CN116310736A

CN116310736A - Image processing method, system, readable medium and electronic device

Info

Publication number: CN116310736A
Application number: CN202310311068.0A
Authority: CN
Inventors: 高薇; 朱晨阳
Original assignee: ARM Technology China Co Ltd
Current assignee: ARM Technology China Co Ltd
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-06-23

Abstract

The application relates to the field of artificial intelligence and discloses an image processing method, an image processing system, a readable medium and electronic equipment, which are applied to the field of artificial intelligence, wherein the method comprises the steps of determining a plurality of detection frames in an image to be identified, and calculating at least one category probability of each detection frame, wherein the category probability represents the probability that the image in the detection frame belongs to a corresponding category; selecting a target detection frame with a category probability meeting probability conditions from a plurality of detection frames; acquiring coordinates of a selected target detection frame; and processing the target detection frame based on the coordinates of the target detection frame. Therefore, the electronic equipment does not need to acquire the coordinates of the detection frame first, only needs to acquire the category probability first, the data quantity required to be acquired is reduced, and the data can be stored in the cache. Furthermore, the electronic device can read the data to be processed from the cache, and the efficiency of the electronic device processor for acquiring the data can be improved.

Description

Image processing method, system, readable medium and electronic device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image processing method, system, readable medium and electronic device.

Background

The electronic equipment can carry out target detection on the image to be identified, the electronic equipment can divide a plurality of detection frames based on the image to be identified, extract image characteristics in each detection frame, and identify and obtain category probability that each detection frame belongs to each category. Then, the electronic device acquires the coordinates of each detection frame and the category probability that each detection frame belongs to each category, screens the detection frames meeting the conditions according to the category probabilities of each detection frame, and performs subsequent coordinate conversion and other processing on the detection frames meeting the conditions.

However, when the number of the detection frames and/or the number of the categories are large, the input data amount required to be acquired in the electronic device rescreening process is large. The storage space of the electronic device cache is usually smaller, so that the required data cannot be stored in the cache, and the required data can only be stored in a memory with larger storage space. The electronic equipment can only read the data needing to be operated from the memory with slower reading speed, so that the image processing speed is reduced.

Disclosure of Invention

In view of this, embodiments of the present application provide an image processing method, system, readable medium, and electronic device.

In a first aspect, an embodiment of the present application provides an image processing method, which is applied to an electronic device, including: determining a plurality of detection frames in the image to be identified, and calculating at least one category probability of each detection frame, wherein the category probability represents the probability that the image in the detection frame belongs to the corresponding category; selecting a target detection frame with a category probability meeting probability conditions from a plurality of detection frames; acquiring coordinates of a selected target detection frame; and processing the target detection frame based on the coordinates of the target detection frame.

By the method provided by the embodiment of the application, the electronic equipment does not need to acquire the coordinates of the detection frame first, only needs to acquire the category probability first, the data quantity required to be acquired is reduced, and the data can be stored in the cache. In the subsequent processing, only the coordinate information of the detection frame meeting the preset condition is acquired, the coordinate information of other detection frames is not required to be acquired, the data required to be acquired is reduced, and the data can be stored in a cache. Particularly, for the case of fewer target detection frames, the condition that a large number of detection frame coordinates which do not meet the screening conditions need to be stored in a cache is avoided. Furthermore, the electronic device can read the data to be processed from the cache, and the efficiency of the electronic device processor for acquiring the data can be improved.

In a possible implementation of the first aspect described above, the probability condition includes that the category probability is greater than a probability threshold.

In a possible implementation of the first aspect, the method further includes: based on the vector instruction set, synchronously determining whether the multiple category probabilities meet a probability condition; determining the class probability meeting the probability condition as the target class probability; and determining the detection frame corresponding to the target class probability as a target detection frame.

In one possible implementation of the first aspect, the coordinates of the target detection frame include predicted correction values of the coordinates of the target detection frame, and the method further includes: and determining the coordinates of the target detection frame after correction based on the coordinates of the target detection frame and the prediction correction value.

In one possible implementation of the first aspect, the coordinates of the target detection frame include coordinates of a center point of the target detection frame; the method further comprises the steps of: and determining the coordinates of the vertex of the target detection frame based on the coordinates of the center point of the target detection frame.

In one possible implementation manner of the first aspect, a neural network model is run on the electronic device, where the neural network model includes an implicit layer and a post-processing layer, and the implicit layer is configured to determine a plurality of detection frames in the image to be identified, and calculate at least one class probability of each detection frame, where the class probability represents a probability that an image in the detection frame belongs to a corresponding class; and the post-processing layer is used for selecting a target detection frame with the category probability meeting the probability condition from the plurality of detection frames, acquiring the coordinates of the selected target detection frame and processing the target detection frame based on the coordinates of the target detection frame.

In a second aspect, an embodiment of the present application provides an image processing system, where the system includes a processor, a memory, and a cache, where the processor is configured to determine a plurality of detection frames in an image to be identified, and calculate at least one class probability for each detection frame, where the class probability indicates a probability that an image in the detection frame belongs to a corresponding class; the processor is also used for selecting a target detection frame with the category probability meeting the probability condition from the plurality of detection frames; the memory is also used for writing the coordinates of the target detection frame into the cache based on the target detection frame; and the processor is also used for acquiring the coordinates of the selected target detection frame from the cache and processing the target detection frame based on the coordinates of the target detection frame.

In a possible implementation of the above second aspect, the probability condition includes that the category probability is greater than a probability threshold.

In one possible implementation of the second aspect, the processor is further configured to determine, based on the vector instruction set, whether the plurality of class probabilities satisfy the probability condition synchronously, determine the class probability that satisfies the probability condition as a target class probability, and determine a detection box corresponding to the target class probability as a target detection box.

In a possible implementation of the second aspect, the processor is further configured to write category probabilities of the plurality of detection frames into the memory; the memory is also used for writing the category probabilities of the plurality of detection frames into the cache; the processor is further configured to read class probabilities of the plurality of detection frames from the cache.

In a possible implementation manner of the second aspect, the memory is further configured to determine a class probability that belongs to the same class from the class probabilities, and is further configured to write the class probabilities that belong to the same class into the cache by the cache line.

In one possible implementation of the second aspect, the buffer includes a first sub-buffer and a second sub-buffer, the processor is further configured to read the class probability of the detection frame in the second sub-buffer based on the memory, and write the class probability of the detection frame in the second sub-buffer based on the memory, and the processor is further configured to read the class probability of the detection frame in the first sub-buffer.

In one possible implementation of the second aspect, the coordinates of the target detection frame include a prediction correction value of the coordinates of the target detection frame; the processor is further configured to obtain a predicted correction value of the target detection frame from the cache, and determine coordinates of the target detection frame after correction based on the coordinates of the target detection frame and the predicted correction value.

In one possible implementation of the above second aspect, the coordinates of the target detection frame include coordinates of a center point of the target detection frame; the processor is further configured to determine coordinates of vertices of the target detection frame based on coordinates of the center point of the target detection frame.

In a possible implementation manner of the second aspect, the processor is further configured to run a neural network model, where the neural network model includes an implicit layer and a post-processing layer, and the implicit layer is configured to determine a plurality of detection frames in the image to be identified, and calculate at least one class probability of each detection frame, where the class probability indicates a probability that an image in the detection frame belongs to a corresponding class; and the post-processing layer is used for selecting a target detection frame with the category probability meeting the probability condition from the plurality of detection frames, acquiring the coordinates of the selected target detection frame and processing the target detection frame based on the coordinates of the target detection frame.

In a third aspect, embodiments of the present application provide a readable medium having instructions embodied therein, which when executed by a processor of an electronic device, cause the electronic device to implement any one of the image processing methods provided in the first aspect and various possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide an electronic device, including: a memory for storing instructions for execution by one or more processors of the electronic device; and a processor, one of the processors of the electronic device, for executing instructions to cause the electronic device to implement any one of the image processing methods provided in the first aspect and various possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising instructions that, when executed by an electronic device, cause the electronic device to implement any one of the image processing methods provided in the first aspect and various possible implementations of the first aspect.

Drawings

FIG. 1 illustrates a schematic diagram of a target detection result, according to some embodiments of the present application;

FIG. 2 illustrates a schematic diagram of a deep neural network, according to some embodiments of the present application;

FIG. 3 illustrates a schematic diagram of a priori block generated by target detection, in accordance with some embodiments of the present application;

FIG. 4 illustrates a flow diagram of post-processing layer work, according to some embodiments provided herein;

FIG. 5 illustrates a schematic diagram of a post-processing layer reading and writing data, according to some embodiments provided herein;

FIG. 6 illustrates a flow diagram of another post-processing layer operation, according to some embodiments provided herein;

7A-7C illustrate a schematic diagram of an image processing system, according to some embodiments of the present application;

FIG. 8 illustrates a schematic diagram of a cached storage structure, according to some embodiments of the present application;

FIG. 9 illustrates a schematic diagram of a structure for caching write data, according to some embodiments of the present application;

FIG. 10 illustrates another schematic diagram of a structure for caching write data, according to some embodiments of the present application;

FIG. 11 illustrates a flow diagram of an image processing method, according to some embodiments of the present application;

FIG. 12 illustrates a schematic diagram of reading data from a cache and writing data, according to some embodiments of the present application;

FIG. 13 illustrates a schematic diagram of an image processing apparatus, according to some embodiments of the present application;

fig. 14 illustrates a schematic structural diagram of an electronic device, according to some embodiments of the present application.

Detailed Description

Illustrative embodiments of the present application include, but are not limited to, image processing methods, systems, readable media, and electronic devices.

The electronic equipment can identify and classify the target objects in the images and can be applied to the fields of face recognition, intelligent video monitoring, automatic driving, unmanned supermarkets and the like. The identification and classification of images may be performed specifically by deep neural networks (deep neural network, DNN), convolutional neural networks (convolutional neural networks, CNN), etc.

For example, when applied to the field of intelligent video monitoring, the electronic device will first acquire image data from the camera and then extract feature information in the image. And determining the category and the position of the target object in the image through the characteristic information of the image. For example, the DNN may identify coordinates of the detection frames shown in fig. 1 and a class corresponding to the target in each detection frame by performing target identification on the image shown in fig. 1, where the class of the target in the detection frame 1 is "cloud", the class of the target in the detection frame 2 is "dog", and the class of the target in the detection frame 3 is "tree".

The following describes the procedure of object detection in image processing in the structure of DNN.

As shown in fig. 2, the neural network inside the DNN may be divided into an input layer, an hidden layer, a post-processing layer, and an output layer. Generally, the first layer is an input layer, the middle layer number comprises a plurality of hidden layers 1-L and a post-processing layer, and the last layer is an output layer, wherein L is a positive integer.

The input layer is used for acquiring input image data and preprocessing the image data. In some embodiments, the input layer may perform a series of pre-processes on the input data, such as image enhancement or limited contrast adaptive histogram equalization (contrast limited adaptive histogram equalization, CLAHE) processing on the image acquired from the camera, and so on.

The hidden layer is used for dividing the image into a plurality of detection frames, extracting the characteristics in each detection frame, and identifying the probability that each detection frame belongs to each category, so as to obtain a preliminary identification result.

The method specifically comprises the following steps: the electronic device sets anchor points (anchors) in the image in a fixed step length, and generates a plurality of detection frames at each anchor point, wherein the sizes of the detection frames at each anchor point are different. As shown in fig. 3, U anchor points are set in the figure with a width distance B and a height distance W, where W, B, U is a positive integer. Based on the U anchor points, R detection frames are generated again with each anchor point as the center, and R is a positive integer, so that V detection frames at different positions can be obtained in the image, and it should be understood that v=u×r. Illustratively, taking anchor a as an example, 3 detection boxes are generated centered around anchor a: detection frame A1, detection frame A2, and detection frame A3.

The electronic device then extracts image features in each detection frame, which are used to reflect color features, texture features, shape features, spatial relationship features, etc. of the objects in the image. For example, the electronic device may extract a directional gradient histogram (histogram of oriented gradient, HOG) feature or haar-like features, harr features, etc. of each detection box.

The electronic equipment further obtains the probability of the image in the detection frame belonging to each category through the classification model according to the image characteristics of the detection frame. For example, the electronic device presets C categories, where C is a positive integer. The electronic device obtains the probability that the image in each detection frame belongs to the C categories by using the classification model. For example, the probability that the V detection boxes respectively belong to the category 1 is s _class1 ＝[s _a,1 ，s _b,1 ，…，s _V,1 ]。

The post-processing layer is used for screening the preliminary identification result, selecting a detection frame meeting the conditions from a plurality of detection frames, decoding the detection frame obtained by screening, mapping the coordinates of the detection frame into the coordinates in the input image, and sending the decoded detection frame to the output layer as a final identification result.

The method specifically comprises the following steps: the electronic equipment compares the class probability of each detection frame belonging to each class with a threshold delta, and when the detection frame with the class probability larger than the threshold delta exists, the electronic equipment indicates that the input image contains the target object of the class, and the position of the target object is the position corresponding to the detection frame. And finally, decoding coordinate information corresponding to the detection frame containing the target object, converting the coordinate information into coordinate information corresponding to the input image, and transmitting the decoded coordinate information of the detection frame, the corresponding class and class probability score to an output layer.

For example, the electronic device first makes each detection frame belong to the category probability s of the category 1 _class1 ＝[s _a,1 ，s _b,1 ，…，s _V,1 ]Comparing with the threshold delta, if a detection frame with the class probability of belonging to the class 1 being greater than the threshold delta exists, the detection frame is indicated to have the object of the class 1. If the probability that each detection box belongs to the category 1 is not greater than the threshold delta, the image is indicated that no object of the category exists.

The output layer is used for outputting the recognition result obtained by the post-processing layer. For example, after the electronic device recognizes the end of the image, the electronic device may further label the target object in the image according to the coordinates and the category of the detection frame, and output the image with the label.

Based on the above object detection process, the present application provides an image processing method, which is applied to an electronic device, and the electronic device obtains coordinates of each detection frame, and a category probability score of each detection frame belonging to each category, and an index value (class_index) corresponding to each category, and an index value (box_index) of each detection frame. The total number of category probability score is the product of the number of detection frames and the number of categories, i.e. v×c. Then, the electronic device reads the class probability of each detection frame according to the class_index in turn, then screens whether the class probability of the detection frame is larger than a threshold delta, and for the detection frame meeting the screening conditions, reads the coordinates of the detection frame according to the box_index, and performs subsequent decoding and other processes.

That is, as shown in fig. 4, the electronic device stores the category probabilities and coordinates of all the detection frames in the memory. The processor will first read the class probabilities of the detection box 1, the detection box 1 comprising class probabilities corresponding to the C classes. The processor first determines whether a class probability greater than a threshold exists in each class probability of the detection frame 1. If the probability of existence of the category is greater than the threshold, the processor reads the coordinates of the detection frame 1 and performs processing such as decoding on the detection frame 1. The above operation is then also performed for the next detection frame 2, and so on, until all detection frames 1 to V have completed the above operation.

Further, as shown in fig. 5, the memory 130 stores the class probability and the coordinates of each detection frame, and an index value for reading the class probability (class probability index value) and an index value for reading the detection frame coordinates (detection frame index value). The processor 110 needs to repeatedly read the class probabilities of the detection frames 1 to V from the memory 130, and uses the probability screening module to screen the class probabilities. And reading coordinates of the detection frame for the detection frame meeting the screening conditions.

That is, the electronic device acquires all the data of each detection frame, and then repeatedly traverses the class_index and the box_index, and performs the processes of screening, decoding, and the like on one detection frame, and then performs the processes of screening, decoding, and the like on the next detection frame.

However, when the number of detection frames and/or the number of categories are large, that is, the values of V and/or C are large, the amount of data that needs to be acquired by the post-processing layer is large. The storage space of the electronic device cache is usually smaller, so that the post-processing layer data cannot be stored in the cache, but can only be stored in a memory with larger storage space. Particularly, when the number of detection frames which can pass the screening is small, the detection frames which do not meet the screening conditions occupy a large amount of storage space.

Among them, the cache memory generally adopts static random-access memory (SRAM) technology, and the memory generally adopts dynamic random-access memory (dynamic random access memory, DRAM) technology, so that the electronic device can read data from the cache memory faster than the system memory. When the DNN data volume is large, the electronic equipment can only read the data needing to be operated from the memory with slower reading speed, so that the DNN operation speed is reduced.

In order to solve the problem that DNN running speed is low due to the fact that input data of a post-processing layer is large, the application further provides an image processing method which is applied to electronic equipment, the electronic equipment firstly obtains category probabilities of all categories to which target detection frames respectively belong, and then synchronously screens a plurality of category probabilities to obtain target category probabilities meeting preset conditions. And then acquiring the coordinates of the target detection frame corresponding to the target class probability, and decoding the acquired target detection frame by the electronic equipment. The preset condition may be that the category probability is greater than a probability threshold.

Therefore, the electronic equipment does not need to acquire the coordinates of the detection frame first, only needs to acquire the category probability first, the data quantity required to be acquired is reduced, and the data can be stored in the cache. In the subsequent processing, only the coordinate information of the detection frame meeting the preset condition is acquired, the coordinate information of other detection frames is not required to be acquired, the data required to be acquired is reduced, and the data can be stored in a cache. Particularly, for the case of fewer target detection frames, the condition that a large number of detection frame coordinates which do not meet the screening conditions need to be stored in a cache is avoided. Furthermore, the electronic device can read the data to be processed from the cache, and the efficiency of the electronic device processor for acquiring the data can be improved.

In some embodiments, the electronic device may specifically filter multiple class probability syncs through a vector instruction set, such as single instruction multiple data streams (single instruction multiple data, SIMD) or very long instruction words (very long instruction word, VLIM).

As shown in fig. 6, the electronic device directly obtains the class probabilities of a plurality of detection frames, including the class probabilities of detection frames 1 to V, where each detection frame includes the class probabilities that the detection frames respectively belong to C classes. Then the electronic equipment synchronously screens the category probability 1 to the category probability V.times.C, determines a detection frame corresponding to the category probability larger than the threshold value, and only acquires the coordinates of the detection frame meeting the condition.

Specifically, fig. 7A to 7C illustrate schematic structural diagrams of an image processing system provided in an embodiment of the present application.

As shown in fig. 7A, the memory 130 stores therein category probabilities and coordinates of the respective detection frames obtained by the hidden layer recognition, and an index value for reading the category probabilities (category probability index value) and an index value for reading the detection frame coordinates (detection frame index value).

The electronic device first stores the category probability score and the category probability index value obtained by the implicit layer processing identification in the memory 130 in the cache 120. The processor 110 then reads the data directly from the cache 120 and screens the target class probabilities above the probability threshold by the probability screening unit. Processor 110 then feeds back to memory 130 an identification of the probability of the target category.

As shown in fig. 7B, based on the received class probability identifier, the memory 130 sends information (the coordinates of the detection frame and the index value of the detection frame) of the target detection frame corresponding to the target class probability to the cache 120. The processor 110 then reads the target detection frame information from the buffer 120, and decodes the detection frame.

In some embodiments, the electronic device may obtain the predicted correction value of the detection frame in addition to the coordinates of the detection frame that meet the preset condition. The prediction correction value is obtained during model training, and the prediction correction value represents the error of a predicted detection frame and a position frame where an actual target object is located. As shown in fig. 7C, after receiving the identifier of the class probability screened by the processor 110, the memory 130 sends the coordinates of the detection frame, the correction value of the detection frame and the index value of the detection frame corresponding to the identifier to the cache 120. The processor 110 will first merge the coordinates of the detection frame according to the correction values of the detection frame, and then decode the coordinates.

In other embodiments, the electronic device may also divide the cache into a plurality of sub-cache regions. When one of the sub-buffers of the memory 130 writes data, the processor 110 can read and process the other sub-buffers already written with data, and thus the memory can also write data into the buffer 120 continuously when the processor 110 processes data.

As shown in fig. 8, the buffer 120 divides the storage space into a sub-buffer 0 and a sub-buffer 1. When the memory 130 writes data into the sub-cache 0, the processor 110 reads and processes the data of the sub-cache 1. While the memory 130 writes data into the sub-cache 1, the processor 110 reads and processes the data of the sub-cache 0.

In other embodiments, since the processor 110 reads data row by row, and the memory 130 writes class probabilities to the cache 120, the class probabilities are also written row by row, i.e., the class probabilities are all the same class for the same row of data. Furthermore, if the class probabilities of the same class need to be read, the corresponding row or rows of data in the cache 120 may be directly read.

For example, if the class probabilities are stored in columns, the data in the same column is the class probability of the same class, and the data in the same row is the data in the same detection frame. As shown in fig. 9 (a), the first column is the class probability that each detection box belongs to class 1. As shown in fig. 9 (B), the second column is the class probability that each detection box belongs to class 2.

However, the processor 110 reads the data according to the rows, and if the processor 110 is to read the category probability of the category 1, it is necessary to read all the data, and then fetch the first data in each row of data, so as to obtain the category probability of the category 1. Further, the processor 110 needs to acquire all class probabilities, resulting in a longer time to acquire data.

Therefore, in the image processing method provided in the present application, the memory 130 writes the category probability into the cache 120 according to the line. As shown in fig. 10 (a), the first row is the class probability that each detection box belongs to class 1. As shown in fig. 10 (B), the second row is the class probability that each detection box belongs to class 2. If the processor 110 were to read the class probability for class 1, the data for the first line would be read directly to obtain the desired data.

In summary, according to the image processing method provided by the application, the class probability is screened, and then only the detection frame coordinates of the screened class probability are obtained. And secondly, dividing the cache into a plurality of sub-cache areas, alternately writing and reading data by the memory and the processor, and continuously writing the data by the memory and reading the data by the processor. In addition, the method and the device can screen the class probabilities of a plurality of classes, and when the memory writes data into the cache, the data are written according to rows, so that the processor can read the data conveniently. In a word, the method provided by the application can reduce the data quantity required to be acquired by the post-processing layer, the post-processing layer can utilize the cache to operate, the data reading efficiency of the processor can be improved, and the DNN operation rate is improved.

The detailed flow of the image processing method provided in the present application is described below. As shown in fig. 11, the method includes:

s1110: the memory 130 writes the class probabilities to the cache 120.

The memory 130 stores the results of the implicit layer after feature extraction and classification of the detection frames, specifically, as shown in fig. 7A, including the probability that each detection frame belongs to each category and the coordinates of each detection frame, and further including index values (category probability index value and detection frame index value) corresponding to the category probability and the coordinates. The memory 130 will first write the class probability and class probability index values to the cache 130.

In some embodiments, the memory 130 may write the class probability to the cache 120 on a line-by-line basis, i.e., a cache line (cacheline). Each row of data corresponds to the same category, i.e. each row of data is the category probability that different detection boxes belong to the same category. It should be appreciated that the class probabilities for the same class may also be stored in multiple cachelines, as determined by the specifications of the cache 120. For example, reference may be made to fig. 10 and its associated description.

In some embodiments, the class probability and the coordinates of each detection frame stored in the memory 130 may be obtained by the processor 110 through implicit layer recognition, or may be obtained by the processor of other devices, which is not limited in this application.

S1120: the processor 110 reads the class probabilities from the cache 120.

The processor 110 reads the class probabilities from the cache 120 according to the class probability index value, wherein the processor 110 reads the class probabilities on a row-by-row basis.

In some embodiments, the cache 120 may also be divided into multiple sub-cache regions. For example, as shown in fig. 8, the buffer 120 divides the storage space into a sub-buffer 0 and a sub-buffer 1. As shown in fig. 12, the memory 130 writes data to the sub-buffer 0 for a period of time. In the second time period, the memory 130 writes data into the sub-buffer 1, and the processor 110 reads data from the sub-buffer 0. In the third time period, the memory 130 writes data into the sub-buffer 0, and the processor 110 reads the data of the sub-buffer 1. In the fourth period, the memory 130 writes data into the sub-buffer 1, and the processor 110 reads data from the sub-buffer 0. The operations of the second time period and the third time period are reciprocally performed.

In other embodiments, the sizes of the multiple sub-buffers of the buffer 120 may be different, and in particular, the sizes of the multiple sub-buffers may be adjusted based on the running time of each experiment according to multiple experiments.

S1130: the processor 110 screens the class probabilities for target class probabilities.

The processor 110 may use a vector instruction set to synchronously screen the multiple category probabilities according to the acquired category probabilities, where a preset condition for screening may be that the category probability is greater than a probability threshold. And further, the target category probability meeting the preset condition in the category probabilities can be obtained.

The processor 110 may filter all the class probabilities simultaneously, or may divide the class probabilities into a plurality of parts, and the processor 110 filters the class probabilities a plurality of times.

In some embodiments, the preset conditions of the category probabilities of different categories may be the same or different. I.e. the probability thresholds for the different categories may also be different. It should be understood that the method for screening the class probability of the detection frame by the processor 110 is not particularly limited.

S1140: processor 110 sends an identification of the target class probability to memory 130.

After the processor 110 filters the target class probability, the processor may also send an identification of the target class probability, which may specifically be a class probability index value of the target class probability, to the memory 130.

In other embodiments, the electronic device may further repeatedly perform steps S1110 to S1140. The electronic device firstly identifies all the detection frames by adopting the same classification model, and after the class probability of each detection frame is obtained, the electronic device respectively identifies the target detection frame corresponding to the target class probability obtained in the step S1130 by using another classification model or other classification models, and extracts more detailed characteristics in the detection frames of different types.

S1150: the memory 130 writes the target detection frame coordinates corresponding to the target class probabilities to the cache 120.

The memory 130 determines a detection frame (target detection frame) corresponding to the target class probability according to the identifier of the target class probability sent by the processor 110, and then writes the coordinates of the target detection frame into the cache 120.

In some embodiments, the memory 130 may further store a prediction correction value of the detection frame, where the prediction correction value is obtained during model training, and the prediction correction value indicates an error of the prediction obtained detection frame and a position frame where the actual target object is located. As shown in fig. 7C, after receiving the identifier of the target class probability sent by the processor 110, the memory 130 sends the coordinates of the detection frame, the correction value of the detection frame, and the index value of the detection frame corresponding to the identifier to the cache 120.

S1160: the processor 110 reads the target detection frame coordinates from the cache 120.

The processor 110 will read the coordinates of the target detection frame from the cache 120. The buffer 120 may be divided into a plurality of sub-buffers, for example, as shown in fig. 8, the memory 130 writes the coordinates of the target detection frame into the buffer 120, and the processor 110 reads the coordinates of the target detection frame from the buffer 120 may refer to the process shown in fig. 12.

S1170: the processor 110 processes the target detection frame.

The processor 110 will decode the target detection frame and convert the coordinates of the target detection frame to coordinates of a coordinate system corresponding to the input image. In some embodiments, the coordinates of the target detection frame stored in the memory 130 may be only the coordinates of the center point of the target detection frame and the length-width of the target detection frame, and the processor 110 may convert the coordinates of the center point and the length-width of the target detection frame into the coordinates of the vertex of the target detection frame when decoding the target detection frame.

In some embodiments, the processor 110 may obtain the predicted correction value of the target detection frame in addition to the coordinates of the target detection frame. The processor 110 may further adjust the coordinates of the target detection frame according to the predicted correction value, that is, merge the predicted correction value and the coordinates of the target detection frame, so that the coordinates of the merged target detection frame are closer to the position of the actual target object. The processor 110 does not decode the target detection frame until it merges.

In other embodiments, the process of adjusting the coordinates of the target detection frame by the processor 110 according to the prediction correction value may also use a vectorized instruction set, that is, merge the coordinates of multiple target detection frames at the same time.

In other embodiments, the processor 110 may also determine whether there are multiple target detection boxes present in a category, i.e., when the probability of multiple categories in the same category is greater than a probability threshold. The processor 110 may also calculate the overlap area between the multiple target detection frames by a non-maximum suppression (non maximum suppression, NMS) algorithm, and determine that two target detection frames with larger overlap area identify the same target object, and two target detection frames with smaller overlap area identify different target objects of the same class. And then determining one target detection frame from a plurality of target detection frames corresponding to the same target object as the detection frame of the target object, so that each target object only corresponds to one target detection frame.

The image processing apparatus provided in the present application is described below. As shown in fig. 13, the image processing apparatus 1300 includes a determination unit 1310, a selection unit 1320, and an acquisition unit 1330.

The determining unit 1310 is configured to determine a plurality of detection frames in the image to be identified, and calculate at least one class probability of each detection frame, where the class probability represents a probability that the image in the detection frame belongs to a corresponding class. The selecting unit 1320 is configured to select a target detection frame for which there is a category probability satisfying a probability condition from a plurality of detection frames, and the acquiring unit 1330 is configured to acquire coordinates of the selected target detection frame. The determining unit 1310 is further configured to process the target detection frame based on coordinates of the target detection frame.

In some embodiments, the probability conditions include a category probability greater than a probability threshold.

In other embodiments, the determining unit 1310 is further configured to determine, based on the vector instruction set, whether the plurality of class probabilities satisfy the probability condition; determining the class probability meeting the probability condition as the target class probability; and determining the detection frame corresponding to the target class probability as a target detection frame.

In other embodiments, the coordinates of the target detection frame include a predicted correction value of the coordinates of the target detection frame, and the determining unit 1310 is further configured to determine the corrected coordinates of the target detection frame based on the coordinates of the target detection frame and the predicted correction value.

In other embodiments, the coordinates of the target detection frame include coordinates of a center point of the target detection frame; the determining unit 1310 is further configured to determine coordinates of vertices of the target detection frame based on coordinates of the center point of the target detection frame.

In other embodiments, the electronic device is provided with a neural network model, where the neural network model includes an implicit layer and a post-processing layer, and the determining unit 1310 is further configured to determine a plurality of detection frames in the image to be identified through the implicit layer, and calculate at least one class probability of each detection frame, where the class probability represents a probability that the image in the detection frame belongs to a corresponding class; the selecting unit 1320 is further configured to select, from among a plurality of detection frames, a target detection frame for which there is a category probability satisfying the probability condition through the post-processing layer, and the acquiring unit 1330 is further configured to acquire, through the post-processing layer, coordinates of the selected target detection frame, and process the target detection frame based on the coordinates of the target detection frame.

In summary, the image processing apparatus provided by the present application screens the class probability first, and then only acquires the coordinates of the detection frame of the screened class probability. And secondly, dividing the cache into a plurality of sub-cache areas, alternately writing and reading data by the memory and the processor, and continuously writing the data by the memory and reading the data by the processor. In addition, the method and the device can screen the class probabilities of a plurality of classes, and when the memory writes data into the cache, the data are written according to rows, so that the processor can read the data conveniently. In a word, the method provided by the application can reduce the data quantity required to be acquired by the post-processing layer, the post-processing layer can utilize the cache to operate, the data reading efficiency of the processor can be improved, and the DNN operation rate is improved.

Further, fig. 14 illustrates a schematic structural diagram of an electronic device 100, according to some embodiments of the present application. As shown in fig. 14, the electronic device 100 includes one or more processors 101A, NPU B, a system memory 102, a non-volatile memory (NVM) 103, a communication interface 104, an input/output (I/O) device 105, and system control logic 106 for coupling the processor 101A, the system memory 102, the non-volatile memory 103, the communication interface 104, and the input/output (I/O) device 105. Wherein:

the processor 101A may be the aforementioned processor 110, may include one or more processing units, and may include, for example, a central processing unit (central processing unit, CPU), an image processor (graphics processing unit, GPU), a digital signal processor (digital signal processor, DSP), a microprocessor (micro-programmed control unit, MCU), an artificial intelligence (artificial intelligence, AI) processor, or a processing module or processing circuit of a programmable logic device (field programmable gate array, FPGA) may include one or more single-core or multi-core processors.

The neural network processor 101B may be configured to implement reasoning of the neural network model and execute instructions corresponding to the operation method of the neural network model provided in the embodiments of the present application. The neural network processor 101B may be a separate processor or may be integrated within the processor 101A.

The system memory 102 may be the above-mentioned cache 120, specifically, a volatile memory, such as a random-access memory (RAM), a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM), etc. The system memory is used to temporarily store data and/or instructions, for example, in some embodiments, the system memory 102 may be used to store instructions related to the neural network model described above.

Nonvolatile memory 103 may be the memory 130 described above and may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, the nonvolatile memory 103 may include any suitable nonvolatile memory such as flash memory and/or any suitable nonvolatile storage device, for example, a Hard Disk Drive (HDD), compact Disc (CD), digital versatile disc (digital versatile disc, DVD), solid State Drive (SSD), and the like. In some embodiments, the nonvolatile memory 103 may also be a removable storage medium, such as a Secure Digital (SD) memory card or the like.

In particular, the system memory 102 and the nonvolatile memory 103 may each include: a temporary copy and a permanent copy of instruction 107. The instructions 107 may include: the image processing method provided in the embodiments of the present application is implemented by the electronic device 100 when executed by at least one of the processor 101A and/or the neural network processor 101B, and may be specifically referred to as fig. 11 and related description thereof.

The communication interface 104 may include a transceiver to provide a wired or wireless communication interface for the electronic device 100 to communicate with any other suitable device via one or more networks. In some embodiments, the communication interface 104 may be integrated with other components of the electronic device 100, e.g., the communication interface 104 may be integrated in the processor 101A. In some embodiments, the electronic device 100 may communicate with other devices through the communication interface 104, e.g., the electronic device 100 may obtain a neural network model to be run from the other electronic devices through the communication interface 104.

Input/output (I/O) devices 105 may include input devices such as a keyboard, mouse, etc., output devices such as a display, etc., through which a user may interact with electronic device 100.

The system control logic 106 may include any suitable interface controller to provide any suitable interface with other modules of the electronic device 100. For example, in some embodiments, the system control logic 106 may include one or more memory controllers to provide an interface to the system memory 102 and the non-volatile memory 103.

In some embodiments, at least one of the processors 101A may be packaged together with logic for one or more controllers of the system control logic 106 to form a system package (system in package, siP). In other embodiments, at least one of the processors 101A may also be integrated on the same chip with logic for one or more controllers of the system control logic 106 to form a system-on-chip (SoC).

It is understood that electronic device 100 may be any electronic device capable of running a neural network model, including, but not limited to, a cell phone, a wearable device (e.g., a smart watch, etc.), a tablet, a desktop, a laptop, a handheld computer, a notebook, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a netbook, as well as a cell phone, a personal digital assistant (personal digital assistant, PDA), an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, etc., without limitation, embodiments of the present application.

It is to be understood that the configuration of the electronic device 100 shown in fig. 14 is merely an example, and in other embodiments, the electronic device 100 may include more or fewer components than shown, or may combine certain components, or may split certain components, or may have a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the present application may be implemented as a computer program or program code that is executed on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a digital signal processor (digital signal processor, DSP), microcontroller, application specific integrated circuit (application specific integrated circuit, ASIC), or microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. Program code may also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in the present application are not limited in scope to any particular programming language. In either case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed over a network or through other computer readable media.

Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including but not limited to floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random access memories (random access memory, RAMs), erasable programmable read-only memories (erasable programmable read only memory, EPROMs), electrically erasable programmable read-only memories (electrically erasable programmable read-only memories), magnetic or optical cards, flash memory, or tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared signal digital signals, etc.) using the internet in the form of an electrical, optical, acoustical or other form of propagated signal.

Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some structural or methodological features may be shown in a particular arrangement and/or order. However, it should be understood that such a particular arrangement and/or ordering may not be required. Rather, in some embodiments, these features may be arranged in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of structural or methodological features in a particular figure is not meant to imply that such features are required in all embodiments, and in some embodiments, may not be included or may be combined with other features.

It should be noted that, in the embodiments of the present application, each unit/module is a logic unit/module, and in physical aspect, one logic unit/module may be one physical unit/module, or may be a part of one physical unit/module, or may be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logic unit/module itself is not the most important, and the combination of functions implemented by the logic unit/module is the key to solve the technical problem posed by the present application. Furthermore, to highlight the innovative part of the present application, the above-described device embodiments of the present application do not introduce units/modules that are less closely related to solving the technical problems presented by the present application, which does not indicate that the above-described device embodiments do not have other units/modules.

It should be noted that, in the examples and descriptions of this patent, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. An image processing method applied to an electronic device, the method comprising:

determining a plurality of detection frames in an image to be identified, and calculating at least one category probability of each detection frame, wherein the category probability represents the probability that the image in the detection frame belongs to a corresponding category;

Selecting a target detection frame with a category probability meeting probability conditions from the plurality of detection frames;

acquiring coordinates of a selected target detection frame;

and processing the target detection frame based on the coordinates of the target detection frame.

2. The method of claim 1, wherein the probability condition comprises a category probability greater than a probability threshold.

3. The method of claim 1, wherein selecting a target detection box from the plurality of detection boxes for which there is a class probability that satisfies a probability condition, comprises:

based on the vector instruction set, synchronously determining whether the multiple category probabilities meet a probability condition;

determining the class probability meeting the probability condition as a target class probability;

and determining the detection frame corresponding to the target class probability as the target detection frame.

4. The method of claim 1, wherein the coordinates of the target detection frame include predicted correction values for the coordinates of the target detection frame,

the processing the target detection frame based on the coordinates of the target detection frame includes:

and determining the coordinates of the target detection frame after correction based on the coordinates of the target detection frame and the prediction correction value.

5. The method of any one of claims 1 to 4, wherein the coordinates of the target detection frame include coordinates of a center point of the target detection frame;

and determining the coordinates of the vertex of the target detection frame based on the coordinates of the center point of the target detection frame.

6. The method of claim 1, wherein the electronic device has a neural network model running thereon, the neural network model comprising an hidden layer and a post-processing layer, wherein,

the hidden layer is used for determining a plurality of detection frames in the image to be identified and calculating at least one category probability of each detection frame, wherein the category probability represents the probability that the image in the detection frame belongs to the corresponding category;

the post-processing layer is used for selecting a target detection frame with a category probability meeting probability conditions from the plurality of detection frames and,

acquiring coordinates of a selected target detection frame

7. An image processing system comprising a processor, a memory and a cache, characterized in that,

The processor is used for determining a plurality of detection frames in the image to be identified and calculating at least one category probability of each detection frame, wherein the category probability represents the probability that the image in the detection frame belongs to the corresponding category;

the processor is further configured to select, from the plurality of detection frames, a target detection frame having a class probability satisfying a probability condition;

the memory is further configured to write coordinates of the target detection frame into the cache based on the target detection frame;

the processor is further configured to obtain coordinates of the selected target detection frame from the cache, and process the target detection frame based on the coordinates of the target detection frame.

8. The system of claim 7, wherein the probability condition comprises a category probability greater than a probability threshold.

9. The system of claim 8, wherein the system further comprises a controller configured to control the controller,

the processor is further configured to synchronously determine, based on the vector instruction set, whether a plurality of category probabilities satisfy the probability condition, and,

determining the class probability satisfying the probability condition as a target class probability, and,

10. The system of claim 7, wherein the system further comprises a controller configured to control the controller,

the processor is further configured to write category probabilities of the plurality of detection frames into the memory;

the memory is further configured to write category probabilities of the plurality of detection frames into the cache;

the processor is further configured to read category probabilities of the plurality of detection frames from the cache.

11. The system of claim 10, wherein the system further comprises a controller configured to control the controller,

the memory is further configured to determine a category probability belonging to the same category among the category probabilities, and,

and the method is also used for writing the class probabilities belonging to the same class into the cache according to cache lines.

12. The system of claim 10, wherein the cache comprises a first sub-cache region and a second sub-cache region,

writing the class probability of the detection frame into the first sub-buffer based on the memory, the processor further being configured to read the class probability of the detection frame in the second sub-buffer, and,

and writing the class probability of the detection frame into the second sub-cache area based on the memory, wherein the processor is also used for reading the class probability of the detection frame in the first sub-cache area.

13. The system of claim 7, wherein the coordinates of the target detection frame include predicted correction values for the coordinates of the target detection frame;

The processor is further configured to obtain a prediction correction value of the target detection frame from the cache, and,

14. The system of any one of claims 7 to 13, wherein the coordinates of the target detection frame include coordinates of a center point of the target detection frame;

the processor is further configured to determine coordinates of the vertex of the target detection frame based on coordinates of the center point of the target detection frame.

15. The system of claim 7, wherein the system further comprises a controller configured to control the controller,

the processor is further configured to run a neural network model, the neural network model including an implicit layer and a post-processing layer,

acquiring coordinates of a selected target detection frame

16. A readable medium having instructions embodied therein, which when executed by a processor of an electronic device, cause the electronic device to implement the image processing method of any one of claims 1 to 6.

17. An electronic device, comprising:

a memory for storing instructions for execution by one or more processors of the electronic device;

and a processor, which is one of the processors of the electronic device, for executing the instructions to cause the electronic device to implement the image processing method of any one of claims 1 to 6.