CN114758140A

CN114758140A - Target detection method, apparatus, device and medium

Info

Publication number: CN114758140A
Application number: CN202210329801.7A
Authority: CN
Inventors: 肖扬; 罗涛; 施佳子; 刘乙赛; 于海燕
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-15

Abstract

The disclosure provides a target detection method, a target detection device, a target detection equipment and a target detection medium, which can be applied to the technical field of artificial intelligence and the technical field of finance. The target detection method comprises the following steps: receiving an event stream to be detected output by a dynamic vision sensor, wherein the event stream to be detected is represented at a target position, a time sequence pixel signal to be detected is output when the brightness change accumulation reaches a preset threshold value, and the time sequence pixel signal to be detected comprises target pixel position information and target pixel brightness change information; generating an event image to be detected from the event stream to be detected within a preset time length according to the target pixel position information and the target pixel brightness change information; obtaining an image to be detected by extracting the characteristic information of the image of the event to be detected; and inputting the image to be detected into a pre-trained convolutional neural network model, and outputting a target detection result.

Description

Target detection method, apparatus, device and medium

Technical Field

The present disclosure relates to the field of artificial intelligence and financial technology, and in particular to a method, apparatus, device, medium, and program product for object detection.

Background

In a conventional target detection method, an image acquired by a camera device is generally output in full pixels, so that framing processing is usually required when a moving object is detected, and false detection and missed detection can occur in a complex scene according to manual design characteristics. On the other hand, the image output by the full pixels has a large data amount and a large amount of redundant data, which results in low detection efficiency.

Disclosure of Invention

In view of the above, the present disclosure provides an object detection method, apparatus, device, medium, and program product.

According to an aspect of the present disclosure, there is provided a target detection method including:

receiving an event stream to be detected output by a dynamic vision sensor, wherein the event stream to be detected is represented at a target position, a time sequence pixel signal to be detected is output when the brightness change accumulation reaches a preset threshold value, and the time sequence pixel signal to be detected comprises target pixel position information and target pixel brightness change information;

generating an event image to be detected from the event stream to be detected within a preset time length according to the target pixel position information and the target pixel brightness change information;

obtaining an image to be detected by extracting the characteristic information of the image of the event to be detected; and

and inputting the image to be detected into a pre-trained convolutional neural network model, and outputting a target detection result.

According to the embodiment of the disclosure, according to the position information and the brightness change information of the target pixel, the method for generating the event stream to be detected within the preset time length into the event image to be detected includes:

constructing N pixel matrixes to be detected according to the pixel signals of the time sequence to be detected at N moments within a preset time length;

And sequentially accumulating and adding the pixel brightness change data of each pixel point in the N-1 th pixel matrix and the pixel brightness change data of each pixel point in the N pixel matrix according to the time sequence of generating the time sequence pixel signal to be detected to obtain an event image to be detected.

According to the embodiment of the disclosure, the method for obtaining the image to be detected by extracting the feature information of the event image to be detected comprises the following steps:

performing wavelet decomposition on the event image to be detected to obtain a wavelet signal of each layer;

and denoising the wavelet signals in each decomposition layer by using a wavelet threshold function to obtain an image to be detected, wherein the image to be detected comprises the characteristic information of the event image to be detected.

According to the embodiment of the present disclosure, the target detection result includes a target pixel position coordinate, and the target detection method further includes:

constructing a target detection frame in the image to be detected according to the position coordinates of the target pixels;

and determining the number of the targets in the image to be detected by counting the number of the target detection frames.

According to the embodiment of the disclosure, determining the number of targets in the image to be detected by counting the number of the target detection frames comprises:

respectively counting the number of target detection frames in each image to be detected;

And determining the number of the targets in the image to be detected by calculating the average value of the number of the target detection frames in the plurality of images to be detected.

According to the embodiment of the disclosure, the training method of the pre-trained convolutional neural network model comprises the following steps:

receiving a historical event stream output by a dynamic vision sensor, wherein the historical event stream represents a historical time sequence pixel signal output when the brightness change accumulation reaches a preset threshold value at a target position, and the historical time sequence pixel signal comprises historical target pixel position information and historical target pixel brightness change information;

generating a historical event image from the historical event stream within a preset time according to the historical target pixel position information and the historical target pixel brightness change information,

obtaining a sample data set by extracting characteristic information of a historical event image; and

and training the convolutional neural network model to be trained by utilizing the sample data set to obtain the convolutional neural network model trained in advance.

Another aspect of the present disclosure provides an object detecting apparatus including: the device comprises a receiving module, a generating module, an extracting module and a detecting module. The receiving module is used for receiving an event stream to be detected output by the dynamic vision sensor, wherein the event stream represents a time sequence pixel signal to be detected output when the brightness change accumulation reaches a preset threshold value at a target position, and the time sequence pixel signal to be detected comprises target pixel position information and target pixel brightness change information. And the generation module is used for generating the event stream to be detected within the preset time length into the event image to be detected according to the target pixel position information and the target pixel brightness change information. And the extraction module is used for extracting the characteristic information of the event image to be detected to obtain the image to be detected. And the detection module is used for inputting the image to be detected into a pre-trained convolutional neural network model and outputting a target detection result.

According to an embodiment of the present disclosure, a generation module includes a construction unit and a generation unit. The construction unit is used for constructing N pixel matrixes to be tested according to the time sequence pixel signals to be tested at N moments in the preset duration. And the generating unit is used for sequentially accumulating and summing the pixel brightness change data of each pixel point in the N-1 th pixel matrix and the pixel brightness change data of each pixel point in the N-1 th pixel matrix according to the time sequence for generating the time sequence pixel signal to be detected to obtain the event image to be detected.

According to the embodiment of the disclosure, the first extraction module comprises a decomposition unit and a denoising unit, wherein the decomposition unit is used for performing wavelet decomposition on the event image to be detected to obtain a wavelet signal of each layer. And the denoising unit is used for denoising the wavelet signals in each decomposition layer by using a wavelet threshold function to obtain an image to be detected, wherein the image to be detected comprises the characteristic information of the event image to be detected.

According to the embodiment of the disclosure, the target detection device further comprises a construction module and a statistical module. The construction module is used for constructing a target detection frame in the image to be detected according to the position coordinates of the target pixels. And the counting module is used for determining the number of the targets in the image to be detected by counting the number of the target detection frames.

According to an embodiment of the present disclosure, a statistics module includes a statistics unit and a calculation unit. The statistical unit is used for respectively counting the number of the target detection frames in each image to be detected. And the calculating unit is used for determining the number of the targets in the image to be detected by calculating the average value of the number of the target detection frames in the plurality of images to be detected.

According to an embodiment of the present disclosure, the target detection apparatus further includes a training module, configured to train a well-trained convolutional neural network model. The training module comprises a receiving unit, a generating unit, an extracting unit and a training unit. The receiving unit is used for receiving a historical event stream output by the dynamic vision sensor, wherein the historical event stream represents a historical time sequence pixel signal output when the brightness change accumulation reaches a preset threshold value at a target position, and the historical time sequence pixel signal comprises historical target pixel position information and historical target pixel brightness change information. And the generating unit is used for generating a historical event image from the historical event stream within the preset time length according to the historical target pixel position information and the historical target pixel brightness change information. And the extraction unit is used for obtaining the sample data set by extracting the characteristic information of the historical event image. And the training unit is used for training the convolutional neural network model to be trained by utilizing the sample data set to obtain the convolutional neural network model which is trained in advance.

Another aspect of the present disclosure provides an electronic device including: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described object detection method.

The fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-mentioned object detection method.

The fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above object detection method.

According to the embodiment of the disclosure, the event stream to be detected output by the dynamic vision sensor is used for generating the event image to be detected, and the event stream to be detected is at the target position, and the time sequence pixel signal to be detected is output when the accumulated brightness change reaches the preset threshold value, so that the redundant data is reduced, the data transmission quantity is reduced, and the characteristic information of the event image to be detected is extracted. And then extracting the characteristic information of the event image to be detected to obtain an image to be detected, inputting the image to be detected into a pre-trained convolutional neural network model, and outputting a target detection result, thereby improving the target detection efficiency in a complex scene and effectively avoiding the problem of target false detection or target missing detection.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, taken in conjunction with the accompanying drawings of which:

FIG. 1 schematically illustrates an application scenario diagram of an object detection method, apparatus, device, medium and program product according to an embodiment of the disclosure;

FIG. 2 schematically illustrates a flow chart of a target detection method according to an embodiment of the disclosure;

3 a-3 f schematically illustrate a flow chart for generating an event image to be measured according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a logic block diagram for processing an image of an event under test using a wavelet threshold function in accordance with an embodiment of the disclosure;

FIG. 5 schematically illustrates a flow chart of a training method of a pre-trained convolutional neural network according to an embodiment of the present disclosure;

FIG. 6 schematically shows a block diagram of an object detection apparatus according to an embodiment of the present disclosure; and

fig. 7 schematically shows a block diagram of an electronic device adapted to implement a target detection method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).

It should be noted that the object detection method and apparatus of the present disclosure can be used in the financial technology field and the artificial intelligence technology field, and can also be used in any field except the financial technology field and the artificial intelligence technology field, and the application field of the object detection method and apparatus of the present disclosure is not limited.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.

In the technical scheme of the disclosure, before the personal information of the user is acquired or collected, the authorization or the consent of the user is acquired.

The traditional target detection method generally adopts a deep learning method, so that the target detection effect in the video is improved. However, the main tools currently used for target detection generally include laser radars and cameras. Because the cost of the laser radar is higher, the application scenes of image acquisition by adopting a camera device are more.

However, on one hand, to detect a moving object, it is usually necessary to frame an image acquired by an imaging device, and manually design features according to experience, so that target false detection or target missing detection is likely to occur in a complex scene. On the other hand, the conventional imaging device is used for outputting one frame by one frame in full pixels, so that a large transmission bottleneck exists between the acquisition module and the processing module, and redundant data is more. On the other hand, the deep learning model has a large scale, and the number of parameters is millions or even tens of millions, so that the training and the use of the deep learning model can be operated on a high-performance device, and the application range of the detection method is limited.

In view of this, embodiments of the present disclosure provide a target detection method, in which an event stream to be detected output by a dynamic vision sensor is used to generate an event image to be detected, and since the event stream to be detected is at a target position, and a temporal pixel signal to be detected is output when a luminance change accumulation reaches a preset threshold, redundant data is reduced, data transmission amount is reduced, and then characteristic information of the event image to be detected is extracted. And then extracting the characteristic information of the event image to be detected to obtain an image to be detected, inputting the image to be detected into a pre-trained convolutional neural network model, and outputting a target detection result, thereby improving the target detection efficiency in a complex scene and effectively avoiding the problem of target false detection or target missing detection.

Fig. 1 schematically illustrates an application scenario diagram of object detection according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may be network video terminal devices capable of receiving dynamic vision sensor output signals disposed within a target detection area.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for signals or data collected by the user with the

terminal devices

101, 102, 103.

It should be noted that the object detection method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the object detection device provided by the embodiment of the present disclosure may be generally disposed in the server 105. The object detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the object detection apparatus provided in the embodiments of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The target detection method of the disclosed embodiment will be described in detail below with fig. 2 to 5 based on the scenario described in fig. 1.

Fig. 2 schematically shows a flow chart of a target detection method according to an embodiment of the present disclosure.

As shown in fig. 2, the object detection method of this embodiment includes operations S210 to S240.

In operation S210, a to-be-detected event stream output by a dynamic vision sensor is received, where the to-be-detected event stream is represented at a target position, and a to-be-detected time-series pixel signal output when a brightness change accumulation reaches a preset threshold value includes target pixel position information and target pixel brightness change information.

According to the embodiment of the disclosure, the target position may be an area of the target to be detected, and an event is output when the brightness change accumulation of a certain pixel reaches a preset threshold. The preset threshold is generally a setting parameter of the dynamic sensor. Events generally include pixel location coordinates, time stamp of pixel generation, polarity of event. Wherein the polarity of the event generally represents the pixel brightness change information by a positive and negative value. The event stream to be measured can be a time sequence signal to be measured or a time sequence event to be measured output by the dynamic vision sensor. For example: the event stream output at time t1 may include pixel location (A1, B1) and event polarity + m1, pixel location (A2, B2) and event polarity + m2, pixel location (A3, B3) and event polarity-n 1, pixel location (A2, B2) and event polarity-n 2.

In operation S220, an event stream to be detected within a preset time length is generated into an event image to be detected according to the target pixel position information and the target pixel brightness change information.

According to an embodiment of the present disclosure, the preset time period may be from time t1 to time t 4. Where, at time t1, pixel position (A1, B1) and event polarity-n 1 in the event stream. the event stream output at time t2 includes pixel position (A2, B2) and event polarity + m3, pixel position (A2, B2) and event polarity-n 2. the event stream output at time t3 includes pixel position (A3, B3) and event polarity-n 1, pixel position (A2, B2) and event polarity-n 2. the event stream of the event stream output at time t4 may include a pixel position (a1, B1) and an event polarity + m1, a pixel position (a2, B2) and an event polarity + m 2. Six events at the time t 1-t 4 and the polarities of the events at the same pixel position can be accumulated and summed to generate an event image to be measured. The brightness change information (event polarity) of the pixel point at the pixel position (A1, B1) in the event image to be detected is m1-n 1; the brightness change information (event polarity) of the pixel point of the pixel position (A2, B2) is m3-n2+ m2-n 2; the luminance change information of the pixel point at the pixel position (A3, B3) is-n 1. The generated image of the event to be measured can be a gray scale image.

In operation S230, an image to be measured is obtained by extracting feature information of the event image to be measured.

According to the embodiment of the present disclosure, since noise generally exists in the target detection environment, the result of target detection may be interfered. Therefore, the generated event image to be detected can be denoised to extract the characteristic information, so as to obtain the image to be detected. For example: in a farm, the target to be detected can be a cultivated animal, but objects such as plants, feeding tools and the like which may exist in the detection area can also be collected by the dynamic vision sensor and appear in the image of the event to be detected. Methods of image denoising processing may be utilized, for example: mean filtering, adaptive wiener filtering, median filtering, morphological noise filtering, wavelet de-noising, and the like. Blurring the non-target object in the event image to be detected, and highlighting the characteristic information of the target object.

In operation S240, the image to be detected is input to the pre-trained convolutional neural network model, and a target detection result is output.

According to the embodiment of the disclosure, the target detection result may include a confidence coefficient representing that the image to be detected includes the target, the confidence coefficient may be any number from 0 to 1, the closer the confidence coefficient is to 1, the image to be detected includes the target, and the closer the confidence coefficient is to 0, the more the image to be detected does not include the target. The target detection result may also include pixel location coordinates of the target, such as: the pixel position coordinates (x1, y1) of the object a and the pixel position coordinates (x2, y2) of the object b, wherein the object a and the object b are detection targets, the number of the detection targets can be determined by the number of the pixel coordinates. The pixel position coordinates of the target may also be coordinates of four vertices of a minimum area rectangle containing the target a, or the number of detection targets may also be determined using the number of minimum area rectangles.

According to an embodiment of the present disclosure, a method for generating an event image to be detected includes:

and sequentially and accumulatively adding the pixel brightness change data of each pixel point in the (N-1) th pixel matrix and the pixel brightness change data of each pixel point in the Nth pixel matrix according to the time sequence of generating the time sequence pixel signals to be detected to obtain an event image to be detected.

Fig. 3a to 3f schematically show a flowchart of generating an event image to be measured according to an embodiment of the present disclosure.

As shown in fig. 3a, the pixel matrix at time t1 is a 4 × 4 pixel matrix, and each element represents a pixel point at one pixel position. In fig. 3a, the luminance variation of the pixel point in row 2 and column 2 may be + m, and the luminance variation of the pixel point in row 4 and column 4 may be + n. Since the event received at this time is only the pixel signal at time t1, the displayed event image is the same as the received pixel signal, as shown in fig. 3 b.

As shown in fig. 3c, in the pixel matrix at time t2, the luminance variation of the pixel point at row 1 and column 2 may be-p, and the luminance variation of the pixel point at row 2 and column 4 may be-q. Taking the preset time duration from t1 to t2 as an example, the brightness change data of each element in the pixel matrix to be measured at the time t1 and the brightness change data of each element in the pixel matrix to be measured at the time t2 are added in an accumulated manner, so as to obtain the event image to be measured with the preset time duration from t1 to t 2. As shown in fig. 3d, in the event image to be detected, the luminance change of the pixel point in the 2 nd row and the 2 nd column in the 2 nd row may be + m, the luminance change of the pixel point in the 4 th row and the 4 th column in the 4 th row may be + n, the luminance change of the pixel point in the 1 st row and the 2 nd column may be-p, and the luminance change of the pixel point in the 2 nd row and the 4 th column in the 2 nd row may be-q.

As shown in fig. 3e, in the pixel matrix at time t3, the luminance variation of the pixel point in the 1 st row and the 1 st column may be + h, and the luminance variation of the pixel point in the 4 th row and the 4 th column may be + k. Taking the preset time duration from t1 to t3 as an example, the brightness change data of each element in the pixel matrix to be measured at the time t1 and each element in the pixel matrix to be measured at the time t2 and the brightness change data of each element in the pixel matrix to be measured at the time t3 can be added in an accumulated manner correspondingly, so as to obtain the event image to be measured with the preset time duration from t1 to t 3. As shown in fig. 3f, in the event image to be detected, the luminance change of the pixel point in the 2 nd row and the 2 nd column in the 2 nd row may be + m, the luminance change of the pixel point in the 1 st row and the 2 nd column in the 1 st row may be-p, the luminance change of the pixel point in the 2 nd row and the 4 th column in the 2 nd row may be-q, the luminance change of the pixel point in the 1 st row and the 1 st column in the 1 st row may be + h, and the luminance change of the pixel point in the 4 th row and the 4 th column in the 4 th row may be k + n.

According to the embodiment of the present disclosure, in the pixel matrix, the pixels without luminance variation may be represented as blank, and the luminance variation data thereof may be represented as zero.

According to the embodiment of the disclosure, the event image to be detected is obtained by cumulatively adding the brightness change data of the pixel points included in the event stream with the preset duration, and as the event stream is the effective pixel which is output only when the brightness change of the pixel points is preset with the threshold value, the brightness change data of the effective pixel points are cumulatively added to generate the event image to be detected, so that the input of redundant data is reduced.

According to the embodiment of the disclosure, wavelet decomposition is performed on the event image to be detected to obtain a wavelet signal of each layer, and the wavelet decomposition can be performed through Matlab software.

According to the embodiment of the present disclosure, a process of denoising wavelet signals in each decomposition layer by using a wavelet threshold function is shown in fig. 4.

Fig. 4 schematically illustrates a logic block diagram for processing an event image to be measured using a wavelet threshold function according to an embodiment of the present disclosure.

As shown in fig. 4, the method of processing an image of an event to be measured using a wavelet threshold function of this embodiment includes operations S410 to S470.

In operation S410, performing wavelet decomposition on the event image to be detected to obtain a wavelet signal of each layer;

in operation S420, it is determined whether a preset condition, which may be a large-scale low resolution, is satisfied for the wavelet coefficients of the wavelet signal of each layer, if so, S430 is performed, and if not, S440 is performed.

In operation S430, the layer of wavelet coefficients is retained and used to perform operation S470.

In operation S440, it is determined whether the wavelet coefficient of the wavelet signal of the corresponding layer that does not satisfy the preset condition is greater than the threshold coefficient, if yes, S450 is performed, and if not, S460 is performed.

In operation S450, wavelet coefficients are retained, and the layer of wavelet coefficients is used to perform operation S470.

In operation S460, the wavelet coefficients are set to zero, and the layer wavelet coefficients are used to perform operation S470.

In operation S470, the wavelet inverse transformation is performed on the wavelet signal of the corresponding decomposition layer using the judgment result of the wavelet coefficient for each layer, so as to obtain an image to be measured.

According to the embodiment of the disclosure, the event image to be detected is subjected to denoising processing through wavelet change, so that the method has the advantages of decorrelation, multi-resolution, low entropy and flexibility, and the detail characteristics in the original event image to be detected are well kept while denoising is carried out, so that the method is used for target detection and the accuracy of detection is improved.

According to an embodiment of the present disclosure, the target detection method further includes:

According to an embodiment of the present disclosure, the pixel location coordinates of the target, for example: the pixel position coordinates (x01, y01) of the object a and the pixel position coordinates (x02, y02) of the object b may construct a rectangle with the smallest area by taking the pixel position coordinates (x01, y01) of the object a as a central point, and as the object detection frame a, for example: the four vertex coordinates of the target detection box A may be A (x1, y1), B (x2, y2), C (x3, y3), D (x4, y 4). A rectangle with the smallest area may be constructed with the pixel position coordinates (x02, y02) of the object B as the center point, and the four vertices of the object detection box B may be a (x11, y11), B (x22, y22), C (x33, y33), and D (x44, y 44).

According to the embodiment of the disclosure, the number of targets in the image to be detected is determined by counting the number of target detection frames. For example: in a detection area, by the target detection method provided by the embodiment of the disclosure, n target detection frames are constructed in an image to be detected, and then the target number of the image to be detected can be determined to be n.

According to the embodiment of the disclosure, the number of targets in the image to be detected is determined by constructing the target detection frames according to the target position coordinates and counting the number of the target detection frames, so that the number of the targets in the target detection area can be accurately counted.

According to the embodiment of the disclosure, the method for determining the number of targets in the image to be detected by counting the number of the target detection frames comprises the following steps:

According to the embodiment of the disclosure, in an actual application scene, a plurality of dynamic vision sensors can be erected in the same detection area, and the event stream output by each dynamic vision sensor can be converted into the event image and then input into the convolutional neural network for detection according to the target detection method provided by the embodiment of the disclosure, so as to output the target detection result. The number of the target detection frames of the images to be detected generated by the plurality of dynamic vision sensors in the same time period can be respectively counted, and then the number of the targets in the images to be detected can be determined by calculating the average value of the number counting results of the plurality of target detection frames. For example: in the same time period, the number of target detection frames in the image to be detected obtained by processing the event stream output by the dynamic time sensor a is m, the number of target detection frames in the image to be detected obtained by processing the event stream output by the dynamic time sensor B is n, and the average value of m and n can be used as the target number of the detection area.

According to the embodiment of the disclosure, the target number in the image to be detected is determined by calculating the average value of the number of the target detection frames in the plurality of images to be detected, and the accuracy of the target data statistics can be improved.

Fig. 5 schematically illustrates a flow chart of a training method of a pre-trained convolutional neural network according to an embodiment of the present disclosure.

As shown in fig. 5, the training method of the pre-trained convolutional neural network of this embodiment includes operations S510 to S540.

In operation S510, a historical event stream output by a dynamic vision sensor is received, wherein the historical event stream represents a historical time-series pixel signal output when a luminance change accumulation reaches a preset threshold value at a target position, and the historical time-series pixel signal includes historical target pixel position information and historical target pixel luminance change information.

In operation S520, generating a historical event image from the historical event stream within a preset time period according to the historical target pixel position information and the historical target pixel brightness change information;

in operation S530, a sample data set is obtained by extracting feature information of the historical event image;

in operation S540, the convolutional neural network model to be trained is trained by using the sample data set, so as to obtain a pre-trained convolutional neural network model.

According to an embodiment of the present disclosure, the convolutional neural network model may be selected from YOLO v5 convolutional neural network models. However, because the network structure in the YOLO v5 convolutional neural network model is based on the base architecture of the darden-53 as the network, the base architecture based on the darden-53 as the network has the problems of excessive network parameters and large calculation amount, and in order to lighten the network, the base network architecture in the convolutional neural network model in the embodiment of the present disclosure may be selected as MobileNet v 2. The main network layer of the MobileNet v2 is a deep separable convolutional layer, a standard convolutional layer with a convolutional kernel of 3 × 3 in a middle-deep layer network layer of the YOLO v5 convolutional neural network model can be replaced by the deep separable convolutional layer of the MobileNet v2, and a ReLu activation function in the MobileNet v2 network architecture is adopted to reduce the parameter number and the calculation amount in the convolutional neural network model in the embodiment of the disclosure, so that the purpose of light weight calculation is achieved.

According to the embodiment of the disclosure, a convolutional neural network model to be trained is trained by using a sample data set, and supervised model training can be performed by taking whether sample data contains a target and pixel coordinates of the target as sample labels.

According to the embodiment of the disclosure, as the sample data is obtained by converting the event stream output by the dynamic vision sensor into the event image and then performing denoising processing, the sample data has less redundant data and prominent features, and the efficiency of model training is improved.

Based on the target detection method, the disclosure also provides a target detection device. The apparatus will be described in detail below with reference to fig. 6.

Fig. 6 schematically shows a block diagram of the structure of the object detection apparatus according to the embodiment of the present disclosure.

As shown in fig. 6, the object detection apparatus 600 of this embodiment includes a receiving module 610, a generating module 620, an extracting module 630, and a detecting module 640.

The receiving module 610 is configured to receive an event stream to be detected output by the dynamic vision sensor, where the event stream represents a time-series pixel signal to be detected output when a luminance change accumulation reaches a preset threshold at a target position, and the time-series pixel signal to be detected includes target pixel position information and target pixel luminance change information. In an embodiment, the receiving module 61O may be configured to perform the operation S210 described above, which is not described herein again.

The generating module 620 is configured to generate an event image to be detected from the event stream to be detected within a preset time length according to the target pixel position information and the target pixel brightness change information. In an embodiment, the generating module 620 may be configured to perform the operation S220 described above, which is not described herein again.

The extracting module 630 is configured to extract feature information of the event image to be detected, so as to obtain an image to be detected. In an embodiment, the extracting module 630 may be configured to perform the operation S230 described above, which is not described herein again.

The detection module 640 is configured to input the image to be detected into a pre-trained convolutional neural network model, and output a target detection result. In an embodiment, the detecting module 640 may be configured to perform the operation S240 described above, which is not described herein again.

According to an embodiment of the present disclosure, a generation module includes a construction unit and a generation unit. The construction unit is used for constructing N pixel matrixes to be tested according to the time sequence pixel signals to be tested at N moments in the preset time length. And the generating unit is used for sequentially accumulating and summing the pixel brightness change data of each pixel point in the N-1 th pixel matrix and the pixel brightness change data of each pixel point in the N-1 th pixel matrix according to the time sequence for generating the time sequence pixel signal to be detected to obtain the event image to be detected.

According to the embodiment of the present disclosure, any plurality of the receiving module 610, the generating module 620, the extracting module 630 and the detecting module 640 may be combined into one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the receiving module 610, the generating module 620, the extracting module 630, and the detecting module 640 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware, and firmware, or any suitable combination of any of them. Alternatively, at least one of the receiving module 610, the generating module 620, the extracting module 630, the detecting module 640 may be at least partially implemented as a computer program module, which, when executed, may perform a corresponding function.

As shown in fig. 7, an electronic device 700 according to an embodiment of the present disclosure includes a processor 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. The processor 701 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 702 and/or the RAM 703. Note that the programs may also be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 700 may also include input/output (I/O) interface 705, which input/output (I/O) interface 705 also connects to bus 704, according to an embodiment of the present disclosure. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to an embodiment of the present disclosure, a computer-readable storage medium may include the above-described ROM 702 and/or RAM 703 and/or one or more memories other than the ROM 702 and RAM 703.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the method provided by the embodiment of the disclosure.

The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 701. The above described systems, devices, modules, units, etc. may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication section 709, and/or installed from the removable medium 711. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program, when executed by the processor 701, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be appreciated by a person skilled in the art that various combinations or/and combinations of features recited in the various embodiments of the disclosure and/or in the claims may be made, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments of the present disclosure and/or the claims may be made without departing from the spirit and teachings of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A method of target detection, comprising:

receiving an event stream to be detected output by a dynamic vision sensor, wherein the event stream to be detected represents a target position, a time sequence pixel signal to be detected is output when the brightness change accumulation reaches a preset threshold value, and the time sequence pixel signal to be detected comprises target pixel position information and target pixel brightness change information;

obtaining an image to be detected by extracting the characteristic information of the event image to be detected; and

2. The method according to claim 1, wherein the generating an event image to be measured from the event stream to be measured within a preset time length according to the target pixel position information and the target pixel brightness change information includes:

constructing N pixel matrixes to be tested according to the time sequence pixel signals to be tested at N moments in the preset time length;

and sequentially adding the pixel brightness change data of each pixel point in the (N-1) th historical pixel matrix and the pixel brightness change data of each pixel point in the (N) th pixel matrix in an accumulated manner according to the time sequence for generating the time sequence pixel signal to be detected, so as to obtain the event image to be detected.

3. The method according to claim 1, wherein the obtaining the image to be tested by extracting the feature information of the event image to be tested comprises:

4. The method of claim 1, the target detection result comprising target pixel location coordinates, further comprising:

5. The method of claim 4, wherein the determining the number of targets in the image to be tested by counting the number of the target detection frames comprises:

respectively counting the number of the target detection frames in each image to be detected;

6. The method of claim 1, wherein the training method of the pre-trained convolutional neural network model comprises:

Receiving a historical event stream output by the dynamic vision sensor, wherein the historical event stream represents a historical time-series pixel signal output when the brightness change accumulation reaches a preset threshold value at a target position, and the historical time-series pixel signal comprises historical target pixel position information and historical target pixel brightness change information;

generating a historical event image from the historical event stream within a preset time length according to the historical target pixel position information and the historical target pixel brightness change information;

obtaining a sample data set by extracting the characteristic information of the historical event image; and

and training a convolutional neural network model to be trained by utilizing the sample data set to obtain the pre-trained convolutional neural network model.

7. An object detection device comprising:

the system comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is used for receiving an event stream to be detected output by a dynamic vision sensor, the event stream to be detected represents a time sequence pixel signal to be detected which is output when the brightness change accumulation reaches a preset threshold value at a target position, and the time sequence pixel signal to be detected comprises target pixel position information and target pixel brightness change information;

The first generation module is used for generating an event image to be detected from the event stream to be detected within a preset time length according to the target pixel position information and the target pixel brightness change information;

the first extraction module is used for extracting the characteristic information of the event image to be detected to obtain an image to be detected;

and the detection module is used for inputting the image to be detected into a pre-trained convolutional neural network model and outputting a target detection result.

8. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-6.

9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 6.

10. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 6.