CN112784711A

CN112784711A - Moving object detection method and device

Info

Publication number: CN112784711A
Application number: CN202110022668.6A
Authority: CN
Inventors: 陈宏伟; 黄泓皓; 胡成洋
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2021-01-08
Filing date: 2021-01-08
Publication date: 2021-05-11
Also published as: WO2022148423A1

Abstract

The disclosure provides a moving object detection method and device. The moving object detection method comprises the following steps: receiving an optical signal from a scene to be detected, and carrying out optical imaging on the scene to be detected by using the optical signal to generate an imaging optical signal of the scene to be detected, wherein the scene to be detected comprises a moving object to be detected; sequentially modulating the imaging optical signal with a modulation signal in the modulation signal set to generate a modulated optical signal within a predetermined time period, and generating a coded image based on the modulated optical signal; and detecting the coded image by using a moving object detection algorithm matched with the modulation signal set so as to identify the object to be detected in the coded image. The moving object detection method can identify the category of the moving object to be detected and a plurality of groups of position information according to the time sequence from a single coded image under the condition of no need of a high-speed camera, and can realize efficient tracking detection of the high-speed moving object.

Description

Moving object detection method and device

Technical Field

The present disclosure relates to the field of computer vision, and more particularly, to a moving object detection method and apparatus.

Background

Object detection is one of the classical problems in the field of computer vision, the task of which is usually to identify the location and class of objects in an image. With the development of machine learning techniques such as deep learning, object detection algorithms for static objects in a static image become mature, and in practical applications, tracking detection of moving objects is often required. Conventionally, a video signal including a moving object may be generated and subject detection is performed on the video signal frame by frame to realize moving object detection, and precise subject detection is performed on the premise that it is ensured that there is no motion blur per frame image in the video signal, which can be generally realized by capturing a video using a high-speed camera. However, the use of a high-speed camera not only reduces the light flux per frame of image, but also significantly increases the data burden and cost of the system.

Disclosure of Invention

In order to solve the above problems, the present disclosure proposes a moving object detection algorithm and apparatus capable of realizing detection of a moving object without a high-speed camera.

According to an aspect of an embodiment of the present disclosure, there is provided a moving object detection method including: receiving an optical signal from a scene to be detected, and performing optical imaging on the scene to be detected by using the optical signal to generate an imaging optical signal of the scene to be detected, wherein the scene to be detected comprises a moving object to be detected; sequentially modulating the imaging optical signal with a modulation signal in a modulation signal set to generate a modulated optical signal within a predetermined time period, and generating a coded image based on the modulated optical signal; and detecting the coded image by using a moving object detection algorithm matched with the modulation signal set so as to identify the object to be detected in the coded image.

According to an example of the embodiment of the present disclosure, the modulation signal set includes a first number of modulation signals, the predetermined period of time has a first duration, each modulation signal modulates the imaging light signal for a second duration, and the first duration is greater than or equal to a product of the first number and the second duration; wherein sequentially modulating the imaging light signal with a modulation signal of a modulation signal set within a predetermined time period to obtain a coded image comprises: sequentially selecting each modulation signal in a first number of modulation signals in the modulation signal set, and modulating the imaging optical signal by using the modulation signals to obtain modulated optical signals corresponding to the modulation signals in a second time period, and sequentially obtaining a first number of modulated optical signals respectively corresponding to the first number of modulation signals in a first time period; forming the encoded image using the first number of modulated light signals for a first duration.

According to an example of the embodiment of the present disclosure, wherein modulating the imaging light signal with the modulation signal to obtain a modulated light signal corresponding to the modulation signal in a second time period includes: inputting the modulation signal into a spatial light modulator, the spatial light modulator comprising a plurality of sub-cells; adjusting the plurality of sub-cells of the spatial light modulator with the modulation signal; and modulating the spatial distribution of the imaging light signal by using the adjusted plurality of subunits to obtain a modulated light signal corresponding to the modulation signal in a second time period.

According to an example of the disclosed embodiment, wherein forming the encoded image with the first number of modulated light signals for a first duration comprises: the first number of modulated light signals is continuously acquired with an image detector for a first duration and the encoded image is generated based on the acquired light signals.

According to an example of the embodiment of the present disclosure, detecting the coded image by using a moving object detection algorithm matched with the modulation signal set to identify an object to be detected in the coded image includes: and detecting the coded image to determine the position and the category of the object to be detected in the coded image.

According to an example of the embodiment of the present disclosure, determining the position and the category of the object to be detected in the encoded image includes: determining, based on the encoded image, a plurality of decoded images that respectively correspond to each of the first number of modulated signals; and determining the position and the category of the object to be detected in the plurality of decoded images.

According to an example of the embodiment of the present disclosure, wherein the modulation signal set matched with the moving object detection algorithm is determined by: acquiring a training data set, wherein the training data set comprises a training image sequence and marking position information and marking types of one or more moving objects contained in each training image in the training image sequence; and carrying out supervised training on the moving object detection algorithm by utilizing the labeled position information and the labeled category so as to determine the modulation signal set.

According to an example of an embodiment of the present disclosure, wherein the moving object detection algorithm comprises a motion coding module and a moving object detection module, and wherein supervised training the moving object detection algorithm with the labeled location information and labeled category to determine the set of modulation signals comprises: encoding, by the motion encoding module, the training image sequence using an encoding signal set to obtain a training encoded image; carrying out object detection on the training coded image by using the moving object detection module to obtain a detection result; and performing supervision training on the detection result by using the labeled position information and the labeled category to obtain a trained coded signal set, and determining the trained coded signal set as the modulation signal set.

According to an example of the embodiment of the present disclosure, wherein, with the motion encoding module, encoding the training image sequence using an encoding signal set to obtain a training encoded image comprises: and multiplying the coding signals in the coding signal set with corresponding training images in the training image sequence pixel by pixel respectively, and summing the multiplication results to obtain the training coding images.

According to another aspect of the embodiments of the present disclosure, there is provided a moving object detection device including: the imaging lens is configured to receive an optical signal from a scene to be detected and perform optical imaging on the scene to be detected by using the optical signal to generate an imaging optical signal of the scene to be detected, wherein the scene to be detected comprises a moving object to be detected; an encoding unit configured to sequentially modulate the imaging light signal with a modulation signal in a modulation signal set to generate a modulated light signal within a predetermined period of time, and generate an encoded image based on the modulated light signal; and the detection unit is configured to detect the coded image by using a moving object detection algorithm matched with the modulation signal set so as to identify the object to be detected in the coded image.

According to an example of the embodiment of the present disclosure, the modulation signal set includes a first number of modulation signals, the predetermined period has a first duration, each modulation signal modulates the imaging light signal for a second duration, and the first duration is greater than or equal to a product of the first number and the second duration, and the modulation unit is further configured to sequentially select each modulation signal of the first number of modulation signals in the modulation signal set and modulate the imaging light signal with the modulation signal to obtain modulated light signals corresponding to the modulation signals in the second duration and obtain a first number of modulated light signals corresponding to the first number of modulation signals respectively in the first duration; and forming the encoded image using the first number of modulated light signals for a first duration.

According to an example of the embodiment of the present disclosure, wherein the encoding unit includes a spatial light modulator, the spatial light modulator includes a plurality of sub-units, and is configured to: receiving a modulation signal; adjusting the plurality of sub-cells of the spatial light modulator with the modulation signal; and modulating the spatial distribution of the imaging light signal by using the adjusted plurality of subunits to obtain a modulated light signal corresponding to the modulation signal in a second time period.

According to an example of an embodiment of the present disclosure, wherein the encoding unit further comprises an image detector configured to: the first number of modulated light signals is continuously acquired for a first duration and the encoded image is generated based on the acquired light signals.

According to an example of an embodiment of the present disclosure, wherein the detection unit is further configured to: and detecting the coded image by using a moving object detection algorithm matched with the modulation signal set so as to determine the position and the category of the object to be detected in the coded image.

According to an example of an embodiment of the present disclosure, wherein the detection unit is further configured to: determining, based on the encoded image, a plurality of decoded images that respectively correspond to each of the first number of modulated signals; and determining the position and the category of the object to be detected in the plurality of decoded images.

According to an example of an embodiment of the present disclosure, wherein the moving object detection algorithm comprises a motion coding module and a moving object detection module, and wherein supervised training of the moving object detection algorithm with the labeled location information and labeled class to determine the set of modulation signals comprises: encoding, by the motion encoding module, the training image sequence using an encoding signal set to obtain a training encoded image; carrying out object detection on the training coded image by using the moving object detection module to obtain a detection result; and performing supervision training on the detection result by using the labeled position information and the labeled category to obtain a trained coded signal set, and determining the trained coded signal set as the modulation signal set.

According to another aspect of an example of an embodiment of the present disclosure, there is provided a moving object detection device including: the imaging lens is configured to receive an optical signal from a scene to be detected and perform optical imaging on the scene to be detected by using the optical signal to generate an imaging optical signal of the scene to be detected, wherein the scene to be detected comprises a moving object to be detected; a spatial light modulator configured to receive a set of modulation signals and to modulate the imaging light signal under control of the set of modulation signals to generate a modulated light signal; an image detector configured to generate an encoded image based on the modulated light signal; and one or more processors configured to: sequentially providing the modulation signals in the modulation signal set to the spatial light modulator within a predetermined time period to control the spatial light modulator to modulate the imaging light signal with the modulation signals to generate modulated light signals, and control the image detector to generate a coded image based on the modulated light signals; and detecting the coded image by using a moving object detection algorithm matched with the modulation signal set so as to identify an object to be detected in the coded image.

According to the moving object detection algorithm and the moving object detection device in the aspects of the embodiment of the disclosure, the modulation signal set is used for sequentially modulating the imaging optical signal of the scene to be detected to obtain the coded image, and the moving object detection algorithm matched with the modulation signal set is used for detecting the object of the coded image, so that the category of the moving object to be detected and a plurality of groups of position information according to the time sequence can be identified from a single coded image without a high-speed camera, thereby realizing the tracking detection of the high-speed moving object, greatly improving the detection efficiency of the moving object, and reducing the cost and data burden of the system.

Drawings

The above and other objects, features and advantages of the embodiments of the present disclosure will become more apparent by describing in more detail the embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

FIG. 1 shows an overall architecture of a moving object detection system according to an example of an embodiment of the present disclosure;

FIG. 2 shows a flow chart of a moving object detection method according to an embodiment of the present disclosure;

FIG. 3 shows a schematic diagram of a training process for a moving object detection algorithm according to an example of an embodiment of the present disclosure;

FIG. 4A shows a schematic diagram of detection results according to an example of an embodiment of the present disclosure;

FIG. 4B shows a schematic diagram of a detection result according to another example of an embodiment of the present disclosure;

fig. 4C shows a schematic diagram of a detection result according to another example of an embodiment of the present disclosure;

fig. 5 shows a schematic structural diagram of a moving object detection device according to an embodiment of the present disclosure; and

fig. 6 shows a schematic structural diagram of a moving object detection device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure. It is to be understood that the described embodiments are merely exemplary of some, and not all, of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without any inventive step, are intended to be within the scope of the present disclosure.

In practical application, a moving object often needs to be tracked and detected, and particularly for a high-speed moving object, a high-speed camera is often required to be used for shooting a video for detecting the moving object, so that the problem of motion blur caused by high-speed motion of the object is solved. The embodiment of the disclosure provides a moving object detection method, which can realize tracking detection of a high-speed moving object under the condition of not needing a high-speed camera. In the moving object detection method according to the embodiment of the disclosure, a single coded image of a scene to be detected containing a moving object can be obtained within any predetermined time period (for example, the exposure time is far longer than that of a high-speed camera), and the category of the moving object and multiple sets of position information according to the time sequence are detected from the single coded image, so that efficient detection of the moving object, especially the high-speed moving object, is realized, the detection efficiency of the moving object is greatly improved, and the cost and the data load of the system are reduced.

The moving object detection method and apparatus according to the embodiments of the present disclosure can be implemented, for example, as a moving object detection system including a hardware portion and a software portion. Fig. 1 shows the overall architecture of an example moving object detection system according to an embodiment of the present disclosure, where the hardware portion may include an imaging system 110 for generating a coded image of a scene to be detected containing an object to be detected, and the software portion may include a moving object detection algorithm 120 for detecting the coded image to identify the object to be detected. An example structure of the imaging system 110 and the moving object detection algorithm 120 is also shown in fig. 1, as shown in fig. 1, the imaging system 110 may include, for example, an imaging lens, a spatial light modulator, an image detector, a relay lens, and other required devices, and the moving object detection algorithm 120 may include, for example, a moving object detection module 121, but the embodiment of the disclosure is not limited thereto, and the imaging system 110 and the moving object detection algorithm 120 may also include other required devices or structures. As shown in fig. 1, the modulation signal set matched with the moving object detection algorithm 120 is input into the imaging system 110, the imaging system 110 is used to generate a coded image of a scene to be detected, and then the moving object detection algorithm 120 is used to detect the coded image to obtain a detection result of an object to be detected.

Next, a moving object detection method according to an embodiment of the present disclosure is specifically described with reference to fig. 2. FIG. 2 shows a flow diagram of a moving object detection method 200 according to an embodiment of the disclosure.

As shown in fig. 2, in step S210, an optical signal from a scene to be measured is received, and the scene to be measured is optically imaged by the optical signal to generate an imaging optical signal of the scene to be measured. The scene to be detected includes moving objects to be detected, such as moving bicycles, flying automobiles, soaring airplanes, running animals and any kinds or number of moving objects, and here, the embodiment of the present disclosure does not specifically limit the kinds and number of the objects to be detected in the scene to be detected. According to an example of the embodiment of the present disclosure, an imaging lens may be utilized to receive an optical signal of a scene to be detected, and the received optical signal may be utilized to optically image the scene to be detected to generate an imaging optical signal of the scene to be detected. The imaging lens may be part of an imaging system and may be, for example, a convex lens, a concave lens, or various combinations thereof, which are not specifically limited by the embodiments of the present disclosure. The generated imaging light signal is a two-dimensional light field signal that is time-varying and represents the scene to be measured, and the imaging light signal may also be referred to as a time-varying imaging light signal, which may reflect changes in the scene to be measured during imaging, such as a motion process of an object to be measured in the scene to be measured, in real time.

In step S220, the imaging optical signal is sequentially modulated with the modulation signals in the modulation signal set to generate a modulated optical signal within a predetermined time period, and generated based on the modulated optical signal to obtain an encoded image. Here, the predetermined time period has a first duration, and the first duration may be, for example, an exposure time of the imaging system, and may be arbitrarily set according to actual requirements, for example, the first duration may be set to be much longer than an exposure time of a high-speed camera, or may be set to other suitable durations, which is not specifically limited by the embodiment of the present disclosure.

According to an example of an embodiment of the present disclosure, the modulation signal set includes a first number of modulation signals, wherein each modulation signal may be a two-dimensional matrix corresponding to the imaging light signal as a two-dimensional light field signal, e.g., a two-dimensional matrix composed of 0 and 1. Here, the first number may be set according to practical application requirements, and the embodiment of the present disclosure is not particularly limited in this regard. According to an example of an embodiment of the present disclosure, the modulation signal set may be determined by machine learning training a moving object detection algorithm, and the determined modulation signal set is matched to the moving object detection algorithm, which will be described in further detail below.

According to an example of the embodiment of the present disclosure, a time length in which each of the modulation signals in the modulation signal set modulates the imaging light signal is a second time length, and the first time length of the predetermined period is greater than or equal to a product of the first number and the second time length. That is, the imaging light signal may be modulated a plurality of times with different modulation signals, respectively, within a predetermined period of time. For example, when the first duration of the predetermined period is equal to the product of the first number and the second duration, the imaging light signal may be modulated a plurality of times with the first number of modulation signals within the predetermined period, and the number of modulation times is equal to the first number. Specifically, each modulation signal in the first number of modulation signals in the modulation signal set may be sequentially selected, and the imaging optical signal may be modulated by using the selected modulation signal, as described above, if the duration of modulating the imaging optical signal by using each modulation signal is the second duration, the modulated optical signal corresponding to the selected modulation signal may be obtained within the second duration; the modulation is performed a plurality of times continuously in a predetermined period of time, thereby obtaining a first number of modulated optical signals corresponding to the first number of modulated signals, respectively, in a first duration of the predetermined period of time. Subsequently, a coded image is formed using the resulting first number of modulated optical signals for a first duration.

According to an example of an embodiment of the present disclosure, a spatial light modulator may be employed to modulate an imaging light signal with a modulation signal. A spatial light modulator is a device that modulates the spatial distribution of light waves and may comprise a plurality of individual sub-elements arranged in a one-dimensional array or a two-dimensional array, each of which may independently change its optical properties, such as reflectivity, refractive index, transmittance, etc., under the control of an optical signal or an electrical signal, and may be used to modulate the spatial distribution of light waves passing through them. For example, the spatial light modulator may modulate optical parameters such as amplitude, intensity, phase, polarization state, etc. of the optical wave. The spatial light modulator may be a liquid crystal type spatial light modulator or a digital microlens array, etc., and this is not particularly limited by the embodiments of the present disclosure. In the disclosed embodiment, each modulation signal in the modulation signal set can be used as a control signal of the spatial light modulator. Specifically, a modulation signal is input to the spatial light modulator, and the plurality of sub-cells of the spatial light modulator are adjusted by the modulation signal, for example, in the case where the modulation signal is a two-dimensional matrix, the plurality of sub-cells of the spatial light modulator may be individually controlled by values of respective elements in the modulation signal matrix to adjust optical properties of the plurality of sub-cells. Therefore, the spatial distribution of the imaging optical signal can be modulated by the adjusted plurality of sub-units, for example, optical parameters such as amplitude, intensity, phase, polarization state, etc. of the imaging optical signal can be modulated, so as to obtain a modulated optical signal corresponding to the input modulation signal within a modulation time with the length of the second duration.

It should be noted that, although the spatial light modulator is used to modulate the imaging light signal in the above example, the embodiments of the present disclosure are not limited thereto, and any other device capable of changing the spatial distribution of the imaging light signal may be used.

As described above, after the first number of modulated light signals corresponding to the first number of modulated signals are obtained within the first duration of the predetermined period of time, the encoded image is formed using the obtained first number of modulated light signals. According to an example of an embodiment of the present disclosure, a first number of modulated light signals may be continuously acquired with an image detector for a first duration and a coded image may be generated based on the acquired light signals. Here, the image detector may be any device capable of converting an optical signal into an electrical signal, for example, a Charge Coupled Device (CCD) sensor, a Complementary Metal Oxide Semiconductor (CMOS) sensor, and the like, which is not particularly limited in the embodiments of the present disclosure. For example, the image detector may continuously collect (for example, may be referred to as exposure) the modulated optical signal for a first time period through the relay lens, and generate a coded image after performing processes such as photoelectric conversion on the collected optical signal. The relay lens may be, for example, a convex lens, a concave lens, or various combinations thereof, which are not particularly limited by the embodiments of the present disclosure. In general, the exposure time of the image detector is much greater than the modulation speed of the spatial light modulator, and therefore, in the case of employing the spatial light modulator and the image detector, the exposure time of the imaging system according to the embodiment of the present disclosure will depend on the exposure time of the image detector, and thus the first duration of the predetermined period of time can be set as the exposure time of the image detector.

After obtaining the coded image, in step S230, the coded image may be detected by using a moving object detection algorithm matched with the modulation signal set, so as to identify the object to be detected in the coded image. According to an example of an embodiment of the present disclosure, a moving object detection algorithm may include a moving object detection module, which may include, for example, a motion decoding module and an object detection module to perform motion decoding and object detection on a coded image, respectively. In the embodiment of the present disclosure, both the motion decoding module and the object detection module may be constructed by a neural network, and the object detection module may be implemented by using an object recognition algorithm based on the neural network, for example, a Regional Convolutional Neural Network (RCNN), a high-speed regional convolutional neural network (fast-RCNN), a single multi-box detection (SSD), and the like, which is not limited in this embodiment of the present disclosure. As previously mentioned, the set of modulation signals may be determined by machine learning training of the moving object detection algorithm, and the determined set of modulation signals is matched to the moving object detection algorithm.

According to an example of the embodiment of the present disclosure, when a moving object detection algorithm is used to detect a coded image, the category of an object to be detected can be identified. For example, when the object to be detected is a moving car, the class of the object to be detected can be identified as "car" by detecting the encoded image. In addition, with the moving object detection method according to the embodiment of the present disclosure, when detecting the coded image, the position of the object to be detected in the coded image can also be determined. After identifying the category and the position of the object to be detected, the category of the object to be detected may be labeled in the encoded image, for example, the identified object to be detected may be selected using a border, and its category (e.g., labeled "car") may be labeled at the border.

In addition, the moving object detection method according to the embodiment of the present disclosure can also detect a plurality of sets of position information and categories of an object to be detected from a single encoded image. Specifically, when the encoded image is decoded using the moving object detection algorithm, a plurality of decoded images may be obtained based on a single encoded image, and the plurality of decoded images may respectively correspond to respective ones of the first number of modulation signals. According to an example of an embodiment of the present disclosure, the number of decoded images obtained from a single encoded image may be equal to the first number of modulation signals. The object detection is performed on the obtained plurality of decoded images respectively, and the position and the category of the object to be detected in each decoded image in the plurality of decoded images can be determined. Since the imaging light signal of the scene to be detected is time-varying, and each modulation signal in the first number of modulation signals is sequentially selected to modulate the imaging light signal within a predetermined time period to generate the encoded image, the plurality of decoded images respectively corresponding to each modulation signal correspond to the scene to be detected at different times, and therefore, the position of the object to be detected in the plurality of decoded images can reflect the motion trajectory of the object to be detected in the scene to be detected.

The method of training the moving object detection algorithm to obtain the set of encoded signals is further described below with reference to fig. 3. Fig. 3 shows a schematic diagram of a training process of a moving object detection algorithm according to an example of an embodiment of the present disclosure. As shown in fig. 3, the moving object detection algorithm may include a moving object detection module 320, and the moving object detection module 320 may further include, for example, a motion decoding module 321 and an object detection module 322 to perform motion decoding and object detection on the encoded image, respectively. The motion decoding module 321 and the object detecting module 322 may be constructed by using a neural network, for example, a network structure including a residual block, a convolution block, and the like, which is not particularly limited in the embodiment of the present disclosure. In addition, the object detection module 322 may be implemented by using an object recognition algorithm based on a neural network, such as RCNN, fast-RCNN, SSD, etc., which is not specifically limited by the embodiment of the disclosure. In addition, the moving object detection algorithm may further include a motion encoding module 310, where the motion encoding module 310 is a mathematical description of the hardware components of the moving object detection system including the imaging lens, the spatial light modulator, the image detector, and the like, i.e., the motion encoding module 310 may simulate the physical process of generating the encoded image. Thus, a set of modulation signals suitable for use in a moving object detection system may be obtained by machine learning training of a moving object detection algorithm that includes the motion encoding module 310.

When training the moving object detection algorithm, first, a training data set for training the moving object detection algorithm is obtained, where the training data set may include a training image sequence, and each training image in the training image sequence includes annotation position information and annotation categories of one or more moving objects. Here, the training image sequence is a set of a plurality of training images that are continuous in time, for example, the training image sequence may be a piece of video signal, and each frame image of the video signal is taken as each training image in the training image sequence. The number of training images used in each training may be equal to the first number of modulation signals described above, or may be greater than or equal to the first number, which is not specifically limited in the embodiments of the present disclosure. For example, in the case where the training image sequence included in the training data set has 80 training images and the first number is 8, 8 training images may be selected for each training. The training data set may be from a published annotated data set used for video object detection, such as an image network video object detection data set (ImageNet VID), and the like. And then, carrying out supervision training on the moving object detection algorithm by using the labeled position information and the labeled category in the training data set so as to determine a modulation signal set.

Specifically, as shown in fig. 3, a training image sequence is first encoded by the motion encoding module 310 using an encoding signal set to obtain a training encoded image. As mentioned above, the motion encoding module is a mathematical simulation of a physical process of generating the encoded image, in the example of the step S220, the spatial light modulator sequentially selects the modulation signals in the modulation signal set to modulate the time-varying imaging light signal within a predetermined time period to generate a first number of modulated light signals, and then the image detector continuously collects the first number of modulated light signals to generate the encoded image, which corresponds to a process of multiplying and summing the time-varying imaging light signal and the modulation signal set within the predetermined time period. Therefore, when the moving object detection algorithm is trained, the coded signals in the coded signal set can be multiplied by the corresponding training images in the training image sequence pixel by pixel respectively, and the multiplication results are summed to obtain the training coded image. Here, the set of code signals used in the training process corresponds to the set of modulation signals in step S220 described above, that is, the set of code signals after training may modulate the imaging light signal as the set of modulation signals of the imaging system in the moving object detection system. As shown in the example of fig. 3, a training image sequence comprising a moving beagle dog is multiplied and summed with a set of code signals to obtain a training code image. It can be seen that after motion coding, a series of training images including a clear beagle image are coded as one image, wherein the beagle image becomes blurred. Here, the encoded signals in the encoded signal set may be set to any suitable initial value, which is not specifically limited by the embodiments of the present disclosure.

Subsequently, the moving object detection module 320 performs object detection on the training coded image to obtain a detection result. Specifically, the training coded pictures may first be decoded by the motion decoding module 321 of the moving object detection module 320 to obtain a number of decoded pictures, which may, for example, correspond to the number of coded signals in the set of coded signals and to the number of training pictures in the sequence of training pictures; then, the object detection module 322 of the moving object detection module 320 performs object detection on the obtained plurality of decoded images to obtain detection results, which may be, for example, the category of one or more moving objects and sets of position information in the plurality of decoded images, and the like.

Because the category and position information of one or more moving objects included in each training image in the training image sequence are labeled, the detection result can be supervised and trained by using the labeled category and labeled position information of one or more moving objects. For example, the error between the labeling type and the labeling position information and the detection result can be calculated, and the moving object detection algorithm is supervised and trained by minimizing the error, so that the coding signal set and each network parameter in the moving object detection module are continuously optimized until the optimal coding signal set and the optimal network parameter are obtained.

After machine learning training is performed on the moving object detection algorithm to obtain an optimal coding signal set and optimal network parameters, the moving object detection algorithm is fixed. The trained coding signal set can be used as a modulation signal set to be applied to the moving object detection system according to the embodiment of the disclosure, so as to image and modulate a scene to be detected containing a moving object to be detected to generate a coding image; then, the fixed moving object detection algorithm is used to detect the coded image so as to identify the object to be detected from the coded image, and the specific steps are as described above with reference to the step of the moving object detection method 200 described in fig. 2, and are not described herein again.

Fig. 4A to 4C show examples of detection results obtained by the moving object detection method according to the embodiment of the present disclosure. In the example of fig. 4A, (a) is a coded image obtained by using the moving object detection method according to the embodiment of the present disclosure, which includes blurred bicycle videos, the coded image being an image obtained by modulating and image-capturing the time-varying imaging light signal of the scene to be measured including moving bicycles with eight modulation signals, respectively, that is, the first number of modulation signal sets in the above step S220 is 8. After the coded images are detected by using the moving object detection method according to the embodiment of the disclosure, the category of the driving vehicle and the corresponding eight sets of position information can be identified from a single coded image. For convenience of explanation, the detection results as shown in (b) - (i) are shown with a high-definition image sequence of a bicycle taken with a high-speed camera as a background, in which different positions of the bicycle are marked with rectangular boxes and the category of the bicycle is marked in each image. In (b) - (i), different positions and categories of the bicycle determined by the moving object detection method according to the embodiment of the disclosure are labeled in time sequence, that is, the motion track of the bicycle is recovered, and tracking detection of the moving bicycle is realized.

In the example of fig. 4B, (a) is a coded image obtained by using the moving object detection method according to the embodiment of the present disclosure, which includes blurred car images, and the coded image is an image obtained by modulating and image-capturing a time-varying imaging light signal of a scene to be measured including a moving car with eight coded signals, respectively, that is, the first number of modulation signal sets in the above step S220 is 8. After the coded images are detected using the moving object detection method according to the embodiment of the present disclosure, the category of the car and the corresponding eight sets of position information can be identified from a single coded image, as shown in (b) - (i). Similarly, for ease of explanation, a high-definition image sequence of a car taken with a high-speed camera is taken as the background of (b) - (i). In (b) - (i), different positions and types of the automobile determined by the moving object detection method according to the embodiment of the disclosure are marked according to the time sequence, that is, the motion track of the automobile is recovered, and the tracking detection of the moving automobile is realized.

In addition, the moving object detection method according to the embodiment of the disclosure can realize tracking detection of a plurality of moving objects at the same time. In the example of fig. 4C, (a) is a coded image obtained by using the moving object detection method according to the embodiment of the present disclosure, which includes a plurality of blurred airplane images, and the coded image is an image obtained by modulating and image-capturing a time-varying imaging optical signal of a scene to be measured including a moving airplane with eight coded signals, respectively, that is, the first number of modulation signal sets in the above step S220 is 8. After the coded images are detected using the moving object detection method according to the embodiment of the present disclosure, the category of each airplane and the corresponding eight sets of position information of each airplane can be identified from a single coded image, as shown in (b) - (i). Similarly, for ease of explanation, a high-definition image sequence of an airplane photographed with a high-speed camera is taken as a background of (b) - (i). In the steps (b) - (i), different positions and types of each airplane determined by the moving object detection method according to the embodiment of the disclosure are marked according to a time sequence, that is, the movement tracks of a plurality of airplanes are recovered simultaneously, and the tracking detection of the airplane moving at a high speed is realized.

As can be seen from the above description and the examples of fig. 4A to 4C, when the moving object detection method according to the embodiment of the present disclosure is utilized, a single coded image of the moving object can be generated within a longer exposure time, and the category of the moving object and a plurality of sets of position information in time sequence can be detected from the single coded image, so that the efficiency of object detection is greatly improved, especially when tracking detection is performed on a high-speed moving object, the motion trajectory of the high-speed moving object over a long time can be recovered under the condition that only a few coded images are captured by using the moving object detection method according to the embodiment of the present disclosure, and the cost and data burden of the system are greatly reduced while efficient detection of the high-speed moving object is achieved without using a high-speed camera.

Next, a moving object detection device according to an embodiment of the present disclosure is described with reference to fig. 5. Fig. 5 shows a schematic structural diagram of a moving object detection device 500 according to an embodiment of the present disclosure. Since the moving object detection device 500 has the same details as the moving object detection method 200 described above in conjunction with fig. 2, a detailed description of the same is omitted here for the sake of simplicity. As shown in fig. 5, the moving object detection device 500 includes an imaging lens 510, an encoding unit 520, and a detection unit 530. The apparatus 500 may include other components in addition to the three units, however, since these components are not related to the contents of the embodiments of the present disclosure, illustration and description thereof are omitted herein.

The imaging lens 510 is configured to receive the optical signal from the scene to be measured and optically image the scene to be measured with the optical signal to generate an imaged optical signal of the scene to be measured. The scene to be detected includes moving objects to be detected, such as moving bicycles, flying automobiles, soaring airplanes, running animals and any kinds or number of moving objects, and here, the kind and number of the objects to be detected are not specifically limited in the embodiment of the present disclosure. The imaging lens may be part of an imaging system and may be, for example, a convex lens, a concave lens, or various combinations thereof, which are not specifically limited by the embodiments of the present disclosure. The generated imaging light signal is a two-dimensional light field signal that is time-varying and represents the scene to be measured, and the imaging light signal may also be referred to as a time-varying imaging light signal, which may reflect changes in the scene to be measured during imaging, such as a motion process of an object to be measured in the scene to be measured, in real time.

The encoding unit 520 is configured to sequentially modulate the imaging light signal with the modulation signals in the modulation signal set within a predetermined period of time to obtain an encoded image. Here, the predetermined time period has a first duration, and the first duration may be, for example, an exposure time of the imaging system, and may be arbitrarily set according to actual requirements, for example, the first duration may be set to be much longer than an exposure time of a high-speed camera, or may be set to other suitable durations, which is not specifically limited by the embodiment of the present disclosure.

According to an example of an embodiment of the present disclosure, the modulation signal set includes a first number of modulation signals, wherein each modulation signal may be a two-dimensional matrix corresponding to the imaging light signal as a two-dimensional light field signal, e.g., a two-dimensional matrix composed of 0 and 1. Here, the first number may be set according to practical application requirements, and the embodiment of the present disclosure is not particularly limited in this regard. According to an example of an embodiment of the present disclosure, the set of modulation signals may be determined by machine learning training a moving object detection algorithm, and the determined set of modulation signals is matched to the moving object detection algorithm. The details of machine learning training the moving object detection algorithm to determine the modulation signal set are similar to the process described above with reference to fig. 3, and therefore a repeated description of the same is omitted here.

According to an example of the embodiment of the present disclosure, a time length in which each of the modulation signals in the modulation signal set modulates the imaging light signal is a second time length, and the first time length of the predetermined period is greater than or equal to a product of the first number and the second time length. That is, the encoding unit 520 may modulate the imaging light signal with different modulation signals a plurality of times, respectively, within a predetermined period of time. For example, when the first duration of the predetermined period is equal to the product of the first number and the second duration, the encoding unit 520 may modulate the imaging light signal with the first number of modulation signals a plurality of times within the predetermined period, and the number of times of modulation is equal to the first number. Specifically, the encoding unit 520 may sequentially select each of the first number of modulation signals in the modulation signal set, and modulate the imaging optical signal with the selected modulation signal, where as described above, if the duration of modulating the imaging optical signal with each modulation signal is the second duration, the modulated optical signal corresponding to the selected modulation signal may be obtained within the second duration; the encoding unit 520 performs a plurality of modulations consecutively within a predetermined time period, thereby obtaining a first number of modulated optical signals respectively corresponding to the first number of modulated signals within a first duration of the predetermined time period. Subsequently, a coded image is formed using the resulting first number of modulated optical signals for a first duration.

According to an example of an embodiment of the present disclosure, the encoding unit 520 may include, for example, a spatial light modulator. A spatial light modulator is a device that modulates the spatial distribution of light waves and may comprise a plurality of individual sub-elements arranged in a one-dimensional array or a two-dimensional array, each of which may independently change its optical properties, such as reflectivity, refractive index, transmittance, etc., under the control of an optical signal or an electrical signal, and may be used to modulate the spatial distribution of light waves passing through them. For example, the spatial light modulator may modulate optical parameters such as amplitude, intensity, phase, polarization state, etc. of the optical wave. The spatial light modulator may be a liquid crystal type spatial light modulator or a digital microlens array, etc., and this is not particularly limited by the embodiments of the present disclosure. In the disclosed embodiment, each modulation signal in the modulation signal set can be used as a control signal of the spatial light modulator. In particular, the spatial light modulator is configured to receive the modulation signal and to adjust the plurality of sub-cells of the spatial light modulator with the modulation signal, e.g. in case the modulation signal is a two-dimensional matrix, the plurality of sub-cells of the spatial light modulator may be individually controlled with the values of the individual elements of the modulation signal matrix to adjust the optical properties of the plurality of sub-cells. Thus, the spatial light modulator may modulate the spatial distribution of the imaging optical signal by using the adjusted plurality of sub-units, for example, may modulate optical parameters such as amplitude, intensity, phase, and polarization state of the imaging optical signal, so as to obtain a modulated optical signal corresponding to the input modulation signal within a modulation time having the length of the second duration.

It should be noted that, although the spatial light modulator is used to modulate the imaging light signal in the above example, the embodiments of the present disclosure are not limited thereto, and the encoding unit 520 may also include any other device capable of changing the spatial distribution of the imaging light signal.

According to an example of the disclosed embodiment, the encoding unit 520 may further include an image detector configured to continuously acquire a first number of modulated light signals for a first duration and generate an encoded image based on the acquired light signals. Here, the image detector may be any device capable of converting an optical signal into an electrical signal, for example, a Charge Coupled Device (CCD) sensor, a Complementary Metal Oxide Semiconductor (CMOS) sensor, and the like, which is not particularly limited in the embodiments of the present disclosure. For example, the image detector may continuously collect (for example, may be referred to as exposure) the modulated optical signal for a first time period through the relay lens, and generate a coded image after performing processes such as photoelectric conversion on the collected optical signal. The relay lens may be, for example, a convex lens, a concave lens, or various combinations thereof, which are not particularly limited by the embodiments of the present disclosure. Generally, the exposure time of the image detector is much larger than the modulation speed of the spatial light modulator, therefore, in case of employing the spatial light modulator and the image detector, the exposure time of the imaging system according to the embodiment of the present disclosure will depend on the exposure time of the image detector, and thus the first duration of the predetermined period of time may be equal to the exposure time of the image detector.

The detection unit 530 is configured to detect the encoded image using a moving object detection algorithm matching the set of modulation signals to identify an object to be detected in the encoded image. According to an example of an embodiment of the present disclosure, a moving object detection algorithm may include a moving object detection module. The moving object detection module may include, for example, a motion decoding module and an object detection module to perform motion decoding and object detection on the encoded image, respectively. The motion decoding module and the object detection module may be constructed by a neural network, and the object detection module may be implemented by using an object recognition algorithm based on the neural network, such as RCNN, fast-RCNN, SSD, etc., which is not limited in this disclosure. As previously mentioned, the set of modulation signals may be determined by machine learning training of the moving object detection algorithm, and the determined set of modulation signals is matched to the moving object detection algorithm.

According to an example of the embodiment of the present disclosure, the detection unit 530 may identify the category of the object to be detected when the coded image is detected using the moving object detection algorithm. For example, when the object to be detected is a moving car, the class of the object to be detected can be identified as "car" by detecting the encoded image. In addition, with the moving object detection device according to the embodiment of the present disclosure, the detection unit 530 may also determine the position of the object to be detected in the encoded image when detecting the encoded image. After identifying the category and the position of the object to be detected, the category of the object to be detected may be labeled in the encoded image, for example, the identified object to be detected may be selected using a border, and its category (e.g., labeled "car") may be labeled at the border.

In addition, the moving object detection device according to the embodiment of the present disclosure can detect a plurality of sets of position information of the object to be detected from a single code image. Specifically, when the detection unit 530 decodes the encoded image using the moving object detection algorithm, a plurality of decoded images may be obtained based on a single encoded image, and the plurality of decoded images may respectively correspond to respective ones of the first number of modulation signals. According to an example of an embodiment of the present disclosure, the number of decoded images obtained from a single encoded image may be equal to the first number of modulation signals. The detection unit 530 performs object detection on the obtained plurality of decoded images, respectively, and can determine the position and the category of the object to be detected in the plurality of decoded images. Since the imaging light signal of the scene to be detected is time-varying and each of the first number of modulation signals is sequentially selected for modulating the imaging light signal within a predetermined time period to generate the encoded image, the plurality of decoded images respectively corresponding to each modulation signal correspond to the scene to be detected at different times, and therefore, the position of the object to be detected in the plurality of decoded images may reflect the motion trajectory of the object to be detected, as shown in the exemplary detection results in fig. 4A-4C.

By utilizing the moving object detection device according to the above-mentioned embodiment of the present disclosure, a single coded image of a moving object can be generated within a long exposure time, and the category of the moving object and a plurality of sets of position information according to a time sequence can be detected from the single coded image, so that the efficiency of object detection is greatly improved, especially when tracking detection is performed on a high-speed moving object, the moving trajectory of the high-speed moving object within a long time can be recovered by using the moving object detection device according to the embodiment of the present disclosure under the condition of capturing only a few coded images, and a high-speed camera is not required, so that the cost and data burden of a system are greatly reduced while the high-efficiency detection of the high-speed moving object is realized.

Next, a moving object detection device according to an embodiment of the present disclosure is described with reference to fig. 6. Fig. 6 shows a schematic structural diagram of a moving object detection device 600 according to an embodiment of the present disclosure. Since the moving object detection device 600 has the same details as the moving object detection method 200 described above in conjunction with fig. 2, a detailed description of the same is omitted here for the sake of simplicity. As shown in fig. 6, the moving object detection device 600 may include an imaging lens 610, a spatial light modulator 620, an image detector 630, and one or more processors 640. In addition to these four units, the moving object detection apparatus 600 may further include other components, for example, one or more storage devices, input/output components, and the like, which are not particularly limited by the embodiments of the present disclosure.

The imaging lens 610 is configured to receive the optical signal from the scene to be measured and optically image the scene to be measured with the optical signal to generate an imaging optical signal of the scene to be measured. Here, the step of generating the imaging light signal is similar to step S210 of the moving object detection method described above with reference to fig. 2, and the details of the function of the imaging lens 510 described with reference to fig. 5, and therefore, a repetitive description of the same is omitted here for the sake of simplicity. The spatial light modulator 620 is configured to receive the set of modulation signals and modulate the imaging light signal under control of the set of modulation signals to generate a modulated light signal; the image detector 630 is configured to generate a coded image based on the modulated light signal, where the step of generating the coded image is similar to the step S220 of the moving object detection method described above with reference to fig. 2, and the details of the function of the encoding unit 520 described with reference to fig. 5, and thus a repeated description of the same is omitted here for the sake of simplicity.

The one or more processors 640 are configured to sequentially provide the modulation signals in the modulation signal set to the spatial light modulator 620 within a predetermined time period, to control the spatial light modulator 620 to modulate the imaging light signal with the modulation signals to generate a modulated light signal, and to control the image detector 630 to generate an encoded image based on the modulated light signal. In addition, the one or more processors 640 are further configured to detect the encoded image using a moving object detection algorithm that matches the set of encoded signals to identify an object to be detected in the encoded image. Here, the step of detecting the encoded image is similar to the details of the function of step S230 of the moving object detection method described above with reference to fig. 2, and the detection unit 530 described with reference to fig. 5, and therefore, a repetitive description of the same is omitted here for the sake of simplicity.

Those skilled in the art will appreciate that the disclosure of the present disclosure is susceptible to numerous variations and modifications. For example, the various devices or components described above may be implemented in hardware, or may be implemented in software, firmware, or a combination of some or all of the three.

Furthermore, as used in this disclosure and in the claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are inclusive in the plural, unless the context clearly dictates otherwise. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. Likewise, the word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Furthermore, flow charts are used in this disclosure to illustrate operations performed by systems according to embodiments of the present disclosure. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or one or more operations may be removed from the processes.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

While the present disclosure has been described in detail above, it will be apparent to those skilled in the art that the present disclosure is not limited to the embodiments described in the present specification. The present disclosure can be implemented as modifications and variations without departing from the spirit and scope of the present disclosure defined by the claims. Accordingly, the description of the present specification is for the purpose of illustration and is not intended to be in any way limiting of the present disclosure.

Claims

1. A moving object detection method comprising:

receiving an optical signal from a scene to be detected, and performing optical imaging on the scene to be detected by using the optical signal to generate an imaging optical signal of the scene to be detected, wherein the scene to be detected comprises a moving object to be detected;

sequentially modulating the imaging optical signal with a modulation signal in a modulation signal set to generate a modulated optical signal within a predetermined time period, and generating a coded image based on the modulated optical signal; and

and detecting the coded image by using a moving object detection algorithm matched with the modulation signal set so as to identify the object to be detected in the coded image.

2. The moving object detection method according to claim 1,

wherein the set of modulation signals includes a first number of modulation signals, the predetermined time period has a first duration, a duration that each modulation signal modulates the imaging light signal is a second duration, and the first duration is greater than or equal to a product of the first number and the second duration;

wherein sequentially modulating the imaging light signal with a modulation signal of a modulation signal set within a predetermined time period to obtain a coded image comprises:

sequentially selecting each modulation signal in a first number of modulation signals in the modulation signal set, and modulating the imaging optical signal by using the modulation signals to obtain modulated optical signals corresponding to the modulation signals in a second time period, and sequentially obtaining a first number of modulated optical signals respectively corresponding to the first number of modulation signals in a first time period;

forming the encoded image using the first number of modulated light signals for a first duration.

3. The moving object detection method of claim 2 wherein modulating the imaging light signal with the modulation signal to obtain a modulated light signal corresponding to the modulation signal for a second time period comprises:

inputting the modulation signal into a spatial light modulator, the spatial light modulator comprising a plurality of sub-cells;

adjusting the plurality of sub-cells of the spatial light modulator with the modulation signal; and

modulating the spatial distribution of the imaging light signal with the adjusted plurality of subunits to obtain a modulated light signal corresponding to the modulation signal in a second time period.

4. The moving object detection method of claim 2 wherein forming the encoded image using the first number of modulated light signals for a first duration comprises:

the first number of modulated light signals is continuously acquired with an image detector for a first duration and the encoded image is generated based on the acquired light signals.

5. The moving object detection method of claim 1, detecting the coded image using a moving object detection algorithm matched to the set of modulated signals to identify an object to be detected in the coded image comprising:

and detecting the coded image to determine the position and the category of the object to be detected in the coded image.

6. The moving object detection method of claim 5, wherein determining the position and the category of the object to be detected in the encoded image comprises:

determining, based on the encoded image, a plurality of decoded images that respectively correspond to each of the first number of modulated signals; and

and determining the position and the category of the object to be detected in the plurality of decoded images.

7. The moving object detection method of claim 1 wherein the set of modulated signals matched to the moving object detection algorithm is determined by:

acquiring a training data set, wherein the training data set comprises a training image sequence and marking position information and marking types of one or more moving objects contained in each training image in the training image sequence;

and carrying out supervised training on the moving object detection algorithm by utilizing the labeled position information and the labeled category so as to determine the modulation signal set.

8. The moving object detection method of claim 7 wherein the moving object detection algorithm comprises a motion coding module and a moving object detection module, and wherein supervised training of the moving object detection algorithm with the annotation location information and annotation class to determine the set of modulation signals comprises:

encoding, by the motion encoding module, the training image sequence using an encoding signal set to obtain a training encoded image;

carrying out object detection on the training coded image by using the moving object detection module to obtain a detection result; and

and performing supervision training on the detection result by using the labeled position information and the labeled category to obtain a trained coded signal set, and determining the trained coded signal set as the modulation signal set.

9. The moving object detection method of claim 8, wherein encoding, with the motion encoding module, the sequence of training images using the set of encoded signals to obtain training encoded images comprises:

and multiplying the coding signals in the coding signal set with corresponding training images in the training image sequence pixel by pixel respectively, and summing the multiplication results to obtain the training coding images.

10. A moving object detecting device comprising:

the imaging lens is configured to receive an optical signal from a scene to be detected and perform optical imaging on the scene to be detected by using the optical signal to generate an imaging optical signal of the scene to be detected, wherein the scene to be detected comprises a moving object to be detected;

an encoding unit configured to sequentially modulate the imaging light signal with a modulation signal in a modulation signal set to generate a modulated light signal within a predetermined period of time, and generate an encoded image based on the modulated light signal; and

a detection unit configured to detect the coded image using a moving object detection algorithm matching the set of modulation signals to identify the object to be detected in the coded image.