WO2022148423A1 - 运动物体检测方法和装置 - Google Patents

运动物体检测方法和装置 Download PDF

Info

Publication number
WO2022148423A1
WO2022148423A1 PCT/CN2022/070677 CN2022070677W WO2022148423A1 WO 2022148423 A1 WO2022148423 A1 WO 2022148423A1 CN 2022070677 W CN2022070677 W CN 2022070677W WO 2022148423 A1 WO2022148423 A1 WO 2022148423A1
Authority
WO
WIPO (PCT)
Prior art keywords
moving object
object detection
modulated
signal
image
Prior art date
Application number
PCT/CN2022/070677
Other languages
English (en)
French (fr)
Inventor
陈宏伟
黄泓皓
胡成洋
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Publication of WO2022148423A1 publication Critical patent/WO2022148423A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding

Definitions

  • the present disclosure relates to the field of computer vision, and more particularly, to a moving object detection method and apparatus.
  • Object detection is one of the classic problems in the field of computer vision, and its task is usually to identify the location and class of objects in an image.
  • machine learning technologies such as deep learning
  • object detection algorithms for static objects in static images are becoming more and more mature.
  • tracking and detection of moving objects are often required.
  • video signals including moving objects can be generated, and frame-by-frame object detection can be performed on the video signals to achieve moving object detection.
  • the premise of accurate moving object detection is to ensure that there is no motion blur in each frame of the video signal. This can usually be achieved by shooting video with a high-speed camera.
  • the use of high-speed cameras not only reduces the luminous flux per frame, but also significantly increases the data burden and cost of the system.
  • the present disclosure proposes a moving object detection algorithm and device, which can realize the detection of moving objects without a high-speed camera.
  • a moving object detection method comprising: receiving an optical signal from a scene to be measured, and using the optical signal to perform optical imaging on the scene to be measured to generate the scene to be measured an imaging optical signal of a scene, wherein the scene to be tested includes a moving object to be detected; within a predetermined period of time, the imaging optical signal is sequentially modulated by using the modulation signal in the modulation signal set to generate a modulated optical signal, and generating a coded image based on the modulated optical signal; and detecting the coded image by using a moving object detection algorithm matched with the modulated signal set, so as to identify the object to be detected in the coded image.
  • the modulation signal set includes a first number of modulation signals
  • the predetermined period of time has a first duration
  • the duration of each modulation signal modulating the imaging optical signal is a second duration duration
  • the first duration is greater than or equal to the product of the first number and the second duration
  • the imaging optical signals are sequentially modulated with the modulation signals in the modulation signal set to obtain
  • the encoding of the image includes: sequentially selecting each modulation signal in the first number of modulation signals in the modulation signal set, and using the modulation signal to modulate the imaging optical signal, so as to obtain the same modulation signal in the second time period.
  • the modulated optical signals corresponding to the modulated signals are obtained, and the modulated optical signals of the first quantity corresponding to the modulated signals of the first quantity are obtained in sequence within the first duration;
  • a number of modulated optical signals form the encoded image.
  • using the modulation signal to modulate the imaging optical signal to obtain a modulated optical signal corresponding to the modulation signal within the second time period includes: inputting the modulation signal into a space a light modulator, the spatial light modulator comprising a plurality of subunits; adjusting the plurality of subunits of the spatial light modulator with the modulation signal; The distribution is modulated to obtain a modulated optical signal corresponding to the modulated signal within a second time period.
  • forming the encoded image by using the first number of modulated optical signals for a first time period includes: using an image detector to detect the first number of modulated optical signals for a first time period The modulated optical signal is continuously acquired, and the encoded image is generated based on the acquired optical signal.
  • detecting the encoded image by using a moving object detection algorithm matched with the modulated signal set, so as to identify the object to be detected in the encoded image includes: performing the encoding on the encoded image. detection to determine the position and category of the object to be detected in the encoded image.
  • determining the position and category of the object to be detected in the encoded image includes: based on the encoded image, determining, based on the encoded image, a modulation signal corresponding to each modulation signal in the first number of modulation signals. corresponding multiple decoded images; and determining the position and category of the object to be detected in the multiple decoded images.
  • the modulated signal set that matches the moving object detection algorithm is determined by acquiring a training data set, the training data set including a training image sequence and the training Labeling position information and labeling category of one or more moving objects included in each training image in the image sequence; using the labeling position information and labeling category to perform supervised training on the moving object detection algorithm to determine the modulation Signal set.
  • the moving object detection algorithm includes a motion encoding module and a moving object detection module, and wherein the moving object detection algorithm is supervised and trained by using the labeling position information and labeling category, to Determining the modulated signal set includes: using the motion encoding module to encode the training image sequence using the encoding signal set to obtain a training encoded image; using the moving object detection module to perform object detection on the training encoded image , to obtain the detection result; and perform supervised training on the detection result by using the label position information and the label category to obtain a trained encoded signal set, and determine the trained encoded signal set as the Modulated signal set.
  • using the motion encoding module to encode the training image sequence using an encoding signal set to obtain a training encoded image comprises: combining the encoded signals in the encoded signal set with the encoded signals respectively.
  • the corresponding training images in the training image sequence are multiplied pixel by pixel, and the multiplication results are summed to obtain the training encoded image.
  • a moving object detection apparatus comprising: an imaging lens configured to receive a light signal from a scene to be tested, and to perform optical optical detection on the scene to be tested by using the light signal imaging to generate an imaging light signal of the scene to be tested, wherein the scene to be tested includes moving objects to be detected; the encoding unit is configured to use the modulated signals in the modulated signal set in a predetermined period of time to sequentially perform The imaging optical signal is modulated to generate a modulated optical signal, and an encoded image is generated based on the modulated optical signal; and a detection unit is configured to use a moving object detection algorithm matched with the modulated signal set to encode the encoded image The image is detected to identify the object to be detected in the encoded image.
  • the modulation signal set includes a first number of modulation signals
  • the predetermined period of time has a first duration
  • the duration of each modulation signal modulating the imaging optical signal is a second duration duration
  • the first duration is greater than or equal to the product of the first number and the second duration
  • the modulation unit is further configured to sequentially select a first number of modulations in the modulation signal set modulate each signal in the signal, and modulate the imaging optical signal by using the modulation signal to obtain a modulated optical signal corresponding to the modulation signal within the second time period, and sequentially within the first time period obtaining a first number of modulated optical signals respectively corresponding to the first number of modulated signals; and using the first number of modulated optical signals to form the encoded image within a first time period.
  • the encoding unit includes a spatial light modulator
  • the spatial light modulator includes a plurality of subunits, and is configured to: receive a modulation signal; adjust the spatial light using the modulation signal the plurality of subunits of the modulator; and modulating the spatial distribution of the imaging optical signal by using the adjusted plurality of subunits to obtain a modulated optical signal corresponding to the modulated signal within a second time period.
  • the encoding unit further includes an image detector, and the image detector is configured to: continuously collect the first quantity of modulated optical signals within a first time period, and The encoded image is generated based on the collected optical signals.
  • the detection unit is further configured to: detect the encoded image by using a moving object detection algorithm matching the modulation signal set, so as to determine that the object to be detected is in the location where the object to be detected is located. position and category in the encoded image.
  • the detection unit is further configured to: determine, based on the encoded image, a plurality of decoded images respectively corresponding to each of the modulated signals in the first number of modulated signals; and to determine the position and category of the object to be detected in the plurality of decoded images.
  • the modulated signal set that matches the moving object detection algorithm is determined by acquiring a training data set, the training data set including a training image sequence and the training Labeling position information and labeling category of one or more moving objects included in each training image in the image sequence; using the labeling position information and labeling category to perform supervised training on the moving object detection algorithm to determine the modulation Signal set.
  • the moving object detection algorithm includes a motion encoding module and a moving object detection module, and wherein the moving object detection algorithm is supervised and trained by using the labeling position information and labeling category to determine
  • the modulated signal set includes: using the motion encoding module to encode the training image sequence by using the encoding signal set to obtain a training encoded image; using the moving object detection module to perform object detection on the training encoded image, to obtain a detection result; and perform supervised training on the detection result by using the labeling position information and labeling category to obtain a trained coded signal set, and determine the trained coded signal set as the modulated signal set .
  • using the motion encoding module to encode the training image sequence using an encoding signal set to obtain a training encoded image comprises: combining the encoded signals in the encoded signal set with the encoded signals respectively.
  • the corresponding training images in the training image sequence are multiplied pixel by pixel, and the multiplication results are summed to obtain a training encoded image.
  • a moving object detection apparatus comprising: an imaging lens configured to receive a light signal from a scene to be tested, and to detect the scene to be tested by using the light signal Performing optical imaging to generate an imaging light signal of the scene to be tested, wherein the scene to be tested includes a moving object to be detected; a spatial light modulator configured to receive a modulated signal set, and in the modulated signal set modulating the imaging optical signal under control to generate a modulated optical signal; an image detector configured to generate an encoded image based on the modulated optical signal; and one or more processors, the one or more The processor is configured to sequentially provide modulated signals in the modulated signal set to the spatial light modulator within a predetermined period of time, so as to control the spatial light modulator to use the modulated signals on the imaging light The signal is modulated to generate a modulated optical signal, and the image detector is controlled to generate a coded image based on the modulated optical signal; the coded image is detected
  • the coded image is obtained by sequentially modulating the imaging light signal of the scene to be tested by using the modulation signal set, and the moving object detection algorithm matching the modulation signal set is used to detect the moving object.
  • the coded image for object detection can identify the category of the moving object to be detected and multiple sets of position information in chronological order from a single coded image without the need for a high-speed camera, so as to realize the tracking and detection of high-speed moving objects, which greatly improves the The detection efficiency of moving objects is improved, and the cost and data burden of the system are reduced.
  • FIG. 1 shows the overall architecture of a moving object detection system according to an example of an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of a moving object detection method according to an embodiment of the present disclosure
  • FIG. 3 shows a schematic diagram of a training process of a moving object detection algorithm according to an example of an embodiment of the present disclosure
  • FIG. 4A shows a schematic diagram of an example detection result according to an embodiment of the present disclosure
  • FIG. 4B shows a schematic diagram of a detection result according to another example of an embodiment of the present disclosure.
  • 4C shows a schematic diagram of a detection result according to another example of an embodiment of the present disclosure.
  • FIG. 5 shows a schematic structural diagram of a moving object detection apparatus according to an embodiment of the present disclosure.
  • FIG. 6 shows a schematic structural diagram of a moving object detection apparatus according to an embodiment of the present disclosure.
  • the embodiments of the present disclosure provide a method for detecting moving objects, which can realize the tracking and detection of high-speed moving objects without a high-speed camera.
  • the moving object detection method it is possible to obtain a single encoded image of a scene to be tested containing moving objects within an arbitrary predetermined period of time (eg, much longer than the exposure time of a high-speed camera), and from the single encoded image
  • the classification of moving objects and multiple sets of position information in chronological order are detected in the system, so as to achieve efficient detection of moving objects, especially high-speed moving objects, which greatly improves the detection efficiency of moving objects and reduces the cost and data burden of the system.
  • the moving object detection method and apparatus can be implemented, for example, as a moving object detection system including a hardware part and a software part.
  • 1 illustrates the overall architecture of an example moving object detection system according to an embodiment of the present disclosure, wherein the hardware portion may include an imaging system 110 for generating an encoded image of a scene to be detected containing objects to be detected, and the software portion may include a A moving object detection algorithm 120 that detects the encoded image to identify the object to be detected.
  • An example structure of the imaging system 110 and the moving object detection algorithm 120 is also shown in FIG. 1 . As shown in FIG.
  • the imaging system 110 may include, for example, an imaging lens, a spatial light modulator, an image detector, a relay lens, and other required
  • the moving object detection algorithm 120 may include a moving object detection module 121, etc., but the embodiment of the present disclosure is not limited thereto, and the imaging system 110 and the moving object detection algorithm 120 may also include other required devices or structures.
  • the modulated signal set matched with the moving object detection algorithm 120 is input into the imaging system 110, and the imaging system 110 is used to generate an encoded image of the scene to be tested, and then the moving object detection algorithm 120 is used to detect the encoded image, to obtain the detection result of the object to be detected.
  • FIG. 2 shows a flowchart of a moving object detection method 200 according to an embodiment of the present disclosure.
  • step S210 an optical signal from the scene to be measured is received, and the scene to be measured is optically imaged by using the optical signal to generate an imaging optical signal of the scene to be measured.
  • the scene to be tested includes moving objects to be detected, such as moving bicycles, speeding cars, soaring planes, running animals, and other moving objects of any type or quantity.
  • the type and number of detection objects are not specifically limited.
  • an imaging lens may be used to receive an optical signal of the scene to be detected, and the received optical signal may be used to perform optical imaging of the scene to be detected to generate an imaging optical signal of the scene to be detected.
  • the imaging lens may be a part of the imaging system, and may be, for example, a convex lens, a concave lens, or various combinations thereof, which are not specifically limited in this embodiment of the present disclosure.
  • the generated imaging light signal is a two-dimensional light field signal that represents the scene to be measured and changes with time.
  • the imaging light signal can also be called a time-varying imaging light signal, which can reflect changes in the scene to be measured during imaging in real time. , such as the motion process of the object to be detected in the scene to be detected.
  • the imaging optical signal is sequentially modulated by the modulation signal in the modulation signal set to generate a modulated optical signal, and is generated based on the modulated optical signal to obtain an encoded image.
  • the predetermined time period has a first duration
  • the first duration can be, for example, the exposure time of the imaging system, and can be arbitrarily set according to actual needs, for example, the first duration can be set to be much longer than the exposure time of the high-speed camera, or It is set to other suitable duration, which is not specifically limited in this embodiment of the present disclosure.
  • the modulated signal set includes a first number of modulated signals, wherein each modulated signal may be a two-dimensional matrix corresponding to the imaging light signal as a two-dimensional light field signal, eg, composed of 0 and 1 A two-dimensional matrix is formed.
  • the first number may be set according to actual application requirements, which is not specifically limited in this embodiment of the present disclosure.
  • the modulation signal set can be determined by performing machine learning training on the moving object detection algorithm, and the determined modulation signal set is matched with the moving object detection algorithm, which will be described in further detail below. describe.
  • the duration of modulating the imaging optical signal by each modulation signal in the modulation signal set is the second duration
  • the first duration of the predetermined time period is greater than or equal to the product of the first number and the second duration. That is, within a predetermined period of time, the imaging light signal may be modulated multiple times by using different modulation signals respectively.
  • the imaging light signal may be modulated multiple times by using the first number of modulation signals, and the modulation number is equal to the first number of modulation signals. a quantity.
  • each modulation signal in the first number of modulation signals in the modulation signal set may be selected in sequence, and the selected modulation signal is used to modulate the imaging light signal.
  • the duration of the modulation of the optical signal is the second duration, then the modulated optical signal corresponding to the selected modulation signal can be obtained within the second duration; the modulation is continuously performed multiple times within the predetermined duration, so that the modulation is performed within the predetermined duration.
  • a first number of modulated optical signals respectively corresponding to the first number of modulated signals are obtained within the first time period. Subsequently, within a first time period, an encoded image is formed using the obtained first quantity of modulated optical signals.
  • a spatial light modulator may be employed to modulate an imaging light signal with a modulation signal.
  • a spatial light modulator is a device that modulates the spatial distribution of light waves. It can include multiple independent subunits arranged in a one-dimensional array or a two-dimensional array. These subunits can be used to modulate the spatial distribution of light waves passing through them by changing their optical properties under control, such as reflectivity, refractive index, transmittance, etc.
  • spatial light modulators can modulate optical parameters such as amplitude, intensity, phase, and polarization state of light waves.
  • the spatial light modulator may be a liquid crystal spatial light modulator or a digital microlens array, etc., which is not specifically limited in this embodiment of the present disclosure.
  • each modulation signal in the modulation signal set may be used as a control signal of the spatial light modulator.
  • the modulation signal is input into the spatial light modulator, and the modulation signal is used to adjust multiple subunits of the spatial light modulator.
  • the modulation signal matrix can be used to The value is used to control the multiple sub-units of the spatial light modulator respectively to adjust the optical properties of the multiple sub-units.
  • the spatial distribution of the imaging optical signal can be modulated by using the adjusted multiple subunits, for example, the optical parameters such as the amplitude, intensity, phase, polarization state and the like of the imaging optical signal can be modulated, so that the optical parameters such as the amplitude, intensity, phase, and polarization state of the imaging optical signal can be modulated, so that the optical parameters of the imaging optical signal can be A modulated optical signal corresponding to the input modulation signal is obtained within the modulation time.
  • the spatial light modulator is used to modulate the imaging light signal in the above example, the embodiments of the present disclosure are not limited thereto, and any other device capable of changing the spatial distribution of the imaging light signal may also be used.
  • an encoded image is formed by using the obtained first number of modulated optical signals.
  • an image detector may be used to continuously collect a first number of modulated optical signals within a first period of time, and an encoded image may be generated based on the collected optical signals.
  • the image detector may be any device capable of converting an optical signal into an electrical signal, such as a charge coupled element (CCD) sensor, a complementary metal oxide semiconductor (CMOS) sensor, etc., which is not specifically limited in this embodiment of the present disclosure.
  • the image detector may continuously collect the modulated optical signal (for example, may be referred to as exposure) through the relay lens for a first period of time, and perform photoelectric conversion on the collected optical signal to generate an encoded image.
  • the relay lens may be, for example, a convex lens, a concave lens, or various combinations thereof, which is not specifically limited in this embodiment of the present disclosure.
  • the exposure time of the image detector is much larger than the modulation speed of the spatial light modulator, therefore, in the case of using the spatial light modulator and the image detector, the exposure time of the imaging system according to the embodiment of the present disclosure will depend on the image exposure time of the detector, so the first duration of the predetermined time period may be set as the exposure time of the image detector.
  • the coded image may be detected by using a moving object detection algorithm matched with the modulated signal set, so as to identify the object to be detected in the coded image.
  • the moving object detection algorithm may include a moving object detection module, and the moving object detection module may include, for example, a motion decoding module and an object detection module to perform motion decoding and object detection on the encoded image, respectively.
  • both the motion decoding module and the object detection module can be constructed with neural networks, and the object detection module can be implemented by using a neural network-based object recognition algorithm, such as a regional convolutional neural network (RCNN), a high-speed regional Convolutional Neural Network (Faster-RCNN), Single Shot Multiple Frame Detection (SSD), etc., which are not specifically limited in this embodiment of the present disclosure.
  • a neural network-based object recognition algorithm such as a regional convolutional neural network (RCNN), a high-speed regional Convolutional Neural Network (Faster-RCNN), Single Shot Multiple Frame Detection (SSD), etc.
  • the modulation signal set can be determined by performing machine learning training on the moving object detection algorithm, and the determined modulation signal set is matched with the moving object detection algorithm.
  • the category of the object to be detected when a moving object detection algorithm is used to detect an encoded image, the category of the object to be detected can be identified. For example, when the object to be detected is a moving car, the category of the object to be detected can be identified as "car" by detecting the encoded image.
  • the position of the object to be detected in the encoded image can also be determined. After identifying the category and position of the object to be detected, the category of the object to be detected can be marked in the encoded image, for example, a frame can be used to select the identified object to be detected, and its category can be marked at the frame (for example, labeling for "car").
  • the moving object detection method can also detect multiple sets of position information and categories of the object to be detected from a single encoded image.
  • multiple decoded images may be obtained based on a single coded image, and the multiple decoded images may respectively correspond to each modulated signal in the first number of modulated signals.
  • the number of decoded images obtained from a single encoded image may be equal to the first number of modulated signals.
  • each modulation signal in the first number of modulation signals is selected in sequence within a predetermined time period for modulating the imaging optical signal to generate an encoded image
  • the multiple decoded images respectively corresponding to the respective modulated signals correspond to the scene to be tested at different times. Therefore, the position of the object to be detected in the multiple decoded images can reflect the motion trajectory of the object to be detected in the scene to be tested.
  • FIG. 3 shows a schematic diagram of a training process of an example moving object detection algorithm according to an embodiment of the present disclosure.
  • the moving object detection algorithm may include a moving object detection module 320, and the moving moving object detection module 320, for example, may further include a motion decoding module 321 and an object detection module 322 to perform motion decoding and object detection on the encoded image, respectively .
  • the motion decoding module 321 and the object detection module 322 may be constructed by using a neural network, for example, may include network structures such as residual blocks, convolution blocks, etc., which are not specifically limited in this embodiment of the present disclosure.
  • the object detection module 322 may be implemented by using an object recognition algorithm based on a neural network, such as RCNN, Faster-RCNN, SSD, etc., which is not specifically limited in this embodiment of the present disclosure.
  • the moving object detection algorithm may further include a motion encoding module 310, which is a mathematical description of the hardware part of the moving object detection system including imaging lenses, spatial light modulators, image detectors, etc., that is, the motion encoding module 310
  • the physical process of generating an encoded image can be simulated. Therefore, a modulated signal set suitable for the moving object detection system can be obtained by performing machine learning training on the moving object detection algorithm including the motion encoding module 310 .
  • the training data set may include a training image sequence, and each training image in the training image sequence contains one or more Annotated location information and annotated category of a moving object.
  • the training image sequence is a collection of multiple training images that are consecutive in time.
  • the training image sequence may be a video signal, and each frame of the video signal is used as each training image in the training image sequence.
  • the number of training images used in each training may be equal to the first number of modulated signals described above, or may be greater than or equal to the first number, which is not specifically limited in this embodiment of the present disclosure.
  • the training image sequence included in the training data set has 80 training images and the first number is 8, 8 training images may be selected for each training.
  • the training dataset can come from public annotated datasets for video object detection, such as ImageNet Video Object Detection dataset (ImageNet VID), etc.
  • ImageNet VID ImageNet Video Object Detection dataset
  • the moving object detection algorithm is supervised and trained using the label location information and label category in the training dataset to determine the modulation signal set.
  • the motion encoding module 310 is used to encode the training image sequence by using the encoding signal set to obtain the training encoded image.
  • the motion encoding module is a mathematical simulation of the physical process of generating an encoded image.
  • the spatial light modulator sequentially selects modulation signal pairs in the modulation signal set within a predetermined period of time to become The image light signal is modulated to generate a first amount of modulated light signals, and then the image detector continuously collects the first amount of modulated light signals to generate an encoded image, and this step is equivalent to changing the time in a predetermined period of time.
  • the process of multiplying and summing the optical signal and the modulated signal set Like the process of multiplying and summing the optical signal and the modulated signal set.
  • the encoded signals in the encoded signal set can be multiplied pixel by pixel with the corresponding training images in the training image sequence, and the multiplication results can be summed to obtain the training code. image.
  • the coded signal set used in the training process corresponds to the modulated signal set in the above step S220, that is, the coded signal set after training can be used as the modulated signal set of the imaging system in the moving object detection system for imaging light The signal is modulated.
  • a training image sequence including a moving hound is multiplied and summed with the encoded signal set to obtain a training encoded image.
  • the encoded signal in the encoded signal set may be set to any appropriate initial value, which is not specifically limited in this embodiment of the present disclosure.
  • the training coded image can be decoded first by using the motion decoding module 321 of the moving object detection module 320 to obtain a plurality of decoded images.
  • the number of training images in the training image sequence is corresponding; then, the object detection module 322 of the moving object detection module 320 is used to perform object detection on the obtained multiple decoded images to obtain detection results, and the detection results may be, for example, one or more categories of moving objects and sets of location information in multiple decoded images, etc.
  • the above detection results can be performed by using the labeled category and labeled position information of one or more moving objects.
  • Supervised training For example, the error between the labeling category and labeling position information and the detection result can be calculated, and the moving object detection algorithm can be supervised and trained by minimizing the error, so as to continuously optimize the encoded signal set and each network parameter in the moving object detection module until the The optimal set of encoded signals and optimal network parameters.
  • the moving object detection algorithm After the machine learning training of the moving object detection algorithm is performed to obtain the optimal encoded signal set and the optimal network parameters, the moving object detection algorithm is fixed.
  • the encoded signal set obtained by training can be applied to the moving object detection system according to the embodiment of the present disclosure as a modulated signal set, so as to image and modulate a scene to be detected including a moving object to be detected to generate an encoded image;
  • the fixed moving object detection algorithm detects the encoded image to identify the object to be detected from the encoded image.
  • the specific steps are as described in the steps of the moving object detection method 200 described above with reference to FIG. 2 , and are not repeated here.
  • FIG. 4A-4C illustrate examples of detection results obtained using the moving object detection method according to an embodiment of the present disclosure.
  • (a) is an encoded image obtained by using the moving object detection method according to an embodiment of the present disclosure, including a blurred image of a bicycle, and the encoded image is obtained by using eight modulation signals for a moving bicycle.
  • the time-varying image of the scene to be tested is an image obtained by modulating the optical signal and performing image acquisition, that is to say, the first number of modulated signal sets in the above step S220 is eight.
  • the category of the bicycle and the corresponding eight sets of position information can be identified from the single coded image.
  • the detection results shown in (b)-(i) are shown in the background of a high-definition image sequence of bicycles captured by a high-speed camera, in which the different positions of the bicycle are marked with rectangular boxes in each image and The category of the bicycle is marked.
  • the different positions and categories of the bicycle determined by the moving object detection method according to the embodiment of the present disclosure are marked in time sequence, that is, the motion trajectory of the bicycle is restored, and the detection of the moving bicycle is realized. tracking detection.
  • (a) is an encoded image obtained by using the moving object detection method according to an embodiment of the present disclosure, which includes a blurred car image, and the encoded image is obtained by using eight encoded signals for the moving car.
  • the time-varying image of the scene to be tested is an image obtained by modulating the optical signal and performing image acquisition, that is to say, the first number of modulated signal sets in the above step S220 is eight.
  • the category of the car and the corresponding eight sets of position information can be identified from a single coded image, as shown in (b)-(i).
  • a high-definition image sequence of a car captured with a high-speed camera is used as the background for (b)-(i).
  • the different positions and categories of the car determined by the moving object detection method according to the embodiment of the present disclosure are marked in time sequence, that is, the motion trajectory of the car is restored, and the detection of the moving car is realized. tracking detection.
  • the moving object detection method according to the embodiment of the present disclosure can realize the tracking and detection of multiple moving objects at the same time.
  • (a) is an encoded image obtained by using the moving object detection method according to an embodiment of the present disclosure, which includes a plurality of blurred airplane images, and the encoded image is obtained by using eight encoded signals to pair objects containing motion respectively.
  • the time-varying image of the scene to be tested of the aircraft is an image obtained by modulating the light signal and performing image acquisition, that is to say, the first number of modulated signal sets in the above step S220 is eight.
  • the category of each aircraft and the corresponding eight sets of position information of each aircraft can be identified from a single coded image, such as (b)-( i) shown.
  • a high-definition image sequence of an aircraft captured with a high-speed camera is used as the background for (b)-(i).
  • the different positions and categories of each aircraft determined by the moving object detection method according to the embodiment of the present disclosure are marked in chronological order, that is, the motion trajectories of multiple aircrafts are restored at the same time. Tracking detection of high-speed moving aircraft.
  • a single encoded image of a moving object can be generated in a longer exposure time, and the single encoded image can be generated from the single encoded image.
  • the category of moving objects and multiple sets of position information in chronological order are detected in the system, so that the efficiency of object detection is greatly improved, especially when tracking and detecting high-speed moving objects, using the moving object detection method according to the embodiment of the present disclosure can In the case of capturing only a few encoded images, the motion trajectory of high-speed moving objects over a long period of time can be recovered without using a high-speed camera, which greatly reduces the cost and data burden of the system while realizing efficient detection of high-speed moving objects.
  • FIG. 5 shows a schematic structural diagram of a moving object detection apparatus 500 according to an embodiment of the present disclosure. Since the moving object detection apparatus 500 has the same details as the moving object detection method 200 described above in conjunction with FIG. 2 , the detailed description of the same content is omitted here for simplicity. As shown in FIG. 5 , the moving object detection apparatus 500 includes an imaging lens 510 , an encoding unit 520 and a detection unit 530 . In addition to these three units, the apparatus 500 may further include other components, however, since these components are not related to the content of the embodiments of the present disclosure, their illustration and description are omitted here.
  • the imaging lens 510 is configured to receive the light signal from the scene to be tested, and to optically image the scene to be tested by using the light signal to generate an imaged light signal of the scene to be tested.
  • the scene to be tested includes moving objects to be detected, such as moving bicycles, speeding cars, soaring planes, running animals, and other moving objects of any type or quantity.
  • the quantity is not specifically limited.
  • the imaging lens may be a part of the imaging system, and may be, for example, a convex lens, a concave lens, or various combinations thereof, which is not specifically limited in this embodiment of the present disclosure.
  • the generated imaging light signal is a two-dimensional light field signal that represents the scene to be measured and changes with time.
  • the imaging light signal can also be called a time-varying imaging light signal, which can reflect changes in the scene to be measured during imaging in real time. , such as the motion process of the object to be detected in the scene to be detected.
  • the encoding unit 520 is configured to sequentially modulate the imaging optical signal with the modulation signal in the modulation signal set within a predetermined period of time, so as to obtain an encoded image.
  • the predetermined time period has a first duration
  • the first duration can be, for example, the exposure time of the imaging system, and can be arbitrarily set according to actual needs, for example, the first duration can be set to be much longer than the exposure time of the high-speed camera, or It is set to other suitable duration, which is not specifically limited in this embodiment of the present disclosure.
  • the modulated signal set includes a first number of modulated signals, wherein each modulated signal may be a two-dimensional matrix corresponding to the imaging light signal as a two-dimensional light field signal, eg, composed of 0 and 1 A two-dimensional matrix is formed.
  • the first number may be set according to actual application requirements, which is not specifically limited in this embodiment of the present disclosure.
  • the modulation signal set may be determined by performing machine learning training on the moving object detection algorithm, and the determined modulation signal set matches the moving object detection algorithm. The details of the machine learning training of the moving object detection algorithm to determine the modulated signal set is similar to the process described above with reference to FIG. 3 , so repeated description of the same content is omitted here.
  • the duration of modulating the imaging optical signal by each modulation signal in the modulation signal set is the second duration
  • the first duration of the predetermined time period is greater than or equal to the product of the first number and the second duration. That is to say, within a predetermined period of time, the encoding unit 520 may modulate the imaging optical signal multiple times with different modulation signals respectively. For example, when the first duration of the predetermined time period is equal to the product of the first number and the second duration, within the predetermined time period, the encoding unit 520 may use the first number of modulated signals to perform multiple modulations on the imaging optical signal, and modulate the The number of times is equal to the first number.
  • the encoding unit 520 may sequentially select each modulation signal in the first number of modulation signals in the modulation signal set, and use the selected modulation signal to modulate the imaging optical signal, as described above, using each modulation signal
  • the duration for which the signal modulates the imaging optical signal is the second duration, then the modulated optical signal corresponding to the selected modulation signal can be obtained within the second duration; the encoding unit 520 continuously modulates multiple times within the predetermined duration,
  • a first number of modulated optical signals corresponding to the first number of modulated signals respectively are obtained within the first duration of the predetermined time period.
  • an encoded image is formed using the obtained first quantity of modulated optical signals.
  • the encoding unit 520 may include, for example, a spatial light modulator.
  • a spatial light modulator is a device that modulates the spatial distribution of light waves. It can include multiple independent subunits arranged in a one-dimensional array or a two-dimensional array. These subunits can be used to modulate the spatial distribution of light waves passing through them by changing their optical properties under control, such as reflectivity, refractive index, transmittance, etc.
  • spatial light modulators can modulate optical parameters such as amplitude, intensity, phase, and polarization state of light waves.
  • the spatial light modulator may be a liquid crystal spatial light modulator or a digital microlens array, etc., which is not specifically limited in this embodiment of the present disclosure.
  • each modulation signal in the modulation signal set may be used as a control signal of the spatial light modulator.
  • the spatial light modulator is configured to receive the modulated signal, and use the modulated signal to adjust a plurality of subunits of the spatial light modulator.
  • each subunit in the modulated signal matrix can be used.
  • the value of the element is used to separately control the plurality of subunits of the spatial light modulator to adjust the optical properties of the plurality of subunits. Therefore, the spatial light modulator can use the adjusted multiple subunits to modulate the spatial distribution of the imaging optical signal.
  • the optical parameters such as the amplitude, intensity, phase, and polarization state of the imaging optical signal can be modulated so as to have a length of A modulated optical signal corresponding to the input modulation signal is obtained within the modulation time of the second duration.
  • the spatial light modulator is used to modulate the imaging light signal in the above example, the embodiment of the present disclosure is not limited thereto, and the encoding unit 520 may also include any other device capable of changing the spatial distribution of the imaging light signal. device.
  • the encoding unit 520 may further include an image detector, and the image detector is configured to continuously collect a first number of modulated optical signals within a first period of time, and generate a signal based on the collected optical signals. encoded image.
  • the image detector may be any device capable of converting an optical signal into an electrical signal, such as a charge coupled element (CCD) sensor, a complementary metal oxide semiconductor (CMOS) sensor, etc., which is not specifically limited in this embodiment of the present disclosure.
  • the image detector may continuously collect the modulated optical signal (for example, may be referred to as exposure) through the relay lens for a first period of time, and perform photoelectric conversion on the collected optical signal to generate an encoded image.
  • the relay lens may be, for example, a convex lens, a concave lens, or various combinations thereof, which is not specifically limited in this embodiment of the present disclosure.
  • the exposure time of the image detector is much larger than the modulation speed of the spatial light modulator, therefore, in the case of using the spatial light modulator and the image detector, the exposure time of the imaging system according to the embodiment of the present disclosure will depend on the image
  • the exposure time of the detector, and thus the first duration of the predetermined time period may be equal to the exposure time of the image detector.
  • the detection unit 530 is configured to detect the encoded image using a moving object detection algorithm matched to the modulated signal set to identify the object to be detected in the encoded image.
  • the moving object detection algorithm may include a moving object detection module.
  • the moving object detection module may include, for example, a motion decoding module and an object detection module, so as to perform motion decoding and object detection on the encoded image, respectively. Both the motion decoding module and the object detection module can be constructed with a neural network, and the object detection module can be implemented by using an object recognition algorithm based on a neural network, such as RCNN, Faster-RCNN, SSD, etc., which is not specifically described in this embodiment of the present disclosure. limited.
  • the modulation signal set can be determined by performing machine learning training on the moving object detection algorithm, and the determined modulation signal set is matched with the moving object detection algorithm.
  • the detection unit 530 when the detection unit 530 detects the encoded image by using the moving object detection algorithm, it can identify the type of the object to be detected. For example, when the object to be detected is a moving car, the category of the object to be detected can be identified as "car" by detecting the encoded image.
  • the detection unit 530 when the detection unit 530 detects the encoded image, the position of the object to be detected in the encoded image can also be determined.
  • the category of the object to be detected can be marked in the encoded image, for example, a frame can be used to select the identified object to be detected, and its category can be marked at the frame (for example, labeling for "car").
  • the moving object detection apparatus can detect multiple sets of position information of the object to be detected from a single encoded image.
  • the detection unit 530 uses the moving object detection algorithm to decode the encoded image, multiple decoded images may be obtained based on a single encoded image, and the multiple decoded images may respectively correspond to each modulation signal in the first number of modulation signals .
  • the number of decoded images obtained from a single encoded image may be equal to the first number of modulated signals.
  • the detection unit 530 performs object detection on the obtained multiple decoded images respectively, and can determine the position and category of the object to be detected in the multiple decoded images.
  • each modulation signal in the first number of modulation signals is selected in sequence within a predetermined time period for modulating the imaging optical signal to generate an encoded image
  • the multiple decoded images corresponding to each modulated signal respectively correspond to the scene to be tested at different times. Therefore, the position of the object to be detected in the multiple decoded images can reflect the motion trajectory of the object to be detected, as shown in Figures 4A-4C Example detection results are shown.
  • a single encoded image of a moving object can be generated within a longer exposure time, and the category of the moving object and multiple groups of positions in time sequence can be detected from the single encoded image Therefore, the efficiency of object detection is greatly improved, especially when tracking and detecting high-speed moving objects, using the moving object detection device according to the embodiment of the present disclosure can recover the high-speed moving objects in the case of only capturing a few encoded images.
  • the long-term motion trajectory without the use of a high-speed camera can greatly reduce the cost and data burden of the system while achieving efficient detection of high-speed moving objects.
  • FIG. 6 shows a schematic structural diagram of a moving object detection apparatus 600 according to an embodiment of the present disclosure. Since the details of the moving object detection apparatus 600 are the same as those of the moving object detection method 200 described above in conjunction with FIG. 2 , the detailed description of the same content is omitted here for the sake of simplicity.
  • the moving object detection apparatus 600 may include an imaging lens 610 , a spatial light modulator 620 , an image detector 630 , and one or more processors 640 .
  • the moving object detection apparatus 600 may further include other components, for example, one or more storage devices, input/output components, etc., which are not specifically limited in this embodiment of the present disclosure.
  • the imaging lens 610 is configured to receive a light signal from the scene to be tested, and to optically image the scene to be tested by using the light signal to generate an imaging light signal of the scene to be tested.
  • the step of generating the imaging light signal is similar to the step S210 of the moving object detection method described above with reference to FIG. 2 and the details of the function of the imaging lens 510 described with reference to FIG. Repeat the description.
  • the spatial light modulator 620 is configured to receive the modulated signal set and modulate the imaging light signal under the control of the modulated signal set to generate a modulated light signal; the image detector 630 is configured to generate an encoded image based on the modulated light signal , here, the step of generating an encoded image is similar to the step S220 of the moving object detection method described above with reference to FIG. 2 and the details of the function of the encoding unit 520 described with reference to FIG. Repeat the description.
  • the one or more processors 640 are configured to sequentially provide the modulation signals in the modulation signal set to the spatial light modulator 620 within a predetermined period of time, so as to control the spatial light modulator 620 to modulate the imaging light signal with the modulation signal to modulate the imaging light signal.
  • a modulated optical signal is generated, and the image detector 630 is controlled to generate an encoded image based on the modulated optical signal.
  • one or more processors 640 are further configured to detect the encoded image using a moving object detection algorithm matched to the encoded signal set to identify an object to be detected in the encoded image.
  • the step of detecting the encoded image is similar to the step S230 of the moving object detection method described above with reference to FIG. 2 and the details of the function of the detection unit 530 described with reference to FIG. 5 , so the same content is omitted here for simplicity. duplicate description.
  • flowcharts are used in this disclosure to illustrate operations performed by a system according to an embodiment of the present disclosure. It should be understood that the preceding or following operations are not necessarily performed in exact order. Rather, the various steps may be processed in reverse order or concurrently. At the same time, other operations can be added to these processes, or a step or steps can be removed from these processes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种运动物体检测方法和装置。运动物体检测方法包括:接收来自待测场景的光信号,并利用光信号对待测场景进行光学成像以生成待测场景的成像光信号,其中待测场景中包括运动的待检测物体;在预定时间段内,利用调制信号集中的调制信号依序对成像光信号进行调制以生成被调制光信号,并基于被调制光信号生成编码图像;以及利用与调制信号集相匹配的运动物体检测算法对编码图像进行检测,以在编码图像中识别出待检测物体。本公开提供的运动物体检测方法能够在无需高速相机的情况下,从单个编码图像中识别出待检测运动物体的类别和按照时间顺序的多组位置信息,可以实现对高速运动物体的高效跟踪检测。

Description

运动物体检测方法和装置
本申请要求于2021年01月08日递交的第202110022668.6号中国专利申请的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开涉及计算机视觉领域,并且更具体地,涉及一种运动物体检测方法和装置。
背景技术
物体检测是计算机视觉领域中的经典问题之一,其任务通常是识别图像中物体的位置和类别。随着诸如深度学习等机器学习技术的发展,针对静态图像中的静态物体的物体检测算法日趋成熟,而在实际应用中,经常需要对运动物体进行跟踪检测。传统上,可以生成包括运动物体的视频信号,并对视频信号进行逐帧的物体检测来实现运动物体检测,而精准进行运动物体检测的前提是确保视频信号中的每帧图像不存在运动模糊,这通常可以通过使用高速相机拍摄视频来实现。然而,高速相机的使用不仅会降低每帧图像的光通量,而且显著增加了系统的数据负担和成本。
发明内容
为了解决上述问题,本公开提出了一种运动物体检测算法和装置,能够在无需高速相机的情况下实现运动物体的检测。
根据本公开实施例的一个方面,提供了一种运动物体检测方法,包括:接收来自待测场景的光信号,并利用所述光信号对所述待测场景进行光学成像以生成所述待测场景的成像光信号,其中所述待测场景中包括运动的待检测物体;在预定时间段内,利用调制信号集中的调制信号依序对所述成像光信号进行调制以生成被调制光信号,并基于所述被调制光信号生成编码图像;以及利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在所述编码图像中识别出所述待检测物体。
根据本公开实施例的示例,其中,所述调制信号集包括第一数量的调制信号,所述预定时间段具有第一时长,每个调制信号对所述成像光信号进行调制的时长为第二时长,且所述第一时长大于等于所述第一数量与所述第二时长的乘积;其中,在预定时间段内利用调制信号集中的调制信号依序对所述成像光信号进行调制以得到编码图像包括:依序选择所述调制信号集中的第一数量的调制信号中的每个调制信号,并利用所述调制信号对所述成像光信号进行调制,以在第二时长内得到与所述调制信号对应的被调制光信号,并且在第一时长内依序得到分别与所述第一数量的调制信号对应的第一数量的被调制光信号;在第一时长内,利用所述第一数量的被调制光信号形成所述编码图像。
根据本公开实施例的示例,其中,利用所述调制信号对所述成像光信号进行调制以在第二时长内得到与所述调制信号对应的被调制光信号包括:将所述调制信号输入空间光调制器,所述空间光调制器包括多个子单元;利用所述调制信号调节所述空间光调制器的所述多个子单元;以及利用被调节的多个子单元对所述成像光信号的空间分布进行调制,以在第二时长内得到与所述调制信号对应的被调制光信号。
根据本公开实施例的示例,其中,在第一时长内利用所述第一数量的被调制光信号形成所述编码图像包括:利用图像探测器在第一时长内对所述第一数量的被调制光信号进行持续采集,并基于所采集的光信号生成所述编码图像。
根据本公开实施例的示例,利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在所述编码图像中识别出待检测物体包括:对所述编码图像进行检测,以确定所述待检测物体在所述编码图像中的位置和类别。
根据本公开实施例的示例,其中,确定所述待检测物体在所述编码图像中的位置和类别包括:基于所述编码图像,确定分别与所述第一数量的调制信号中的各个调制信号对应的多个解码图像;以及确定所述待检测物体在所述多个解码图像中的位置和类别。
根据本公开实施例的示例,其中,与所述运动物体检测算法相匹配的所述调制信号集是通过以下步骤确定的:获取训练数据集,所述训练数据集包括训练图像序列以及所述训练图像序列中的每个训练图像所包含的一个或多 个运动物体的标注位置信息和标注类别;利用所述标注位置信息和标注类别对所述运动物体检测算法进行监督训练,以确定所述调制信号集。
根据本公开实施例的示例,其中,所述运动物体检测算法包括运动编码模块和运动物体检测模块,并且其中,利用所述标注位置信息和标注类别对所述运动物体检测算法进行监督训练,以确定所述调制信号集包括:利用所述运动编码模块,使用编码信号集对所述训练图像序列进行编码,以得到训练编码图像;利用所述运动物体检测模块对所述训练编码图像进行物体检测,以获得检测结果;以及利用所述标注位置信息和所述标注类别对所述检测结果进行监督训练,以得到训练后的编码信号集,并将所述训练后的编码信号集确定为所述调制信号集。
根据本公开实施例的示例,其中,利用所述运动编码模块,使用编码信号集对所述训练图像序列进行编码,以得到训练编码图像包括:将所述编码信号集中的编码信号分别与所述训练图像序列中的对应的训练图像进行逐像素相乘,并对各相乘结果进行求和,以得到所述训练编码图像。
根据本公开实施例的另一方面,提供了一种运动物体检测装置,包括:成像透镜,被配置为接收来自待测场景的光信号,并利用所述光信号对所述待测场景进行光学成像以生成所述待测场景的成像光信号,其中所述待测场景中包括运动的待检测物体;编码单元,被配置为在预定时间段内,利用调制信号集中的调制信号依序对所述成像光信号进行调制以生成被调制光信号,并基于所述被调制光信号生成编码图像;以及检测单元,被配置为利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在所述编码图像中识别出所述待检测物体。
根据本公开实施例的示例,其中,所述调制信号集包括第一数量的调制信号,所述预定时间段具有第一时长,每个调制信号对所述成像光信号进行调制的时长为第二时长,且所述第一时长大于等于所述第一数量与所述第二时长的乘积,并且其中,所述调制单元还被配置为,依序选择所述调制信号集中的第一数量的调制信号中的每个调制信号,并利用所述调制信号对所述成像光信号进行调制,以在第二时长内得到与所述调制信号对应的被调制光信号,并且在第一时长内依序得到分别与所述第一数量的调制信号对应的第一数量的被调制光信号;以及在第一时长内,利用所述第一数量的被调制光信号形成所述编码图像。
根据本公开实施例的示例,其中,所述编码单元包括空间光调制器,所述空间光调制器包括多个子单元,并且被配置为:接收调制信号;利用所述调制信号调节所述空间光调制器的所述多个子单元;以及利用被调节的多个子单元对所述成像光信号的空间分布进行调制,以在第二时长内得到与所述调制信号对应的被调制光信号。
根据本公开实施例的示例,其中,所述编码单元还包括图像探测器,所述图像探测器被配置为:在第一时长内对所述第一数量的被调制光信号进行持续采集,并基于所采集的光信号生成所述编码图像。
根据本公开实施例的示例,其中,所述检测单元还被配置为:利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以确定所述待检测物体在所述编码图像中的位置和类别。
根据本公开实施例的示例,其中,所述检测单元还被配置为:基于所述编码图像,确定分别与所述第一数量的调制信号中的各个调制信号对应的多个解码图像;以及确定所述待检测物体在所述多个解码图像中的位置和类别。
根据本公开实施例的示例,其中,与所述运动物体检测算法相匹配的所述调制信号集是通过以下步骤确定的:获取训练数据集,所述训练数据集包括训练图像序列以及所述训练图像序列中的每个训练图像所包含的一个或多个运动物体的标注位置信息和标注类别;利用所述标注位置信息和标注类别对所述运动物体检测算法进行监督训练,以确定所述调制信号集。
根据本公开实施例的示例,其中,所述运动物体检测算法包括运动编码模块和运动物体检测模块,并且其中,利用所述标注位置信息和标注类别对所述运动物体检测算法进行监督训练以确定所述调制信号集包括:利用所述运动编码模块,使用编码信号集对所述训练图像序列进行编码,以得到训练编码图像;利用所述运动物体检测模块对所述训练编码图像进行物体检测,以获得检测结果;以及利用所述标注位置信息和标注类别对所述检测结果进行监督训练,以得到训练后的编码信号集,并将所述训练后的编码信号集确定为所述调制信号集。
根据本公开实施例的示例,其中,利用所述运动编码模块,使用编码信号集对所述训练图像序列进行编码,以得到训练编码图像包括:将所述编码信号集中的编码信号分别与所述训练图像序列中的对应的训练图像进行逐像素相乘,并对各相乘结果进行求和,以得到训练编码图像。
根据本公开实施例的示例的另一方面,提供了一种运动物体检测装置,包括:成像透镜,被配置为接收来自待测场景的光信号,并利用所述光信号对所述待测场景进行光学成像以生成所述待测场景的成像光信号,其中所述待测场景中包括运动的待检测物体;空间光调制器,被配置为接收调制信号集,并在所述调制信号集的控制下对所述成像光信号进行调制,以生成被调制光信号;图像探测器,被配置为基于所述被调制光信号生成编码图像;以及一个或多个处理器,所述一个或多个处理器被配置为:在预定时间段内,将所述调制信号集中的调制信号依序提供至所述空间光调制器,以控制所述空间光调制器利用所述调制信号对所述成像光信号进行调制以生成被调制光信号,并控制所述图像探测器基于所述被调制光信号生成编码图像;利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在所述编码图像中识别出待检测物体。
根据本公开实施例上述各个方面的运动物体检测算法和装置,通过利用调制信号集依序对待测场景的成像光信号进行调制来得到编码图像,并利用与调制信号集匹配的运动物体检测算法对编码图像进行物体检测,能够在无需高速相机的情况下,从单个编码图像中识别出待检测运动物体的类别和按照时间顺序的多组位置信息,从而实现对高速运动物体的跟踪检测,大大提高了运动物体的检测效率,并且降低了系统的成本和数据负担。
附图说明
通过结合附图对本公开实施例进行更详细的描述,本公开实施例的上述以及其它目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1示出了根据本公开实施例的示例的运动物体检测系统的整体架构;
图2示出了根据本公开实施例的运动物体检测方法的流程图;
图3示出了根据本公开实施例的示例的运动物体检测算法的训练过程的示意图;
图4A示出了根据本公开实施例的示例的检测结果的示意图;
图4B示出了根据本公开实施例的另一示例的检测结果的示意图;
图4C示出了根据本公开实施例的另一示例的检测结果的示意图;
图5示出了根据本公开实施例的运动物体检测装置的结构示意图;以及
图6示出了根据本公开实施例的运动物体检测装置的结构示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
在实际应用中,经常需要对运动物体进行跟踪检测,尤其是对于高速运动物体,往往需要利用高速相机来拍摄用于运动物体检测的视频,以解决物体的高速运动所带来的运动模糊问题。本公开实施例提供一种运动物体检测方法,能够在无需高速相机的情况下,实现对高速运动物体的跟踪检测。在根据本公开实施例的运动物体检测方法中,能够在任意预定时间段(例如,远远大于高速相机的曝光时间)内获得包含运动物体的待测场景的单个编码图像,并从单个编码图像中检测出运动物体的类别和按照时间顺序的多组位置信息,从而实现对运动物体尤其是高速运动物体的高效检测,大大提高了运动物体的检测效率,并且降低了系统的成本和数据负担。
根据本公开实施例的运动物体检测方法和装置例如可以实现为包括硬件部分和软件部分的运动物体检测系统。图1示出了根据本公开实施例的示例运动物体检测系统的整体架构,其中硬件部分可以包括用于生成包含待检测物体的待检测场景的编码图像的成像系统110,软件部分可以包括用于对编码图像进行检测以识别待检测物体的运动物体检测算法120。图1中还示出了成像系统110和运动物体检测算法120的示例结构,如图1所示,成像系统110例如可以包括成像透镜、空间光调制器、图像探测器、中继透镜以及其他需要的器件等,运动物体检测算法120例如可以包括运动物体检测模块121等,但本公开实施例不限于此,成像系统110和运动物体检测算法120也可以包括其他所需的器件或结构。如图1所示,将与运动物体检测算法120相匹配的调制信号集输入成像系统110,利用成像系统110生成待测场景的编码图像,然后,利用运动物体检测算法120对编码图像进行检测,以得到待检测物体的检测结果。
下面,参照图2具体描述根据本公开实施例的运动物体检测方法。图2示出了根据本公开实施例的运动物体检测方法200的流程图。
如图2所示,在步骤S210中,接收来自待测场景的光信号,并利用光信号对待测场景进行光学成像以生成待测场景的成像光信号。其中,待测场景中包括运动的待检测物体,例如运动的自行车、飞驰的汽车、翱翔的飞机、奔跑的动物等任意种类或数量的运动物体,这里,本公开实施例对待测场景中的待检测物体的种类和数量不作具体限定。根据本公开实施例的示例,可以利用成像透镜来接收待测场景的光信号,并利用接收到的光信号对待测场景进行光学成像以生成待检测场景的成像光信号。成像透镜可以是成像系统的一部分,并且例如可以是凸透镜、凹透镜或者它们的各种组合,本公开实施例对此不作具体限定。所生成的成像光信号为表示待测场景的、随时间变化的二维光场信号,则成像光信号也可以称为时变成像光信号,其可以实时反映成像期间待测场景中的变化,例如待测场景中的待检测物体的运动过程。
在步骤S220中,在预定时间段内,利用调制信号集中的调制信号依序对成像光信号进行调制以生成被调制光信号,并基于被调制光信号生成,以得到编码图像。这里,预定时间段具有第一时长,第一时长例如可以是成像系统的曝光时间,并且可以根据实际需求任意设置,例如,可以将第一时长设置为远远大于高速相机的曝光时间,或者可以设置为其他合适的时长,本公开实施例对此不作具体限定。
根据本公开实施例的示例,调制信号集包括第一数量的调制信号,其中每个调制信号可以是与作为二维光场信号的成像光信号相对应的二维矩阵,例如,由0和1构成的二维矩阵。这里,第一数量可以根据实际应用需求进行设置,本公开实施例对此不作具体限定。根据本公开实施例的示例,可以通过对运动物体检测算法进行机器学习训练来确定调制信号集,则所确定的调制信号集与运动物体检测算法是相匹配的,这将在下文进行进一步的详细描述。
根据本公开实施例的示例,调制信号集中的每个调制信号对成像光信号进行调制的时长为第二时长,并且预定时间段的第一时长大于等于第一数量与第二时长的乘积。也就是说,在预定时间段内,可以分别利用不同的调制信号对成像光信号进行多次调制。例如,在预定时间段的第一时长等于第一数量与第二时长的乘积时,在预定时间段内,可以利用第一数量的调制信号 对成像光信号进行多次调制,且调制次数等于第一数量。具体地,可以依序选择调制信号集中的第一数量的调制信号中的每个调制信号,并利用所选择的调制信号对成像光信号进行调制,如前所述,利用每个调制信号对成像光信号进行调制的时长为第二时长,则可以在第二时长内得到与所选择的调制信号对应的被调制光信号;在预定时间段内连续地进行多次调制,从而在预定时间段的第一时长内得到分别与第一数量的调制信号对应的第一数量的被调制光信号。随后,在第一时长内,利用所得到的第一数量的被调制光信号形成编码图像。
根据本公开实施例的示例,可以采用空间光调制器来利用调制信号对成像光信号进行调制。空间光调制器是一种对光波的空间分布进行调制的器件,其可以包含排列成一维阵列或二维阵列的多个独立的子单元,每个子单元都可以独立地在光信号或者电信号的控制下改变自身的光学性质,例如反射率、折射率、透过率等等,进而可以利用这些子单元对通过它们的光波的空间分布进行调制。例如,空间光调制器可以对光波的振幅、强度、相位、偏振态等光学参量进行调制。空间光调制器可以是液晶型空间光调制器或者数字微透镜阵列等等,本公开实施例对此不作具体限定。在本公开实施例中,调制信号集中的各个调制信号可以作为空间光调制器的控制信号。具体地,将调制信号输入空间光调制器,并利用调制信号来调节空间光调制器的多个子单元,例如,在调制信号为二维矩阵的情况下,可以利用调制信号矩阵中的各个元素的值来分别控制空间光调制器的多个子单元,以调节多个子单元的光学性质。从而,可以利用调节后的多个子单元对成像光信号的空间分布进行调制,例如,可以对成像光信号的振幅、强度、相位、偏振态等光学参量进行调制,以在长度为第二时长的调制时间内得到与输入的调制信号对应的被调制光信号。
需要说明的是,虽然在上述示例中采用了空间光调制器来对成像光信号进行调制,但本公开实施例不限于此,也可以采用能够改变成像光信号的空间分布的任何其他器件。
如上所述,在预定时间段的第一时长内得到与第一数量的调制信号对应的第一数量的被调制光信号之后,利用所得到的第一数量的被调制光信号形成编码图像。根据本公开实施例的示例,可以利用图像探测器在第一时长内对第一数量的被调制光信号进行持续采集,并基于所采集的光信号生成编码 图像。这里,图像探测器可以是任意能够将光信号转换为电信号的器件,例如电荷耦合元件(CCD)传感器、互补金属氧化物半导体(CMOS)传感器等等,本公开实施例对此不作具体限定。例如,图像探测器可以通过中继透镜在第一时长内对被调制光信号进行持续采集(例如,可以称为曝光),并对采集到的光信号进行光电转换等处理后生成编码图像。中继透镜例如可以是凸透镜、凹透镜或者它们的各种组合,本公开实施例对此不作具体限定。通常,图像探测器的曝光时间要远大于空间光调制器的调制速度,因此,在采用空间光调制器和图像探测器的情况下,根据本公开实施例的成像系统的曝光时间将取决于图像探测器的曝光时间,因此可以将预定时间段的第一时长设置为图像探测器的曝光时间。
得到编码图像之后,在步骤S230中,可以利用与调制信号集相匹配的运动物体检测算法对编码图像进行检测,以在编码图像中识别出待检测物体。根据本公开实施例的示例,运动物体检测算法可以包括运动物体检测模块,运动物体检测模块例如可以包括运动解码模块和物体检测模块,以分别对编码图像进行运动解码和物体检测。在本公开实施例中,运动解码模块和物体检测模块均可以用神经网络来构建,并且物体检测模块可以采用基于神经网络的对象识别算法来实现,例如区域卷积神经网络(RCNN)、高速区域卷积神经网络(Faster-RCNN)、单次多框检测(SSD)等等,本公开实施例对此不作具体限定。如前所述,可以通过对运动物体检测算法进行机器学习训练来确定调制信号集,则所确定的调制信号集与运动物体检测算法是相匹配的。
根据本公开实施例的示例,在利用运动物体检测算法对编码图像进行检测时,可以识别出待检测物体的类别。例如,在待检测物体为运动的汽车时,通过对编码图像进行检测,可以识别出待检测物体的类别为“汽车”。另外,利用根据本公开实施例的运动物体检测方法,在对编码图像进行检测时,还可以确定待检测物体在编码图像中的位置。在识别出待检测物体的类别和位置之后,可以在编码图像中标注待检测物体的类别,例如可以使用边框来选定所识别出的待检测物体,并在边框处标注其类别(例如,标注为“汽车”)。
此外,根据本公开实施例的运动物体检测方法还可以从单个编码图像中检测出待检测物体的多组位置信息和类别。具体地,在利用运动物体检测算法对编码图像进行解码时,可以基于单个编码图像获得多个解码图像,多个解码图像可以分别与第一数量的调制信号中的各个调制信号相对应。根据本 公开实施例的示例,从单个编码图像中所获得的解码图像的数量可以等于调制信号的第一数量。分别对所获得的多个解码图像进行物体检测,可以确定待检测物体在多个解码图像中的每个解码图像中的位置和类别。由于待测场景的成像光信号是随时间变化的,并且第一数量的调制信号中的各个调制信号是在预定时间段内被依序选择用于对成像光信号进行调制以生成编码图像,则分别与各个调制信号对应的多个解码图像对应于不同时刻的待测场景,因此,待检测物体在多个解码图像中的位置可以反映出待测场景中的待检测物体的运动轨迹。
下面参照图3进一步描述对运动物体检测算法进行训练以得到编码信号集的方法。图3示出了根据本公开实施例的示例的运动物体检测算法的训练过程的示意图。如图3所示,运动物体检测算法可以包括运动物体检测模块320,并且运动运动物体检测模块320例如可以进一步包括运动解码模块321和物体检测模块322,以分别对编码图像进行运动解码和物体检测。运动解码模块321和物体检测模块322可以利用神经网络来构建,例如,可以包括残差块、卷积块等网络结构,本公开实施例对此不作具体限定。另外,物体检测模块322可以采用基于神经网络的对象识别算法来实现,例如RCNN、Faster-RCNN、SSD等等,本公开实施例对此不作具体限定。此外,运动物体检测算法还可以包括运动编码模块310,运动编码模块310是对包括成像透镜、空间光调制器、图像探测器等运动物体检测系统的硬件部分的数学描述,即,运动编码模块310可以对生成编码图像的物理过程进行模拟。因此,可以通过对包括运动编码模块310的运动物体检测算法进行机器学习训练,来得到适用于运动物体检测系统的调制信号集。
在对运动物体检测算法进行训练时,首先,获取用于对运动物体检测算法进行训练的训练数据集,训练数据集可以包括训练图像序列,并且训练图像序列中的每个训练图像包含一个或多个运动物体的标注位置信息和标注类别。这里,训练图像序列是在时间上连续的多个训练图像的集合,例如,训练图像序列可以是一段视频信号,视频信号的每一帧图像作为训练图像序列中的每一个训练图像。每一次训练所使用的训练图像的数量可以等于上文所述的调制信号的第一数量,也可以大于或等于第一数量,本公开实施例对此不作具体限定。例如,在训练数据集中包括的训练图像序列具有80幅训练图像并且第一数量为8的情况下,每一次训练可以选用8幅训练图像。训练数 据集可以来自公开的用于视频目标检测的已标注数据集,例如图像网络视频目标检测数据集(ImageNet VID)等等。然后,利用训练数据集中的标注位置信息和标注类别对运动物体检测算法进行监督训练,以确定调制信号集。
具体地,如图3所示,首先利用运动编码模块310,使用编码信号集对训练图像序列进行编码,以得到训练编码图像。如前所述,运动编码模块是对生成编码图像的物理过程的数学模拟,在上述步骤S220的示例中,空间光调制器在预定时间段内依序选择调制信号集中的调制信号对时变成像光信号进行调制,生成第一数量的被调制光信号,然后图像探测器对第一数量的被调制光信号进行持续采集以生成编码图像,该步骤相当于对预定时间段内的时变成像光信号与调制信号集进行相乘求和的过程。因而,在对运动物体检测算法进行训练时,可以将编码信号集中的编码信号分别与训练图像序列中对应的训练图像进行逐像素相乘,并对各相乘结果进行求和,以得到训练编码图像。这里,在训练过程中使用的编码信号集对应于上述步骤S220中的调制信号集,也就是说,训练后的编码信号集可以作为运动物体检测系统中的成像系统的调制信号集来对成像光信号进行调制。如图3中的示例所示,将包括运动的猎犬的训练图像序列与编码信号集进行相乘和求和,以得到训练编码图像。可以看到,在经过运动编码之后,包括清晰的猎犬影像的一系列训练图像被编码为一副图像,其中猎犬影像变得模糊。这里,编码信号集中的编码信号可以设定为任意合适的初始值,本公开实施例对此不做具体限制。
随后,利用运动物体检测模块320对训练编码图像进行物体检测,以获得检测结果。具体地,可以首先利用运动物体检测模块320的运动解码模块321对训练编码图像进行解码,以获得多个解码图像,解码图像的数量例如可以与编码信号集中的编码信号的数量相对应,并且与训练图像序列中的训练图像的数量相对应;然后,利用运动物体检测模块320的物体检测模块322对所获得的多个解码图像进行物体检测,以获得检测结果,检测结果例如可以是一个或多个运动物体的类别和在多个解码图像中的多组位置信息等等。
由于训练图像序列中的各个训练图像所包括的一个或多个运动物体的类别和位置信息是已标注的,因此,可以利用一个或多个运动物体的标注类别和标注位置信息对上述检测结果进行监督训练。例如,可以计算标注类别和标注位置信息与检测结果的误差,并通过使误差最小来对运动物体检测算法进行监督训练,从而不断优化编码信号集以及运动物体检测模块中的各个网 络参数,直至得到最优的编码信号集以及最优的网络参数。
在对运动物体检测算法进行机器学习训练得到最优的编码信号集以及最优的网络参数之后,则固定运动物体检测算法。可以将训练得到的编码信号集作为调制信号集应用于根据本公开实施例的运动物体检测系统,以对包含运动的待检测物体的待测场景进行成像和调制来生成编码图像;然后,利用已固定的运动物体检测算法对编码图像进行检测,以从编码图像中识别处待检测物体,具体步骤如上文参照图2描述的运动物体检测方法200的步骤所述,在此不再赘述。
图4A-4C示出了利用根据本公开实施例的运动物体检测方法得到的检测结果的示例。在图4A的示例中,(a)为利用根据本公开实施例的运动物体检测方法得到的编码图像,其中包括模糊的自行车影像,该编码图像是分别利用八个调制信号对包含运动的自行车的待测场景的时变成像光信号进行调制并进行图像采集得到的图像,也就是说,在上述步骤S220中的调制信号集的第一数量为8。在利用根据本公开实施例的运动物体检测方法对编码图像进行检测后,可以从单个编码图像中识别出自行车的类别以及相应的八组位置信息。为了便于说明,以利用高速相机拍摄的自行车的高清图像序列为背景,示出了如(b)-(i)所示的检测结果,其中在各图像中以矩形框标注了自行车的不同位置并标注了自行车的类别。在(b)-(i)中,按照时间顺序标注了利用根据本公开实施例的运动物体检测方法所确定的自行车的不同位置和类别,即恢复了自行车的运动轨迹,实现了对运动的自行车的跟踪检测。
在图4B的示例中,(a)为利用根据本公开实施例的运动物体检测方法得到的编码图像,其中包括模糊的汽车影像,该编码图像是分别利用八个编码信号对包含运动的汽车的待测场景的时变成像光信号进行调制并进行图像采集得到的图像,也就是说,在上述步骤S220中的调制信号集的第一数量为8。在利用根据本公开实施例的运动物体检测方法对编码图像进行检测后,可以从单个编码图像中识别出汽车的类别以及相应的八组位置信息,如(b)-(i)所示。类似地,为了便于说明,以利用高速相机拍摄的汽车的高清图像序列作为(b)-(i)的背景。在(b)-(i)中,按照时间顺序标注了利用根据本公开实施例的运动物体检测方法所确定的汽车的不同位置和类别,即恢复了汽车的运动轨迹,实现了对运动的汽车的跟踪检测。
另外,根据本公开实施例的运动物体检测方法能够实现同时对多个运动 物体进行跟踪检测。在图4C的示例中,(a)为利用根据本公开实施例的运动物体检测方法得到的编码图像,其中包括多个模糊的飞机影像,该编码图像是分别利用八个编码信号对包含运动的飞机的待测场景的时变成像光信号进行调制并进行图像采集得到的图像,也就是说,在上述步骤S220中的调制信号集的第一数量为8。在利用根据本公开实施例的运动物体检测方法对编码图像进行检测后,可以从单个编码图像中识别出每架飞机的类别以及每架飞机的相应的八组位置信息,如(b)-(i)所示。类似地,为了便于说明,以利用高速相机拍摄的飞机的高清图像序列作为(b)-(i)的背景。在(b)-(i)中,按照时间顺序标注了利用根据本公开实施例的运动物体检测方法所确定的各个飞机的不同位置和类别,即同时恢复了多架飞机的运动轨迹,实现了对高速运动的飞机的跟踪检测。
从上文的描述以及图4A-4C的示例可知,在利用根据本公开实施例的运动物体检测方法时,可以在较长的曝光时间内生成运动物体的单个编码图像,并且能够从单个编码图像中检测出运动物体的类别以及按照时间顺序的多组位置信息,从而使得物体检测的效率大大提高,尤其是在对高速运动物体进行跟踪检测时,使用根据本公开实施例的运动物体检测方法能够在仅仅捕获少数编码图像的情况下,恢复高速运动物体在长时间内的运动轨迹,并且无需使用高速相机,在实现高速运动物体的高效检测的同时,大幅降低了系统的成本和数据负担。
下面,参照图5描述根据本公开实施例的运动物体检测装置。图5示出了根据本公开实施例的运动物体检测装置500的结构示意图。由于运动物体检测装置500与上文结合图2描述的运动物体检测方法200的细节相同,因此在这里为了简单起见,省略对相同内容的详细描述。如图5所示,运动物体检测装置500包括成像透镜510、编码单元520以及检测单元530。除了这三个单元以外,装置500还可以包括其他部件,然而,由于这些部件与本公开实施例的内容无关,因此在这里省略其图示和描述。
成像透镜510被配置为接收来自待测场景的光信号,并利用光信号对待测场景进行光学成像以生成待测场景的成像光信号。其中,待测场景中包括运动的待检测物体,例如运动的自行车、飞驰的汽车、翱翔的飞机、奔跑的动物等任意种类或数量的运动物体,这里,本公开实施例对待检测物体的种类和数量不作具体限定。成像透镜可以是成像系统的一部分,并且例如可以 是凸透镜、凹透镜或者它们的各种组合,本公开实施例对此不作具体限定。所生成的成像光信号为表示待测场景的、随时间变化的二维光场信号,则成像光信号也可以称为时变成像光信号,其可以实时反映成像期间待测场景中的变化,例如待测场景中的待检测物体的运动过程。
编码单元520被配置为在预定时间段内,利用调制信号集中的调制信号依序对成像光信号进行调制,以得到编码图像。这里,预定时间段具有第一时长,第一时长例如可以是成像系统的曝光时间,并且可以根据实际需求任意设置,例如,可以将第一时长设置为远远大于高速相机的曝光时间,或者可以设置为其他合适的时长,本公开实施例对此不作具体限定。
根据本公开实施例的示例,调制信号集包括第一数量的调制信号,其中每个调制信号可以是与作为二维光场信号的成像光信号相对应的二维矩阵,例如,由0和1构成的二维矩阵。这里,第一数量可以根据实际应用需求进行设置,本公开实施例对此不作具体限定。根据本公开实施例的示例,可以通过对运动物体检测算法进行机器学习训练来确定调制信号集,则所确定的调制信号集与运动物体检测算法是相匹配的。对运动物体检测算法进行机器学习训练以确定调制信号集的细节与上文参照图3描述的过程类似,因此这里省略对相同内容的重复描述。
根据本公开实施例的示例,调制信号集中的每个调制信号对成像光信号进行调制的时长为第二时长,并且预定时间段的第一时长大于等于第一数量与第二时长的乘积。也就是说,在预定时间段内,编码单元520可以分别利用不同的调制信号对成像光信号进行多次调制。例如,在预定时间段的第一时长等于第一数量与第二时长的乘积时,在预定时间段内,编码单元520可以利用第一数量的调制信号对成像光信号进行多次调制,且调制次数等于第一数量。具体地,编码单元520可以依序选择调制信号集中的第一数量的调制信号中的每个调制信号,并利用所选择的调制信号对成像光信号进行调制,如前所述,利用每个调制信号对成像光信号进行调制的时长为第二时长,则可以在第二时长内得到与所选择的调制信号对应的被调制光信号;编码单元520在预定时间段内连续地进行多次调制,从而在预定时间段的第一时长内得到分别与第一数量的调制信号对应的第一数量的被调制光信号。随后,在第一时长内,利用所得到的第一数量的被调制光信号形成编码图像。
根据本公开实施例的示例,编码单元520例如可以包括空间光调制器。 空间光调制器是一种对光波的空间分布进行调制的器件,其可以包含排列成一维阵列或二维阵列的多个独立的子单元,每个子单元都可以独立地在光信号或者电信号的控制下改变自身的光学性质,例如反射率、折射率、透过率等等,进而可以利用这些子单元对通过它们的光波的空间分布进行调制。例如,空间光调制器可以对光波的振幅、强度、相位、偏振态等光学参量进行调制。空间光调制器可以是液晶型空间光调制器或者数字微透镜阵列等等,本公开实施例对此不作具体限定。在本公开实施例中,调制信号集中的各个调制信号可以作为空间光调制器的控制信号。具体地,空间光调制器被配置为接收调制信号,并利用调制信号来调节空间光调制器的多个子单元,例如,在调制信号为二维矩阵的情况下,可以利用调制信号矩阵中的各个元素的值来分别控制空间光调制器的多个子单元,以调节多个子单元的光学性质。从而,空间光调制器可以利用调节后的多个子单元对成像光信号的空间分布进行调制,例如,可以对成像光信号的振幅、强度、相位、偏振态等光学参量进行调制,以在长度为第二时长的调制时间内得到与输入的调制信号对应的被调制光信号。
需要说明的是,虽然在上述示例中采用了空间光调制器来对成像光信号进行调制,但本公开实施例不限于此,编码单元520也可以包括能够改变成像光信号的空间分布的任何其他器件。
根据本公开实施例的示例,编码单元520还可以包括图像探测器,图像探测器被配置为在第一时长内对第一数量的被调制光信号进行持续采集,并基于所采集的光信号生成编码图像。这里,图像探测器可以是任意能够将光信号转换为电信号的器件,例如电荷耦合元件(CCD)传感器、互补金属氧化物半导体(CMOS)传感器等等,本公开实施例对此不作具体限定。例如,图像探测器可以通过中继透镜在第一时长内对被调制光信号进行持续采集(例如,可以称为曝光),并对采集到的光信号进行光电转换等处理后生成编码图像。中继透镜例如可以是凸透镜、凹透镜或者它们的各种组合,本公开实施例对此不作具体限定。通常,图像探测器的曝光时间要远大于空间光调制器的调制速度,因此,在采用空间光调制器和图像探测器的情况下,根据本公开实施例的成像系统的曝光时间将取决于图像探测器的曝光时间,因此预定时间段的第一时长可以等于图像探测器的曝光时间。
检测单元530被配置为利用与调制信号集相匹配的运动物体检测算法对 编码图像进行检测,以在编码图像中识别出待检测物体。根据本公开实施例的示例,运动物体检测算法可以包括运动物体检测模块。其中,运动物体检测模块例如可以包括运动解码模块和物体检测模块,以分别对编码图像进行运动解码和物体检测。运动解码模块和物体检测模块均可以用神经网络来构建,并且物体检测模块可以采用基于神经网络的对象识别算法来实现,例如RCNN、Faster-RCNN、SSD等等,本公开实施例对此不作具体限定。如前所述,可以通过对运动物体检测算法进行机器学习训练来确定调制信号集,则所确定的调制信号集与运动物体检测算法是相匹配的。
根据本公开实施例的示例,检测单元530在利用运动物体检测算法对编码图像进行检测时,可以识别出待检测物体的类别。例如,在待检测物体为运动的汽车时,通过对编码图像进行检测,可以识别出待检测物体的类别为“汽车”。另外,利用根据本公开实施例的运动物体检测装置,检测单元530在对编码图像进行检测时,还可以确定待检测物体在编码图像中的位置。在识别出待检测物体的类别和位置之后,可以在编码图像中标注待检测物体的类别,例如可以使用边框来选定所识别出的待检测物体,并在边框处标注其类别(例如,标注为“汽车”)。
另外,根据本公开实施例的运动物体检测装置可以从单个编码图像中检测出待检测物体的多组位置信息。具体地,在检测单元530利用运动物体检测算法对编码图像进行解码时,可以基于单个编码图像获得多个解码图像,多个解码图像可以分别与第一数量的调制信号中的各个调制信号相对应。根据本公开实施例的示例,从单个编码图像中所获得的解码图像的数量可以等于调制信号的第一数量。检测单元530分别对所获得的多个解码图像进行物体检测,可以确定待检测物体在多个解码图像中的位置和类别。由于待测场景的成像光信号是随时间变化的,并且第一数量的调制信号中的各个调制信号是在预定时间段内被依序选择用于对成像光信号进行调制以生成编码图像,则分别与各个调制信号对应的多个解码图像对应于不同时刻的待测场景,因此,待检测物体在多个解码图像中的位置可以反映出待检测物体的运动轨迹,如图4A-4C中的示例检测结果所示。
利用根据本公开上述实施例的运动物体检测装置,可以在较长的曝光时间内生成运动物体的单个编码图像,并且能够从单个编码图像中检测出运动物体的类别以及按照时间顺序的多组位置信息,从而使得物体检测的效率大 大提高,尤其是在对高速运动物体进行跟踪检测时,使用根据本公开实施例的运动物体检测装置能够在仅仅捕获少数编码图像的情况下,恢复高速运动物体在长时间内的运动轨迹,并且无需使用高速相机,在实现高速运动物体的高效检测的同时,大幅降低了系统的成本和数据负担。
下面,参照图6描述根据本公开实施例的运动物体检测装置。图6示出了根据本公开实施例的运动物体检测装置600的结构示意图。由于运动物体检测装置600与上文结合图2描述的运动物体检测方法200的细节相同,因此在这里为了简单起见,省略对相同内容的详细描述。如图6所示,运动物体检测装置600可以包括成像透镜610、空间光调制器620、图像探测器630以及一个或多个处理器640。除了这四个单元以外,运动物体检测装置600还可以包括其他部件,例如,一个或多个存储设备、输入/输出组件等等,本公开实施例对此不作具体限定。
成像透镜610被配置为接收来自待测场景的光信号,并利用光信号对待测场景进行光学成像以生成待测场景的成像光信号。这里,生成成像光信号的步骤与上文参照图2描述的运动物体检测方法的步骤S210、以及参照图5描述的成像透镜510的功能的细节类似,因此这里为了简单起见,省略对相同内容的重复描述。空间光调制器620被配置为接收调制信号集,并在调制信号集的控制下对成像光信号进行调制,以生成被调制光信号;图像探测器630被配置为基于被调制光信号生成编码图像,这里,生成编码图像的步骤与上文参照图2描述的运动物体检测方法的步骤S220、以及参照图5描述的编码单元520的功能的细节类似,因此这里为了简单起见,省略对相同内容的重复描述。
一个或多个处理器640被配置为在预定时间段内,将调制信号集中的调制信号依序提供至空间光调制器620,以控制空间光调制器620利用调制信号对成像光信号进行调制以生成被调制光信号,并控制图像探测器630基于被调制光信号来生成编码图像。此外,一个或多个处理器640还被配置为利用与所述编码信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在编码图像中识别出待检测物体。这里,对编码图像进行检测的步骤与上文参照图2描述的运动物体检测方法的步骤S230、以及参照图5描述的检测单元530的功能的细节类似,因此这里为了简单起见,省略对相同内容的重复描述。
本领域技术人员能够理解,本公开所披露的内容可以出现多种变型和改进。例如,以上所描述的各种设备或组件可以通过硬件实现,也可以通过软 件、固件、或者三者中的一些或全部的组合实现。
此外,如本公开和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。
此外,本公开中使用了流程图用来说明根据本公开实施例的实施例的系统所执行的操作。应当理解的是,前面或下面操作不一定按照顺序来精确地执行。相反,可以按照倒序或同时处理各种步骤。同时,也可以将其他操作叠加到这些过程中,或从这些过程移除某一步或数步操作。
除非另有定义,这里使用的所有术语(包括技术和科学术语)具有与本公开所属领域的普通技术人员共同理解的相同含义。还应当理解,诸如在通常字典里定义的那些术语应当被解释为具有与它们在相关技术的上下文中的含义相一致的含义,而不应用理想化或极度形式化的意义来解释,除非这里明确地这样定义。
以上对本公开进行了详细说明,但对于本领域技术人员而言,显然,本公开并非限定于本说明书中说明的实施方式。本公开在不脱离由权利要求书的记载所确定的本公开的宗旨和范围的前提下,可以作为修改和变更方式来实施。因此,本说明书的记载是以示例说明为目的,对本公开而言并非具有任何限制性的意义。

Claims (10)

  1. 一种运动物体检测方法,包括:
    接收来自待测场景的光信号,并利用所述光信号对所述待测场景进行光学成像以生成所述待测场景的成像光信号,其中所述待测场景中包括运动的待检测物体;
    在预定时间段内,利用调制信号集中的调制信号依序对所述成像光信号进行调制以生成被调制光信号,并基于所述被调制光信号生成编码图像;以及
    利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在所述编码图像中识别出所述待检测物体。
  2. 如权利要求1所述的运动物体检测方法,
    其中,所述调制信号集包括第一数量的调制信号,所述预定时间段具有第一时长,每个调制信号对所述成像光信号进行调制的时长为第二时长,且所述第一时长大于等于所述第一数量与所述第二时长的乘积;
    其中,在预定时间段内利用调制信号集中的调制信号依序对所述成像光信号进行调制以得到编码图像包括:
    依序选择所述调制信号集中的第一数量的调制信号中的每个调制信号,并利用所述调制信号对所述成像光信号进行调制,以在第二时长内得到与所述调制信号对应的被调制光信号,并且在第一时长内依序得到分别与所述第一数量的调制信号对应的第一数量的被调制光信号;
    在第一时长内,利用所述第一数量的被调制光信号形成所述编码图像。
  3. 如权利要求2所述的运动物体检测方法,其中,利用所述调制信号对所述成像光信号进行调制以在第二时长内得到与所述调制信号对应的被调制光信号包括:
    将所述调制信号输入空间光调制器,所述空间光调制器包括多个子单元;
    利用所述调制信号调节所述空间光调制器的所述多个子单元;以及
    利用被调节的多个子单元对所述成像光信号的空间分布进行调制,以在第二时长内得到与所述调制信号对应的被调制光信号。
  4. 如权利要求2所述的运动物体检测方法,其中,在第一时长内利用所 述第一数量的被调制光信号形成所述编码图像包括:
    利用图像探测器在第一时长内对所述第一数量的被调制光信号进行持续采集,并基于所采集的光信号生成所述编码图像。
  5. 如权利要求1所述的运动物体检测方法,利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在所述编码图像中识别出待检测物体包括:
    对所述编码图像进行检测,以确定所述待检测物体在所述编码图像中的位置和类别。
  6. 如权利要求5所述的运动物体检测方法,其中,确定所述待检测物体在所述编码图像中的位置和类别包括:
    基于所述编码图像,确定分别与所述第一数量的调制信号中的各个调制信号对应的多个解码图像;以及
    确定所述待检测物体在所述多个解码图像中的位置和类别。
  7. 如权利要求1所述的运动物体检测方法,其中,与所述运动物体检测算法相匹配的所述调制信号集是通过以下步骤确定的:
    获取训练数据集,所述训练数据集包括训练图像序列以及所述训练图像序列中的每个训练图像所包含的一个或多个运动物体的标注位置信息和标注类别;
    利用所述标注位置信息和标注类别对所述运动物体检测算法进行监督训练,以确定所述调制信号集。
  8. 如权利要求7所述的运动物体检测方法,其中,所述运动物体检测算法包括运动编码模块和运动物体检测模块,并且其中,利用所述标注位置信息和标注类别对所述运动物体检测算法进行监督训练以确定所述调制信号集包括:
    利用所述运动编码模块,使用编码信号集对所述训练图像序列进行编码,以得到训练编码图像;
    利用所述运动物体检测模块对所述训练编码图像进行物体检测,以获得检测结果;以及
    利用所述标注位置信息和所述标注类别对所述检测结果进行监督训练,以得到训练后的编码信号集,并将所述训练后的编码信号集确定为所述调制信号集。
  9. 如权利要求8所述的运动物体检测方法,其中,利用所述运动编码模块,使用编码信号集对所述训练图像序列进行编码,以得到训练编码图像包括:
    将所述编码信号集中的编码信号分别与所述训练图像序列中的对应的训练图像进行逐像素相乘,并对各相乘结果进行求和,以得到所述训练编码图像。
  10. 一种运动物体检测装置,包括:
    成像透镜,被配置为接收来自待测场景的光信号,并利用所述光信号对所述待测场景进行光学成像以生成所述待测场景的成像光信号,其中所述待测场景中包括运动的待检测物体;
    编码单元,被配置为在预定时间段内,利用调制信号集中的调制信号依序对所述成像光信号进行调制以生成被调制光信号,并基于所述被调制光信号生成编码图像;以及
    检测单元,被配置为利用与所述调制信号集相匹配的运动物体检测算法对所述编码图像进行检测,以在所述编码图像中识别出所述待检测物体。
PCT/CN2022/070677 2021-01-08 2022-01-07 运动物体检测方法和装置 WO2022148423A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110022668.6A CN112784711A (zh) 2021-01-08 2021-01-08 运动物体检测方法和装置
CN202110022668.6 2021-01-08

Publications (1)

Publication Number Publication Date
WO2022148423A1 true WO2022148423A1 (zh) 2022-07-14

Family

ID=75756830

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070677 WO2022148423A1 (zh) 2021-01-08 2022-01-07 运动物体检测方法和装置

Country Status (2)

Country Link
CN (1) CN112784711A (zh)
WO (1) WO2022148423A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112784711A (zh) * 2021-01-08 2021-05-11 清华大学 运动物体检测方法和装置
CN113630517B (zh) * 2021-10-08 2022-01-25 清华大学 光电感算一体光场智能成像方法及装置
CN114279330B (zh) * 2021-12-27 2023-11-21 中国科学院合肥物质科学研究院 一种高速调制与同步采集关联成像方法及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150276387A1 (en) * 2014-02-14 2015-10-01 Palo Alto Research Center Incorporated Spatial modulation of light to determine object position
CN110175971A (zh) * 2019-05-27 2019-08-27 大连海事大学 一种多光谱单像素成像的深度学习图像重构方法
CN111562588A (zh) * 2019-02-13 2020-08-21 英飞凌科技股份有限公司 用于检测大气颗粒的存在的方法、装置和计算机程序
CN112784711A (zh) * 2021-01-08 2021-05-11 清华大学 运动物体检测方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150276387A1 (en) * 2014-02-14 2015-10-01 Palo Alto Research Center Incorporated Spatial modulation of light to determine object position
CN111562588A (zh) * 2019-02-13 2020-08-21 英飞凌科技股份有限公司 用于检测大气颗粒的存在的方法、装置和计算机程序
CN110175971A (zh) * 2019-05-27 2019-08-27 大连海事大学 一种多光谱单像素成像的深度学习图像重构方法
CN112784711A (zh) * 2021-01-08 2021-05-11 清华大学 运动物体检测方法和装置

Also Published As

Publication number Publication date
CN112784711A (zh) 2021-05-11

Similar Documents

Publication Publication Date Title
WO2022148423A1 (zh) 运动物体检测方法和装置
Cheng et al. Memory-efficient network for large-scale video compressive sensing
Ma et al. SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer
Iliadis et al. Deep fully-connected networks for video compressive sensing
US11704775B2 (en) Bright spot removal using a neural network
Wang et al. Metasci: Scalable and adaptive reconstruction for video compressive sensing
Duan et al. EventZoom: Learning to denoise and super resolve neuromorphic events
Pu et al. Robust high dynamic range (hdr) imaging with complex motion and parallax
Xu et al. Compressed domain image classification using a dynamic-rate neural network
WO2023138629A1 (zh) 加密图像信息获取装置及方法
Duan et al. Guided event filtering: Synergy between intensity images and neuromorphic events for high performance imaging
Wang et al. Efficientsci: Densely connected network with space-time factorization for large-scale video snapshot compressive imaging
CN110942097A (zh) 基于单像素探测器的免成像分类方法和系统
CN105554354A (zh) 一种高清摄像头
Zhang et al. From compressive sampling to compressive tasking: retrieving semantics in compressed domain with low bandwidth
Ahmad et al. Person re-identification without identification via event anonymization
Sibechi et al. Exploiting temporality for semi-supervised video segmentation
Kumawat et al. Action recognition from a single coded image
Zhang et al. Progressive content-aware coded hyperspectral compressive imaging
Zhu et al. Recurrent spike-based image restoration under general illumination
Liang et al. Action recognition based on discrete cosine transform by optical pixel-wise encoding
Dong et al. Retrieving object motions from coded shutter snapshot in dark environment
Angus et al. Real-time video anonymization in smart city intersections
Xue et al. Crowd Scene Analysis by Output Encoding
Stork Optical elements as computational devices for low-power sensing and imaging

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22736585

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22736585

Country of ref document: EP

Kind code of ref document: A1