WO2020078229A1 - 目标对象的识别方法和装置、存储介质、电子装置 - Google Patents

目标对象的识别方法和装置、存储介质、电子装置 Download PDF

Info

Publication number
WO2020078229A1
WO2020078229A1 PCT/CN2019/110058 CN2019110058W WO2020078229A1 WO 2020078229 A1 WO2020078229 A1 WO 2020078229A1 CN 2019110058 W CN2019110058 W CN 2019110058W WO 2020078229 A1 WO2020078229 A1 WO 2020078229A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
images
original model
model
pixels
Prior art date
Application number
PCT/CN2019/110058
Other languages
English (en)
French (fr)
Inventor
陈炳文
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP19873881.7A priority Critical patent/EP3869459B1/en
Publication of WO2020078229A1 publication Critical patent/WO2020078229A1/zh
Priority to US17/074,502 priority patent/US11443498B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10141Special mode during image acquisition
    • G06T2207/10152Varying illumination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the field of image processing, and in particular, to recognition of target objects.
  • Moving target detection refers to detecting the change area in the sequence image and extracting the moving target from the background image.
  • post-processing processes such as target classification, tracking and behavior understanding only consider the pixel area in the image corresponding to the moving target Therefore, the correct detection and segmentation of moving objects is very important for post-processing.
  • Embodiments of the present application provide a method and device for identifying a target object, a storage medium, and an electronic device, so as to at least solve the technical problem of low recognition accuracy of the target object in the related art.
  • a method for identifying a target object including: acquiring a first image and a second image, the first image is an image obtained by shooting a target scene under visible light, and the second image is The image obtained by shooting the target scene under infrared; the predicted infrared intensity value corresponding to the pixel in the first image is determined by the prediction model; the actual infrared intensity value of the pixel in the second image is obtained at the same position in the first image The difference between the predicted infrared intensity values corresponding to the pixels; determining that the difference in the second image is greater than the first threshold is the pixel where the target object is located in the target scene.
  • a target object recognition apparatus including: a first acquisition unit, configured to acquire a first image and a second image, the first image is to capture a target scene under visible light The obtained image, the second image is an image obtained by shooting the target scene under infrared rays; the prediction unit is used to determine the predicted infrared intensity value corresponding to the pixels in the first image through the prediction model; the second acquisition unit is used to acquire The difference between the actual infrared intensity value of the pixel in the second image and the predicted infrared intensity value corresponding to the pixel at the same position in the first image; the identification unit is used to determine that the difference in the second image is greater than the first A threshold pixel is the pixel where the target object is located in the target scene.
  • a storage medium is further provided, and the storage medium includes a stored program, and the above method is executed when the program runs.
  • an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the foregoing method through the computer program.
  • a computer program product including instructions, which when executed on a computer, causes the computer to execute the method described above.
  • adaptive function reconstruction which can effectively establish a nonlinear prediction model for representing the background of the target scene, can effectively merge infrared light and visible light information, suppress shadow interference and infrared halo effect, and effectively suppress Background Clutter highlights the target, which can solve the technical problem of low recognition accuracy of the target object in the related art, thereby achieving the technical effect of being able to accurately identify the target object in the presence of interference.
  • FIG. 1 is a schematic diagram of a hardware environment of a target object recognition method according to an embodiment of the present application
  • FIG. 2 is a flowchart of an optional target object recognition method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an optional scene visible light image according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an optional scene infrared image according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an optional scene target object according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an optional scene target object according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of an optional prediction model according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an optional prediction result according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of an optional target object recognition device according to an embodiment of the present application.
  • FIG. 10 is a structural block diagram of a terminal according to an embodiment of the present application.
  • a method embodiment of a target object recognition method is provided.
  • the above target object recognition method may be applied to a processing device, and the processing device may include a terminal and / or a server.
  • the processing device may include the server 103.
  • the server 101 is connected to the terminal 103 through a network.
  • the network includes but is not limited to a wide area network, a metropolitan area network, or a local area network (such as a property internal network, a company internal network, etc.), and the terminal 103 is capable of capturing visible light images And infrared light image capture terminals, including but not limited to visible light surveillance cameras, infrared light surveillance cameras, mobile phones with cameras, tablet computers with cameras, etc .; servers are devices used for surveillance video storage and / or surveillance video analysis .
  • the above hardware environment can be the hardware environment of security monitoring, automatic monitoring and remote monitoring systems in the fields of banks, museums, transportation roads, commercial institutions, military institutions, public security bureaus, power departments, factories and mines departments, intelligent communities, space exploration agencies,
  • the terminal may be a high-definition camera and an infrared camera located in the same position in these systems
  • the server may be a server located in the central control room of the system, so as to realize intelligent target detection and target tracking using a computer.
  • the above hardware environment may also be a hardware environment in an artificial intelligence system
  • the terminal may be a visible light sensor or an infrared sensor of an intelligent device such as an aircraft in the system
  • the server may be an Internet server that communicates with the aircraft. Using the method of the present application, objects that appear in the visible area can be automatically located.
  • FIG. 2 is a method according to an embodiment of the present application.
  • a flowchart of an optional target object recognition method, as shown in FIG. 2, the method may include the following steps:
  • Step S202 the server obtains a first image and a second image, the first image is an image obtained by shooting the target scene under visible light, and the second image is an image obtained by shooting the target scene under infrared light.
  • the above-mentioned first image and second image may be one frame in a sequence of continuous video frames captured, or may be a single image taken separately, and the first image and the second image are images with the same framing (ie, target scene) , And the first image and the second image are images with close shooting time (that is, less than a preset value, such as 0.02 seconds), such as the same frame in the visible light video frame sequence and the infrared video frame sequence captured at the same time (ie (Video frames with the same frame position).
  • a preset value such as 0.02 seconds
  • the above target scene is the area to be identified by the target object, which may be a scene of the area monitored by the terminal in the monitoring system, or an area currently recognized by the aircraft in the artificial intelligence system.
  • the visible light sensor can detect the red, green, and blue spectral energy and convert it into a color image. It has rich color, texture, and structure information, and is in line with the human visual perception system. It is easy to understand and analyze. Infrared radiation in the background transforms invisible radiation into an image that can be observed by the human eye. It has good environmental adaptability and high sensitivity, and is suitable for the detection and identification of weak target signals. However, the infrared radiation of the infrared sensor device itself is extremely weak and belongs to no Source detection device, good concealment. Therefore, the combination of visible light image and infrared light image can effectively enrich the target and scene information and improve the detection rate.
  • Step S204 the server determines the predicted infrared intensity value corresponding to the pixel in the first image through the prediction model.
  • the prediction model is a model obtained by using a set of third images captured under visible light as model input and a set of fourth images captured under infrared light as model output.
  • the third set of images and the fourth set of images are images of the same scene.
  • the purpose of the training is to enable the model to convert the visible light image into an infrared image in the same scene.
  • the prediction model can convert the first image into an infrared image.
  • the pixels of each pixel in the infrared image The predicted infrared intensity value is determined using the color value of the pixel at the same position in the first image.
  • the above prediction model includes multiple prediction functions with corresponding weights assigned (the prediction function can be a basis function).
  • the input of the prediction function is the input of the prediction model, and the output of the prediction model is the output of all prediction functions and the corresponding The cumulative sum of the products between weights.
  • Step S206 the server determines the difference between the infrared intensity values of the pixels at the same position in the first image and the second image according to the actual infrared intensity value and the predicted infrared intensity value of the second image.
  • step S208 the server determines that pixels whose difference value in the second image is greater than the first threshold are pixels where the target object is located in the target scene.
  • the above embodiments are described by taking the method for identifying the target object in the embodiment of the present application as an example performed by the server 101.
  • the method for identifying the target object in the embodiment of the present application may also be performed by the terminal 103.
  • the method for identifying the target object in the embodiment of the present application The identification method may also be performed jointly by the server 101 and the terminal 103, the terminal 103 performs step S202, and the server performs the remaining steps.
  • the method for identifying the target object performed by the terminal 103 in the embodiment of the present application may also be performed by a client installed thereon.
  • the adaptive function reconstruction is adopted, which can effectively establish a non-linear prediction model for representing the background of the target scene, can effectively merge infrared light and visible light information, and suppress shadow interference and infrared halo effect, which can effectively Suppressing background clutter to highlight the target can solve the technical problem of low recognition accuracy of the target object in the related art, and thus achieve the technical effect of accurately identifying the target object in the presence of interference.
  • the cell in an intelligent cell, can be divided into several sub-regions, and the monitoring terminal of each sub-region can monitor the situation of the sub-region in real time.
  • the infrared video and visible light video collected by the terminal It is transmitted to the server in the cell's central control room in real time in order to automatically monitor the situation of the cell.
  • the server After receiving the infrared video and the visible light video, the server can obtain the target scene under visible light (that is, the scene in the terminal monitoring area) by analyzing the visible light video.
  • the first image is the first image of multiple scenes collected as shown in Figure 3, and the second image of the target scene under infrared is obtained by analyzing the infrared video, as shown in Figure 4
  • the second image corresponding to the first image in Figure 3 one by one, the first image position in the visible light video is the same as the second image position in the infrared video (it can be considered as the image of the same frame position, as shown in Figure 3
  • the first image in the upper left corner of the middle and the second image in the upper left corner of FIG. 4 which is equivalent to acquiring the target scene under visible light Photo obtained by the first image and the second image captured at a time resulting in the target scene infrared.
  • the server determines the predicted infrared intensity value corresponding to the pixel in the first image through the prediction model.
  • the prediction model uses a set of third images captured under visible light as the model input and uses the infrared
  • a set of fourth images obtained by the next shooting is used as a model for training the model output, and a set of third images and a set of fourth images are images of the same scene.
  • the above prediction model may be pre-trained, or may be trained when step S204 is executed.
  • An optional training method is shown in steps 11 to 14:
  • Step 11 Before determining the predicted infrared intensity value corresponding to the pixel in the first image through the prediction model, acquire a set of third images and a set of fourth images obtained by shooting the target scene.
  • the image used in training is an image that should at least include the target scene.
  • it can be an image that includes only the target scene, or an image that includes the target scene and other adjacent scenes.
  • the number of images in the above set of third images is the same as the number of images in the set of fourth images, and there is a one-to-one correspondence between the images in the set of third images and the images in the set of fourth images.
  • each Each third image has a fourth image with the same framing content.
  • Step 12 frame-by-frame use of images in a set of third images as the input of the original model and use of images in a set of fourth images of the same frame (or the same framing content) as the output of the original model training.
  • using an image from a set of third images as input to the original model frame by frame and using an image from the same frame in a set of fourth images as the output of the original model to train the original model includes steps S121-S122 :
  • step S121 the color values of the pixels in the third image are input to the original model, and the intensity values of the pixels in the fourth image of the same frame are output as the original model.
  • the color values of the pixels in the third image are used As input to multiple prediction functions in the original model, the output of the original model is the cumulative sum of the products between each prediction function and the corresponding weights in the multiple prediction functions.
  • step S121 in the case where the color type of the pixels in the third image is not a color type based on physiological characteristics (such as Lab color value type), the color of the pixels in the third image The type is converted to a color type based on physiological characteristics; then the color value of the first color channel (such as a channel) and the color of the second color channel (such as b channel) of the pixels in the third image after color type conversion The value is entered into the prediction model.
  • physiological characteristics such as Lab color value type
  • the above prediction function may be a basis function.
  • the basis function is a group of elements of a special basis in the function space. Continuous functions in the function space (such as the functions used to represent the model) can be expressed as a linear combination of a series of basis functions, just like each vector in the vector space can be expressed as a linear combination of basis vectors.
  • Basis function can be used Represents, where a i represents the weight of the i-th basis function f i (x); Represents the jth parameter of the basis function, r ij is preset, such as r i1 is 0.1, r i2 is 1, r i3 is 2, and so on, d is an integer used to represent the upper limit of the value of j, that is, the model Enter the number of features.
  • the function representing the target model can be used Represents, k represents the number of basis functions; Represents the predicted value; f i (x) represents the basis function, that is, the product of the power of the input features.
  • step S122 the color values of the pixels in the third image and the intensity values of the pixels in the fourth image of the same frame are used to initialize the weights corresponding to the prediction function and the parameters inside the prediction function to complete the training of the original model.
  • This is equivalent to taking the color value of the pixel in the third image as the value of x in f i (x), and taking the intensity value of the pixel in the fourth image as The value of, so as to solve the parameters to be determined in the function through the internal activation function.
  • Step 13 when the number of trainings reaches a certain amount, use the test image captured under visible light as the input of the original model after training, and determine whether the predicted image output by the original model matches the verification image captured under infrared To confirm whether the parameters during training are fitted.
  • Step 14 when the test image taken under visible light is used as the input of the original model after training, and the predicted image output by the original model matches the verification image taken under infrared, that is, the similarity between the two reaches In the case of a certain threshold (such as 99%), the original model after training is used as the prediction model, and the test image and the verification image are images of the same framing area in the target scene.
  • a certain threshold such as 99%
  • the similarity When solving the similarity, it can be achieved by comparing the intensity value q1 of each pixel in the predicted image with the intensity value q2 of the pixel at the same position in the verification image, if the intensity value of the pixel at the same position
  • the similarity can be used as the ratio between the number n of the same pixel in the predicted image and the number m of the pixel in the verified image. Said.
  • Step 15 When the test image is used as the input of the original model after training and the predicted image output by the original model does not match the verification image, continue to use the images in the third set of images as the input of the original model and use a The image of the same frame in the fourth group of images is used as the output of the original model to train the original model until the predicted image output by the trained original model matches the verification image.
  • the set of third images used in the above training process is a background image illuminated by visible light
  • the set of fourth images is a background image illuminated by infrared light in the same viewfinder area.
  • the model It is equivalent to being able to build a background model. If the input data of the model is background pixels, the predicted output value of the model is very close to the infrared value of the background pixel, and the pixel of the target object is input, which is different from the infrared value of the background pixel. If the value is large, the model can be used for object recognition.
  • determining the predicted infrared intensity value corresponding to the pixel in the first image through the prediction model may include the following steps:
  • step S21 the color values of the pixels in the first image are input to the prediction model.
  • step S21 when the color values of the pixels in the first image are input to the prediction model, it can be determined whether the color types of the pixels in the first image are based on physiological characteristics, and if so Direct input, if not, that is, if the color type of the pixels in the first image is not a color type based on physiological characteristics, convert the color type of the pixels in the first image to a color type based on physiological characteristics, Then, the color values of the first color channel and the second color channel of the pixels in the first image after the color type conversion are input to the prediction model.
  • step S22 various types of prediction functions in the prediction model are called, and the predicted infrared intensity values corresponding to the pixels at the same position in the first image are determined according to the color values of the pixels in the first image.
  • the server obtains the difference between the actual infrared intensity value of the pixel in the second image and the predicted infrared intensity value corresponding to the pixel at the same position in the first image. If the data input by the model is a background pixel, the model predicted output value is very close to the infrared value of the background pixel. In other words, the actual infrared intensity value of the pixel in the second image is the same as that in the first image.
  • the difference between the predicted infrared intensity values corresponding to the pixels is small, less than the first threshold, and the difference between the predicted infrared intensity value of the target object ’s pixels and the background pixel ’s infrared value is large, greater than The first threshold, so whether the pixel is the pixel on the target object can be judged by the difference.
  • the server determines that pixels whose difference value in the second image is greater than the first threshold are pixels where the target object is located in the target scene.
  • determining that the pixel with the difference in the second image greater than the first threshold is the pixel where the target object is located in the target scene includes: traversing each pixel in the second image and comparing the difference in the second image
  • the intensity value of pixels greater than the first threshold is set as the second threshold (such as the intensity value corresponding to white), and the intensity value of pixels whose difference in the second image is not greater than the first threshold is set as the third threshold ( Such as the intensity value corresponding to black), the second threshold and the third threshold are different thresholds; after traversing all the pixels in the second image, the target is described by the pixels in the second image with the intensity value of the second threshold Objects, as shown in Figure 5, each image in Figure 5 corresponds to one of the four images in Figure 3, respectively.
  • FIG. 6 The effect of the detection in the solution in the related art is shown in FIG. 6, taking the target object marked by the white box in the upper left corner of FIG. 5 and FIG. 6 as an example, the technical solution of this application can be eliminated Shadow interference, halo effect and infrared interference in infrared images make the outline of the target object clearer.
  • the technical solution of this application can effectively integrate infrared and visible light information, suppress shadow interference and infrared halo effect, and effectively suppress background clutter highlights aims.
  • the ROI of the current frame when detecting a moving target in the air of a visible light image, can be obtained by setting an ROI frame (full name region of interest) in the gray image of the Nth frame Gray image, after image preprocessing, image binarization processing, image binary inversion processing and image dilation processing of the current frame ROI gray image, the method of filtering the connected area to obtain the target image by using the screening method can solve the traditional
  • the target detection method is not suitable for the target detection under the moving background and the problem of the target losing the target through the moving background, ensuring the real-time and accuracy of the moving target detection under the moving background.
  • Infrared remote sensing detects and locates the target by receiving the thermal energy of the target, reflects the radiation characteristics of the scene, and has strong anti-interference ability and recognition ability. However, when the contrast is low, it is likely to miss some targets with small thermal radiation and misdetection. Part of the brighter background area; while the visible light image represents the reflection characteristics of the scene, the image contrast is better, with a rich gray level distribution, but its dependence on illumination is strong, and the working time is limited.
  • the related-art mid-infrared and visible light collaborative target detection methods can be roughly divided into two categories: collaborative target detection based on fusion-first detection and collaborative target detection based on detection-first fusion.
  • the infrared and visible light collaborative target detection method based on fusion first and then detection first integrates infrared and visible light images into a fusion image to highlight the target and enhance the contrast of the scene, and then develops a detection scheme to detect the target according to the fusion situation.
  • the focus of such methods is the formulation of efficient fusion models, such as the use of non-parametric models, codebook models, and directional gradient histogram models to distinguish various types of targets based on probability fusion theory.
  • the infrared and visible light collaborative target detection method based on detection first and fusion will move the fusion part to single source target detection to complete.
  • the target detection task of a single data source will be performed according to the data source type, and then the fusion strategy (mostly the probability fusion strategy) Or threshold fusion strategy) to adjust the respective detection results to get the target.
  • the fusion strategy mostly the probability fusion strategy
  • threshold fusion strategy to adjust the respective detection results to get the target.
  • the main idea of the above embodiment is to use morphological operation and adaptive threshold algorithm for target detection, which is not suitable for complex outdoor scenes (such as wind and trees swaying background, building interference, etc.), which is likely to cause false alarms and low detection rates.
  • complex outdoor scenes such as wind and trees swaying background, building interference, etc.
  • the present application also provides an embodiment, that is, an infrared and visible light collaborative target detection method based on adaptive basis function reconstruction, which acquires several frame images BG v (i.e. a set of Third image), several frame images BG t of infrared video (ie a group of fourth images) to establish a collaborative background model M (ie a prediction model); for the background model M obtained in the previous step, combine the current frame F t ( Including the first image under visible light and the second image under infrared light), the background clutter suppression is performed to obtain the background suppression map G t after the background clutter suppression (the predicted intensity value of each pixel in G t is used The pixels of the first image are predicted) using an adaptive threshold segmentation algorithm to detect the target.
  • the use of adaptive basis function reconstruction technology can effectively establish a non-linear background model, can effectively integrate infrared and visible light information, suppress shadow interference and infrared halo effect, and effectively suppress background clutter to highlight
  • the infrared and visible light collaborative target detection method based on adaptive basis function reconstruction involved in this application can actually be deployed in service calls or SDK (English is called Software Development Kit) using API (English is called Application Programming Interface). , That is, the software development kit) is called on the server called in a nested manner, the server can be implemented in conjunction with the actual landing scenario, and the algorithm can be run in the system environment of the server such as linux or window.
  • SDK Software Development Kit
  • API English is called Application Programming Interface
  • Step 1 Obtain several frames of image BG v of visible light video, as shown in FIG. 3, and several frames of image BG t of infrared video, as shown in FIG. 4, to establish a collaborative background model M, which is based on adaptive The basis function reconstruction technique is used for acquisition.
  • FIG. 7 shows a schematic diagram of an optional background model
  • Step 11 convert the RGB values of visible light into Lab color values, use the a and b color values as the model input features, that is, the input as shown in Figure 7, and use the infrared intensity values as the model output, as shown in Figure 7 ,
  • the output consists of T training samples X;
  • Step 12 for the training sample X obtained in step 1.1, an adaptive basis function reconstruction technique is used to establish a background model M (i, j),
  • a i represents the weight of the basis function
  • k represents the number of basis functions
  • d represents the number of model input features
  • f i (x) represents the basis function, that is, the product of the power of the input features.
  • Step 2 For the background model M obtained in step 1, combined with the current frame F t , the background clutter suppression is performed to obtain the background suppression graph G t after the background clutter suppression, that is, the visible light image in the current frame (i.e. An image) an image composed of the difference between the predicted infrared intensity value and the real infrared intensity value obtained by predicting the pixel points in the image, and an adaptive threshold segmentation algorithm is used to detect the target.
  • the background clutter suppression that is, the visible light image in the current frame (i.e. An image) an image composed of the difference between the predicted infrared intensity value and the real infrared intensity value obtained by predicting the pixel points in the image, and an adaptive threshold segmentation algorithm is used to detect the target.
  • Step 21 Obtain the background suppression graph G t for the background model obtained in step 1:
  • an adaptive threshold algorithm such as the OTSU threshold algorithm (ie, Otsu method or maximum inter-class variance method) is used to calculate the threshold ⁇ to extract the target.
  • the OTSU threshold algorithm ie, Otsu method or maximum inter-class variance method
  • a nonlinear local background model can be effectively established, infrared and visible light information can be effectively fused, shadow interference and infrared halo effects can be suppressed, and background clutter can be effectively suppressed to highlight the target.
  • this application uses six actually collected complex outdoor scene videos for verification experiments, see Figure 3 and Figure 4, and compare and verify with other algorithms, as shown in Figure 8 and Table 1 , And it is verified that the technical scheme of this application can effectively detect targets in different complex scenarios; compared with the codebook method and the weighted single Gaussian method, the F1 (a kind of evaluation index) comprehensive index of this scheme is as high as 90.9% (That is, the gentler curve at the top) can better suppress the interference of visible light shadows and infrared halo effects, with better detection stability and stronger scene adaptability.
  • Table 1 The comparison of the average detection index of the three detection methods is shown in Table 1:
  • this application uses some other methods for comparative verification. And compared two algorithms: codebook method (CB), weighted single Gaussian method (SWG).
  • CB codebook method
  • SWG weighted single Gaussian method
  • the evaluation results of the three indicators of the three detection methods are shown in FIG. 8 and Table 1. From the chart, the overall detection performance of various detection algorithms can be compared: the weighted single Gaussian algorithm has a higher detection rate, but the accuracy rate is lower, and the detection rate and accuracy rate of the codebook algorithm are average; High detection rate and accuracy, and good detection stability, the F1 index of this method is as high as 90.9%.
  • the method according to the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
  • the technical solutions of the present application can essentially be embodied in the form of software products that contribute to the existing technology, and the computer software products are stored in a storage medium (such as ROM / RAM, magnetic disk,
  • the CD-ROM includes several instructions to enable a terminal device (which may be a mobile phone, computer, server, or network device, etc.) to execute the methods described in the embodiments of the present application.
  • FIG. 9 is a schematic diagram of an optional target object recognition device according to an embodiment of the present application. As shown in FIG. 9, the device may include: a first acquisition unit 901, a prediction unit 903, a second acquisition unit 905, and recognition Unit 907.
  • the first acquiring unit 901 is configured to acquire a first image and a second image, where the first image is an image captured by the target scene under visible light, and the second image is an image captured by the target scene under infrared light.
  • the above-mentioned first image and second image may be one frame in a sequence of continuous video frames captured, or may be a single image taken separately, and the first image and the second image are images with the same framing (ie, target scene) , And the first image and the second image are images with close shooting time (that is, less than a preset value, such as 0.02 seconds), such as the same frame in the visible light video frame sequence and the infrared video frame sequence captured at the same time (ie (Video frames with the same frame position).
  • a preset value such as 0.02 seconds
  • the above target scene is the area to be identified by the target object, which may be a scene of the area monitored by the terminal in the monitoring system, or an area currently recognized by the aircraft in the artificial intelligence system.
  • the visible light sensor can detect the red, green, and blue spectral energy and convert it into a color image. It has rich color, texture, and structure information, and is in line with the human visual perception system. It is easy to understand and analyze. Infrared radiation in the background transforms invisible radiation into an image that can be observed by the human eye. It has good environmental adaptability and high sensitivity, and is suitable for the detection and identification of weak target signals. However, the infrared radiation of the infrared sensor device itself is extremely weak and belongs to no Source detection device, good concealment. Therefore, the combination of visible light image and infrared light image can effectively enrich the target and scene information and improve the detection rate.
  • the prediction unit 903 is used to determine the predicted infrared intensity value corresponding to the pixels in the first image through a prediction model, where the prediction model is a set of third images captured under visible light as a model input and is captured under infrared
  • the obtained set of fourth images is used as the model output for training, and the set of third images and the set of fourth images are images of the same scene.
  • the purpose of the training is to enable the model to convert the visible light image into an infrared image in the same scene.
  • the prediction model can convert the first image into an infrared image.
  • the pixels of each pixel in the infrared image The predicted infrared intensity value is determined using the color value of the pixel at the same position in the first image.
  • the above prediction model includes multiple prediction functions with corresponding weights assigned (the prediction function can be a basis function).
  • the input of the prediction function is the input of the prediction model, and the output of the prediction model is the output of all prediction functions and the corresponding The cumulative sum of the products between weights.
  • the second obtaining unit 905 is configured to obtain the difference between the actual infrared intensity value of the pixel in the second image and the predicted infrared intensity value corresponding to the pixel at the same position in the first image.
  • the recognition unit 907 is configured to determine that the pixel point in the second image whose difference value is greater than the first threshold is the pixel point where the target object is located in the target scene.
  • first obtaining unit 901 in this embodiment may be used to perform step S202 in the embodiment of the present application
  • prediction unit 903 in this embodiment may be used to perform step S204 in the embodiment of the present application
  • the second obtaining unit 905 in the embodiment may be used to perform step S206 in the embodiment of the present application
  • the identification unit 907 in the embodiment may be used to perform step S208 in the embodiment of the present application.
  • the use of adaptive function reconstruction can effectively establish a nonlinear prediction model for representing the background of the target scene, can effectively merge infrared and visible light information, suppress shadow interference and infrared halo effects, and effectively suppress background clutter.
  • Highlighting the target can solve the technical problem of low recognition accuracy of the target object in the related art, and thus achieve the technical effect of being able to accurately identify the target object in the presence of interference.
  • the prediction unit may include: an input module for inputting color values of pixels in the first image to the prediction model; a prediction module for calling various types of prediction functions in the prediction model, according to the first The color value of the pixel in the image determines the predicted infrared intensity value corresponding to the pixel at the same position in the first image.
  • the apparatus of the present application may further include: a third acquisition unit, configured to acquire a set of first shots obtained by shooting the target scene before determining the predicted infrared intensity value corresponding to the pixel in the first image through the prediction model Three images and a set of fourth images; a training unit for frame-by-frame use of images from a set of third images as the input to the original model and a set of images from the same frame in the fourth image as the output of the original model The original model is trained; the first verification unit is used when the test image taken under visible light is used as the input of the original model after training, and the predicted image output by the original model matches the verification image taken under infrared In the case, the original model after training is used as the prediction model, where the test image and the verification image are images of the target scene; the second verification unit is used when the test image is used as the input of the original model after training, and the original model is output If the predicted image and the verified image do not match, continue to use the images in a set of third images as Input the
  • the training unit uses images from a set of third images as input to the original model frame by frame and uses images from the same frame in a set of fourth images as the output of the original model to train the original model by: Implementation: input the color values of the pixels in the third image to the original model, and output the intensity values of the pixels in the fourth image of the same frame as the original model, wherein the color values of the pixels in the third image Used as the input of multiple prediction functions in the original model, the output of the original model is the cumulative sum of the product of each prediction function and the corresponding weight in the multiple prediction functions; the color of the pixels in the third image is used Values and intensity values of pixels in the fourth image of the same frame to initialize the weights corresponding to the prediction function and the parameters inside the prediction function to complete the training of the original model.
  • the identification unit determines that the pixel with the difference in the second image greater than the first threshold is the pixel where the target object is located in the target scene, it can be implemented by traversing each pixel in the second image , Set the intensity value of the pixels with the difference in the second image greater than the first threshold to the second threshold, and set the intensity value of the pixels with the difference in the second image not greater than the first threshold to the third threshold , Where the second threshold and the third threshold are different thresholds; after traversing all the pixels in the second image, the target object is described by the pixels in the second image whose intensity value is the second threshold.
  • the first obtaining unit obtains the first image and the second image
  • the first image obtained by shooting the target scene under visible light and the second image obtained by shooting the target scene under infrared light at the same time may be obtained.
  • Adaptive threshold segmentation algorithm to detect targets.
  • the use of adaptive basis function reconstruction technology can effectively establish a non-linear background model, can effectively integrate infrared and visible light information, suppress shadow interference and infrared halo effect, and effectively suppress background clutter to highlight the target.
  • the above-mentioned module can run in the hardware environment shown in FIG. 1, and can be implemented by software or hardware, where the hardware environment includes a network environment.
  • a server or terminal for implementing the above target object recognition method.
  • the terminal may include: one or more (only one shown in the figure) processor 1001, memory 1003, and transmission device 1005 As shown in FIG. 10, the terminal may further include an input and output device 1007.
  • the memory 1003 can be used to store software programs and modules, such as program instructions / modules corresponding to the target object recognition method and device in the embodiments of the present application, and the processor 1001 runs the software programs and modules stored in the memory 1003, thereby Perform various functional applications and data processing, that is, to realize the above-mentioned target object recognition method.
  • the memory 1003 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory.
  • the memory 1003 may further include memories remotely provided with respect to the processor 1001, and these remote memories may be connected to the terminal through a network. Examples of the above network include but are not limited to the Internet, intranet, local area network, mobile communication network, and combinations thereof.
  • the above-mentioned transmission device 1005 is used for receiving or sending data via a network, and can also be used for data transmission between the processor and the memory.
  • Specific examples of the aforementioned network may include a wired network and a wireless network.
  • the transmission device 1005 includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers through a network cable to communicate with the Internet or a local area network.
  • the transmission device 1005 is a radio frequency (Radio Frequency) module, which is used to communicate with the Internet in a wireless manner.
  • Radio Frequency Radio Frequency
  • the memory 1003 is used to store an application program.
  • the processor 1001 may call the application program stored in the memory 1003 through the transmission device 1005 to perform the following steps:
  • first image is an image captured by the target scene under visible light
  • second image is an image captured by the target scene under infrared light
  • the predicted infrared intensity value corresponding to the pixels in the first image is determined by the prediction model, where, optionally, the prediction model uses a set of third images captured under visible light as the model input and captured using infrared A set of fourth images is used as the model output for training, and a set of third images and a set of fourth images are images of the same scene;
  • the pixel point where the difference in the second image is greater than the first threshold is the pixel point where the target object is located in the target scene.
  • the processor 1001 is also used to perform the following steps:
  • test image captured under visible light is used as the input of the original model after training and the predicted image output from the original model matches the verification image captured under infrared
  • the original model after training is used as the prediction model ,
  • test image and the verification image are images of the target scene;
  • test image is used as the input of the original model after training, and the predicted image output by the original model does not match the verification image, continue to use the images in the third set of images as the input of the original model and use a set of fourth
  • the image of the same frame in the image is used as the output of the original model to train the original model until the predicted image output by the trained original model matches the verification image.
  • the first image is an image obtained by shooting the target scene under visible light
  • the second image is an image obtained by shooting the target scene under infrared light
  • the prediction model uses a set of third images captured under visible light as model input and a set of fourth images captured under infrared light as model output.
  • the trained model, a set of third images and a set of fourth images are images of the same scene; obtain the actual infrared intensity value of the pixels in the second image corresponding to the predicted infrared of the pixels in the same position in the first image The difference between the intensity values; determine the pixels where the difference in the second image is greater than the first threshold as the pixel where the target object is located in the target scene ", using an adaptive function reconstruction, which can effectively establish a nonlinear
  • the prediction model that represents the background of the target scene can effectively integrate infrared light and visible light information to suppress shadow interference and
  • the infrared halo effect can effectively suppress the background clutter to highlight the target, and can solve the technical problem of low recognition accuracy of the target object in the related art, thereby achieving the technical effect of accurately identifying the target object in the presence of interference.
  • the structure shown in FIG. 10 is merely an illustration, and the terminal may be a smartphone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (Mobile Internet Devices, MID) Terminal equipment such as PAD.
  • FIG. 10 does not limit the structure of the above electronic device.
  • the terminal may further include more or fewer components than those shown in FIG. 10 (such as a network interface, a display device, etc.), or have a configuration different from that shown in FIG. 10.
  • the program may be stored in a computer-readable storage medium, which Including: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • the embodiments of the present application also provide a storage medium.
  • the above storage medium may be used to execute the program code of the identification method of the target object.
  • the above storage medium may be located on at least one network device among multiple network devices in the network shown in the above embodiment.
  • the storage medium is set to store program code for performing the following steps:
  • first image is an image captured by the target scene under visible light
  • second image is an image captured by the target scene under infrared light
  • the predicted infrared intensity value corresponding to the pixels in the first image is determined by the prediction model, where, optionally, the prediction model uses a set of third images captured under visible light as the model input and captured using infrared A set of fourth images is used as the model output for training, and a set of third images and a set of fourth images are images of the same scene;
  • the pixel point where the difference in the second image is greater than the first threshold is the pixel point where the target object is located in the target scene.
  • the storage medium is also configured to store program code for performing the following steps:
  • test image captured under visible light is used as the input of the original model after training and the predicted image output from the original model matches the verification image captured under infrared
  • the original model after training is used as the prediction model ,
  • test image and the verification image are images of the target scene;
  • test image is used as the input of the original model after training, and the predicted image output by the original model does not match the verification image, continue to use the images in the third set of images as the input of the original model and use a set of fourth
  • the image of the same frame in the image is used as the output of the original model to train the original model until the predicted image output by the trained original model matches the verification image.
  • An embodiment of the present application also provides a computer program product including instructions, which when run on a server, causes the server to execute the method provided in the foregoing embodiment.
  • the above storage medium may include, but is not limited to: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic Various media such as discs or optical discs that can store program codes.
  • the integrated unit in the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, Several instructions are included to enable one or more computer devices (which may be personal computers, servers, network devices, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the disclosed client may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may Integration into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种目标对象的识别方法和装置、存储介质、电子装置。其中,该方法包括:获取第一图像和第二图像,第一图像为在可见光下对目标场景拍摄得到的,第二图像为在红外线下对目标场景拍摄得到的;通过预测模型确定第一图像中像素点对应的预测红外强度值;获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值;确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点。本申请解决了相关技术中对目标对象的识别准确率较低的技术问题。

Description

目标对象的识别方法和装置、存储介质、电子装置
本申请要求于2018年10月15日提交中国专利局、申请号为201811197547.X、申请名称为“目标对象的识别方法和装置、存储介质、电子装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理领域,具体而言,涉及目标对象的识别。
背景技术
运动目标检测是指在序列图像中检测出变化区域并将运动目标从背景图像中提取出来,通常情况下,目标分类、跟踪和行为理解等后处理过程仅仅考虑图像中对应于运动目标的像素区域,因此运动目标的正确检测与分割对于后期处理非常重要。然而,由于场景的动态变化,如天气、光照、阴影及杂乱背景干扰等的影响,使得运动目标的检测与分割变得相当困难。
发明内容
本申请实施例提供了一种目标对象的识别方法和装置、存储介质、电子装置,以至少解决相关技术中对目标对象的识别准确率较低的技术问题。
根据本申请实施例的一个方面,提供了一种目标对象的识别方法,包括:获取第一图像和第二图像,第一图像为在可见光下对目标场景拍摄得到的图像,第二图像为在红外线下对目标场景拍摄得到的图像;通过预测模型确定第一图像中的像素点对应的预测红外强度值;获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值;确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点。
根据本申请实施例的另一方面,还提供了一种目标对象的识别装置,包括:第一获取单元,用于获取第一图像和第二图像,第一图像为在可见光下对目标场景拍摄得到的图像,第二图像为在红外线下对目标场景拍摄得到的图像;预测单元,用于通过预测模型确定第一图像中的像素点对应的预测红外强度值;第二获取单元,用于获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值;识别单元,用于确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素 点。
根据本申请实施例的另一方面,还提供了一种存储介质,该存储介质包括存储的程序,程序运行时执行上述的方法。
根据本申请实施例的另一方面,还提供了一种电子装置,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器通过计算机程序执行上述的方法。
根据本申请实施例的另一方面,还提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权上述的方法。
在本申请实施例中,采用自适应函数重建,能有效建立非线性的用于表示目标场景的背景的预测模型,能有效融合红外光与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标,可以解决相关技术中对目标对象的识别准确率较低的技术问题,进而达到在存在干扰的情况下仍然能够准确识别目标对象的技术效果。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的目标对象的识别方法的硬件环境的示意图;
图2是根据本申请实施例的一种可选的目标对象的识别方法的流程图;
图3是根据本申请实施例的一种可选的场景可见光图像的示意图;
图4是根据本申请实施例的一种可选的场景红外图像的示意图;
图5是根据本申请实施例的一种可选的场景目标对象的示意图;
图6是根据本申请实施例的一种可选的场景目标对象的示意图;
图7是根据本申请实施例的一种可选的预测模型的示意图;
图8是根据本申请实施例的一种可选的预测结果的示意图;
图9是根据本申请实施例的一种可选的目标对象的识别装置的示意图;
以及
图10是根据本申请实施例的一种终端的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
随着科技的发展、社会的进步、生活水平的提高,团体和个人的安防意识都在不断增强,视频监控系统也就得到了越来越广泛的应用,另外,在人工智能领域,人、动物等的智能识别也越来越普通;在监控领域、人工智能等领域中,并不能在存在场景的动态变化,如天气、光照、阴影及杂乱背景干扰等的情况下准确检测出目标对象。
为了克服以上场景中存在的问题,根据本申请实施例的一方面,提供了一种目标对象的识别方法的方法实施例。
可选地,在本实施例中,上述目标对象的识别方法可以应用于处理设备中,处理设备可以包括终端和/或服务器。例如图1所示的由服务器101和用户终端103所构成的硬件环境中,处理设备包括服务器103。如图1所示,服务器101通过网络与终端103进行连接,上述网络包括但不限于:广域网、城域网或局域网(如物业内部网络、公司内部网络等),终端103为可进行拍摄可见光图像和红外光图像拍摄的终端,包括并不限定于可见光监控摄像机、红外光监控摄像机、具有摄像头的手机、具有摄像头的平板电脑等;服务器为用于进行监控视频存储和/或监控视频分析的设备。
上述的硬件环境可以是银行、博物馆、交通道路、商业机构、军事机构、 公安局、电力部门、厂矿部门、智能小区、空间探测机构等领域的安全监控、自动监控和远程监控系统的硬件环境,其中,终端可以为这些系统中的位于同一位置的高清摄像机和红外摄像机,服务器可以是位于系统中控室的服务器,以实现利用计算机实现智能的目标检测和目标跟踪。
上述的硬件环境还可以是人工智能系统中硬件环境,终端可以为系统中飞行器等智能设备的可见光传感器、红外传感器,服务器可以是与飞行器通讯连接的互联网服务器。采用本申请的方法可以自动定位出在可视区域出现的对象。
将本申请的方法应用于监控、人工智能等领域时,本申请实施例的目标对象的识别方法可以由监控、人工智能系统中的服务器101来执行,图2是根据本申请实施例的一种可选的目标对象的识别方法的流程图,如图2所示,该方法可以包括以下步骤:
步骤S202,服务器获取第一图像和第二图像,第一图像为在可见光下对目标场景拍摄得到的图像,第二图像为在红外线下对目标场景拍摄得到的图像。
上述的第一图像和第二图像可以为拍摄的连续视频帧序列中的一帧,也可为单独拍到的一张图像,第一图像和第二图像为取景(即目标场景)相同的图像,且第一图像和第二图像为拍摄时间接近(即小于预先设定的数值,如0.02秒)的图像,如为同时拍摄得到的可见光视频帧序列和红外视频帧序列中的相同帧(即帧位置相同的视频帧)。
上述的目标场景即待进行目标对象识别的区域,可以为监控系统中终端所监控区域的场景、人工智能系统中飞行器当前所能够识别的区域。
可见光传感器能探测红绿蓝光谱能量,将其转化为彩色图像,具有丰富的色彩、纹理和结构等信息,且符合人类视觉感知体系,便于理解分析;基于红外传感器的侦查系统能接收来自目标和背景的红外辐射,将不可见的辐射转变成人眼可观测的图像,环境适应性好,灵敏度高,适合于弱小目标信号的探测和鉴别,而红外传感装置自身的红外辐射极其微弱,属于无源探测装置,隐蔽性好。因此,采用可见光图像与红外光图像结合的方式能有效丰富目标和场景信息,提高检测率。
步骤S204,服务器通过预测模型确定第一图像中的像素点对应的预测红外 强度值。
在一种可能的实现方式中,预测模型是使用在可见光下拍摄得到的一组第三图像作为模型输入并使用在红外线下拍摄得到的一组第四图像作为模型输出进行训练得到的模型,一组第三图像和一组第四图像是相同场景的图像。
在进行模型训练时,训练的目的在于使得模型能够将可见光图像转换为相同场景下的红外图像,换言之,预测模型能够将第一图像转为一张红外图像,该红外图像中每个像素点的预测红外强度值是利用第一图像中相同位置的像素点的颜色值确定的。
上述的预测模型中包括多个分配有相应权重的预测函数(预测函数可以为基函数),预测函数的输入即为预测模型的输入,预测模型的输出即为对所有预测函数的输出与相应的权重之间的乘积的累积和。
步骤S206,服务器根据所述第二图像的实际红外线强度值和所述预测红外线强度值,确定所述第一图像和第二图像中相同位置上像素点的红外线强度值的差值。
步骤S208,服务器确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点。
上述实施例以本申请实施例的目标对象的识别方法由服务器101来执行为例进行说明,本申请实施例的目标对象的识别方法也可以由终端103来执行,本申请实施例的目标对象的识别方法还可以是由服务器101和终端103共同执行,由终端103执行步骤S202,服务器执行剩余步骤。其中,终端103执行本申请实施例的目标对象的识别方法也可以是由安装在其上的客户端来执行。
通过上述步骤S202至步骤S208,采用自适应函数重建,能有效建立非线性的用于表示目标场景的背景的预测模型,能有效融合红外光与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标,可以解决相关技术中对目标对象的识别准确率较低的技术问题,进而达到在存在干扰的情况下仍然能够准确识别目标对象的技术效果。
下面以智能小区为例进一步详述本申请的技术方案。
在步骤S202提供的技术方案中,在智能小区中,可以将小区分为若干个子区域,每个子区域的监控用的终端可以实时监控该子区域的情况,终端采集到 的红外视频和可见光视频会被实时传输给小区中控室的服务器,以便于自动监控小区的情况,服务器在接收到红外视频和可见光视频之后,可以通过对可见光视频的解析获取在可见光下目标场景(即终端监控区域的场景)的第一图像,如图3所示的采集到的采集到的多个场景的第一图像,并通过对红外视频解析获取在红外线下目标场景的第二图像,如图4所示的采集到的与图3中的第一图像一一对应的第二图像,第一图像在可见光视频中的位置与第二图像在红外视频中的位置相同(可以认为是相同帧位置的图像,如图3中左上角的第一图像和图4中的左上角的第二图像),即相当于获取的是在可见光下对目标场景拍摄得到的第一图像和同一时刻在红外线下对目标场景拍摄得到的第二图像。
在步骤S204提供的技术方案中,服务器通过预测模型确定第一图像中的像素点对应的预测红外强度值,预测模型是使用在可见光下拍摄得到的一组第三图像作为模型输入并使用在红外线下拍摄得到的一组第四图像作为模型输出进行训练得到的模型,一组第三图像和一组第四图像是相同场景的图像。
上述的预测模型可以是预先训练好的,也可以是在执行步骤S204的时候训练的,一种可选的训练方式如步骤11-步骤14所示:
步骤11,在通过预测模型确定第一图像中的像素点对应的预测红外强度值之前,获取对目标场景进行拍摄得到的一组第三图像和一组第四图像。
需要说明的是,训练时所使用的图像是至少应该包括目标场景的图像,换言之,可以为仅包括目标场景的图像,也可以是包括目标场景和其他相邻场景的图像。
上述的一组第三图像中图像的数量和一组第四图像中图像的数量张数相同,且一组第三图像中图像与一组第四图像中图像是一一对应关系,换言之,每张第三图像均存在一张与之取景内容相同的第四图像。
步骤12,逐帧地使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧(或称为取景内容相同)的图像作为原始模型的输出来对原始模型进行训练。
可选地,逐帧地使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练包括步骤S121-步骤S122:
步骤S121,将第三图像中的像素点的颜色值输入至原始模型,并将相同帧的第四图像中的像素点的强度值作为原始模型输出,第三图像中的像素点的颜色值用于作为原始模型中多个预测函数的输入,原始模型的输出为对多个预测函数中每个预测函数与对应的权重之间的乘积的累积和。
在步骤S121所示的实施例中,在第三图像中的像素点的颜色类型不为基于生理特征的颜色类型(如Lab颜色值类型)的情况下,将第三图像中的像素点的颜色类型转换为基于生理特征的颜色类型;然后将进行颜色类型转换后的第三图像中的像素点的第一颜色通道(如a通道)的颜色值和第二颜色通道(如b通道)的颜色值输入至预测模型。
上述的预测函数可为基函数,在数学中,基函数是函数空间一组特殊的基的元素。对于函数空间中的连续函数(如用于表示模型的函数)都可以表示成一系列基函数的线性组合,就像是在向量空间中每个向量都可以表示成基向量的线性组合一样。
基函数可以用
Figure PCTCN2019110058-appb-000001
表示,其中,a i表示第i个基函数f i(x)的权重;
Figure PCTCN2019110058-appb-000002
表示基函数的第j个参数,r ij为预先设定好的,如r i1为0.1,r i2为1,r i3为2等,d为用于表示j的取值上限的整数,即模型输入特征的数目。
表示目标模型的函数可以用
Figure PCTCN2019110058-appb-000003
表示,k表示基函数数目;
Figure PCTCN2019110058-appb-000004
表示预测值;f i(x)表示基函数,即输入特征的幂次的乘积。
步骤S122,利用第三图像中的像素点的颜色值和相同帧的第四图像中的像素点的强度值来初始化预测函数对应的权重和预测函数内部的参数,以完成对原始模型的训练。其相当于是将第三图像中的像素点的颜色值作为f i(x)中x的取值,而将第四图像中的像素点的强度值作为
Figure PCTCN2019110058-appb-000005
的取值,从而通过内部的激活函数来求解函数中待确定的参数。
步骤13,当训练的次数达到一定量之后,使用在可见光下拍摄得到的测试 图像作为训练后的原始模型的输入、并判断原始模型输出的预测图像与在红外线下拍摄得到的验证图像是否相匹配,以确认训练时的参数是否拟合完毕。
步骤14,当使用在可见光下拍摄得到的测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与在红外线下拍摄得到的验证图像相匹配,即二者之间的相似度达到某个阈值(如99%)的情况下,将训练后的原始模型作为预测模型,测试图像和验证图像为目标场景中同一取景区域的图像。
在求解相似度时可以通过比较预测图像中每个像素点的强度值q1与验证图像中的相同位置的像素点的强度值q2实现,如若相同位置的像素点的强度值|q2-q1|/q2大于一个固定阈值(如95%)则认为二者是相同的像素点,相似度可以用预测图像中与验证图像中相同像素点的数量n与验证图像中像素点的数量m之间的比值表示。
步骤15,当使用测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与验证图像不匹配的情况下,继续使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练,直至训练后的原始模型输出的预测图像与验证图像相匹配。
上述训练过程中使用的一组第三图像为可见光照射下的背景图像,而一组第四图像为相同取景区域内红外光照射下的背景图像,在使用上述方法完成对模型的训练之后,模型相当于能够建立背景模型,如果模型输入的数据是背景像素点,则模型预测输出值就跟该背景像素点的红外值很接近,目标对象的像素点输进去,与背景像素点的红外值差值就很大,即可使用该模型进行对象识别,可选地,通过预测模型确定第一图像中的像素点对应的预测红外强度值可以包括如下步骤:
步骤S21,将第一图像中的像素点的颜色值输入至预测模型。
在步骤S21所示的实施例中,将第一图像中的像素点的颜色值输入至预测模型时,可判断第一图像中的像素点的颜色类型是否为基于生理特征的颜色类型,若是则直接输入,若不是,即在第一图像中的像素点的颜色类型不为基于生理特征的颜色类型的情况下,将第一图像中的像素点的颜色类型转换为基于生理特征的颜色类型,然后将进行颜色类型转换后的第一图像中的像素点的第 一颜色通道的颜色值和第二颜色通道的颜色值输入至预测模型。
步骤S22,调用预测模型中的多种类型的预测函数,根据第一图像中的像素点的颜色值确定第一图像中相同位置上的像素点对应的预测红外强度值。
其相当于是将第一图像中的像素点的颜色值作为f i(x)中x的取值,并通过函数的参数求解出每个像素点的强度值
Figure PCTCN2019110058-appb-000006
的取值,从而完成预测。
在步骤S206提供的技术方案中,服务器获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值。如果模型输入的数据是背景像素点,则模型预测输出值就跟该背景像素点的红外值很接近,换言之,第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值就很小,小于第一阈值,目标对象的像素点输进去后预测得到的红外强度值与背景像素点的红外值差值就很大,大于第一阈值,故可以通过差值来判断像素点是否为目标对象上的像素点。
在步骤S208提供的技术方案中,服务器确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点。
可选地,确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点包括:遍历第二图像中的每个像素点,将第二图像中的差值大于第一阈值的像素点的强度值设置为第二阈值(如白色对应的强度值),并将第二图像中的差值不大于第一阈值的像素点的强度值设置为第三阈值(如黑色对应的强度值),第二阈值与第三阈值为不同的阈值;在遍历完第二图像中的所有像素点之后,通过第二图像中强度值为第二阈值的像素点来描述目标对象,如图5所示,图5中每幅图像分别对应图3的四幅图像中的一副。
而在相关技术中方案中实现检测的效果如图6所示,以图5中和图6中左上角图中被白色方框标出的目标对象为例,采用本申请的技术方案,能够消除阴影干扰、红外图像中存在光环效应以及人为干扰等,使得目标对象的轮廓更加清晰,本申请的技术方案能有效融合红外与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标。
作为一种可选的实施例,下面结合具体的实施方式详述本申请的技术方案。
在一个可选的实施例中,在进行可见光图像空中运动目标的检测时,可通过在第N帧灰度图像上设置ROI框(全称为region of interest,即感兴趣区域),获得当前帧ROI灰度图像,对当前帧ROI灰度图像进行图像预处理、图像二值化处理、图像二值取反处理和图像膨胀处理后,再使用筛选方法筛选连通区域获得目标图像的方法,可解决传统目标检测方法不适用于运动背景下的目标检测和目标穿过运动背景导致目标丢失的问题,保证了运动背景下运动目标检测的实时性和准确性。
红外遥感通过接收目标辐射的热能对其进行探测和定位,反映场景的辐射特性,具有较强抗干扰能力和识别能力,但对比度低时很可能漏检某些热能辐射较小的目标,误检部分较亮的背景区域;而可见光图像表征景物的反射特性,图像的对比度较好,具有丰富的灰度级分布,但其对光照的依赖性较强,工作时间受限。
相关技术中红外与可见光协同目标检测方法大致可分为两大类:基于先融合后检测的协同目标检测和基于先检测后融合的协同目标检测。基于先融合后检测的红外与可见光协同目标检测方法依据融合策略先将红外与可见光图像整合为一幅融合图像,以此突显目标增强场景对比度,再依据融合情况制定检测方案来检测目标。此类方法的重点在于高效融合模型的制定,例如采用非参数模型、码本模型和方向梯度直方图模型依据概率融合理论来区分各类目标。基于先检测后融合的红外与可见光协同目标检测方法将融合部分移至单源目标检测之后来完成,先依据数据源类型进行单一数据源的目标检测任务,再制定融合策略(多为概率融合策略或阈值融合策略)来调节各自检测结果得到目标。此类方法致力于鲁棒背景模型与显著特征的研究,例如双特征混合参数背景模型和显著轮廓图特征。
上述实施方式的主体思想是采用形态学运算和自适应阈值算法来进行目标检测,其不适合复杂户外场景(如有风树木摇摆背景、建筑物干扰等),容易造成虚警、检测率低。当可见光图像中存在阴影干扰、红外图像中存在光环效应、以及人为干扰的情况下,容易造成虚警、目标检测率低等技术问题。
为了克服上述实施方式中的缺陷,本申请还提供了一种实施方式,即一种基于自适应基函数重建的红外与可见光协同目标检测方法,获取可见光视频的 若干帧图像BG v(即一组第三图像)、红外视频的若干帧图像BG t(即一组第四图像),来建立协同背景模型M(即预测模型);针对前述步骤中得到的背景模型M,结合当前帧F t(包括可见光下的第一图像和红外光下的第二图像),进行背景杂波抑制,得到背景杂波抑制后的背景抑制图G t,(G t中每个像素点的预测强度值即利用第一图像的像素点预测得到的)采用自适应阈值分割算法来检测目标。采用自适应基函数重建技术,能有效建立非线性背景模型,能有效融合红外与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标。
本申请涉及的基于自适应基函数重建的红外与可见光协同目标检测方法,实际可部署于采用API(英文全称为Application Programming Interface,即应用程序编程接口)服务调用或SDK(英文全称为Software Development Kit,即软件开发工具包)嵌套的方式调用的服务器上,该服务器可结合实际落地场景实现,算法可运行于服务器的linux或window等系统环境中。
下面通过具体实施例,并结合附图,对本申请的技术方案作进一步说明。
步骤1,获取可见光视频的若干帧图像BG v,如图3所示,红外视频的若干帧图像BG t,如图4所示,以此来建立协同背景模型M,背景模型M是基于自适应基函数重建技术进行获取,图7示出了一种可选的背景模型的示意图;
分配T帧图像数据(F t,t=1…T)来建立背景模型,包含红外部分F t t,可见光部分F v t,针对每个像素点位置,分别建立一个背景模型M(i,j):
步骤11,将可见光的RGB值转换成Lab颜色值,将a、b颜色值作为模型输入特征,即如图7所示的输入,将红外的强度值作为模型的输出,即如图7所示的输出,共组成T个训练样本X;
步骤12,针对步骤1.1得到的训练样本X,采用自适应基函数重建技术建立背景模型M(i,j),
Figure PCTCN2019110058-appb-000007
Figure PCTCN2019110058-appb-000008
其中,a i表示基函数的权重;k表示基函数数目;
Figure PCTCN2019110058-appb-000009
表示预测值;d表示模型输入特征数目;f i(x)表示基函数,即输入特征的幂次的乘积。
步骤2,针对步骤1中得到的背景模型M,结合当前帧F t,进行背景杂波抑制,得到背景杂波抑制后的背景抑制图G t,即由对当前帧中的可见光图像(即第一图像)中的像素点进行预测得到的预测红外强度值与真实红外强度值的差值组成的图像,采用自适应阈值分割算法来检测目标。
步骤21,针对步骤1中得到的背景模型,获取背景抑制图G t
Figure PCTCN2019110058-appb-000010
即取当前帧红外图像的强度值与背景模型的预测值的绝对差值,如图4所示。
步骤22,采用OTSU阈值算法(即大津法或最大类间方差法)等自适应阈值算法来计算阈值θ来提取目标。
Figure PCTCN2019110058-appb-000011
采用本申请的技术方案,利用自适应基函数重建技术,能有效建立非线性局部背景模型,能有效融合红外与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标。
为了验证本方法的有效性,本申请采用实际采集到的六段复杂户外场景视频进行验证实验,参见图3、图4,并与其它算法进行比较验证,如图8所示和表1所示,验证得出:本申请的技术方案能有效地检测出不同复杂场景下的目标;相较于码本方法和加权单高斯方法,本方案的F1(一种评判指标)综合指标高达90.9%(即最上方较为平缓的曲线),能较好地抑制可见光阴影、红外光环效应干扰,检测稳定性较好,场景适应性较强。三种检测方法的平均检测指标比较如表1所示:
表1
Figure PCTCN2019110058-appb-000012
为了验证本方法的有效性,本申请采用一些其它方法进行对比验证。并对比了两种算法:码本方法(CB)、加权单高斯方法(SWG)。
上述两种方法以及本申请的方法这三种检测方法针对户外场景的检测结果如图8和表1所示,本方法能有效抑制阴影和光环效应,有效检测不同场景的目标。
为了更客观地评价本方法的检测性能,我们采用目标检测领域标准通用的Precision、Recall和F1指标来评价本方法检测结果;其中,Recall表示检测率,为检测到的真实目标数与真实目标总数之比;Precision表示准确率,为检测到的真实目标数与检测到的目标总数之比。一个好的目标检测方法应具有较高的Recall值,同时也能保持较高的Precision值;较高的F1值也意味着好的检测性能。
三种检测方法的三种指标评价结果如附图8和表1所示。从图表中可以比较各种检测算法的总体检测性能:加权单高斯算法具有较高的检测率,但准确率较低,码本算法的检测率和准确率表现一般;而本申请方法同时具有较高的检测率和准确率,且具有较好的检测稳定性,本方法的F1指标高达90.9%。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体 现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
根据本申请实施例的另一个方面,还提供了一种用于实施上述目标对象的识别方法的目标对象的识别装置。图9是根据本申请实施例的一种可选的目标对象的识别装置的示意图,如图9所示,该装置可以包括:第一获取单元901、预测单元903、第二获取单元905以及识别单元907。
第一获取单元901,用于获取第一图像和第二图像,其中,第一图像为在可见光下对目标场景拍摄得到的图像,第二图像为在红外线下对目标场景拍摄得到的图像。
上述的第一图像和第二图像可以为拍摄的连续视频帧序列中的一帧,也可为单独拍到的一张图像,第一图像和第二图像为取景(即目标场景)相同的图像,且第一图像和第二图像为拍摄时间接近(即小于预先设定的数值,如0.02秒)的图像,如为同时拍摄得到的可见光视频帧序列和红外视频帧序列中的相同帧(即帧位置相同的视频帧)。
上述的目标场景即待进行目标对象识别的区域,可以为监控系统中终端所监控区域的场景、人工智能系统中飞行器当前所能够识别的区域。
可见光传感器能探测红绿蓝光谱能量,将其转化为彩色图像,具有丰富的色彩、纹理和结构等信息,且符合人类视觉感知体系,便于理解分析;基于红外传感器的侦查系统能接收来自目标和背景的红外辐射,将不可见的辐射转变成人眼可观测的图像,环境适应性好,灵敏度高,适合于弱小目标信号的探测和鉴别,而红外传感装置自身的红外辐射极其微弱,属于无源探测装置,隐蔽性好。因此,采用可见光图像与红外光图像结合的方式能有效丰富目标和场景信息,提高检测率。
预测单元903,用于通过预测模型确定第一图像中的像素点对应的预测红外强度值,其中,预测模型是使用在可见光下拍摄得到的一组第三图像作为模型输入并使用在红外线下拍摄得到的一组第四图像作为模型输出进行训练得到的模型,一组第三图像和一组第四图像是相同场景的图像。
在进行模型训练时,训练的目的在于使得模型能够将可见光图像转换为相 同场景下的红外图像,换言之,预测模型能够将第一图像转为一张红外图像,该红外图像中每个像素点的预测红外强度值是利用第一图像中相同位置的像素点的颜色值确定的。
上述的预测模型中包括多个分配有相应权重的预测函数(预测函数可以为基函数),预测函数的输入即为预测模型的输入,预测模型的输出即为对所有预测函数的输出与相应的权重之间的乘积的累积和。
第二获取单元905,用于获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值。
识别单元907,用于确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点。
需要说明的是,该实施例中的第一获取单元901可以用于执行本申请实施例中的步骤S202,该实施例中的预测单元903可以用于执行本申请实施例中的步骤S204,该实施例中的第二获取单元905可以用于执行本申请实施例中的步骤S206,该实施例中的识别单元907可以用于执行本申请实施例中的步骤S208。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现。
通过上述模块,采用自适应函数重建,能有效建立非线性的用于表示目标场景的背景的预测模型,能有效融合红外光与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标,可以解决相关技术中对目标对象的识别准确率较低的技术问题,进而达到在存在干扰的情况下仍然能够准确识别目标对象的技术效果。
可选地,预测单元可包括:输入模块,用于将第一图像中的像素点的颜色值输入至预测模型;预测模块,用于调用预测模型中的多种类型的预测函数,根据第一图像中的像素点的颜色值确定第一图像中相同位置上的像素点对应的预测红外强度值。
可选地,本申请的装置还可包括:第三获取单元,用于在通过预测模型确定第一图像中的像素点对应的预测红外强度值之前,获取对目标场景进行拍摄 得到的一组第三图像和一组第四图像;训练单元,用于逐帧地使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练;第一验证单元,用于当使用在可见光下拍摄得到的测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与在红外线下拍摄得到的验证图像相匹配的情况下,将训练后的原始模型作为预测模型,其中,测试图像和验证图像为目标场景的图像;第二验证单元,用于当使用测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与验证图像不匹配的情况下,继续使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练,直至训练后的原始模型输出的预测图像与验证图像相匹配。
可选地,训练单元逐帧地使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练可以通过如下方式实现:将第三图像中的像素点的颜色值输入至原始模型,并将相同帧的第四图像中的像素点的强度值作为原始模型输出,其中,第三图像中的像素点的颜色值用于作为原始模型中多个预测函数的输入,原始模型的输出为对多个预测函数中每个预测函数与对应的权重之间的乘积的累积和;利用第三图像中的像素点的颜色值和相同帧的第四图像中的像素点的强度值来初始化预测函数对应的权重和预测函数内部的参数,以完成对原始模型的训练。
可选地,识别单元在确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点时,可以通过如下方式实现:遍历第二图像中的每个像素点,将第二图像中的差值大于第一阈值的像素点的强度值设置为第二阈值,并将第二图像中的差值不大于第一阈值的像素点的强度值设置为第三阈值,其中,第二阈值与第三阈值为不同的阈值;在遍历完第二图像中的所有像素点之后,通过第二图像中强度值为第二阈值的像素点来描述目标对象。
可选地,第一获取单元获取第一图像和第二图像时,可获取在可见光下对目标场景拍摄得到的第一图像和同一时刻在红外线下对目标场景拍摄得到的第二图像。
采用本申请提供的技术方案,即一种基于自适应基函数重建的红外与可见 光协同目标检测方案,获取可见光视频的若干帧图像BG v(即一组第三图像)、红外视频的若干帧图像BG t(即一组第四图像),来建立协同背景模型M(即预测模型);针对前述步骤中得到的背景模型M,结合当前帧F t(包括可见光下的第一图像和红外光下的第二图像),进行背景杂波抑制,得到背景杂波抑制后的背景抑制图G t,(G t中每个像素点的预测强度值即利用第一图像的像素点预测得到的)采用自适应阈值分割算法来检测目标。采用自适应基函数重建技术,能有效建立非线性背景模型,能有效融合红外与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标。
此处需要说明的是,上述模块与对应的步骤所实现的示例和应用场景相同,但不限于上述实施例所公开的内容。需要说明的是,上述模块作为装置的一部分可以运行在如图1所示的硬件环境中,可以通过软件实现,也可以通过硬件实现,其中,硬件环境包括网络环境。
根据本申请实施例的另一个方面,还提供了一种用于实施上述目标对象的识别方法的服务器或终端。
图10是根据本申请实施例的一种终端的结构框图,如图10所示,该终端可以包括:一个或多个(图中仅示出一个)处理器1001、存储器1003、以及传输装置1005,如图10所示,该终端还可以包括输入输出设备1007。
其中,存储器1003可用于存储软件程序以及模块,如本申请实施例中的目标对象的识别方法和装置对应的程序指令/模块,处理器1001通过运行存储在存储器1003内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的目标对象的识别方法。存储器1003可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器1003可进一步包括相对于处理器1001远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置1005用于经由一个网络接收或者发送数据,还可以用于处理器与存储器之间的数据传输。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1005包括一个网络适配器(Network Interface  Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置1005为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。
其中,具体地,存储器1003用于存储应用程序。
处理器1001可以通过传输装置1005调用存储器1003存储的应用程序,以执行下述步骤:
获取第一图像和第二图像,其中,第一图像为在可见光下对目标场景拍摄得到的图像,第二图像为在红外线下对目标场景拍摄得到的图像;
通过预测模型确定第一图像中的像素点对应的预测红外强度值,其中,可选的,预测模型是使用在可见光下拍摄得到的一组第三图像作为模型输入并使用在红外线下拍摄得到的一组第四图像作为模型输出进行训练得到的模型,一组第三图像和一组第四图像是相同场景的图像;
获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值;
确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点。
处理器1001还用于执行下述步骤:
获取对目标场景进行拍摄得到的一组第三图像和一组第四图像;
逐帧地使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练;
当使用在可见光下拍摄得到的测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与在红外线下拍摄得到的验证图像相匹配的情况下,将训练后的原始模型作为预测模型,其中,测试图像和验证图像为目标场景的图像;
当使用测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与验证图像不匹配的情况下,继续使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练,直至训练后的原始模型输出的预测图像与验证图像相匹配。
采用本申请实施例,采用“获取第一图像和第二图像,第一图像为在可见 光下对目标场景拍摄得到的图像,第二图像为在红外线下对目标场景拍摄得到的图像;通过预测模型确定第一图像中的像素点对应的预测红外强度值,预测模型是使用在可见光下拍摄得到的一组第三图像作为模型输入并使用在红外线下拍摄得到的一组第四图像作为模型输出进行训练得到的模型,一组第三图像和一组第四图像是相同场景的图像;获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值;确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点”的方式,采用自适应函数重建,能有效建立非线性的用于表示目标场景的背景的预测模型,能有效融合红外光与可见光信息,抑制阴影干扰与红外光环效应,能有效抑制背景杂波突显目标,可以解决相关技术中对目标对象的识别准确率较低的技术问题,进而达到在存在干扰的情况下仍然能够准确识别目标对象的技术效果。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。
本领域普通技术人员可以理解,图10所示的结构仅为示意,终端可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图10其并不对上述电子装置的结构造成限定。例如,终端还可包括比图10中所示更多或者更少的组件(如网络接口、显示装置等),或者具有与图10所示不同的配置。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本申请的实施例还提供了一种存储介质。可选地,在本实施例中,上述存储介质可以用于执行目标对象的识别方法的程序代码。
可选地,在本实施例中,上述存储介质可以位于上述实施例所示的网络中的多个网络设备中的至少一个网络设备上。
可选地,在本实施例中,存储介质被设置为存储用于执行以下步骤的程序 代码:
获取第一图像和第二图像,其中,第一图像为在可见光下对目标场景拍摄得到的图像,第二图像为在红外线下对目标场景拍摄得到的图像;
通过预测模型确定第一图像中的像素点对应的预测红外强度值,其中,可选的,预测模型是使用在可见光下拍摄得到的一组第三图像作为模型输入并使用在红外线下拍摄得到的一组第四图像作为模型输出进行训练得到的模型,一组第三图像和一组第四图像是相同场景的图像;
获取第二图像中的像素点的实际红外强度值与第一图像中相同位置上的像素点对应的预测红外强度值之间的差值;
确定第二图像中的差值大于第一阈值的像素点为目标场景中目标对象所在的像素点。
可选地,存储介质还被设置为存储用于执行以下步骤的程序代码:
获取对目标场景进行拍摄得到的一组第三图像和一组第四图像;
逐帧地使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练;
当使用在可见光下拍摄得到的测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与在红外线下拍摄得到的验证图像相匹配的情况下,将训练后的原始模型作为预测模型,其中,测试图像和验证图像为目标场景的图像;
当使用测试图像作为训练后的原始模型的输入、且原始模型输出的预测图像与验证图像不匹配的情况下,继续使用一组第三图像中的图像作为原始模型的输入并使用一组第四图像中相同帧的图像作为原始模型的输出来对原始模型进行训练,直至训练后的原始模型输出的预测图像与验证图像相匹配。
本申请实施例还提供了一种包括指令的计算机程序产品,当其在服务器上运行时,使得服务器执行上述实施例提供的方法。
可选地,本实施例中的具体示例可以参考上述实施例中所描述的示例,本实施例在此不再赘述。
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access  Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (14)

  1. 一种目标对象的识别方法,其特征在于,包括:
    处理设备获取第一图像和第二图像,其中,所述第一图像为在可见光下对目标场景拍摄得到的图像,所述第二图像为在红外线下对所述目标场景拍摄得到的图像;所述处理设备包括终端和/或服务器;
    所述处理设备通过预测模型确定所述第一图像中的像素点对应的预测红外强度值;
    所述处理设备根据所述第二图像的实际红外线强度值和所述预测红外线强度值,确定所述第一图像和第二图像中相同位置上像素点的红外线强度值的差值;
    所述处理设备确定所述第二图像中的所述差值大于第一阈值的像素点为所述目标场景中目标对象所在的像素点。
  2. 根据权利要求1所述的方法,其特征在于,所述处理设备通过预测模型确定所述第一图像中的像素点对应的预测红外强度值包括:
    所述处理设备获取所述第一图像中的像素点的颜色值;
    所述处理设备调用所述预测模型中的多种类型的预测函数,根据所述第一图像中的像素点的颜色值确定所述第一图像中相同位置上的像素点对应的预测红外强度值。
  3. 根据权利要求2所述的方法,其特征在于,所述处理设备获取所述第一图像中的像素点的颜色值包括:
    所述处理设备在所述第一图像中的像素点的颜色类型不为基于生理特征的颜色类型的情况下,将所述第一图像中的像素点的颜色类型转换为基于生理特征的颜色类型;
    进行颜色类型转换后,所述处理设备获取所述第一图像中像素点的第一颜色通道的颜色值和第二颜色通道的颜色值。
  4. 根据权利要求1至3中任意一项所述的方法,其特征在于,所述预测模型是使用在可见光下拍摄得到的一组第三图像作为模型输入并使用在红外线下拍摄得到的一组第四图像作为模型输出进行训练得到的模型,所述一组第三图像和所述一组第四图像是相同场景的图像。
  5. 根据权利要求4所述的方法,其特征在于,在所述处理设备通过预测模型确定所述第一图像中的像素点对应的预测红外强度值之前,所述方法还包括:
    所述处理设备获取对所述目标场景进行拍摄得到的所述一组第三图像和所述一组第四图像;
    所述处理设备逐帧地使用所述一组第三图像中的图像作为原始模型的输入,并使用所述一组第四图像中相同帧的图像作为所述原始模型的输出来对所述原始模型进行训练;
    所述处理设备当使用在可见光下拍摄得到的测试图像作为训练后的所述原始模型的输入、且所述原始模型输出的预测图像与在红外线下拍摄得到的验证图像相匹配的情况下,将训练后的所述原始模型作为所述预测模型,其中,所述测试图像和所述验证图像为所述目标场景的图像;
    当所述处理设备使用所述测试图像作为训练后的所述原始模型的输入、且所述原始模型输出的所述预测图像与所述验证图像不匹配的情况下,所述处理设备继续使用所述一组第三图像中的图像作为所述原始模型的输入,并使用所述一组第四图像中相同帧的图像作为所述原始模型的输出来对所述原始模型进行训练,直至训练后的所述原始模型输出的所述预测图像与所述验证图像相匹配。
  6. 根据权利要求5所述的方法,其特征在于,所述处理设备逐帧地使用所述一组第三图像中的图像作为原始模型的输入,并使用所述一组第四图像中相同帧的图像作为所述原始模型的输出来对所述原始模型进行训练包括:
    所述处理设备将所述第三图像中的像素点的颜色值输入至所述原始模型,并将相同帧的所述第四图像中的像素点的强度值作为所述原始模型输出,其中,所述第三图像中的像素点的颜色值用于作为所述原始模型中多个预测函数的输入,所述原始模型的输出为对所述多个预测函数中每个预测函数与对应的权重之间的乘积的累积和;
    所述处理设备利用所述第三图像中的像素点的颜色值和相同帧的所述第四图像中的像素点的强度值,初始化所述预测函数对应的权重和所述预测函数内部的参数,以完成对所述原始模型的训练。
  7. 根据权利要求1至3中任意一项所述的方法,其特征在于,所述处理设备确定所述第二图像中的所述差值大于第一阈值的像素点为所述目标场景中目标对象所在的像素点包括:
    所述处理设备遍历所述第二图像中的每个像素点,将所述第二图像中的所述差值大于所述第一阈值的像素点的强度值设置为第二阈值,并将所述第二图像中的所述差值不大于所述第一阈值的像素点的强度值设置为第三阈值,其中,所述第二阈值与所述第三阈值为不同的阈值;
    所述处理设备在遍历完所述第二图像中的所有像素点之后,通过所述第二图像中强度值为所述第二阈值的像素点来描述所述目标对象。
  8. 根据权利要求1至3中任意一项所述的方法,其特征在于,所述处理设备获取第一图像和第二图像包括:
    所述处理设备获取在可见光下对所述目标场景拍摄得到的所述第一图像和同一时刻在红外线下对所述目标场景拍摄得到的所述第二图像。
  9. 一种目标对象的识别装置,其特征在于,包括:
    第一获取单元,用于获取第一图像和第二图像,其中,所述第一图像为在可见光下对目标场景拍摄得到的图像,所述第二图像为在红外线下对所述目标场景拍摄得到的图像;
    预测单元,用于通过预测模型确定所述第一图像中的像素点对应的预测红外强度值;
    第二获取单元,用于根据所述第二图像的实际红外线强度值和所述预测红外线强度值,确定所述第一图像和第二图像中相同位置上像素点的红外线强度值的差值;;
    识别单元,用于确定所述第二图像中的所述差值大于第一阈值的像素点为所述目标场景中目标对象所在的像素点。
  10. 根据权利要求9所述的装置,其特征在于,所述预测单元包括:
    输入模块,用于获取所述第一图像中的像素点的颜色值;
    预测模块,用于调用所述预测模型中的多种类型的预测函数,根据所述第一图像中的像素点的颜色值确定所述第一图像中相同位置上的像素点对应的预测红外强度值。
  11. 根据权利要求9或10所述的装置,其特征在于,所述装置还包括:
    第三获取单元,用于在通过预测模型确定所述第一图像中的像素点对应的预测红外强度值之前,获取对所述目标场景进行拍摄得到的一组第三图像和一组第四图像;
    训练单元,用于逐帧地使用所述一组第三图像中的图像作为原始模型的输入,并使用所述一组第四图像中相同帧的图像作为所述原始模型的输出来对所述原始模型进行训练;
    第一验证单元,用于当使用在可见光下拍摄得到的测试图像作为训练后的所述原始模型的输入、且所述原始模型输出的预测图像与在红外线下拍摄得到的验证图像相匹配的情况下,将训练后的所述原始模型作为所述预测模型,其中,所述测试图像和所述验证图像为所述目标场景的图像;
    第二验证单元,用于当使用所述测试图像作为训练后的所述原始模型的输入、且所述原始模型输出的所述预测图像与所述验证图像不匹配的情况下,继续使用所述一组第三图像中的图像作为所述原始模型的输入,并使用所述一组第四图像中相同帧的图像作为所述原始模型的输出来对所述原始模型进行训练,直至训练后的所述原始模型输出的所述预测图像与所述验证图像相匹配。
  12. 一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,所述程序运行时执行上述权利要求1至8任一项中所述的方法。
  13. 一种电子装置,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其特征在于,所述处理器通过所述计算机程序执行上述权利要求1至8任一项中所述的方法。
  14. 一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行权利要求1-8任意一项所述的方法。
PCT/CN2019/110058 2018-10-15 2019-10-09 目标对象的识别方法和装置、存储介质、电子装置 WO2020078229A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19873881.7A EP3869459B1 (en) 2018-10-15 2019-10-09 Target object identification method and apparatus, storage medium and electronic apparatus
US17/074,502 US11443498B2 (en) 2018-10-15 2020-10-19 Target object recognition method and apparatus, storage medium, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811197547.XA CN109461168B (zh) 2018-10-15 2018-10-15 目标对象的识别方法和装置、存储介质、电子装置
CN201811197547.X 2018-10-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/074,502 Continuation US11443498B2 (en) 2018-10-15 2020-10-19 Target object recognition method and apparatus, storage medium, and electronic device

Publications (1)

Publication Number Publication Date
WO2020078229A1 true WO2020078229A1 (zh) 2020-04-23

Family

ID=65607710

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/110058 WO2020078229A1 (zh) 2018-10-15 2019-10-09 目标对象的识别方法和装置、存储介质、电子装置

Country Status (4)

Country Link
US (1) US11443498B2 (zh)
EP (1) EP3869459B1 (zh)
CN (1) CN109461168B (zh)
WO (1) WO2020078229A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111880575A (zh) * 2020-08-10 2020-11-03 重庆依塔大数据研究院有限公司 基于颜色追踪的控制方法、装置、存储介质及机器人
CN113111807A (zh) * 2021-04-20 2021-07-13 北京嘀嘀无限科技发展有限公司 一种目标识别的方法和系统
CN115861039A (zh) * 2022-11-21 2023-03-28 北京城市网邻信息技术有限公司 信息展示方法、装置、设备及介质
CN117336453A (zh) * 2023-11-27 2024-01-02 湖南苏科智能科技有限公司 一种安检图像转换方法、系统、设备及存储介质
CN117471392A (zh) * 2023-12-27 2024-01-30 矽电半导体设备(深圳)股份有限公司 探针针尖的检测方法、系统、电子设备及存储介质

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937776A (zh) * 2017-09-15 2023-04-07 杭州海康威视数字技术股份有限公司 监控方法、装置、系统、电子设备及计算机可读存储介质
CN109461168B (zh) 2018-10-15 2021-03-16 腾讯科技(深圳)有限公司 目标对象的识别方法和装置、存储介质、电子装置
CN110059746A (zh) * 2019-04-18 2019-07-26 达闼科技(北京)有限公司 一种创建目标检测模型的方法、电子设备及存储介质
CN110176027B (zh) * 2019-05-27 2023-03-14 腾讯科技(深圳)有限公司 视频目标跟踪方法、装置、设备及存储介质
CN110244314B (zh) * 2019-06-24 2022-04-19 北京机械设备研究所 一种“低慢小”目标探测识别系统与方法
CN111145168B (zh) * 2019-12-31 2023-04-14 华东理工大学 碳纤维复合材料缺陷的检测方法及系统、存储介质
CN111652242B (zh) * 2020-04-20 2023-07-04 北京迈格威科技有限公司 图像处理方法、装置、电子设备及存储介质
CN111598088B (zh) * 2020-05-15 2023-12-29 京东方科技集团股份有限公司 目标检测方法、装置、计算机设备及可读存储介质
CN111914672B (zh) * 2020-07-08 2023-08-04 浙江大华技术股份有限公司 图像标注方法和装置及存储介质
CN111968057A (zh) * 2020-08-24 2020-11-20 浙江大华技术股份有限公司 图像降噪方法、装置、存储介质及电子装置
CN113077533B (zh) * 2021-03-19 2023-05-12 浙江大华技术股份有限公司 一种图像融合方法、装置以及计算机存储介质
CN113034533B (zh) * 2021-04-06 2022-05-20 电子科技大学 一种基于空时平稳性的红外小目标检测方法
CN113111806A (zh) * 2021-04-20 2021-07-13 北京嘀嘀无限科技发展有限公司 用于目标识别的方法和系统
CN113298744B (zh) * 2021-06-07 2022-10-28 长春理工大学 一种端到端的红外与可见光图像融合方法
CN114648547B (zh) * 2022-03-09 2023-06-27 中国空气动力研究与发展中心计算空气动力研究所 用于反无人机红外探测系统的弱小目标检测方法和装置
CN114662594B (zh) * 2022-03-25 2022-10-04 浙江省通信产业服务有限公司 一种目标特征识别分析系统
CN116385260B (zh) * 2022-05-19 2024-02-09 上海玄戒技术有限公司 图像处理方法、装置、芯片、电子设备及介质
CN115050016B (zh) * 2022-08-15 2023-01-17 深圳市爱深盈通信息技术有限公司 车牌检测方法、装置、设备终端和可读存储介质
CN115965843B (zh) * 2023-01-04 2023-09-29 长沙观谱红外科技有限公司 一种可见光和红外图像融合方法
CN115953566B (zh) * 2023-03-15 2023-05-16 深圳市普雷德科技有限公司 一种用于红外热成像仪的特征分析方法、系统及介质
CN116309501B (zh) * 2023-03-27 2024-02-02 北京鹰之眼智能健康科技有限公司 一种疮面类型预测方法、电子设备和存储介质
CN116109893B (zh) * 2023-04-11 2023-09-15 宁波长壁流体动力科技有限公司 一种矿井场景图像分类方法、系统及存储介质
CN116797993B (zh) * 2023-05-13 2024-03-19 全景智联(武汉)科技有限公司 一种基于智慧社区场景的监控方法、系统、介质及设备
CN116977184B (zh) * 2023-08-01 2024-03-19 南方电网数字电网研究院股份有限公司 输电线路图像获取方法、装置、计算机设备和存储介质
CN116994295B (zh) * 2023-09-27 2024-02-02 华侨大学 基于灰度样本自适应选择门的野生动物类别识别方法
CN117496201B (zh) * 2023-12-29 2024-04-05 深圳市五轮科技股份有限公司 一种用于电子烟、雾化器和电池杆的识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103402044A (zh) * 2013-08-07 2013-11-20 重庆大学 一种基于多源视频融合的目标识别与跟踪系统
US20140375821A1 (en) * 2013-06-25 2014-12-25 Pixart Imaging Inc. Detection system
CN108510556A (zh) * 2018-03-30 2018-09-07 百度在线网络技术(北京)有限公司 用于处理图像的方法和装置
CN109461168A (zh) * 2018-10-15 2019-03-12 腾讯科技(深圳)有限公司 目标对象的识别方法和装置、存储介质、电子装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693331B2 (en) * 2006-08-30 2010-04-06 Mitsubishi Electric Research Laboratories, Inc. Object segmentation using visible and infrared images
CN101727665B (zh) * 2008-10-27 2011-09-07 广州飒特电力红外技术有限公司 红外图像和可见光图像融合的方法及装置
EP2549759B1 (en) * 2011-07-19 2016-01-13 Axis AB Method and system for facilitating color balance synchronization between a plurality of video cameras as well as method and system for obtaining object tracking between two or more video cameras
CN105575034B (zh) * 2014-10-14 2019-06-07 哈尔滨新光光电科技有限公司 一种双波段森林防火智能监控软件图像处理分析方法
CN104778722A (zh) * 2015-03-20 2015-07-15 北京环境特性研究所 一种传感器的数据融合方法
CN108280819B (zh) * 2018-02-02 2022-03-25 北京理工雷科电子信息技术有限公司 一种双载荷遥感图像融合方法
CN108509892B (zh) * 2018-03-28 2022-05-13 百度在线网络技术(北京)有限公司 用于生成近红外图像的方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140375821A1 (en) * 2013-06-25 2014-12-25 Pixart Imaging Inc. Detection system
CN103402044A (zh) * 2013-08-07 2013-11-20 重庆大学 一种基于多源视频融合的目标识别与跟踪系统
CN108510556A (zh) * 2018-03-30 2018-09-07 百度在线网络技术(北京)有限公司 用于处理图像的方法和装置
CN109461168A (zh) * 2018-10-15 2019-03-12 腾讯科技(深圳)有限公司 目标对象的识别方法和装置、存储介质、电子装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111880575A (zh) * 2020-08-10 2020-11-03 重庆依塔大数据研究院有限公司 基于颜色追踪的控制方法、装置、存储介质及机器人
CN113111807A (zh) * 2021-04-20 2021-07-13 北京嘀嘀无限科技发展有限公司 一种目标识别的方法和系统
CN113111807B (zh) * 2021-04-20 2024-06-07 北京嘀嘀无限科技发展有限公司 一种目标识别的方法和系统
CN115861039A (zh) * 2022-11-21 2023-03-28 北京城市网邻信息技术有限公司 信息展示方法、装置、设备及介质
CN117336453A (zh) * 2023-11-27 2024-01-02 湖南苏科智能科技有限公司 一种安检图像转换方法、系统、设备及存储介质
CN117336453B (zh) * 2023-11-27 2024-01-30 湖南苏科智能科技有限公司 一种安检图像转换方法、系统、设备及存储介质
CN117471392A (zh) * 2023-12-27 2024-01-30 矽电半导体设备(深圳)股份有限公司 探针针尖的检测方法、系统、电子设备及存储介质
CN117471392B (zh) * 2023-12-27 2024-03-29 矽电半导体设备(深圳)股份有限公司 探针针尖的检测方法、系统、电子设备及存储介质

Also Published As

Publication number Publication date
US20210034901A1 (en) 2021-02-04
CN109461168B (zh) 2021-03-16
US11443498B2 (en) 2022-09-13
EP3869459A4 (en) 2021-12-15
EP3869459B1 (en) 2023-09-27
EP3869459A1 (en) 2021-08-25
CN109461168A (zh) 2019-03-12

Similar Documents

Publication Publication Date Title
WO2020078229A1 (zh) 目标对象的识别方法和装置、存储介质、电子装置
US10599958B2 (en) Method and system for classifying an object-of-interest using an artificial neural network
US10896323B2 (en) Method and device for image processing, computer readable storage medium, and electronic device
CN112560657B (zh) 烟火识别方法、装置、计算机设备和存储介质
CN108777815B (zh) 视频处理方法和装置、电子设备、计算机可读存储介质
Çetin et al. Video fire detection–review
WO2020073505A1 (zh) 基于图像识别的图像处理方法、装置、设备及存储介质
CN106886216B (zh) 基于rgbd人脸检测的机器人自动跟踪方法和系统
US20160260306A1 (en) Method and device for automated early detection of forest fires by means of optical detection of smoke clouds
TW202026948A (zh) 活體檢測方法、裝置以及儲存介質
WO2019052318A1 (zh) 一种电梯轿厢监控方法、装置及系统
CN105930822A (zh) 一种人脸抓拍方法及系统
US10997469B2 (en) Method and system for facilitating improved training of a supervised machine learning process
US20190096066A1 (en) System and Method for Segmenting Out Multiple Body Parts
CN105184308B (zh) 一种基于全局优化决策的遥感图像建筑物检测分类方法
WO2024051067A1 (zh) 红外图像处理方法、装置及设备、存储介质
KR101944374B1 (ko) 이상 개체 검출 장치 및 방법, 이를 포함하는 촬상 장치
KR20180001356A (ko) 지능형 영상 보안 시스템
CN113936252A (zh) 基于视频监控的电瓶车智能管理系统及方法
KR102171384B1 (ko) 영상 보정 필터를 이용한 객체 인식 시스템 및 방법
CN113158963A (zh) 一种高空抛物的检测方法及装置
CN115601712B (zh) 适用于现场安全措施的图像数据处理方法及系统
KR102457470B1 (ko) 영상 분석을 이용한 인공지능 기반 강수 판단장치 및 방법
KR102299250B1 (ko) 복합 영상 데이터를 이용한 입출력 인원 계수 장치 및 방법
CN111191575B (zh) 一种基于火苗跳动建模的明火检测方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19873881

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019873881

Country of ref document: EP

Effective date: 20210517