CN118135545A

CN118135545A - Method and device for detecting in-vehicle carryover, electronic equipment and readable storage medium

Info

Publication number: CN118135545A
Application number: CN202410169073.7A
Authority: CN
Inventors: 刘浩杰; 龙柏君
Original assignee: Streamax Technology Co Ltd
Current assignee: Streamax Technology Co Ltd
Priority date: 2024-02-04
Filing date: 2024-02-04
Publication date: 2024-06-04

Abstract

The embodiment of the application discloses a method, a device, electronic equipment and a readable storage medium for detecting in-vehicle carryover, wherein the method comprises the following steps: acquiring at least two frames of images to be detected; carrying out differential processing on the gray level image of each frame of image to be detected and the gray level image of the initial reference frame to obtain a differential image; performing binarization processing on the differential image by using a first preset threshold value to obtain a binarized image; performing contour detection on the binarized image to obtain an object detection frame set; the first preset threshold value of each frame of image to be detected is different or the same; the method comprises the steps of taking a union set of object detection frames of images to be detected of each frame to obtain a detection frame union set; and determining a target detection frame set from the detection frame union set to obtain a legacy detection result. Therefore, the multi-frame image and the single binarization threshold or multiple binarization thresholds are processed, namely the multi-frame multi-threshold or multi-frame single threshold can reduce the interference of illumination conditions, scenes and the like on the detection rate and the detection accuracy, and the detection stability is better.

Description

Method and device for detecting in-vehicle carryover, electronic equipment and readable storage medium

Technical Field

The application belongs to the technical field of computer vision and image processing, and particularly relates to a method and a device for detecting in-vehicle remnants, electronic equipment and a readable storage medium.

Background

In-vehicle carryover detection is a technique that utilizes computer vision and image processing techniques to detect and identify objects or items left inside a vehicle. The requirements of vehicle safety, monitoring and preventing articles or objects from losing can be met through in-vehicle remnants detection.

Currently, conventional in-vehicle carryover detection methods generally rely on computer vision techniques (e.g., edge detection, color segmentation, shape analysis, etc.) to detect in-vehicle carryover. However, the detection rate and the detection accuracy of the traditional in-vehicle legacy detection method are easily affected by scenes and illumination conditions, and the detection stability is poor.

Disclosure of Invention

The embodiment of the application provides a method, a device, electronic equipment and a readable storage medium for detecting in-vehicle carryover, which can solve the problem of poor detection stability of the existing method for detecting in-vehicle carryover.

In a first aspect, an embodiment of the present application provides a method for detecting a carryover in a vehicle, including:

Acquiring at least two frames of images to be detected;

Converting the image to be detected into a first gray level image aiming at each frame of image to be detected, and carrying out differential processing on the first gray level image and the gray level image of a first initial reference frame to obtain a first differential image; performing binarization processing on the first differential image by using a first preset threshold value to obtain a first binarized image; performing contour detection on the first binarized image to obtain a first object detection frame set; the first preset threshold value of each frame of image to be detected is different or the same, the image to be detected and the first initial reference frame are both in-vehicle environment images, and the acquisition time of the image to be detected is later than that of the first initial reference frame;

A first object detection frame set of each frame of image to be detected is taken and combined to obtain a first detection frame combined set, and the first detection frame combined set comprises a plurality of first detection frames;

And determining a first target detection frame set from the first detection frame union set to obtain a legacy detection result, wherein the first target detection frame set comprises at least one first target detection frame, and the first target detection frame is a first detection frame with the number of intersecting detection frames being greater than the preset number.

From the above, according to the embodiment of the application, at least two frames of images to be detected are collected, one or more binarization thresholds are used for processing the images to be detected of each frame to obtain the first object detection frame set, and finally the first object detection frame set is obtained based on the first object lifting detection frame set of the multi-frame images to be detected to obtain the residue detection result, so that the influence interference of the illumination condition and the scene (such as different positions and complex scene of the residue) on the detection rate and the detection accuracy can be reduced based on the multi-frame images to be detected and the single binarization threshold or the multiple binarization thresholds, namely the multi-frame multi-threshold or the multi-frame single threshold, and the detection stability is better.

In some possible implementations of the first aspect, determining the first set of target detection boxes from the first union of detection boxes includes:

Determining an intersection ratio IOU value between the first detection frame and each second detection frame for each first detection frame; if the IOU value is greater than the first threshold value, determining that the second detection frame is an intersecting detection frame of the first detection frame; determining the number of intersecting detection frames, wherein the second detection frame is a first detection frame and is a first detection frame except the current first detection frame;

For each first detection frame, determining that the first detection frame is a first target detection frame when the number of the intersected detection frames is greater than a preset number;

And obtaining a first target detection frame set according to at least one first target detection frame.

In this implementation, intersecting detection boxes of each first detection box are determined by the IOU value, and a first target detection box, which is a legacy object or a logistics item, is determined based on the number of intersecting detection boxes. Thus, the stability of the in-vehicle legacy detection can be further improved.

In some possible implementations of the first aspect, determining the first set of target detection boxes from the first union of detection boxes further includes:

When the number of the intersecting detection frames is smaller than or equal to the preset number, determining that the first detection frames are second target detection frames, and obtaining a second target detection frame set according to at least one second target detection frame;

After determining the first set of target detection frames from the first set of detection frames, the method further comprises:

Cutting out a first image area and a second image area from each frame of to-be-detected image according to the first target detection frame set and the second target detection frame set, wherein the first image area is an image area where the first target detection frame in the to-be-detected image is located, and the second image area is an image area where the second target detection frame in the to-be-detected image is located;

Respectively inputting the first image area and the second image area into a pre-trained classification model to obtain a classification result output by the classification model, wherein the classification result is used for representing the probability that the first image area or the second image area contains the carryover;

And obtaining a final residue detection result according to the classification result.

In the implementation mode, after the legacy detection result is obtained based on multiple frames of multiple thresholds or multiple frames of single thresholds, the legacy detection result is further filtered and screened through a pre-trained classification model, so that the detection accuracy is further improved.

In some possible implementations of the first aspect, the training process of the classification model includes:

Acquiring at least two frames of training sample images;

Aiming at each frame of training sample image, converting the training sample image into a second gray level image, and carrying out differential processing on the second gray level image and the gray level image of a second initial reference frame to obtain a second differential image; performing binarization processing on the second differential image by using a second preset threshold value to obtain a second binarized image; performing contour detection on the second binarized image to obtain a second object detection frame set; the second preset threshold value of each frame of training sample image is different or the same, the training sample image and the second initial reference frame are both in-vehicle environment images, and the acquisition time of the training sample image is later than that of the second initial reference frame;

a second detection frame union set is obtained for a second detection frame set of each frame of training sample image, and the second detection frame union set comprises a plurality of second detection frames;

Determining a third target detection frame set and a fourth target detection frame set from the second detection frame union, wherein the third target detection frame set comprises at least one third target detection frame, the fourth target detection frame set comprises at least one fourth target detection frame, the third target detection frame is a second detection frame with the number of the intersected detection frames being larger than the preset number, and the fourth target detection frame is a second detection frame with the number of the intersected detection frames being smaller than or equal to the preset number;

Cutting out a third image area and a fourth image area from each frame of to-be-detected image according to the third target detection frame set and the fourth target detection frame set, wherein the third image area is an image area where the third target detection frame is positioned in the training sample image, and the fourth image area is an image area where the fourth target detection frame is positioned in the training sample image;

And taking the third image area as a positive sample and the fourth image area as a negative sample, and performing model training on the pre-constructed classification model to obtain a classification model after training.

In the implementation mode, the method for detecting the left object based on multiple frames of multiple thresholds or multiple frames of single thresholds automatically marks the left object image area and the non-left object area in the training sample image, uses the positive and negative samples after automatic marking to train the classification model, does not consume a great deal of manpower and material resources to label the samples,

In some possible implementations of the first aspect, contour detection is performed on the first binarized image to obtain a first set of object detection frames, including:

Contour detection is carried out on the first binarized image, and a contour point set of each object in the first binarized image is obtained;

For each contour point set, determining the coordinates of a first position point and the coordinates of a second position point corresponding to the contour point set, and determining the coordinates of a first detection frame corresponding to the contour point set according to the coordinates of the first position point and the coordinates of the second position point;

And obtaining a first object detection frame set according to the coordinates of the first detection frame corresponding to each contour point set.

In a second aspect, an embodiment of the present application provides an in-vehicle legacy detection device, including:

the detection image acquisition module is used for acquiring at least two frames of images to be detected;

The first object detection module is used for converting the image to be detected into a first gray level image aiming at each frame of image to be detected, and carrying out differential processing on the first gray level image and the gray level image of a first initial reference frame to obtain a first differential image; performing binarization processing on the first differential image by using a first preset threshold value to obtain a first binarized image; performing contour detection on the first binarized image to obtain a first object detection frame set; the first preset threshold value of each frame of image to be detected is different or the same, the image to be detected and the first initial reference frame are both in-vehicle environment images, and the acquisition time of the image to be detected is later than that of the first initial reference frame;

The drawing and collecting module is used for drawing and collecting a first object detection frame set of each frame of image to be detected to obtain a first detection frame drawing and collecting, wherein the first detection frame drawing and collecting comprises a plurality of first detection frames;

The first determining module is used for determining a first target detection frame set from the first detection frame union set to obtain a legacy detection result, wherein the first target detection frame set comprises at least one first target detection frame, and the first target detection frame is a first detection frame with the number of intersecting detection frames being larger than the preset number.

In some possible implementations of the second aspect, the first determining module is further configured to: when the number of the intersecting detection frames is smaller than or equal to the preset number, determining that the first detection frames are second target detection frames, and obtaining a second target detection frame set according to at least one second target detection frame;

the apparatus further comprises:

the clipping module is used for clipping a first image area and a second image area from each frame of to-be-detected image according to the first target detection frame set and the second target detection frame set, wherein the first image area is an image area where the first target detection frame in the to-be-detected image is located, and the second image area is an image area where the second target detection frame in the to-be-detected image is located;

The classification filtering module is used for respectively inputting the first image area and the second image area into a pre-trained classification model to obtain a classification result output by the classification model, wherein the classification result is used for representing the probability that the first image area or the second image area contains the carryover;

and the final result determining module is used for obtaining a final legacy detection result according to the classification result.

In some possible implementations of the second aspect, the apparatus further includes a classification model training module for:

Acquiring at least two frames of training sample images;

In a third aspect, an embodiment of the present application provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing a method according to any one of the first aspects described above when the computer program is executed by the processor.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which when executed by a processor performs a method as in any one of the first aspects above.

In a fifth aspect, embodiments of the present application provide a computer program product for, when run on an electronic device, causing the electronic device to perform the method of any one of the first aspects.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for detecting in-vehicle carryover according to an embodiment of the present application;

FIG. 2 is a schematic block diagram of another flow chart of a method for detecting in-vehicle carryover according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a training process of a classification model according to an embodiment of the present application;

fig. 4 is a block diagram of a device for detecting carryover in a vehicle according to an embodiment of the present application;

fig. 5 is a schematic block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

The method for detecting the in-vehicle carryover provided by the embodiment of the application can be applied to electronic equipment such as vehicle-mounted monitoring equipment and the like, for example, a vehicle-mounted video analysis integrated machine. The embodiment of the application does not limit the specific type of the electronic equipment.

Referring to fig. 1, a flow chart of a method for detecting carryover in a vehicle according to an embodiment of the application may include the following steps:

Step S101, at least two frames of images to be detected are acquired.

The image to be detected is an in-vehicle environment image, that is, the in-vehicle environment is shot by the image acquisition device to obtain the image to be detected. The number of the images to be detected can be more than or equal to 2, namely, a plurality of frames of images to be detected are collected, and the specific number can be set according to actual needs. For example, N-2 frames, N-1 frames, and N frames of images are acquired as images to be detected during the running of the vehicle.

Step S102, converting the image to be detected into a first gray level image aiming at the image to be detected of each frame, and carrying out differential processing on the first gray level image and the gray level image of a first initial reference frame to obtain a first differential image; performing binarization processing on the first differential image by using a first preset threshold value to obtain a first binarized image; performing contour detection on the first binarized image to obtain a first object detection frame set; the first preset threshold values of the images to be detected of the frames are different or the same, the images to be detected and the first initial reference frame are all environment images in the vehicle, and the acquisition time of the images to be detected is later than that of the first initial reference frame.

The first initial reference frame is an in-vehicle environment image, that is, the in-vehicle environment is photographed by the image acquisition device to obtain the first initial reference frame. The acquisition time of the first initial reference frame and the image to be detected is different. Normally, an in-vehicle environment image is collected firstly to be used as a first initial reference frame, and the first initial reference frame is converted into a gray level image to be stored, or the first initial reference frame is directly stored; and acquiring a plurality of frames of images to be detected, and processing based on the images to be detected of each frame and the first initial reference frame.

For example, the first initial reference frame may be an empty image acquired in an empty state, which may refer to a state when there is no passenger in the vehicle. The image frame to be detected may be an image acquired during a certain period of time after the passenger gets off (for example, during the running of the vehicle after the passenger gets off).

In some embodiments, contour detection may be performed on the first binarized image first to obtain a set of contour points for each object in the first binarized image. It will be appreciated that the first binarized image may include one or more objects or objects, which may be living bodies (e.g. humans or animals), and that detection of the first binarized image by a contour detection algorithm may result in a set of contour points for each object. An object comprises a plurality of contour points, which form a set of contour points of the object. Of course, the first binarized image may not include an object or article, but a set of contour points may be obtained when contour detection is performed on the first binarized image for various reasons (e.g., algorithmic false detection).

After the contour point sets of the respective objects are obtained, coordinates of a first position point and coordinates of a second position point corresponding to the contour point sets may be determined for the respective contour point sets, and coordinates of a first detection frame corresponding to the contour point sets may be determined according to the coordinates of the first position point and the coordinates of the second position point. The first position point may be a position point of an upper left corner corresponding to the contour point set, the second position point may be a contour point of a lower right corner corresponding to the contour point set, that is, an upper left corner coordinate and a lower right corner coordinate corresponding to the contour point set are found, and then a rectangular frame coordinate of the contour point set is obtained according to the upper left corner coordinate and the lower right corner coordinate, so as to obtain a detection frame of the object.

Of course, the coordinates of the first position point and the second position point are obtained for determining the detection frame of the object, so the first position point and the second position point may not be the coordinates of the lower left corner and the lower right corner, but may be the coordinates of other positions. For example, the first location point may be a location point of a lower left corner corresponding to the set of contour points, and the second location point may be a location point of an upper right corner corresponding to the set of contour points.

After the first detection frames of each contour point set are obtained in this process, a first object detection frame set may be formed according to a plurality of first detection frames, where the first object detection set includes one or more first detection frames.

For each frame of image to be detected, the first detection frames of all objects in the image to be detected are obtained through the process, and then a first object detection frame set of the image is obtained.

It should be noted that, in the process of performing binarization processing on each frame of image to be detected, the binarization threshold value of each frame of image to be detected may be the same or different. If the images are different, a plurality of different binarization thresholds, namely a plurality of multi-frame multi-threshold values, are corresponding to the multi-frame images to be detected; if the two threshold values are the same, the multiple frames of images to be detected correspond to the same binarization threshold value, namely the multiple frames of single threshold values.

Compared with Shan Zhenshan thresholds, namely based on a single-frame image to be detected and a single binarization threshold, whether the image is multi-frame multi-threshold or multi-frame single-threshold, the stability of detecting the carryover in the vehicle can be improved. Specifically, the remnant detection algorithm based on a single frame and a single threshold value is easily interfered by various complex scenes such as different illumination conditions, different positions of the remnant, types of the remnant, different vehicle models and the like, so that the detection rate and the stability of the detection accuracy are poor. Based on multiple frames of multiple thresholds or multiple frames of single thresholds, the interference of multiple complex scenes such as different illumination conditions, different positions of the left objects, types of the left objects and different vehicle models can be reduced, and the stability of the detection rate and the detection accuracy is better.

Further, the detection rate of the multi-frame single threshold is higher than that of the multi-frame multi-threshold, but the false alarm rate (or called false alarm rate) of the multi-frame single threshold is higher than that of the multi-frame multi-threshold, namely the detection accuracy of the multi-frame single threshold is higher than that of the multi-frame multi-threshold. Thus, the multi-frame multi-threshold can be a better balance between false detection rate and detection rate.

For example, firstly, an image of an empty state is collected as an initial reference frame I _f, and the initial reference frame I _f is converted into a gray scale map for storage; and in the running process of the vehicle, acquiring N-2, N-1 and N frames of in-vehicle environment images as images to be detected. Taking an N frame to-be-detected image as an example, converting the to-be-detected image into a gray level image I _N; carrying out image differential processing on the I _N and the I _f based on opencv to obtain a differential image I _d; and (3) carrying out binarization processing on the differential image I _d by using a binarization threshold value theta ₁ to obtain a binarized result R ₁, namely obtaining a first binarized image. Performing an opencv-based contour detection process on the first binarized image R ₁ to obtain contour point sets (for example, a contour point set C ₁, a contour point set C ₂, etc.) of different objects; for each contour point set, the upper left corner coordinate and the lower right corner coordinate corresponding to the contour point set are found, and then the rectangular detection frames S _N of different objects are obtained. For example, rectangular detection frame coordinates corresponding to the contour point set C ₁ and the contour point set C ₂ may be obtained as follows: [ X ₁、y₁、x₂、y₂],[x₃、y₃、x₄、y₄ ].

It will be appreciated that the processing of the N-2 th frame to-be-detected image and the N-1 th frame to-be-detected image may be similar to that of the N-th frame to-be-detected image above. The N-1 frame of image to be detected is subjected to binarization processing by using a binarization threshold value theta ₂, and rectangular detection frame coordinates of different objects of the frame of image are obtained as S _N-1; and carrying out binarization processing on the N-2 frame image to be detected by using a binarization threshold value theta ₃ to obtain rectangular detection frame coordinates of different objects of the frame image as S _N-2. Of course, the binarization threshold of the three frames of images to be detected may be the same.

Step S103, a first object detection frame set of each frame of the image to be detected is obtained to obtain a first detection frame union, and the first detection frame union comprises a plurality of first detection frames.

Illustratively, a first object detection frame set of the N-2 th frame, the N-1 th frame, and the N-th frame is obtained, and the first object detection frame set of the three-frame image is subjected to a union-taking process, that is, s=s _N+S_N-1+S_N-2.

Step S104, determining a first target detection frame set from the first detection frame union set to obtain a legacy detection result, wherein the first target detection frame set comprises at least one first target detection frame, and the first target detection frame is a first detection frame with the number of intersecting detection frames being greater than a preset number.

The first target detection frame may be a detection frame of the legacy, that is, the first target detection frame set is obtained, that is, a legacy detection result may be obtained.

The intersecting detection frame refers to a detection frame intersecting with the current detection frame. For example, the first detection frames are combined and four first detection frames including a detection frame 1, a detection frame 2, a detection frame 3 and a detection frame 4 are integrated; for each first detection frame, an intersection detection frame of the first detection frame is calculated. For example, for detection frame 1, it can be determined by calculation that detection frame 2, detection frame 3, and detection frame 4 all intersect it, i.e., detection frame 2, detection frame 3, and detection frame 4 are all intersecting detection frames of detection frame 1; for detection frame 2, the general calculation may determine that only detection frame 1 intersects it, i.e. detection frame 1 is the intersecting detection frame of detection frame 2.

In some embodiments, it may be determined whether the two detection boxes intersect by the intersection ratio ((Intersection over Union, IOU) between the detection boxes.

In the embodiment of the present application, iou= (intersection area)/(union area). Intersection area refers to the area of the intersection of two detection frames, and union area is the combined area of the two detection frames. The IOU value ranges from 0 to 1, and when the IOU value is equal to 0, the two detection frames are not overlapped, and when the IOU value is equal to 1, the two detection frames are overlapped.

After the first union of detection frames is obtained, all of the first detection frames in the first union of detection frames are traversed. For each first detection box, determining an IOU value between the first detection box and each second detection box. The second detection frame is a first detection frame in the first detection frame union except the current first detection frame, for example, the first detection frame union comprises a detection frame 1, a detection frame 2, a detection frame 3 and a detection frame 4, and the current first detection frame is the detection frame 1, and the detection frame 2, the detection frame 3 and the detection frame 4 are all taken as the second detection frame; if the current first detection frame is detection frame 2, detection frame 1, detection frame 3 and detection frame 4 are all second detection frames.

If the IOU value of the two detection frames is larger than a first threshold value, the two detection frames are considered to be intersected, namely, a second detection frame corresponding to the IOU value is determined to be intersected with the current first detection frame, and the second detection frame is the intersected detection frame of the current first detection frame; conversely, if the IOU value of the two detection boxes is less than or equal to the first threshold, then the two detection boxes are considered disjoint. The first threshold may be set according to practical application requirements, and is not limited herein.

According to the process, for each first detection frame, the number of detection frames intersected with the first detection frame is determined according to the magnitude of the IOU value, namely the number of intersected detection frames of the first detection frame is counted. The intersecting detection box refers to a first detection box with an IOU value greater than a first threshold with the current detection box.

After the number of the intersecting detection frames of the first detection frames is determined and collected, the first target detection frames are determined according to the number of the intersecting detection frames. If the number of the intersecting detection frames of a certain first detection frame is larger than the preset number, the first detection frame is considered to be a first target detection frame; otherwise, if the number of intersecting detection frames of a certain first detection frame is smaller than or equal to the preset number, the first detection frame is considered to be a second target detection frame. The preset number may be set according to actual application requirements, and is not limited herein, and for example, the preset number may be set to 2.

After the plurality of first target detection frames are obtained, the plurality of first target detection frames may then constitute a first set of target detection frames. Since the first target detection frame is a detection frame of the legacy, after the first target detection frame set is obtained, it can be considered that the legacy detection result is obtained.

In the above process, the intersecting detection frames of each first detection frame are determined by the IOU value, and the first target detection frame, which is a legacy object or a logistics object, is determined based on the number of intersecting detection frames. Thus, the stability of the in-vehicle legacy detection can be further improved.

From the above, the embodiment of the application processes based on the multi-frame image to be detected and the single binary threshold or multiple binary thresholds, namely the multi-frame multi-threshold or multi-frame single threshold, so that the influence and interference of illumination conditions and scenes (such as different positions of the legacy and complex scenes) and the like on the detection rate and the detection accuracy can be reduced, and the detection stability is better.

In other embodiments, the first target detection frame may be a detection frame of the background area, that is, the background area is mistakenly identified as a carryover, which may make the accuracy of detecting the carryover in the vehicle lower.

In order to further improve the accuracy of the in-vehicle remnants, after the first target detection frame set is obtained, the first target detection frame set is not used as a final remnants detection result, but is based on the first target detection frame set, and further judgment is performed through a classification model.

Referring to fig. 2, for example, another flow chart of a method for detecting in-vehicle carryover according to an embodiment of the present application may include the following steps:

Step S201, at least two frames of images to be detected are acquired.

Step S202, converting an image to be detected into a first gray scale image aiming at each frame of the image to be detected, and carrying out differential processing on the first gray scale image and a gray scale image of a first initial reference frame to obtain a first differential image; performing binarization processing on the first differential image by using a first preset threshold value to obtain a first binarized image; performing contour detection on the first binarized image to obtain a first object detection frame set; the first preset threshold values of the images to be detected of the frames are different or the same, the images to be detected and the first initial reference frame are all environment images in the vehicle, and the acquisition time of the images to be detected is later than that of the first initial reference frame.

Step 203, a union set is taken for a first object detection frame set of each frame of the image to be detected, so as to obtain a first detection frame union set, where the first detection frame union set includes a plurality of first detection frames.

Step S204, determining a first target detection frame set and a second target detection frame set from a first detection frame union set, wherein the first target detection frame set comprises at least one first target detection frame, and the first target detection frame is a first detection frame with the number of intersecting detection frames being larger than the preset number; the second target detection frame set comprises at least one second target detection frame, and the second target detection frame is a first detection frame with the number of the intersecting detection frames being smaller than or equal to the preset number.

It should be understood that steps S201 to S204 are similar to those described in fig. 1, and detailed description is omitted herein.

It should be noted that, in step S204, in addition to the first set of target detection frames, a second set of target detection frames needs to be determined. Specifically, after the number of intersecting detection frames of the first detection frames in the union set of the first detection frames is determined, when the number of intersecting detection frames is less than or equal to a preset number, the first detection frames are determined to be second target detection frames, and a second target detection frame set is obtained according to at least one second target detection frame.

Step S205, according to the first target detection frame set and the second target detection frame set, a first image area and a second image area are cut out from the to-be-detected images of each frame, where the first image area is an image area where the first target detection frame is located in the to-be-detected image, and the second image area is an image area where the second target detection frame is located in the to-be-detected image.

The first target detection frame may be considered as a detection frame of a carryover and the second target detection frame may be considered as a detection frame of a background area. Based on the first target detection frame, a first image area can be cut out from the image to be detected, wherein the first image area is an image area where the first target detection frame, namely, the legacy is located. The first image area may refer to an image area containing carryover.

Similarly, based on the second target detection frame, a second image area, which is an image area that does not include the carryover, may be cut out from the image to be detected.

It can be understood that the background area may be mistakenly identified as a carryover, and the carryover area may be mistakenly identified as the background area, so that the first image area and the second image area may be cut out, and whether the first image area and the second image area include the image area of the carryover may be judged again, so as to further improve the accuracy of detecting the carryover.

Step S206, the first image area and the second image area are respectively input into a pre-trained classification model, a classification result output by the classification model is obtained, and the classification result is used for representing the probability that the first image area or the second image area contains the carryover.

Step S207, obtaining a final residue detection result according to the classification result.

In general, when the probability value represented by the classification result is greater than the preset probability threshold value, the image area is considered to be an image area containing the carryover, namely considered to be the carryover; otherwise, when the probability value represented by the classification result is smaller than or equal to the preset probability threshold value, the image area is considered to be not the image area containing the carryover, namely, the image area is considered to be not the carryover and is considered to be the background area. According to the process, for each first image area and each second image area, a classification model is used for filtering, and whether the first image area and the second image area are image areas containing the carryover is judged, so that a final carryover detection result is obtained.

In the embodiment of the application, the first target detection frame set and the second target detection frame set are obtained based on multiple frames and multiple thresholds or multiple frames and a single threshold, so that the stability of detecting the legacy is improved, and the detection accuracy is further improved by further filtering and screening the detection result of the legacy through a pre-trained classification model.

The training process of classification models requires the use of large-scale labeled datasets. To enable training of classification models, it is often necessary to create a dataset containing different scenarios, different carryover in the vehicle, and manually annotate the dataset. But this consumes a lot of manpower and material resources.

Based on the above, the embodiment of the application also provides a training method of the classification model, so as to reduce the consumption of manpower and physics.

Illustratively, referring to the schematic diagram of the classification model training process provided by the embodiment of the present application shown in fig. 3, the process may include the following steps:

step S301, at least two frames of training sample images are acquired.

Step S302, aiming at each frame of training sample image, converting the training sample image into a second gray level image, and carrying out differential processing on the second gray level image and a gray level image of a second initial reference frame to obtain a second differential image; performing binarization processing on the second differential image by using a second preset threshold value to obtain a second binarized image; performing contour detection on the second binarized image to obtain a second object detection frame set; the second preset threshold value of each frame of training sample image is different or the same, the training sample image and the second initial reference frame are both in-vehicle environment images, and the acquisition time of the training sample image is later than that of the second initial reference frame.

The second initial reference frame is an in-vehicle environment image of an empty vehicle state, and the training sample image is an in-vehicle environment image acquired during running of the vehicle.

The second preset threshold value can be set according to actual needs. The second preset threshold value of each frame of training sample image can be the same, namely a multi-frame single threshold value; or may be different, i.e., multiple frames multiple thresholds.

Step S303, a second detection frame union set is obtained for the second object detection frame set of each frame of training sample image, and the second detection frame union set comprises a plurality of second detection frames.

Step S304, determining a third target detection frame set and a fourth target detection frame set from the second detection frame union, wherein the third target detection frame set comprises at least one third target detection frame, the fourth target detection frame set comprises at least one fourth target detection frame, the third target detection frame is a second detection frame with the number of intersecting detection frames being larger than the preset number, and the fourth target detection frame is a second detection frame with the number of intersecting detection frames being smaller than or equal to the preset number.

The third target detection frame may be a detection frame of a legacy, and the fourth target detection frame may be a detection frame of a background area. In a specific application, the intersecting detection frames of the second detection frames may be determined based on the IOU value. Specific procedures can be found above and will not be described here.

Step S305, according to the third target detection frame set and the fourth target detection frame set, a third image area and a fourth image area are cut out from the images to be detected of each frame, where the third image area is an image area where the third target detection frame is located in the training sample image, and the fourth image area is an image area where the fourth target detection frame is located in the training sample image.

It should be noted that, the specific process from step S301 to step S305 is a multi-frame single-threshold or multi-frame multi-threshold, similar to the specific process from step S201 to step S205 in fig. 1 and 2, and the specific process may refer to the relevant content above, and will not be repeated here.

And step S306, taking the third image area as a positive sample and the fourth image area as a negative sample, and performing model training on the pre-constructed classification model to obtain a trained classification model.

Illustratively, the third set of target detection boxes is S _RNMS+ and the fourth set of target detection boxes is S _RNMS-. According to S _RNMS+ and S _RNMS-, an image area containing the remnants and an image area not containing the remnants are cut out from each frame of training sample image; taking the image area containing the carryover as a positive sample of model training, and taking the image area without the carryover as a negative sample of model training; training the classification model according to the positive sample and the negative sample to obtain a trained classification model.

In the embodiment of the application, the method for detecting the left-behind objects based on multiple frames of multiple thresholds or multiple frames of single thresholds automatically marks the left-behind object image area and the non-left-behind object area in the training sample image, and uses the positive and negative samples after automatic marking to train the classification model, so that a great deal of manpower and material resources are not consumed for marking the samples.

It should be understood that the sequence numbers of the steps in the above embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present application.

Corresponding to the in-vehicle left article detection method described in the above embodiments, fig. 4 shows a block diagram of the in-vehicle left article detection device according to the embodiment of the present application, and for convenience of explanation, only the portions related to the embodiment of the present application are shown.

Referring to fig. 4, the apparatus includes:

a detection image acquisition module 41, configured to acquire at least two frames of images to be detected;

The first object detection module 42 is configured to convert, for each frame of an image to be detected, the image to be detected into a first gray scale image, and perform differential processing on the first gray scale image and a gray scale image of a first initial reference frame to obtain a first differential image; performing binarization processing on the first differential image by using a first preset threshold value to obtain a first binarized image; performing contour detection on the first binarized image to obtain a first object detection frame set; the first preset threshold value of each frame of image to be detected is different or the same, the image to be detected and the first initial reference frame are both in-vehicle environment images, and the acquisition time of the image to be detected is later than that of the first initial reference frame;

A union taking module 43, configured to take a union of a first object detection frame set of each frame of an image to be detected, to obtain a first detection frame union, where the first detection frame union includes a plurality of first detection frames;

the first determining module 44 is configured to determine a first target detection frame set from the first detection frame union set to obtain a legacy detection result, where the first target detection frame set includes at least one first target detection frame, and the first target detection frame is a first detection frame with a number of intersecting detection frames greater than a preset number.

In some possible implementations, the first determining module 44 is further configured to: when the number of the intersecting detection frames is smaller than or equal to the preset number, determining that the first detection frames are second target detection frames, and obtaining a second target detection frame set according to at least one second target detection frame;

The apparatus further comprises: the clipping module is used for clipping a first image area and a second image area from each frame of to-be-detected image according to the first target detection frame set and the second target detection frame set, wherein the first image area is an image area where the first target detection frame in the to-be-detected image is located, and the second image area is an image area where the second target detection frame in the to-be-detected image is located;

In some possible implementations, the apparatus further includes a classification model training module to:

Acquiring at least two frames of training sample images;

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be found in the method embodiment section, and will not be described herein.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: at least one processor 50 (only one is shown in fig. 5), a memory 51, and a computer program 52 stored in the memory 51 and executable on the at least one processor 50, the processor 50 implementing the steps in any of the various in-vehicle carryover detection method embodiments described above when executing the computer program 52.

The electronic device 5 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The electronic device may include, but is not limited to, a processor 50, a memory 51. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not meant to be limiting of the electronic device 5, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The Processor 50 may be a central processing unit (Central Processing Unit, CPU), the Processor 50 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 51 may in some embodiments be an internal storage unit of the electronic device 5, such as a hard disk or a memory of the electronic device 5. The memory 51 may also be an external storage device of the electronic device 5 in other embodiments, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the electronic device 5. The memory 51 is used for storing an operating system, application programs, boot loader (BootLoader), data, other programs, etc., such as program codes of the computer program. The memory 51 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

The embodiment of the application also provides electronic equipment, which comprises: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, which when executed by the processor performs the steps of any of the various method embodiments described above.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product which, when run on an electronic device, causes the electronic device to perform steps that may be carried out in the various method embodiments described above.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other manners. For example, the apparatus/electronic device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method for detecting in-vehicle carryover, the method comprising:

Acquiring at least two frames of images to be detected;

Converting the image to be detected into a first gray level image aiming at the image to be detected of each frame, and carrying out differential processing on the first gray level image and a gray level image of a first initial reference frame to obtain a first differential image; performing binarization processing on the first differential image by using a first preset threshold value to obtain a first binarized image; performing contour detection on the first binarized image to obtain a first object detection frame set; the first preset threshold value of the image to be detected of each frame is different or the same, the image to be detected and the first initial reference frame are both in-vehicle environment images, and the acquisition time of the image to be detected is later than that of the first initial reference frame;

A union set is taken for the first object detection frame set of the image to be detected of each frame, and a first detection frame union set is obtained, wherein the first detection frame union set comprises a plurality of first detection frames;

And determining a first target detection frame set from the first detection frame union set to obtain a legacy detection result, wherein the first target detection frame set comprises at least one first target detection frame, and the first target detection frames are first detection frames with the number of intersecting detection frames being greater than a preset number.

2. The method of claim 1, wherein determining a first set of target detection frames from the first union of detection frames comprises:

Determining an intersection ratio IOU value between the first detection frame and each second detection frame for each first detection frame; if the IOU value is greater than a first threshold value, determining that the second detection frame is an intersecting detection frame of the first detection frame; determining the number of the intersecting detection frames, wherein the second detection frames are the first detection frames and are concentrated into first detection frames except the current first detection frame;

For each first detection frame, when the number of the intersecting detection frames is greater than the preset number, determining that the first detection frame is the first target detection frame;

and obtaining the first target detection frame set according to at least one first target detection frame.

3. The method of claim 2, wherein determining a first set of target detection frames from the first union of detection frames, further comprises:

after determining the first set of target detection frames from the first set of detection frames union, the method further comprises:

Cutting out a first image area and a second image area from the images to be detected according to the first target detection frame set and the second target detection frame set, wherein the first image area is an image area where the first target detection frame is located in the images to be detected, and the second image area is an image area where the second target detection frame is located in the images to be detected;

4. The method of claim 3, wherein the training process of the classification model comprises:

Acquiring at least two frames of training sample images;

Converting the training sample image into a second gray level image aiming at each frame of the training sample image, and carrying out differential processing on the second gray level image and a gray level image of a second initial reference frame to obtain a second differential image; performing binarization processing on the second differential image by using a second preset threshold value to obtain a second binarized image; performing contour detection on the second binarized image to obtain a second object detection frame set; the second preset threshold value of the training sample image of each frame is different or the same, the training sample image and the second initial reference frame are both in-vehicle environment images, and the acquisition time of the training sample image is later than that of the second initial reference frame;

a second detection frame union is obtained for the second object detection frame set of the training sample image of each frame, and the second detection frame union comprises a plurality of second detection frames;

Determining a third target detection frame set and a fourth target detection frame set from the second detection frame union, wherein the third target detection frame set comprises at least one third target detection frame, the fourth target detection frame set comprises at least one fourth target detection frame, the third target detection frame is a second detection frame with the number of intersecting detection frames being larger than the preset number, and the fourth target detection frame is a second detection frame with the number of intersecting detection frames being smaller than or equal to the preset number;

Cutting out a third image area and a fourth image area from the images to be detected according to the third target detection frame set and the fourth target detection frame set, wherein the third image area is an image area where the third target detection frame is located in the training sample image, and the fourth image area is an image area where the fourth target detection frame is located in the training sample image;

and taking the third image area as a positive sample, and taking the fourth image area as a negative sample, and performing model training on a pre-constructed classification model to obtain the trained classification model.

5. The method of claim 1, wherein contour detecting the first binarized image to obtain a first set of object detection frames comprises:

performing contour detection on the first binarized image to obtain a contour point set of each object in the first binarized image;

for each contour point set, determining coordinates of a first position point and a second position point corresponding to the contour point set, and determining coordinates of a first detection frame corresponding to the contour point set according to the coordinates of the first position point and the coordinates of the second position point;

And obtaining the first object detection frame set according to the coordinates of the first detection frames corresponding to the contour point sets.

6. An in-vehicle legacy detection device, comprising:

The first object detection module is used for converting the image to be detected into a first gray level image aiming at the image to be detected of each frame, and carrying out differential processing on the first gray level image and a gray level image of a first initial reference frame to obtain a first differential image; performing binarization processing on the first differential image by using a first preset threshold value to obtain a first binarized image; performing contour detection on the first binarized image to obtain a first object detection frame set; the first preset threshold value of the image to be detected of each frame is different or the same, the image to be detected and the first initial reference frame are both in-vehicle environment images, and the acquisition time of the image to be detected is later than that of the first initial reference frame;

The union taking module is used for taking union of the first object detection frame sets of the images to be detected of each frame to obtain a first detection frame union, and the first detection frame union comprises a plurality of first detection frames;

The first determining module is configured to determine a first target detection frame set from the first detection frame union set to obtain a legacy detection result, where the first target detection frame set includes at least one first target detection frame, and the first target detection frame is a first detection frame with the number of intersecting detection frames being greater than a preset number.

7. The apparatus of claim 6, wherein,

The first determination module is further configured to: when the number of the intersecting detection frames is smaller than or equal to the preset number, determining that the first detection frames are second target detection frames, and obtaining a second target detection frame set according to at least one second target detection frame;

the apparatus further comprises:

the clipping module is configured to clip a first image area and a second image area from the to-be-detected image according to the first target detection frame set and the second target detection frame set, where the first image area is an image area where the first target detection frame is located in the to-be-detected image, and the second image area is an image area where the second target detection frame is located in the to-be-detected image;

8. The apparatus of claim 7, further comprising a classification model training module to:

Acquiring at least two frames of training sample images;

9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 5.