CN114926503A - Object detection method, device, equipment and medium - Google Patents
Object detection method, device, equipment and medium Download PDFInfo
- Publication number
- CN114926503A CN114926503A CN202210601744.3A CN202210601744A CN114926503A CN 114926503 A CN114926503 A CN 114926503A CN 202210601744 A CN202210601744 A CN 202210601744A CN 114926503 A CN114926503 A CN 114926503A
- Authority
- CN
- China
- Prior art keywords
- image
- target object
- target
- comparison
- position area
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/40—Filling a planar surface by adding surface attributes, e.g. colour or texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/254—Analysis of motion involving subtraction of images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20224—Image subtraction
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The application provides an object detection method, device, equipment and medium, wherein the method comprises the following steps: comparing a first image to be detected with a preset background model to obtain a comparison graph; determining a position area and a position parameter of a target object in the comparison graph; determining effective pixel values of each position area in the first image; filling the position area of the target object in the comparison graph by using the effective pixel value, and expanding the outline of the filled position area until all the position areas are filled and covered by the filling to obtain the target data with the expanded outline; replacing the position area of the target object in the first image according to the target data to obtain an optimized second image; the second image is subjected to down-sampling processing to obtain a feature map, detection is performed according to the feature map, and a detection result of the target object is determined, so that the occupied area of the outline of the target object and the image pixels is increased, and the detection rate of the target object is greatly increased; the accuracy of detection is also improved.
Description
Technical Field
The present application relates to the field of video monitoring or image processing, and in particular, to an object detection method, apparatus, device, and medium.
Background
The object detection technology is a basis of the computer vision technology, and can detect various target objects such as human figures, animals, or objects included in an image. In practical applications, the target detection technology can be applied to target detection of many scenes, and generally locates a target object in an image and assigns a corresponding tag to the target object.
However, when the target detects an object in the image, the resolution of the acquired input image needs to be sampled, which results in a resolution reduction to 2MP (megapixel) or even lower, and a tiny object or a tiny moving object becomes smaller after the resolution reduction, so that the tiny object or the tiny moving object cannot be accurately detected.
Content of application
In view of the above-mentioned shortcomings of the prior art, the present application provides an object detection method, device, apparatus and medium to solve the above-mentioned technical problems.
The application provides an object detection method, which comprises the following steps:
comparing a first image to be detected with a preset background model to obtain a comparison graph, wherein the first image comprises a target object to be detected;
determining a position area of the target object and a position parameter of the target object in the comparison graph;
determining effective pixel values of the position areas in the first image according to the position areas of the target object;
filling a position area of the target object in the comparison graph by using the effective pixel value, and expanding the outline of the filled position area until all the position areas are filled and covered to obtain the target data with the expanded outline;
replacing the position area of the target object in the first image according to the target data to obtain an optimized second image;
and performing down-sampling processing on the second image to obtain a characteristic diagram, detecting according to the characteristic diagram, and determining the detection result of the target object.
In a possible embodiment, the image comparison of the first image to be detected and a preset background model to obtain a comparison diagram includes:
acquiring a first image of continuous multiple frames, and converting the first image into a gray scale image;
comparing the gray level image with a preset background model, and determining the gray level comparison image according to a comparison difference value;
and carrying out scaling pretreatment on the target object of each frame of the gray comparison image by using a preset gray threshold value to obtain a comparison image.
In a possible implementation manner, before the image comparing the first image to be detected with the preset background model to obtain the comparison map, the method further includes:
quantizing the background by using a Gaussian probability density function, fitting each pixel point by using a plurality of Gaussian distributions, and constructing a background model aiming at the used background scene, wherein the background model is used for performing image comparison to determine a target object; or the like, or, alternatively,
and performing difference operation on the current frame image and the background image by using an inter-frame difference algorithm according to the correlation between two adjacent frames of images and taking the previous frame image as the current background image to obtain a background model for detecting the target object.
In a possible embodiment, before determining the position area of the target object and the position parameter of the target object in the comparison map, the method further includes:
and denoising the comparison graph, and filtering noise objects in the comparison graph, which are lower than a preset minimum detection size, to obtain the comparison graph after denoising.
In a possible implementation manner, the position area includes position information of an outer frame area formed around the target object, the position parameters include a center coordinate, a length and a width of the position area, and the target object is in one-to-one correspondence with the position area of the comparison map in the first image.
In a possible implementation manner, the filling a position area in the comparison map where the target object is located by using the effective pixel value, and expanding a contour of the filled position area until all the position areas are filled and covered by the filling, to obtain target data with an expanded contour, includes:
filling position areas in the comparison graph according to the effective pixel values, wherein the effective pixel values are determined according to the pixel mean value of each position area of the first image;
after the filling is finished, the outline of the filled position area is expanded according to a preset amplification ratio, and the expanded position area is continuously filled with the effective pixel values;
and obtaining target data with enlarged outline in the comparison graph until all position areas are filled and covered.
In a possible implementation manner, the replacing the position area where the target object is located in the first image according to the target data to obtain an optimized second image includes:
determining target data of the target object in the enlarged outline of the comparison graph according to the position area of the target object in the first image; and replacing the position areas in the first image by the target data corresponding to the target object, and replacing the position areas corresponding to all the target objects in the first image one by one to form an optimized second image.
In a possible implementation manner, the replacing, one by one, the position areas corresponding to all the target objects in the first image to form the optimized second image includes:
if any two target objects in the second image have the situation of cross coincidence of position areas after replacing the target data, comparing the degrees of spaciousness around the two cross coincident target objects, selecting one target object with the better degree of spaciousness as the target to be moved until the two target objects have no cross coincidence, and recording the position information of the target object to be moved before moving.
The present application further provides an object detection device, the device comprising:
the image comparison module is used for carrying out image comparison on a first image to be detected and a preset background model to obtain a comparison image, wherein the first image comprises a target object to be detected;
the position determining module is used for determining a position area of the target object and a position parameter of the target object in the comparison map;
the pixel value determining module is used for determining effective pixel values of all the position areas in the first image according to the position areas of the target object;
the optimization processing module is used for filling a position area of the target object in the comparison graph by using the effective pixel value, expanding the outline of the filled position area until all the position areas are filled and covered, and obtaining the target data with the expanded outline;
the target replacement module is used for replacing the position area of the target object in the first image according to the target data to obtain an optimized second image;
and the object detection module is used for performing down-sampling processing on the second image to obtain a characteristic diagram, detecting according to the characteristic diagram and determining the detection result of the target object.
The application also provides an electronic device comprising a processor, a memory and a communication bus;
the communication bus is used for connecting the processor and the memory;
the processor is configured to execute the computer program stored in the memory to implement the method according to any one of the embodiments described above.
The present application also provides a computer-readable storage medium having stored thereon a computer program,
the computer program is for causing a computer to perform a method as in any one of the embodiments described above.
The beneficial effect of this application: according to the method, a first image is compared with a preset background model to obtain a comparison graph, a position area, position parameters and effective pixel values corresponding to a target object are obtained, the position area formed by the target object in the comparison graph is filled and the outline of the position area is expanded to obtain the object data of the expanded outline, and the position area where the target object is located in the first image is replaced by the object data; the method has the advantages that the optimized second image of the target object is obtained, then the target detection is carried out on the second image, and the detection result of the target object is determined; on the other hand, the accuracy of intelligent detection is also improved.
Drawings
Fig. 1 is a schematic diagram of an application environment of an implementation of an object detection method provided in an embodiment of the present application;
FIG. 2 is a flow chart of an object detection method provided in an embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for detecting an object according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a target object optimization process provided in an embodiment of the present application;
FIG. 5 is a block diagram of an object detection device provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 7 is a comparison diagram between an image to be detected and a background model according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a predetermined minimum inspection size comparison according to an embodiment of the present application;
fig. 9 is a comparison diagram of noise reduction processing according to an embodiment of the present application;
FIG. 10 is a comparison chart of location area determination provided by an embodiment of the present application;
FIG. 11 is a schematic illustration of a target object fill provided by an embodiment of the present application;
fig. 12 is an enlarged schematic view of a contour of a target object according to an embodiment of the present application.
Detailed Description
The following embodiments of the present application are described by specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure of the present application. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It should be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present application, and the drawings only show the components related to the present application and are not drawn according to the number, shape and size of the components in actual implementation, and the type, number and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
In the following description, numerous details are set forth to provide a more thorough explanation of the embodiments of the present application, however, it will be apparent to one skilled in the art that the embodiments of the present application may be practiced without these specific details, and in other embodiments, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring the embodiments of the present application.
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence base technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, blockchains, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) is a science for researching how to make a machine see, and further means that a camera and a Computer are used for replacing human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or is transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
The application provides an object detection method, and relates to the technical fields of artificial intelligence, machine learning and the like. For example, the target detection of the image to be detected based on the model obtained by training can be realized by using the technologies of machine simulation, cloud computing and the like in the artificial intelligence technology, for example, the image to be detected is divided into a plurality of slices by using the target model; extracting pixel point characteristics and context characteristics of each slice by using a characteristic extraction layer of the target model; and detecting small targets in the high-resolution image by using the feature map of the obtained slice. Of course, the above machine learning technique may also be used to perform reinforcement learning on the initial model by using the sample set, so as to obtain a more robust target model.
In the related technology, the motion detection of the minimum 8 × 8 pixel block under the condition of the highest 4K resolution can be realized, and meanwhile, the motion detection of the minimum 4 × 4 pixel block under the condition of the 4MP resolution is realized, so that the detection function of the motion of small objects in some monitoring scenes is improved, specific services of users, such as high altitude parabolic alarm and other services, are realized, the function of detecting the small thrown objects is realized, the service detection rate is higher, and the alarm is more accurate and is not missed.
Under the condition of 4MP resolution, the moving object detection algorithm requires that the minimum resolution of a moving object is 8 × 8, and under the condition of 4K resolution, the minimum resolution of a moving object is 16 × 16, because the existing detection algorithm performs resolution down-sampling processing on an input image, the resolution is generally reduced to 2MP resolution, and if the performance of a hardware platform is poor, the reduced resolution is lower, due to the process, a small moving object becomes smaller in the down-sampled image, so that the algorithm cannot accurately identify and detect the small moving object, which is also a significant problem faced by the current algorithm. For example, in the conventional scheme, an algorithm is continuously optimized, background modeling is continuously performed or field environment modeling is performed, and sample sampling is continuously performed on a moving object to improve the detection rate of the moving object, but the method is little in detection improvement on smaller objects and does not help much.
Fig. 1 is a schematic diagram of an application environment of an object detection method according to an embodiment of the present application. As shown in fig. 1, the enforcement environment application network architecture may include a server 01 (server cluster) and a user terminal cluster. The user terminal cluster may comprise one or more user terminals, where the number of user terminals will not be limited. As shown in fig. 1, the system may specifically include a user terminal 100a, a user terminal 100b, user terminals 100c and …, and a user terminal 100 n. As shown in fig. 1, the user terminal 100a, the user terminal 100b, the user terminals 100c, …, and the user terminal 100n may be respectively connected to the server 10 via a network, so that each user terminal may interact with the server 10 via the network. The specific connection manner of the network connection is not limited herein, and for example, the connection may be directly or indirectly performed through a wired communication manner, or may be directly or indirectly performed through a wireless communication manner.
Wherein, each user terminal in the user terminal cluster may include: the intelligent terminal comprises an intelligent terminal with an image data processing function, such as a smart phone, a tablet personal computer, a notebook computer, a desktop computer, an intelligent sound box, an intelligent watch, a vehicle-mounted terminal and an intelligent television. It should be understood that each user terminal in the user terminal cluster shown in fig. 1 may be installed with a target application (i.e., an application client), and when the application client runs in each user terminal, data interaction may be performed with the server 01 shown in fig. 1. The application client may include a social client, a multimedia client (e.g., a video client), an entertainment client (e.g., a game client), an education client, a live client, and the like. The application client may be an independent client, or may be an applet integrated in a client (for example, a social client, an education client, a multimedia client, and the like), which is not limited herein.
As shown in fig. 1, the server 01 in the embodiment of the present application may be a server corresponding to the application client. The server 01 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing services.
For convenience of understanding, in the embodiment of the present application, one user terminal may be selected as a target user terminal from the plurality of user terminals shown in fig. 1. For example, the user terminal 100a shown in fig. 1 may be used as a target user terminal in the embodiment of the present application, and a target application (i.e., an application client) may be integrated in the target user terminal. At this time, the target user terminal may implement data interaction with the server 01 through the service data platform corresponding to the application client. The object detection method can be performed in any equipment such as a server, a terminal, a server cluster or a cloud computing service cluster. For example, the server or the terminal may have a function of detecting the target object, for example, the server performs object detection on the acquired image to be detected based on the image to be detected and the target model.
Referring to fig. 2, a schematic flow chart of an object detection method provided in an embodiment of the present application is detailed as follows:
step S101, comparing a first image to be detected with a preset background model to obtain a comparison graph, wherein the first image comprises a target object to be detected; the target object is in particular a moving, tiny object.
Specifically, the detection of the target object may refer to the detection of a defect having a small size or a relatively small size, and may also refer to a minute object or a fine object. In one possible example, the detection of a tiny object may refer to detection within a range of sizes less than a certain threshold. For example, detection is performed with a defect having a size smaller than 32 × 32 as a target. In another possible example, the detection of the tiny objects may also refer to the detection of defects within a range of relative sizes smaller than a certain threshold; for example, the detection is performed with the aim of defects whose width and height are respectively lower than one tenth of the width and height of the image to be detected.
It should be noted that the image to be detected may be an image with a higher resolution; that is, the image to be detected may be an image whose resolution exceeds the target resolution threshold and whose size does not exceed the target size threshold. For example, the original image of the target object with high resolution and small size acquired by a high-definition camera can be obtained; different from the common target detection method based on the neural network, the target detection method has a better detection effect on the images with high resolution and small size.
In one possible example, the target object may be a 3C (Computer, Communication, consumer electronics) or an accessory of a 3C product, such as a Computer, tablet, cell phone or digital audio player, cell phone camera holder, etc. For example, in the embodiment of the present application, the image data of 3C fine or minute product accessories collected by the camera may be used.
In another possible example, the target object may be a moving target object, for example, a small or delicate and in motion target object, which may be a slipper, cup, cigarette butt, small stone, etc. thrown from high altitude.
It should be further noted that the detailed steps of step S101 specifically include:
acquiring a first image of continuous multiple frames, and converting the first image into a gray scale image;
specifically, the first image under the same background is converted into a gray-scale image from the first frame to the nth frame, and the binary gray-scale image of the tiny object in the motion state can be accurately acquired by acquiring the first images of the continuous frames.
Comparing the gray scale image with a preset background model, and determining the gray scale comparison image according to a comparison difference value;
specifically, the gray comparison is carried out on the two images, and a gray comparison image of the moving target object is determined according to the gray comparison difference.
And carrying out scaling pretreatment on the target object of each frame of the gray comparison image by using a preset gray threshold value to obtain a comparison image.
Through the prejudgment and comparison, a comparison graph of the tiny objects in the motion state can be better acquired.
Step S102, determining a position area of the target object and a position parameter of the target object in the comparison map;
specifically, the target object in the graph is labeled by comparison, so that the position area labeled by the target object and the position parameter of the target object are obtained.
The position area comprises position information of an outer frame area formed by surrounding the target object, the position parameters comprise a center coordinate, a length and a width of the position area, and the target object is in one-to-one correspondence with the position area of the first image and the comparison image, namely, the position information and the position parameters of the target object can be quickly acquired, and the first image or the comparison image can be quickly positioned and searched conveniently.
For example, in practical applications, many detection targets have an inclination angle, for example, if a detection target with an inclination angle of 45 ° is labeled by using a rectangular frame, the labeled area will occupy too much area, thereby reducing the accuracy and efficiency of target detection.
In a possible example, the detection target can be further labeled by using a variable-angle rectangular frame, and the detection target can be adjusted according to the actual condition of the actual detection target and the actual application scene, so that the region obtained by labeling is fully attached to the detection target, the area occupied by the labeling region is reduced, and the accuracy and the efficiency of target detection are improved.
Step S103, determining effective pixel values of all the position areas in the first image according to the position areas of the target object;
specifically, the target objects are in one-to-one correspondence with the position areas of the comparison graph in the first image, and the effective pixel values are obtained by determining the pixel points of each position area in the first image by using the average value of the pixel points of each position area, or the average value of the pixel points of the position area corresponding to a certain target object, that is, the average value of the pixel points of the corresponding position area is determined according to the position information of each target object in the first image, so that the effective pixel values of the current target object can be accurately reflected, and pixel distortion is not caused.
Step S104, filling the position area of the target object in the comparison graph by using the effective pixel value, and expanding the outline of the filled position area until all the position areas are filled and covered to obtain the target data with expanded outline;
specifically, the position area of the target object in the comparison graph is filled by referring to the effective pixel value obtained from the first image, the filled position area is subjected to contour expansion, and filling is continued in the contour expanded position area until the filled position area covers all the position areas, so that the contour expanded target data, namely the position area formed by contour expansion of the target object, is determined.
In one possible example, feature extraction processing may also be performed on a position region where the target object is located, the obtained feature maps are respectively pooled by a plurality of kernels with different sizes, and the pooled feature maps are connected in a channel dimension. And then performing down-sampling processing on the feature map obtained by the target object, and outputting the fused scale feature map for detection and identification.
S105, replacing a position area where a target object is located in the first image according to the target data to obtain an optimized second image;
specifically, through the complete replacement of the position area of the target data, the outline of the target object is enlarged, the occupied area of pixel points of the target object is increased, and follow-up accurate and easy detection is facilitated.
And S106, performing down-sampling processing on the second image to obtain a characteristic diagram, detecting according to the characteristic diagram, and determining the detection result of the target object.
Down-sampling is a process of reducing the sampling rate of a particular signal, typically used to reduce the data transmission rate or data size. And performing down-sampling processing on the second image to obtain a characteristic map in the second image, and detecting the characteristic map by using a target detection algorithm to further determine a detection result of the target object.
Specifically, in a normal case, the resolution of the image input to the target detection algorithm for detection is controlled within 2MP resolution, and if the performance is insufficient, the resolution to be supplied to the target detection algorithm is lower.
In the field of video monitoring at present, the resolution of a camera is larger and larger, 4MP becomes a mainstream, the occupancy of a 4K camera is higher and higher, the higher the resolution is, the higher the requirement on the performance of a chip is, but the influence of a limited hardware platform and hardware cost is caused, in limited resources, the resolution of an image required to be input during the operation of an algorithm cannot be too high, otherwise, the operation speed and performance indexes of the detection algorithm are influenced.
The process of converting the original resolution image to the resolution that the algorithm requires as input is called down-sampling. The 4MP resolution is converted into the 2MP resolution, the resolution is reduced by 2 times, the corresponding target object pixel area is reduced by 2 times, the 4K resolution is converted into the 2MP resolution, and the corresponding target object pixel area is reduced by 4 times. After the area of the pixel of the target object in the image after the downsampling is reduced, when the corresponding detection algorithm runs, the target object which should be concerned cannot be detected due to the reduction of the area of the downsampling pixel, so that the whole intelligent service is greatly influenced, and the expected monitoring effect cannot be achieved.
In the embodiment, the detection of the micro moving object can be realized, so that the detection rate and the accuracy of the moving target are improved. The application effect is most outstanding particularly in high-altitude parabolic scenes, and various objects thrown out of floors, such as slippers, cups and the like, even butts and small stones, can be effectively detected besides realizing common monitoring of scenes.
In one possible example, for example, an ssd (single Shot multi box detector) model is used as a target detection network, a resnet (residual error network) 101 network is used to replace an original vgg (visual Geometry group) network, for example, residual error learning is used, and a residual error block is configured by using a 1 × 1 convolutional layer for dimension reduction, then using a 3 × 3 convolutional layer for feature extraction, then using a 1 × 1 convolutional layer for restoration, adding input features and output features of the residual error block to serve as inputs of a next residual error block, which not only maintains accuracy but also reduces computation amount, and extracts richer semantic information. In order to extract and classify the features of the target object in a complex background environment, a method combining a local attention mechanism and a global attention mechanism may be used, for example, the local attention mechanism constructed by a gram matrix is multiplied by its transpose to obtain local features, the extracted features are subjected to global pooling to obtain global features while original spatial information and semantic information are retained, and then the local features are multiplied by the global features to obtain final features of the feature extraction unit. The network is made to focus on the detection target and ignore background information that interferes with feature extraction. Aiming at the problem that the SSD model has insufficient detection capability on the small target, the method increases the feature map of the feature layer responsible for detecting the small target by deconvolution, and fuses the feature map with the feature map obtained by the feature extraction unit, for example, the feature information of images with different scales extracted by different feature extraction units is effectively utilized, so that the loss phenomenon of effective information is avoided, the stability of a network is improved to a certain extent, and the accuracy of detecting the small-volume target objects such as dry batteries, melon seed skins, cigarette ends and the like is improved.
Please refer to fig. 3, which is a flowchart illustrating an object detection method according to an embodiment of the present application, in detail as follows:
in a possible implementation manner, before the image comparing the first image to be detected with the preset background model to obtain the comparison map, the method further includes:
step S100, quantizing a background by utilizing a Gaussian probability density function, fitting each pixel point by adopting a plurality of Gaussian distributions, and constructing a background model aiming at a used background scene, wherein the background model is used for performing image comparison to determine a target object; or, using an inter-frame difference algorithm to obtain a background model for detecting the target object by taking the previous frame image as the current background image and performing difference operation on the current frame image and the background image according to the correlation between two adjacent frames of images.
In this embodiment, a background image of a scene may be modeled by a single gaussian, an inter-frame difference, or other conventional background modeling methods, and the modeled background model may be used for image comparison, so as to conveniently and quickly obtain a target moving object in an image.
In a possible embodiment, before determining the position area of the target object and the position parameter of the target object in the comparison map, the method further includes:
and step S10, carrying out noise reduction processing on the comparison map, and filtering noise objects in the comparison map which are lower than a preset minimum detection size to obtain the comparison map after noise reduction.
Specifically, the preset minimum detection size may be set according to a preset rule, for example, manually set, and mainly set according to the image capturing resolution of the current camera device. By filtering the noise objects which are lower than the preset minimum detection size, the noise influence can be reduced, interference data in the comparison graph are eliminated, and the detection accuracy of the target objects of the comparison graph is improved.
Referring to fig. 4, a target object optimization processing flow chart provided in an embodiment of the present application is shown, where the position area in the comparison graph where the target object is located is filled with the effective pixel value, and the outline of the filled position area is expanded until all the position areas are filled and covered, so as to obtain the target data with the expanded outline, which is described in detail as follows:
step S301, filling position areas in the comparison graph according to the effective pixel values, wherein the effective pixel values are determined according to the pixel mean value of each position area of the first image;
specifically, in a position area of the target object in the comparison map (i.e., a position area formed by a rectangular frame surrounding the target object), the position area formed in the rectangular frame is filled with effective pixel values, and the uncovered background in the position area is filled and covered according to the effective pixel values.
Step S302, after the filling, the contour of the position area after the filling is enlarged according to a preset enlargement ratio, and the enlarged position area is continuously filled with the effective pixel value;
specifically, after the position area is filled, the outline of the position area is enlarged according to a preset enlargement ratio, for example, the length and the width formed by the rectangular frame are adaptively enlarged according to half the length and half the width of the minimum detection size, so as to improve the outline of the position area, and the enlarged position area is continuously filled with the effective pixel values.
Step S303, obtaining the target data with the expanded outline in the comparison graph until all the position areas are filled and covered.
In the embodiment, the outline and the pixel occupied area of the target object are reasonably and effectively enlarged through the method, and the target object in the comparison graph is optimized.
In a possible implementation manner, the replacing a position area where a target object is located in the first image according to the target data to obtain an optimized second image includes:
determining target data of the target object in the enlarged outline of the comparison graph according to the position area of the target object in the first image; and replacing the position areas corresponding to all the target objects in the first image one by replacing the position areas in the first image with the target data corresponding to the target objects to form an optimized second image.
Specifically, the position areas of the target objects in the first image (i.e., the original image) are replaced one by one according to the position information of the target objects in the first image, and the target areas are replaced with target data of the target objects subjected to filling processing and contour expansion processing in the comparison graph.
In a possible implementation manner, the replacing, one by one, the position areas corresponding to all the target objects in the first image to form the optimized second image includes:
if any two target objects in the second image have a cross-coincidence condition of position areas after replacing the target data, comparing the degrees of spaciousness around the two cross-coincidence target objects, selecting one target object with a better degree of spaciousness as a target to be moved, moving until the two target objects do not have the cross-coincidence condition, and recording the position information of the target object to be moved before moving.
Specifically, whether the position areas overlap with each other may be determined by the position information of the position areas corresponding to the two target objects, for example, if the position information overlaps with each other, it is determined that the position areas of the two target objects overlap with each other after the target data is replaced. The distance value can be used for judging the spaciousness degree around the two target objects according to the distance between the central position corresponding to the target object position area and the central position corresponding to other target objects, for example, the spaciousness degree is judged by the number of the distance value larger than the preset value, the distance value is larger, or the distance value is larger and the number is larger, the spaciousness degree is more optimal and sparse.
In this embodiment, after the target data of the target object is replaced, if there is a situation of overlapping position areas, by the above manner, not only the influence of the target objects between the target data after the replacement is avoided, but also the position information of the target data can be truly reflected, and it is ensured that the target data will not have a phenomenon of distortion of the position information after the replacement.
In other embodiments, the acquired original image (i.e., the first image) is preprocessed without downsampling, first, a background model is determined by performing scene modeling on the background, then, the images of consecutive multiple frames are acquired and converted into grayscale images, a grayscale comparison image corresponding to the multiple frames is obtained by comparing the grayscale images with the background model, then, the comparison image of each frame is judged through a threshold, a place where the comparison is obvious is enlarged, and a place where the comparison is not obvious is reduced, so that an occupied area of an interested or target moving object becomes more obvious through the operation, which is detailed in fig. 7.
In this case, the comparison chart is not yet the final effect chart. Because the original image has a noise problem, the comparison graph needs to be subjected to noise reduction processing, and in detail, fig. 8 is a schematic diagram of comparison of a preset minimum detection size provided by an embodiment of the present application; and traversing all the regions of the comparison graph through the set minimum target detection size, when the occupied region of the interested or moving object in the comparison graph is smaller than the minimum target detection size, completely removing the corresponding occupied region of the interested or moving object, and finally reserving the region which is larger than or equal to the minimum target detection size in the whole comparison graph. Fig. 9 is a detailed comparison diagram of noise reduction processing according to an embodiment of the present disclosure. By the method, the noise interference which is not expected can be obviously filtered out, so that the signal-to-noise ratio of the target object in the contrast image is improved.
On the basis, since the target moving objects in the comparison graph are all larger than the minimum target detection size, the center coordinates and the length and the width of the target moving object regions are obtained, as shown in detail in fig. 10, the comparison graph is determined for a position region provided by an embodiment of the present application, then, the effective pixels in each corresponding region in the original image are copied to obtain the average RGB values of the effective pixels, then, the pixel points of the region without pixels are filled, the whole region is filled, as shown in detail in fig. 11, and a target object filling schematic diagram provided by an embodiment of the present application is obtained. Then, the whole area is enlarged according to the preset pixel points, so that the outline of the target moving object is enlarged, the occupied area of the whole pixel is relatively greatly increased, see fig. 12 in detail, which is a schematic diagram for enlarging the outline of the target object provided by an embodiment of the present application, then, the pixel data subjected to optimization processing is copied again and replaced with the pixel data at the corresponding position in the original image, and then, the downsampling is performed and the pixel data are sent to a target detection algorithm for detection, so that the detection rate can be greatly improved.
Here, the above-mentioned method for optimizing the target object in the first image needs to be described in detail as follows:
background modeling: the image of the scene can be modeled through single gauss, interframe difference or other traditional background modeling modes, and modeling data can be used for image comparison, so that a moving target object in the image is obtained.
Image comparison: comparing the original image acquired by monitoring with the modeling data to obtain compared data, then converting the compared data into a gray comparison graph, and performing gray judgment through a set threshold to obtain the comparison graph;
and (3) denoising the comparison graph: in order to eliminate the interference data in the comparison graph, noise reduction needs to be performed on the comparison graph, and data in a region lower than the area of the minimum target moving object is eliminated. By setting a minimum block of detection pixels, the detected regions will be clear compared to the area of the regions below this size in the map.
And (3) detecting a target moving object of the comparison graph to obtain the region center coordinate and length and width data:
after noise reduction, the remaining effective regions are all target moving object regions which can be used for analysis, and rectangular regions of the target moving objects are obtained through rectangular fitting to obtain central coordinates and length and width data;
acquiring original image data in a comparison map target moving object region:
and corresponding the target moving object region in the comparison graph to the region of the original image one by one, and copying data in the original image.
Area filling and contour enlargement for original image data in comparison map target moving object area
And (4) performing area filling on the copied original image data, wherein the actual outline of the data area is not a rectangular area, so that the part without data in the area needs to be filled, and the filled data is the average of all RGB data.
After filling, contour expansion is required at this time, and the length and width of the minimum detection size are each taken to be half of the length and width of the minimum detection size, and the length and width of the region are each increased by half of the length and width of the minimum detection size, and the average is continuously filled into the expanded region.
Copying the optimized data into the original image, and replacing the original data:
and copying the data in the expanded region into the original image again, and replacing the original data, wherein the pixel area is expanded before the moving target in the original image is opposite, so that the algorithm detection is facilitated.
Down-sampling and algorithm detecting a target moving object:
and sending the optimized original image to a target detection algorithm to detect an actual target moving object.
According to the mode, the first image is compared with a preset background model to obtain a comparison graph, a position area, a position parameter and an effective pixel value corresponding to a target object are obtained, the position area formed by the target object in the comparison graph is filled and subjected to contour expansion to obtain contour expanded target data, and the position area where the target object is located in the first image is replaced by the target data; the method not only increases the outline of the target object, but also enlarges the occupied area of the target object in the image pixels, and can accurately detect the target object even aiming at a tiny or tiny target object, on one hand, the detection rate of the target object is greatly increased; on the other hand, the accuracy of intelligent detection is also improved.
Referring to fig. 5, the present embodiment provides an object detecting apparatus 500, which includes:
the background modeling module 501 quantifies a background by using a gaussian probability density function, fits each pixel point by using a plurality of gaussian distributions, and constructs a background model for a used background scene, wherein the background model is used for performing image comparison to determine a target object; or the like, or, alternatively,
using an inter-frame difference algorithm to obtain a background model for detecting a target object by taking a previous frame image as a current background image according to the correlation between two adjacent frames of images and performing difference operation on the current frame image and the background image;
an image comparison module 502, configured to perform image comparison on a first image to be detected and a preset background model to obtain a comparison graph, where the first image includes a target object to be detected;
the denoising module 503 is configured to denoise the comparison map, and filter a noise object in the comparison map that is lower than a preset minimum detection size to obtain the comparison map after denoising.
A position determining module 504, configured to determine a position area of the target object and a position parameter of the target object in the comparison map;
a pixel value determining module 505, configured to determine, according to a position region of the target object, an effective pixel value of each position region in the first image;
an optimization processing module 506, which fills the position area of the target object in the comparison graph by using the effective pixel value, expands the outline of the filled position area until all the position areas are filled and covered, and obtains the target data with the expanded outline;
a target replacement module 507, configured to replace a location area where a target object is located in the first image according to the target data, so as to obtain an optimized second image;
and the object detection module 508 is configured to perform downsampling on the second image to obtain a feature map, perform detection according to the feature map, and determine a detection result of the target object.
In this embodiment, the system is substantially provided with a plurality of modules for executing the method in the above embodiments, and specific functions and technical effects may refer to the above method embodiments, which are not described herein again.
Referring to fig. 7, an embodiment of the present application further provides an electronic device 600, which includes a processor 601, a memory 602, and a communication bus 603;
a communication bus 603 is used to connect the processor 601 and the memory 602;
the processor 601 is configured to execute the computer program stored in the memory 602 to implement the method according to one or more of the above-described embodiments.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program being used for causing a computer to execute the method according to any one of the above-mentioned embodiments.
Embodiments of the present application also provide a non-volatile readable storage medium, where one or more modules (programs) are stored in the storage medium, and when the one or more modules are applied to a device, the one or more modules may cause the device to execute instructions (instructions) included in an embodiment of the present application.
It should be noted that the computer readable medium of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the present application. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical concepts disclosed in the present application shall be covered by the claims of the present application.
Claims (11)
1. An object detection method, characterized in that the method comprises:
comparing a first image to be detected with a preset background model to obtain a comparison graph, wherein the first image comprises a target object to be detected;
determining a position area of the target object and a position parameter of the target object in the comparison map;
determining effective pixel values of the position areas in the first image according to the position areas of the target object;
filling a position area of the target object in the comparison graph by using the effective pixel value, and expanding the outline of the filled position area until all the position areas are filled and covered to obtain the target data with the expanded outline;
replacing the position area of the target object in the first image according to the target data to obtain an optimized second image;
and performing down-sampling processing on the second image to obtain a characteristic diagram, detecting according to the characteristic diagram, and determining the detection result of the target object.
2. The method of claim 1, wherein the comparing the first image to be tested with the preset background model to obtain a comparison graph comprises:
acquiring a first image of continuous multiple frames, and converting the first image into a gray scale image;
comparing the gray scale image with a preset background model, and determining the gray scale comparison image according to a comparison difference value;
and carrying out scaling pretreatment on the target object of each frame of the gray comparison image by using a preset gray threshold value to obtain a comparison image.
3. The method of claim 1, wherein before the step of comparing the first image to be tested with the preset background model to obtain the comparison map, the method further comprises:
quantizing the background by using a Gaussian probability density function, fitting each pixel point by using a plurality of Gaussian distributions, and constructing a background model aiming at the used background scene, wherein the background model is used for performing image comparison to determine a target object; or the like, or, alternatively,
and performing difference operation on the current frame image and the background image by using an inter-frame difference algorithm according to the correlation between two adjacent frames of images and taking the previous frame image as the current background image to obtain a background model for detecting the target object.
4. The method of claim 1, wherein before determining the location area of the target object and the location parameter of the target object in the alignment chart, further comprising:
and denoising the comparison graph, and filtering noise objects in the comparison graph, which are lower than a preset minimum detection size, to obtain the comparison graph after denoising.
5. The method according to any one of claims 1 to 4, wherein the position area includes position information of an outer frame area formed around the target object, the position parameters include center coordinates, length and width of the position area, and the target object is in one-to-one correspondence with the position area of the comparison map in the first image.
6. The method according to any one of claims 1 to 4, wherein the filling the position area of the target object in the comparison map with the effective pixel value, and expanding the outline of the filled position area until all the position areas are covered by the filling to obtain the target data with expanded outline, includes:
filling position areas in the comparison graph according to the effective pixel values, wherein the effective pixel values are determined according to the pixel mean value of each position area of the first image;
after the filling is finished, expanding the outline of the filled position area according to a preset amplification ratio, and continuously filling the expanded position area with the effective pixel value;
and obtaining target data with enlarged outlines in the comparison graph until all position areas are filled and covered.
7. The method according to any one of claims 1 to 4, wherein said replacing the location area of the target object in the first image according to the target data to obtain the optimized second image comprises:
determining target data of the target object in the enlarged outline of the comparison graph according to the position area of the target object in the first image; and replacing the position areas in the first image by the target data corresponding to the target object, and replacing the position areas corresponding to all the target objects in the first image one by one to form an optimized second image.
8. The method of claim 7, wherein said replacing location areas corresponding to all target objects in the first image one by one to form an optimized second image comprises:
if any two target objects in the second image have the situation of cross coincidence of position areas after replacing the target data, comparing the degrees of spaciousness around the two cross coincident target objects, selecting one target object with the better degree of spaciousness as the target to be moved until the two target objects have no cross coincidence, and recording the position information of the target object to be moved before moving.
9. An object detection device, the device comprising:
the image comparison module is used for carrying out image comparison on a first image to be detected and a preset background model to obtain a comparison image, wherein the first image comprises a target object to be detected;
the position determining module is used for determining a position area of the target object and a position parameter of the target object in the comparison map;
the pixel value determining module is used for determining effective pixel values of all the position areas in the first image according to the position areas of the target object;
the optimization processing module is used for filling a position area of the target object in the comparison graph by using the effective pixel value, expanding the outline of the filled position area until all the position areas are filled and covered, and obtaining the target data with the expanded outline;
the target replacement module is used for replacing the position area of the target object in the first image according to the target data to obtain an optimized second image;
and the object detection module is used for performing down-sampling processing on the second image to obtain a characteristic diagram, detecting according to the characteristic diagram and determining the detection result of the target object.
10. An electronic device comprising a processor, a memory, and a communication bus;
the communication bus is used for connecting the processor and the memory;
the processor is configured to execute a computer program stored in the memory to implement the method of any one of claims 1-8.
11. A computer-readable storage medium, having stored thereon a computer program,
the computer program is for causing a computer to perform the method of any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210601744.3A CN114926503A (en) | 2022-05-30 | 2022-05-30 | Object detection method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210601744.3A CN114926503A (en) | 2022-05-30 | 2022-05-30 | Object detection method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114926503A true CN114926503A (en) | 2022-08-19 |
Family
ID=82812235
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210601744.3A Pending CN114926503A (en) | 2022-05-30 | 2022-05-30 | Object detection method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114926503A (en) |
-
2022
- 2022-05-30 CN CN202210601744.3A patent/CN114926503A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113822977A (en) | Image rendering method, device, equipment and storage medium | |
CN112927363B (en) | Voxel map construction method and device, computer readable medium and electronic equipment | |
CN114758337B (en) | Semantic instance reconstruction method, device, equipment and medium | |
CN112861575A (en) | Pedestrian structuring method, device, equipment and storage medium | |
CN109886271B (en) | Image accurate segmentation method integrating deep learning network and improving edge detection | |
CN112954399B (en) | Image processing method and device and computer equipment | |
CN111062854A (en) | Method, device, terminal and storage medium for detecting watermark | |
WO2023082453A1 (en) | Image processing method and device | |
CN112989085A (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN112801047A (en) | Defect detection method and device, electronic equipment and readable storage medium | |
CN116453121B (en) | Training method and device for lane line recognition model | |
CN117197462A (en) | Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment | |
CN112686828B (en) | Video denoising method, device, equipment and storage medium | |
CN114648604A (en) | Image rendering method, electronic device, storage medium and program product | |
CN114155524A (en) | Single-stage 3D point cloud target detection method and device, computer equipment and medium | |
CN117036392A (en) | Image detection method and related device | |
CN117078602A (en) | Image stretching recognition and model training method, device, equipment, medium and product | |
CN110197459B (en) | Image stylization generation method and device and electronic equipment | |
CN117292122A (en) | RGB-D significance object detection and semantic segmentation method and system | |
CN115424029A (en) | Small target detection method for improving YOLOX network structure | |
CN113628349B (en) | AR navigation method, device and readable storage medium based on scene content adaptation | |
CN116883770A (en) | Training method and device of depth estimation model, electronic equipment and storage medium | |
CN114926503A (en) | Object detection method, device, equipment and medium | |
CN112651351B (en) | Data processing method and device | |
CN115131291A (en) | Object counting model training method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |