CN117576670A - Fine granularity identification method based on cascade neural network and target space-time continuity - Google Patents
Fine granularity identification method based on cascade neural network and target space-time continuity Download PDFInfo
- Publication number
- CN117576670A CN117576670A CN202311378666.6A CN202311378666A CN117576670A CN 117576670 A CN117576670 A CN 117576670A CN 202311378666 A CN202311378666 A CN 202311378666A CN 117576670 A CN117576670 A CN 117576670A
- Authority
- CN
- China
- Prior art keywords
- target
- fine
- interested
- image
- recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 38
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 77
- 238000001514 detection method Methods 0.000 claims abstract description 49
- 230000033001 locomotion Effects 0.000 claims description 10
- 241000288140 Gruiformes Species 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 241000271566 Aves Species 0.000 description 3
- 241000124879 Grus leucogeranus Species 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000013475 authorization Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The application provides a fine granularity identification method based on cascading neural network and target space-time continuity, wherein the method comprises the steps of obtaining multiple frames of images to be identified with continuous time; performing target detection on each image to be identified by using a target detection algorithm to obtain the position distribution of all the interested targets; tracking the interested target by utilizing a target tracking algorithm based on the space-time continuity of the target so as to determine the newly appeared interested target in each image to be identified; carrying out fine-grained recognition on the newly-appearing interested targets by utilizing a fine-grained recognition algorithm to obtain recognition results of all the interested targets; based on the position distribution and the recognition result, the position information and the fine granularity classification result of the interested target of each image to be recognized are obtained, so that efficient fine granularity recognition of the target is realized.
Description
Technical Field
The application relates to the field of target identification, in particular to a fine granularity identification method based on a cascade neural network and target space-time continuity.
Background
The video monitoring can provide abundant and visual scene information, so that people can know environmental conditions and changes thereof in a remote mode conveniently, and particularly, the interested targets in the environment are realized. The target recognition and monitoring method based on computer vision has the advantages of high efficiency, low cost, wide coverage range and the like, and is widely applied to the fields of traffic, security, ecological protection and the like. The artificial intelligence technology which is rapidly developed in recent years provides very effective technical support for target identification, can automatically and intelligently identify and classify targets in images, greatly improves the working efficiency, reduces manual intervention, and realizes intelligent monitoring and management.
In practical applications, not only the rough classification (such as birds and vehicles) of the target (i.e. coarse-grained identification) is often needed, but also the subdivision (such as white crane and gray crane) of the target (i.e. fine-grained identification of the target) is needed. At present, coarse-granularity identification of targets is studied more, and a plurality of model algorithms based on neural networks appear, and the methods can obtain good identification effect through a single frame image. However, fine granularity identification also faces a number of problems. On one hand, for fine granularity distinction of different targets, a more complex network structure is needed to extract fine features, if each frame of picture in continuous video is subjected to fine granularity recognition, huge calculation amount is generated, so that real-time recognition cannot be performed, and the recognition efficiency is low; on the other hand, due to factors such as occlusion, blurring, and the like, effective features of the target cannot be extracted based on a single frame image, which may result in lower recognition accuracy.
Disclosure of Invention
The present application aims to solve, at least to some extent, one of the technical problems in the related art.
Therefore, a first object of the present application is to propose a fine-granularity recognition method based on a cascade neural network and target space-time continuity, so as to realize efficient target fine-granularity recognition.
A second object of the present application is to propose a fine-grained recognition system based on cascaded neural networks and target spatio-temporal continuity.
A third object of the present application is to propose an electronic device.
A fourth object of the present application is to propose a computer readable storage medium.
To achieve the above object, an embodiment of a first aspect of the present application provides a fine granularity identification method based on a cascaded neural network and target space-time continuity, including the following steps:
acquiring a plurality of frames of images to be identified with continuous time;
performing target detection on each image to be identified by using a target detection algorithm to obtain the position distribution of all the interested targets;
tracking the interested target by utilizing a target tracking algorithm based on the target space-time continuity so as to determine the newly appeared interested target in each image to be identified;
carrying out fine-grained recognition on the newly appeared interested targets by using a fine-grained recognition algorithm to obtain recognition results of all the interested targets;
and obtaining the position information and the fine granularity classification result of the interested target of each image to be identified based on the position distribution and the identification result.
In the method of the first aspect of the present application, the multiple frames of images to be identified are obtained directly or through video.
In the method of the first aspect of the present application, the target detection algorithm is a lightweight target detection algorithm.
In the method of the first aspect of the present application, the tracking the object of interest by using an object tracking algorithm based on the object space-time continuity to determine a new object of interest in each image to be identified includes: aiming at the appearance information and the motion information of the interested target, matching the interested target in the images to be identified with continuous correlation time by utilizing a target tracking algorithm to obtain a continuous track of the interested target, and further determining the newly-appearing interested target in each image to be identified.
In the method of the first aspect of the present application, further comprising: after obtaining the identification result of each interested target of the current frame to-be-identified image, judging the reliability of the identification result; if the target of interest with the reliability not meeting the requirement exists, carrying out fine-granularity recognition on the corresponding target of interest of the target of interest with the reliability not meeting the requirement in the next frame of image to be recognized and the new target of interest in the next frame of image to be recognized by utilizing a fine-granularity recognition algorithm.
To achieve the above object, an embodiment of a second aspect of the present application provides a fine-grained identification system based on a cascade neural network and target spatio-temporal continuity, including:
the acquisition module is used for acquiring a plurality of frames of images to be identified with continuous time;
the target detection module is used for carrying out target detection on each image to be identified by utilizing a target detection algorithm so as to obtain the position distribution of all the interested targets;
the target tracking module is used for tracking the interested target by utilizing a target tracking algorithm based on the space-time continuity of the target so as to determine the newly appeared interested target in each image to be identified;
the fine granularity recognition module is used for carrying out fine granularity recognition on the newly-appearing interested targets by utilizing a fine granularity recognition algorithm so as to obtain recognition results of all the interested targets;
and the output module is used for obtaining the position information and the fine granularity classification result of the interested target of each image to be identified based on the position distribution and the identification result.
In the system of the second aspect of the present application, the obtaining module directly obtains or obtains the multi-frame image to be identified through video.
In the system of the second aspect of the present application, in the target detection module, the target detection algorithm is a lightweight target detection algorithm.
In the system of the second aspect of the present application, the target tracking module is specifically configured to: aiming at the appearance information and the motion information of the interested target, matching the interested target in the images to be identified with continuous correlation time by utilizing a target tracking algorithm to obtain a continuous track of the interested target, and further determining the newly-appearing interested target in each image to be identified.
In the system of the second aspect of the present application, the fine-granularity identification module is further configured to: after obtaining the identification result of each interested target of the current frame to-be-identified image, judging the reliability of the identification result; if the target of interest with the reliability not meeting the requirement exists, carrying out fine-granularity recognition on the corresponding target of interest of the target of interest with the reliability not meeting the requirement in the next frame of image to be recognized and the new target of interest in the next frame of image to be recognized by utilizing a fine-granularity recognition algorithm.
To achieve the above object, an embodiment of a third aspect of the present application provides an electronic device, including: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the method set forth in the first aspect of the present application.
To achieve the above object, an embodiment of a fourth aspect of the present application proposes a computer-readable storage medium having stored therein computer-executable instructions for implementing the method proposed in the first aspect of the present application when being executed by a processor.
According to the method, the system, the electronic equipment and the storage medium for identifying the fine granularity based on the cascading neural network and the target space-time continuity, the cascading neural network is formed by utilizing the target detection algorithm, the target tracking algorithm and the fine granularity identification algorithm, the target of interest in the images to be identified is detected rapidly by utilizing the target detection algorithm, then the tracking of the target of interest in each image to be identified is achieved through the target tracking algorithm to obtain a new target of interest, and then the fine granularity identification is carried out on the new target of interest, so that repeated fine granularity identification on the target can be avoided, calculated amount is reduced, and more efficient target fine granularity identification is achieved.
Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
fig. 1 is a schematic flow chart of a fine granularity identification method based on cascading neural network and target space-time continuity according to an embodiment of the present application;
fig. 2 is a diagram of recognition results of a first frame in three time-continuous frames of images to be recognized according to an embodiment of the present application;
fig. 3 is a diagram of recognition results of a second frame in three time-continuous frames of images to be recognized according to an embodiment of the present application;
fig. 4 is a diagram of recognition results of a third frame in three time-continuous frames of images to be recognized according to an embodiment of the present application;
fig. 5 is a block diagram of a fine-grained recognition system based on cascaded neural networks and target spatio-temporal continuity according to an embodiment of the application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary and intended for the purpose of explaining the present application and are not to be construed as limiting the present application.
The following describes a fine granularity recognition method and system based on cascading neural network and target space-time continuity according to an embodiment of the present application with reference to the accompanying drawings.
Aiming at various problems existing in the current target classification method, such as fine granularity distinction of different targets, a more complex network structure is needed to extract fine features, if each frame of picture in continuous video is subjected to fine granularity recognition, huge calculated amount is generated, so that real-time recognition cannot be performed, and recognition efficiency is low; and due to factors such as shielding, blurring and the like, effective characteristics of the target cannot be extracted based on a single frame image, which can lead to low recognition accuracy and the like. The embodiment of the application provides a fine granularity identification method based on a cascade neural network and target space-time continuity, so as to realize efficient and accurate target fine granularity identification.
Fig. 1 is a schematic flow chart of a fine granularity identification method based on cascading neural network and target space-time continuity according to an embodiment of the present application. As shown in fig. 1, the fine granularity identification method based on the cascade neural network and the target space-time continuity comprises the following steps:
step S101, a plurality of frames of images to be identified which are continuous in time are acquired.
In step S101, a plurality of frames of images to be identified may be acquired directly or obtained through video. Wherein the video may be an online real-time video or an offline stored video.
In step S101, when the multiple frames of images to be identified with continuous time are acquired through the video, the multiple frames of images to be identified with continuous time may be formed by frame extraction of the video according to actual requirements.
In step S101, if the image size of the image to be identified does not meet the input requirement of the target detection algorithm, the image size of each frame of the image to be identified is further required to be preprocessed (e.g. cropped) to adapt to the input requirement of the target detection network (also referred to as the target detection algorithm).
Step S102, performing object detection on each image to be identified by using an object detection algorithm to obtain the position distribution of all the objects of interest.
In step S102, the target detection algorithm may employ a lightweight target detection algorithm. The lightweight target detection algorithm uses a lightweight network with low complexity and low parameter quantity, such as YOLO, mobileNet (mobile neural network architecture), and can rapidly detect the position distribution of all the targets of interest in each image to be identified.
In step S102, the target detection algorithm is a model trained in advance. Wherein the object of interest is related to the tag at the time of training. For example, if the tag is "birds" during training, then all the interested targets in the image to be identified obtained by the target detection in this step are all birds in the image to be identified.
In step S102, the object detection algorithm detects only the position of the object of interest in the image to be recognized, and does not include the kind information of the object of interest.
Step S103, tracking the interested target by utilizing a target tracking algorithm based on the target space-time continuity so as to determine the new interested target in each image to be identified.
In step S103, the target tracking algorithm may employ a neural network-based tracking algorithm, such as deep sort (multi-target tracking algorithm), to improve the robustness and accuracy of tracking.
Specifically, in step S103, tracking the object of interest by using an object tracking algorithm based on the object spatiotemporal continuity to determine a new object of interest in each image to be identified, including: aiming at the appearance information and the motion information of the interested target, matching the interested target in the images to be identified with continuous correlation time by utilizing a target tracking algorithm to obtain a continuous track of the interested target, and further determining the newly-appearing interested target in each image to be identified.
The target tracking method uses spatial contact information such as appearance information and motion information of an interested target (which may be simply referred to as a target), matches the target in a continuous frame of associated time to obtain a continuous track of each target, and manages target data information, wherein the management content includes: different labels are assigned to different targets, new labels are assigned to new emerging targets, and the current matching library is moved out of the way for disappeared targets.
And step S104, carrying out fine-grained recognition on the newly-appearing interesting targets by utilizing a fine-grained recognition algorithm so as to obtain recognition results of all the interesting targets.
In step S104, the fine-grained recognition model adopts a neural network with a deeper network hierarchy to extract finer target features, so as to realize the distinction of different fine-grained categories (such as category information). Wherein the fine granularity recognition algorithm is a model trained in advance. Wherein the type information of the object of interest is related to the tag at the time of training of the model. For example, when the target of interest is "bird", the fine-grained recognition algorithm trains the category information of the bird including "gray crane", "white crane", etc., and then the recognition result (also referred to as fine-grained classification result) of any target of interest obtained after the fine-grained recognition in this step is one of the corresponding "gray crane", "white crane", etc.
In step S104, input data of the fine-granularity recognition algorithm is a new object of interest in each frame of images to be recognized, specifically, for the first frame of images to be recognized, wherein all the objects of interest are the new objects of interest, and at this time, fine-granularity recognition is performed on all the objects of interest in the first frame of images to be recognized by using the fine-granularity recognition algorithm; and for each subsequent frame of image to be identified, taking any frame as the current frame, and comparing the newly added object of interest with the corresponding previous frame to obtain the newly appeared object of interest in the current frame of image to be identified, and carrying out fine-grained identification on the newly appeared object of interest in the current frame of image to be identified by using a fine-grained identification algorithm. Therefore, repeated fine-grained identification of the target can be avoided, the calculated amount is reduced, and efficient fine-grained identification of the target is realized.
In addition, considering that effective features of the target cannot be extracted based on a single frame image due to factors such as occlusion, blurring and the like, the recognition accuracy is low. In step S104, further including: after the identification result of each interested target of the image to be identified of the current frame is obtained, the reliability of the identification result is judged; if the target of interest with the reliability not meeting the requirement exists, carrying out fine-granularity recognition on the corresponding target of interest of the target of interest with the reliability not meeting the requirement in the next frame of image to be recognized and the new target of interest in the next frame of image to be recognized by utilizing a fine-granularity recognition algorithm.
Wherein the reliability of the recognition result can be determined by using the network prediction score, and the higher the score is, the higher the reliability is. If the prediction score is greater than or equal to a score threshold (e.g., 0.7), the reliability meets the requirements, otherwise the reliability does not meet the requirements. If the reliability does not meet the requirement, marking out that fine granularity recognition is performed again in the next frame until the reliability is higher, and updating a historical result by using a new recognition result to improve the recognition accuracy under the conditions of shielding, blurring and the like.
In step S104, the fine-grained recognition algorithm performs fine-grained recognition only on the newly-appearing object of interest or the object of interest with poor reliability of the previous recognition result, so as to avoid repeated fine-grained recognition by using the space-time continuity of the object of interest, reduce the calculation amount, and improve the operation efficiency. Meanwhile, for interested targets in complex scenes such as shielding, blurring and the like, based on recognition reliability measurement, multi-frame recognition results are fused, and the stability and accuracy of recognition are enhanced.
Step S105, based on the position distribution and the recognition result, obtaining the position information and the fine granularity classification result of the interested target of each image to be recognized.
Specifically, in step S105, the position distribution of the interested targets obtained by the target detection network of each image to be identified and the recognition results of each interested target of each image to be identified are summarized, and finally, the position information and the fine-granularity classification results of all the interested targets in all the images to be identified are obtained.
To verify the effect of the method of the present application, experiments were performed taking bird fine grain identification as an example, and the specific contents are as follows:
step A: bird video data are acquired through a real-time video monitoring device, frame extraction processing of 10 frames per second is carried out on the video, time-continuous image data are obtained, and then the image size is converted into 448 x 448 so as to be input into a target detection algorithm.
And (B) step (B): and performing target detection on each image data which are continuous in time by using a lightweight target detection algorithm YOLOv5, so as to obtain the position distribution information of all the interested targets. For example, using the YOLOv5s6 network, YOLOv5s6 is a lighter weight model consisting of 4 convolutional layers, 8 Bottleneck layers, and 1 output layer, with a total of about 10.5M. This enables YOLOv5s6 to achieve fast and efficient real-time target detection in a resource-constrained environment.
Step C: tracking the detected interested target by using a target tracking algorithm deep SORT, matching the detected interested target in the associated continuous frames by using the appearance information and the motion information to obtain a continuous track of the target, managing the target data information, assigning different labels to different targets, and assigning new labels to newly-appearing targets. The deep source is used for processing the relevance of each frame by combining the realization of motion and appearance information and performing more accurate relevance matching, and the relevance measurement is performed by using a Hungary algorithm, so that the deep source has higher tracking performance; and meanwhile, the neural network is used for extracting the characteristics, so that the robustness under the conditions of missing and shielding is improved.
Step D: and carrying out fine-grained recognition on the interested target by using a fine-grained recognition algorithm, and judging the reliability of a recognition result. The fine-grained recognition model employs a convnext-base_4xb32_cmb200, which is based on a convolutional neural network architecture, containing 4 parallel branches, each branch having 32 convolutional kernels. With this architecture, the model is able to extract image features of different levels. Wherein the step only carries out fine-grained recognition on the new interested target or the interested target with poor previous recognition reliability of each frame. The identification reliability is determined by using the predictive score, and the higher the score is, the higher the reliability is. If the prediction score is lower than 0.7, namely the reliability is lower, marking that fine granularity identification is performed again on the next frame until the reliability is higher, and updating the historical result by using the new identification result. The space-time continuity of the target is utilized to avoid repeated fine granularity recognition, so that the calculated amount is reduced, the operation efficiency is improved, and meanwhile, the recognition accuracy under the conditions of shielding, blurring and the like is improved.
Step E: and finally, obtaining the position information and the fine granularity classification result of the interested target of each image data by utilizing the target detection position distribution result and the fine granularity identification classification result.
Fig. 2 is a diagram of recognition results of a first frame in three time-continuous frames of images to be recognized according to an embodiment of the present application. Fig. 3 is a diagram of recognition results of a second frame in three time-continuous frames of images to be recognized according to an embodiment of the present application. Fig. 4 is a diagram of recognition results of a third frame in three time-continuous frames of images to be recognized according to an embodiment of the present application.
As can be seen in connection with fig. 2 to 4: first, an ash crane appears in the picture of FIG. 2, and is successfully positioned and identified; two gray cranes appear almost simultaneously on the right side of the picture in fig. 3, wherein the gray crane with the closer distance is successfully positioned and identified, and the gray crane with the farther distance is not identified due to partial shielding; with the movement of the target, the gray crane originally appearing in fig. 4 has disappeared in the picture, and the gray crane originally occluded appears all the way and is successfully positioned and identified, proving the effectiveness of the method.
In order to implement the above embodiment, the application further provides a fine granularity identification system based on the cascade neural network and the target space-time continuity.
Fig. 5 is a block diagram of a fine-grained recognition system based on cascaded neural networks and target spatio-temporal continuity according to an embodiment of the application.
As shown in fig. 5, the fine-granularity recognition system based on the cascade neural network and the target space-time continuity includes an acquisition module 11, a target detection module 12, a target tracking module 13, a fine-granularity recognition module 14, and an output module 15, wherein:
an acquiring module 11, configured to acquire multiple frames of images to be identified that are continuous in time;
the target detection module 12 is configured to perform target detection on each image to be identified by using a target detection algorithm, so as to obtain the position distribution of all the objects of interest;
the target tracking module 13 is used for tracking the interested target by utilizing a target tracking algorithm based on the space-time continuity of the target so as to determine the newly appeared interested target in each image to be identified;
the fine granularity recognition module 14 is configured to perform fine granularity recognition on the newly appearing objects of interest by using a fine granularity recognition algorithm, so as to obtain recognition results of the objects of interest;
and the output module 15 is used for obtaining the position information and the fine granularity classification result of the interested target of each image to be identified based on the position distribution and the identification result.
Further, in one possible implementation manner of the embodiment of the present application, the acquiring module 11 acquires the multi-frame image to be identified directly or through video.
Further, in one possible implementation of the embodiment of the present application, in the target detection module 12, the target detection algorithm is a lightweight target detection algorithm.
Further, in one possible implementation manner of the embodiment of the present application, the target tracking module 13 is specifically configured to: aiming at the appearance information and the motion information of the interested target, matching the interested target in the images to be identified with continuous correlation time by utilizing a target tracking algorithm to obtain a continuous track of the interested target, and further determining the newly-appearing interested target in each image to be identified.
Further, in one possible implementation of the embodiment of the present application, the fine granularity identification module 14 is further configured to: after the identification result of each interested target of the image to be identified of the current frame is obtained, the reliability of the identification result is judged; if the target of interest with the reliability not meeting the requirement exists, carrying out fine-granularity recognition on the corresponding target of interest of the target of interest with the reliability not meeting the requirement in the next frame of image to be recognized and the new target of interest in the next frame of image to be recognized by utilizing a fine-granularity recognition algorithm.
It should be noted that the foregoing explanation of the embodiments of the method for identifying fine granularity based on the cascade neural network and the target space-time continuity is also applicable to the fine granularity identification system based on the cascade neural network and the target space-time continuity of the embodiments, and will not be repeated herein.
In the embodiment of the application, the target detection algorithm, the target tracking algorithm and the fine-granularity recognition algorithm are utilized to form the cascade neural network, the target detection algorithm is utilized to rapidly detect the target of interest in the images to be recognized, then the target tracking algorithm is utilized to track the target of interest in each image to be recognized to obtain a new target of interest, and then fine-granularity recognition is carried out on the new target of interest, so that repeated fine-granularity recognition on the target can be avoided, the calculated amount is reduced, and more efficient fine-granularity recognition on the target is realized. In addition, for targets in complex scenes such as shielding, blurring and the like, based on identification reliability judgment, multi-frame identification results can be fused, and stability and accuracy of identification are enhanced. Namely, through cascading neural network and target space-time continuity, efficient and accurate target fine granularity identification is realized.
In order to achieve the above embodiments, the present application further proposes an electronic device including: a processor, a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the methods provided by the previous embodiments.
In order to implement the above embodiment, the present application further proposes a computer-readable storage medium, in which computer-executable instructions are stored, which when executed by a processor are configured to implement the method provided in the foregoing embodiment.
In order to implement the above embodiments, the present application also proposes a computer program product comprising a computer program which, when executed by a processor, implements the method provided by the above embodiments.
The processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the information related to the application all accord with the regulations of the related laws and regulations and do not violate the well-known and popular regulations.
It should be noted that personal information from users should be collected for legitimate and reasonable uses and not shared or sold outside of these legitimate uses. In addition, such collection/sharing should be performed after receiving user informed consent, including but not limited to informing the user to read user agreements/user notifications and signing agreements/authorizations including authorization-related user information before the user uses the functionality. In addition, any necessary steps are taken to safeguard and ensure access to such personal information data and to ensure that other persons having access to the personal information data adhere to their privacy policies and procedures.
The present application contemplates embodiments that may provide a user with selective prevention of use or access to personal information data. That is, the present disclosure contemplates that hardware and/or software may be provided to prevent or block access to such personal information data. Once personal information data is no longer needed, risk can be minimized by limiting data collection and deleting data. In addition, personal identification is removed from such personal information, as applicable, to protect the privacy of the user.
In the foregoing descriptions of embodiments, descriptions of the terms "one embodiment," "some embodiments," "example," "particular example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.
Claims (12)
1. The fine granularity identification method based on the cascade neural network and the target space-time continuity is characterized by comprising the following steps of:
acquiring a plurality of frames of images to be identified with continuous time;
performing target detection on each image to be identified by using a target detection algorithm to obtain the position distribution of all the interested targets;
tracking the interested target by utilizing a target tracking algorithm based on the target space-time continuity so as to determine the newly appeared interested target in each image to be identified;
carrying out fine-grained recognition on the newly appeared interested targets by using a fine-grained recognition algorithm to obtain recognition results of all the interested targets;
and obtaining the position information and the fine granularity classification result of the interested target of each image to be identified based on the position distribution and the identification result.
2. The method for fine-granularity recognition based on cascading neural networks and target space-time continuity according to claim 1, wherein the multi-frame images to be recognized are obtained directly or through video.
3. The method for fine-grained identification based on cascading neural networks and target space-time continuity according to claim 1, wherein the target detection algorithm employs a lightweight target detection algorithm.
4. The method for fine-grained recognition based on cascading neural networks and target spatiotemporal continuity according to claim 1, wherein tracking the target of interest with a target tracking algorithm based on target spatiotemporal continuity to determine a new target of interest in each image to be recognized comprises:
aiming at the appearance information and the motion information of the interested target, matching the interested target in the images to be identified with continuous correlation time by utilizing a target tracking algorithm to obtain a continuous track of the interested target, and further determining the newly-appearing interested target in each image to be identified.
5. The method for fine-grained identification based on cascading neural networks and target spatiotemporal continuity of claim 1, further comprising:
after obtaining the identification result of each interested target of the current frame to-be-identified image, judging the reliability of the identification result;
if the target of interest with the reliability not meeting the requirement exists, carrying out fine-granularity recognition on the corresponding target of interest of the target of interest with the reliability not meeting the requirement in the next frame of image to be recognized and the new target of interest in the next frame of image to be recognized by utilizing a fine-granularity recognition algorithm.
6. A fine-grained recognition system based on cascading neural networks and target spatiotemporal continuity, comprising:
the acquisition module is used for acquiring a plurality of frames of images to be identified with continuous time;
the target detection module is used for carrying out target detection on each image to be identified by utilizing a target detection algorithm so as to obtain the position distribution of all the interested targets;
the target tracking module is used for tracking the interested target by utilizing a target tracking algorithm based on the space-time continuity of the target so as to determine the newly appeared interested target in each image to be identified;
the fine granularity recognition module is used for carrying out fine granularity recognition on the newly-appearing interested targets by utilizing a fine granularity recognition algorithm so as to obtain recognition results of all the interested targets;
and the output module is used for obtaining the position information and the fine granularity classification result of the interested target of each image to be identified based on the position distribution and the identification result.
7. The fine-grained recognition system based on cascading neural networks and target spatio-temporal continuity of claim 6, wherein the acquisition module acquires the multi-frame image to be recognized directly or through video.
8. The fine-grained recognition system based on cascading neural networks and target spatiotemporal continuity of claim 6, wherein in the target detection module, the target detection algorithm employs a lightweight target detection algorithm.
9. The fine-grained recognition system based on cascading neural networks and target spatiotemporal continuity according to claim 6, wherein the target tracking module is specifically configured to:
aiming at the appearance information and the motion information of the interested target, matching the interested target in the images to be identified with continuous correlation time by utilizing a target tracking algorithm to obtain a continuous track of the interested target, and further determining the newly-appearing interested target in each image to be identified.
10. The fine-grained identification system based on cascading neural networks and target spatiotemporal continuity of claim 6, wherein the fine-grained identification module is further configured to:
after obtaining the identification result of each interested target of the current frame to-be-identified image, judging the reliability of the identification result;
if the target of interest with the reliability not meeting the requirement exists, carrying out fine-granularity recognition on the corresponding target of interest of the target of interest with the reliability not meeting the requirement in the next frame of image to be recognized and the new target of interest in the next frame of image to be recognized by utilizing a fine-granularity recognition algorithm.
11. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1-5.
12. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311378666.6A CN117576670A (en) | 2023-10-23 | 2023-10-23 | Fine granularity identification method based on cascade neural network and target space-time continuity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311378666.6A CN117576670A (en) | 2023-10-23 | 2023-10-23 | Fine granularity identification method based on cascade neural network and target space-time continuity |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117576670A true CN117576670A (en) | 2024-02-20 |
Family
ID=89863269
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311378666.6A Pending CN117576670A (en) | 2023-10-23 | 2023-10-23 | Fine granularity identification method based on cascade neural network and target space-time continuity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117576670A (en) |
-
2023
- 2023-10-23 CN CN202311378666.6A patent/CN117576670A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9299162B2 (en) | Multi-mode video event indexing | |
Cucchiara et al. | Vehicle Detection under Day and Night Illumination. | |
Faro et al. | Adaptive background modeling integrated with luminosity sensors and occlusion processing for reliable vehicle detection | |
US8744125B2 (en) | Clustering-based object classification | |
US20080095435A1 (en) | Video segmentation using statistical pixel modeling | |
Kumar et al. | Study of robust and intelligent surveillance in visible and multi-modal framework | |
CN116311166A (en) | Traffic obstacle recognition method and device and electronic equipment | |
Sharma et al. | Automatic vehicle detection using spatial time frame and object based classification | |
KR101690050B1 (en) | Intelligent video security system | |
CN113920585A (en) | Behavior recognition method and device, equipment and storage medium | |
CN111667507A (en) | Method for tracking vehicle track on highway | |
Zhang et al. | A front vehicle detection algorithm for intelligent vehicle based on improved gabor filter and SVM | |
CN111639585A (en) | Self-adaptive crowd counting system and self-adaptive crowd counting method | |
US12100214B2 (en) | Video-based public safety incident prediction system and method therefor | |
Płaczek | A real time vehicle detection algorithm for vision-based sensors | |
CN117576670A (en) | Fine granularity identification method based on cascade neural network and target space-time continuity | |
CN114187666B (en) | Identification method and system for watching mobile phone while walking | |
Neto et al. | Computer-vision-based surveillance of intelligent transportation systems | |
CN114912536A (en) | Target identification method based on radar and double photoelectricity | |
CN212084368U (en) | Highway vehicle trajectory tracking system | |
Vujović et al. | Traffic video surveillance in different weather conditions | |
Khatri et al. | Video analytics based identification and tracking in smart spaces | |
CN111008580A (en) | Human behavior analysis method and device based on intelligent security of park | |
Giriprasad et al. | Anomalies detection from video surveillance using support vector trained deep neural network classifier | |
CN116156149B (en) | Detection method and device for detecting camera movement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |