CN117576670A

CN117576670A - Fine-grained recognition method based on cascade neural network and target spatiotemporal continuity

Info

Publication number: CN117576670A
Application number: CN202311378666.6A
Authority: CN
Inventors: 鉴海防; 郑帅康; 王洪昌; 张凌赫
Original assignee: Nanchang High Tech Industry Collaborative Innovation Research Institute Chinese Academy Of Sciences; Institute of Semiconductors of CAS
Current assignee: Nanchang High Tech Industry Collaborative Innovation Research Institute Chinese Academy Of Sciences; Institute of Semiconductors of CAS
Priority date: 2023-10-23
Filing date: 2023-10-23
Publication date: 2024-02-20

Abstract

The application provides a fine granularity identification method based on cascading neural network and target space-time continuity, wherein the method comprises the steps of obtaining multiple frames of images to be identified with continuous time; performing target detection on each image to be identified by using a target detection algorithm to obtain the position distribution of all the interested targets; tracking the interested target by utilizing a target tracking algorithm based on the space-time continuity of the target so as to determine the newly appeared interested target in each image to be identified; carrying out fine-grained recognition on the newly-appearing interested targets by utilizing a fine-grained recognition algorithm to obtain recognition results of all the interested targets; based on the position distribution and the recognition result, the position information and the fine granularity classification result of the interested target of each image to be recognized are obtained, so that efficient fine granularity recognition of the target is realized.

Description

Fine-grained recognition method based on cascade neural network and target spatiotemporal continuity

技术领域Technical field

本申请涉及目标识别领域，尤其涉及一种基于级联神经网络和目标时空连续性的细粒度识别方法。This application relates to the field of target recognition, and in particular to a fine-grained recognition method based on cascade neural networks and target spatiotemporal continuity.

背景技术Background technique

视频监控能够提供丰富直观的场景信息，方便人们以远程的形式对环境情况及其变化进行了解，特别是环境中的感兴趣目标。基于计算机视觉的目标识别监测，具有效率高、成本低、覆盖范围广等优点，广泛应用于交通、安防、生态保护等领域。近年来快速发展的人工智能技术为目标识别提供了非常有效的技术支撑，能够对图像中的目标进行自动化和智能化的识别和分类，大大提高工作效率，减少人工干预，实现智慧化监测和管理。Video surveillance can provide rich and intuitive scene information, making it easier for people to remotely understand environmental conditions and changes, especially targets of interest in the environment. Target recognition monitoring based on computer vision has the advantages of high efficiency, low cost, and wide coverage, and is widely used in transportation, security, ecological protection and other fields. The rapid development of artificial intelligence technology in recent years has provided very effective technical support for target recognition. It can automatically and intelligently identify and classify targets in images, greatly improve work efficiency, reduce manual intervention, and achieve intelligent monitoring and management. .

在实际应用中，往往不仅需要对目标的大致类别(如鸟类、车辆)进行分析，即粗粒度识别，还需要判断出目标的细分种类(如白鹤、灰鹤)，即目标的细粒度识别。目前对于目标的粗粒度识别研究较多，出现了许多基于神经网络的模型算法，这些方法通过单帧图像即可取得良好的识别效果。然而细粒度识别还面临许多问题。一方面，对于不同目标的细粒度区分，需要更加复杂的网络结构来提取精细特征，如果对连续视频中的每帧图片都进行细粒度识别，会产生巨大的计算量导致无法实时识别，从而导致识别效率偏低；另一方面，由于遮挡、模糊等因素，基于单帧图像无法提取目标的有效特征，这会导致识别精度较低。In practical applications, it is often necessary not only to analyze the general category of the target (such as birds, vehicles), that is, coarse-grained identification, but also to determine the subdivided types of the target (such as white cranes, gray cranes), that is, fine-grained identification of the target. . At present, there are many studies on coarse-grained target recognition, and many model algorithms based on neural networks have emerged. These methods can achieve good recognition results through a single frame of image. However, fine-grained recognition still faces many problems. On the one hand, fine-grained distinction between different targets requires a more complex network structure to extract fine features. If fine-grained recognition is performed on each frame of a continuous video, a huge amount of calculation will be generated, resulting in the inability to recognize in real time, resulting in The recognition efficiency is low; on the other hand, due to factors such as occlusion and blur, effective features of the target cannot be extracted based on a single frame image, which will lead to low recognition accuracy.

发明内容Contents of the invention

本申请旨在至少在一定程度上解决相关技术中的技术问题之一。The present application aims to solve, at least to a certain extent, one of the technical problems in the related art.

为此，本申请的第一个目的在于提出一种基于级联神经网络和目标时空连续性的细粒度识别方法，以实现高效的目标细粒度识别。To this end, the first purpose of this application is to propose a fine-grained identification method based on cascaded neural networks and target spatiotemporal continuity to achieve efficient fine-grained identification of targets.

本申请的第二个目的在于提出一种基于级联神经网络和目标时空连续性的细粒度识别系统。The second purpose of this application is to propose a fine-grained recognition system based on cascaded neural networks and target spatiotemporal continuity.

本申请的第三个目的在于提出一种电子设备。The third object of this application is to provide an electronic device.

本申请的第四个目的在于提出一种计算机可读存储介质。The fourth object of this application is to provide a computer-readable storage medium.

为达上述目的，本申请第一方面实施例提出了一种基于级联神经网络和目标时空连续性的细粒度识别方法，包括以下步骤：In order to achieve the above purpose, the first embodiment of the present application proposes a fine-grained identification method based on cascade neural network and target spatiotemporal continuity, which includes the following steps:

获取时间连续的多帧待识别图像；Obtain multiple frames of images to be recognized that are continuous in time;

利用目标检测算法对各待识别图像进行目标检测，以得到所有感兴趣目标的位置分布；Use the target detection algorithm to perform target detection on each image to be recognized to obtain the location distribution of all targets of interest;

基于目标时空连续性，利用目标跟踪算法对所述感兴趣目标进行跟踪，以确定各待识别图像中的新出现的感兴趣目标；Based on the spatio-temporal continuity of the target, use a target tracking algorithm to track the target of interest to determine new targets of interest appearing in each image to be recognized;

利用细粒度识别算法对所述新出现的感兴趣目标进行细粒度识别，以得到各感兴趣目标的识别结果；Use a fine-grained identification algorithm to perform fine-grained identification of the newly emerged targets of interest to obtain identification results for each target of interest;

基于所述位置分布和所述识别结果，得到各待识别图像的感兴趣目标的位置信息和细粒度分类结果。Based on the position distribution and the recognition result, the position information and fine-grained classification results of the target of interest in each image to be recognized are obtained.

在本申请的第一方面的方法中，所述多帧待识别图像直接获取或通过视频获得。In the method of the first aspect of the present application, the multiple frames of images to be identified are obtained directly or through video.

在本申请的第一方面的方法中，所述目标检测算法采用轻量化目标检测算法。In the method of the first aspect of the present application, the target detection algorithm adopts a lightweight target detection algorithm.

在本申请的第一方面的方法中，所述基于目标时空连续性，利用目标跟踪算法对所述感兴趣目标进行跟踪，以确定各待识别图像中的新出现的感兴趣目标，包括：针对感兴趣目标的外观信息和运动信息，利用目标跟踪算法匹配关联时间连续的待识别图像中的感兴趣目标，得到感兴趣目标的连续轨迹，进而确定各待识别图像中的新出现的感兴趣目标。In the method of the first aspect of the present application, based on the spatio-temporal continuity of the target, the target tracking algorithm is used to track the target of interest to determine the newly emerging target of interest in each image to be identified, including: The appearance information and motion information of the target of interest are used, and the target tracking algorithm is used to match the target of interest in the images to be identified that are associated with continuous time, and the continuous trajectory of the target of interest is obtained, and then the new target of interest in each image to be recognized is determined. .

在本申请的第一方面的方法中，还包括：获得当前帧待识别图像的各感兴趣目标的识别结果后，判断所述识别结果的可靠性；若存在可靠性不满足要求的感兴趣目标，则针对该可靠性不满足要求的感兴趣目标在下一帧待识别图像中对应的感兴趣目标，及下一帧待识别图像中新出现的感兴趣目标利用细粒度识别算法进行细粒度识别。In the method of the first aspect of the present application, it also includes: after obtaining the recognition results of each object of interest in the image to be recognized in the current frame, judging the reliability of the recognition results; if there are objects of interest whose reliability does not meet the requirements , then a fine-grained recognition algorithm is used for fine-grained recognition of the corresponding objects of interest in the next frame of the image to be recognized for which the reliability does not meet the requirements, and the new objects of interest that appear in the next frame of the image to be recognized.

为达上述目的，本申请第二方面实施例提出了一种基于级联神经网络和目标时空连续性的细粒度识别系统，包括：In order to achieve the above purpose, the second embodiment of the present application proposes a fine-grained recognition system based on cascaded neural networks and target spatio-temporal continuity, including:

获取模块，用于获取时间连续的多帧待识别图像；The acquisition module is used to acquire multiple consecutive frames of images to be recognized;

目标检测模块，用于利用目标检测算法对各待识别图像进行目标检测，以得到所有感兴趣目标的位置分布；The target detection module is used to use the target detection algorithm to perform target detection on each image to be recognized to obtain the location distribution of all targets of interest;

目标追踪模块，用于基于目标时空连续性，利用目标跟踪算法对所述感兴趣目标进行跟踪，以确定各待识别图像中的新出现的感兴趣目标；A target tracking module, used to track the target of interest based on the spatio-temporal continuity of the target using a target tracking algorithm to determine newly emerging targets of interest in each image to be identified;

细粒度识别模块，用于利用细粒度识别算法对所述新出现的感兴趣目标进行细粒度识别，以得到各感兴趣目标的识别结果；A fine-grained identification module, used to perform fine-grained identification of the newly emerged targets of interest using a fine-grained identification algorithm to obtain identification results of each target of interest;

输出模块，用于基于所述位置分布和所述识别结果，得到各待识别图像的感兴趣目标的位置信息和细粒度分类结果。An output module is configured to obtain the position information and fine-grained classification results of the target of interest in each image to be recognized based on the position distribution and the recognition result.

在本申请的第二方面的系统中，所述获取模块直接获取或通过视频获得所述多帧待识别图像。In the system of the second aspect of the present application, the acquisition module acquires the multiple frames of images to be recognized directly or through video.

在本申请的第二方面的系统中，所述目标检测模块中，所述目标检测算法采用轻量化目标检测算法。In the system of the second aspect of the present application, in the target detection module, the target detection algorithm adopts a lightweight target detection algorithm.

在本申请的第二方面的系统中，所述目标追踪模块，具体用于：针对感兴趣目标的外观信息和运动信息，利用目标跟踪算法匹配关联时间连续的待识别图像中的感兴趣目标，得到感兴趣目标的连续轨迹，进而确定各待识别图像中的新出现的感兴趣目标。In the system of the second aspect of the present application, the target tracking module is specifically used to: use a target tracking algorithm to match the target of interest in the associated time-continuous images to be identified based on the appearance information and motion information of the target of interest, The continuous trajectory of the target of interest is obtained, and then the new target of interest appearing in each image to be recognized is determined.

在本申请的第二方面的系统中，所述细粒度识别模块，还用于：获得当前帧待识别图像的各感兴趣目标的识别结果后，判断所述识别结果的可靠性；若存在可靠性不满足要求的感兴趣目标，则针对该可靠性不满足要求的感兴趣目标在下一帧待识别图像中对应的感兴趣目标，及下一帧待识别图像中新出现的感兴趣目标利用细粒度识别算法进行细粒度识别。In the system of the second aspect of the present application, the fine-grained recognition module is also used to: after obtaining the recognition results of each object of interest in the image to be recognized in the current frame, determine the reliability of the recognition results; if there is a reliable For the target of interest whose reliability does not meet the requirements, the target of interest corresponding to the target of interest whose reliability does not meet the requirements is used in the next frame of the image to be recognized, and the new target of interest that appears in the next frame of the image to be recognized is used. Granular recognition algorithm performs fine-grained recognition.

为达上述目的，本申请第三方面实施例提出了一种电子设备，包括：处理器，以及与所述处理器通信连接的存储器；所述存储器存储计算机执行指令；所述处理器执行所述存储器存储的计算机执行指令，以实现本申请第一方面提出的方法。To achieve the above purpose, a third embodiment of the present application provides an electronic device, including: a processor, and a memory communicatively connected to the processor; the memory stores computer execution instructions; the processor executes the The computer executes instructions stored in the memory to implement the method proposed in the first aspect of this application.

为达上述目的，本申请第四方面实施例提出了一种计算机可读存储介质，所述计算机可读存储介质中存储有计算机执行指令，所述计算机执行指令被处理器执行时用于实现本申请第一方面提出的方法。In order to achieve the above object, the fourth embodiment of the present application proposes a computer-readable storage medium. Computer-executable instructions are stored in the computer-readable storage medium. When the computer-executable instructions are executed by a processor, they are used to implement the present invention. Apply the method proposed by the first aspect.

本申请提供的基于级联神经网络和目标时空连续性的细粒度识别方法、系统、电子设备及存储介质，利用目标检测算法、目标跟踪算法和细粒度识别算法组成级联神经网络，首先利用目标检测算法快速检测出待识别图像中的感兴趣目标，之后通过目标跟踪算法实现各待识别图像中感兴趣目标的跟踪得到新出现的感兴趣目标，然后对新出现的感兴趣目标进行细粒度识别，由此能够避免对目标进行重复的细粒度识别，从而降低计算量，实现了更加高效的目标细粒度识别。The fine-grained identification method, system, electronic equipment and storage medium provided by this application are based on cascade neural networks and target spatio-temporal continuity. The target detection algorithm, target tracking algorithm and fine-grained identification algorithm are used to form a cascade neural network. First, the target is used The detection algorithm quickly detects the target of interest in the image to be identified, and then uses the target tracking algorithm to track the target of interest in each image to be identified to obtain the new target of interest, and then performs fine-grained identification of the newly emerged target of interest. , which can avoid repeated fine-grained identification of targets, thereby reducing the amount of calculation and achieving more efficient fine-grained identification of targets.

本申请附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本申请的实践了解到。Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

附图说明Description of the drawings

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解，其中：The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

图1为本申请实施例所提供的一种基于级联神经网络和目标时空连续性的细粒度识别方法的流程示意图；Figure 1 is a schematic flowchart of a fine-grained identification method based on cascaded neural networks and target spatiotemporal continuity provided by an embodiment of the present application;

图2为本申请实施例所提供的时间连续的三帧待识别图像中第一帧的识别结果图；Figure 2 is a recognition result diagram of the first frame of three consecutive frames of images to be recognized provided by the embodiment of the present application;

图3为本申请实施例所提供的时间连续的三帧待识别图像中第二帧的识别结果图；Figure 3 is a recognition result diagram of the second frame of three consecutive frames of images to be recognized provided by the embodiment of the present application;

图4为本申请实施例所提供的时间连续的三帧待识别图像中第三帧的识别结果图；Figure 4 is a recognition result diagram of the third frame of three consecutive frames of images to be recognized provided by the embodiment of the present application;

图5为本申请实施例所提供的一种基于级联神经网络和目标时空连续性的细粒度识别系统的框图。Figure 5 is a block diagram of a fine-grained recognition system based on cascaded neural networks and target spatiotemporal continuity provided by an embodiment of the present application.

具体实施方式Detailed ways

下面详细描述本申请的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本申请，而不能理解为对本申请的限制。The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the drawings are exemplary and are intended to explain the present application, but should not be construed as limiting the present application.

下面参考附图描述本申请实施例的基于级联神经网络和目标时空连续性的细粒度识别方法和系统。The fine-grained identification method and system based on cascaded neural networks and target spatiotemporal continuity according to the embodiments of the present application will be described below with reference to the accompanying drawings.

针对目前目标分类方法存在的多种问题，如对于不同目标的细粒度区分，需要更加复杂的网络结构来提取精细特征，如果对连续视频中的每帧图片都进行细粒度识别，会产生巨大的计算量导致无法实时识别，从而导致识别效率偏低；以及由于遮挡、模糊等因素，基于单帧图像无法提取目标的有效特征，这会导致识别精度较低等。本申请实施例提供了基于级联神经网络和目标时空连续性的细粒度识别方法，以实现高效准确的目标细粒度识别。In view of the various problems existing in current target classification methods, such as fine-grained distinction between different targets, a more complex network structure is needed to extract fine features. If fine-grained recognition is performed on each frame of a continuous video, huge problems will occur. The amount of calculation makes it impossible to recognize in real time, which leads to low recognition efficiency; and due to factors such as occlusion and blur, effective features of the target cannot be extracted based on a single frame image, which leads to low recognition accuracy. The embodiments of this application provide a fine-grained identification method based on cascaded neural networks and target spatiotemporal continuity to achieve efficient and accurate fine-grained identification of targets.

图1为本申请实施例所提供的一种基于级联神经网络和目标时空连续性的细粒度识别方法的流程示意图。如图1所示，该基于级联神经网络和目标时空连续性的细粒度识别方法包括以下步骤：Figure 1 is a schematic flowchart of a fine-grained identification method based on a cascade neural network and target spatiotemporal continuity provided by an embodiment of the present application. As shown in Figure 1, the fine-grained identification method based on cascade neural network and target spatiotemporal continuity includes the following steps:

步骤S101，获取时间连续的多帧待识别图像。Step S101: Acquire multiple consecutive frames of images to be recognized.

在步骤S101中，多帧待识别图像可以直接获取或通过视频获得。其中视频可以是在线实时视频或离线存储视频。In step S101, multiple frames of images to be recognized can be obtained directly or through video. The video can be an online real-time video or an offline stored video.

在步骤S101中，在通过视频获取时间连续的多帧待识别图像时，可以根据实际需求，对视频进行抽帧形成时间连续的多帧待识别图像。In step S101, when obtaining multiple frames of time-continuous images to be identified through video, frames may be extracted from the video to form multiple frames of time-continuous images to be identified according to actual needs.

在步骤S101中，若待识别图像的图像尺寸不满足目标检测算法的输入要求，还需要对各帧待识别图像进行预处理(例如剪裁)来调整图像尺寸，以适应目标检测网络(也称目标检测算法)的输入要求。In step S101, if the image size of the image to be recognized does not meet the input requirements of the target detection algorithm, each frame of the image to be recognized needs to be preprocessed (such as cropping) to adjust the image size to adapt to the target detection network (also known as target detection network). detection algorithm) input requirements.

步骤S102，利用目标检测算法对各待识别图像进行目标检测，以得到所有感兴趣目标的位置分布。Step S102: Use a target detection algorithm to perform target detection on each image to be recognized to obtain the location distribution of all targets of interest.

在步骤S102中，目标检测算法可以采用轻量化目标检测算法。其中轻量化目标检测算法使用复杂度较低、参数量较少的轻量化网络，如YOLO、MobileNet(移动神经网络架构)，能够快速检测出各待识别图像中所有感兴趣目标的位置分布。In step S102, the target detection algorithm may adopt a lightweight target detection algorithm. Among them, the lightweight target detection algorithm uses lightweight networks with lower complexity and fewer parameters, such as YOLO and MobileNet (mobile neural network architecture), which can quickly detect the location distribution of all targets of interest in each image to be recognized.

在步骤S102中，目标检测算法为提前训练好的模型。其中感兴趣目标与训练时的标签有关。例如训练时的标签为“鸟”，则本步骤目标检测得到的待识别图像中所有感兴趣目标即为待识别图像中所有鸟。In step S102, the target detection algorithm is a model trained in advance. The target of interest is related to the label during training. For example, the label during training is "bird", then all the objects of interest in the image to be recognized obtained by target detection in this step are all the birds in the image to be recognized.

在步骤S102中，目标检测算法只检测出待识别图像中的感兴趣目标的位置，不包含感兴趣目标的种类信息。In step S102, the target detection algorithm only detects the position of the target of interest in the image to be recognized, and does not include the type information of the target of interest.

步骤S103，基于目标时空连续性，利用目标跟踪算法对感兴趣目标进行跟踪，以确定各待识别图像中的新出现的感兴趣目标。Step S103: Based on the spatio-temporal continuity of the target, use the target tracking algorithm to track the target of interest to determine the newly emerging target of interest in each image to be recognized.

在步骤S103中，目标跟踪算法可以采用基于神经网络的跟踪算法，如DeepSORT(多目标跟踪算法)，以提高跟踪的鲁棒性和精度。In step S103, the target tracking algorithm may adopt a neural network-based tracking algorithm, such as DeepSORT (multiple target tracking algorithm), to improve the robustness and accuracy of tracking.

具体地，在步骤S103中，基于目标时空连续性，利用目标跟踪算法对感兴趣目标进行跟踪，以确定各待识别图像中的新出现的感兴趣目标，包括：针对感兴趣目标的外观信息和运动信息，利用目标跟踪算法匹配关联时间连续的待识别图像中的感兴趣目标，得到感兴趣目标的连续轨迹，进而确定各待识别图像中的新出现的感兴趣目标。Specifically, in step S103, based on the spatio-temporal continuity of the target, a target tracking algorithm is used to track the target of interest to determine the newly emerging target of interest in each image to be recognized, including: appearance information for the target of interest and Based on the motion information, the target tracking algorithm is used to match the target of interest in the images to be identified with continuous correlation time, and the continuous trajectory of the target of interest is obtained, and then the new target of interest appearing in each image to be recognized is determined.

其中目标跟踪方法利用感兴趣目标(可以简称为目标)的空间联系性信息如外观信息和运动信息，匹配关联时间连续帧中的目标，得到各目标的连续轨迹，同时管理目标数据信息，管理内容包括：为不同的目标赋予不同的标号，对于新出现的目标赋予新的标号，对于消失的目标移出当前匹配库。Among them, the target tracking method uses the spatial connection information of the target of interest (which can be referred to as the target for short), such as appearance information and motion information, to match targets in continuous frames of associated time, obtain the continuous trajectory of each target, and manage target data information and content at the same time. Including: assigning different labels to different targets, assigning new labels to newly appearing targets, and removing disappearing targets from the current matching library.

步骤S104，利用细粒度识别算法对新出现的感兴趣目标进行细粒度识别，以得到各感兴趣目标的识别结果。Step S104: Use a fine-grained recognition algorithm to perform fine-grained recognition of newly emerged targets of interest to obtain recognition results for each target of interest.

在步骤S104中，细粒度识别模型采用网络层次较深的神经网络，以提取更加精细的目标特征，实现不同细粒度种类(如种类信息)的区分。其中细粒度识别算法为提前训练好的模型。其中感兴趣目标的种类信息与该模型训练时的标签有关。例如感兴趣目标为“鸟”时，细粒度识别算法训练时标签包括“灰鹤”、“白鹤”等鸟的种类信息，则本步骤进行细粒度识别后得到的任一感兴趣目标的识别结果(也称细粒度分类结果)即为对应的“灰鹤”、“白鹤”等中的一个。In step S104, the fine-grained recognition model uses a neural network with a deeper network level to extract more refined target features and achieve distinction between different fine-grained categories (such as category information). The fine-grained recognition algorithm is a model trained in advance. The type information of the target of interest is related to the label when training the model. For example, when the target of interest is "bird" and the fine-grained recognition algorithm training labels include bird species information such as "gray crane" and "white crane", then the recognition result of any target of interest obtained after fine-grained recognition in this step is ( Also called fine-grained classification result) is one of the corresponding "grey crane", "white crane", etc.

在步骤S104中，细粒度识别算法的输入数据为各帧待识别图像中新出现的感兴趣目标，具体地，对于第一帧待识别图像，其中所有感兴趣目标均为新出现的感兴趣目标，此时利用细粒度识别算法对第一帧待识别图像中所有感兴趣目标进行细粒度识别；对于后续的各帧待识别图像，以任一帧为当前帧，则相比于对应的上一帧新增的感兴趣目标即为该当前帧待识别图像中新出现的感兴趣目标，此时利用细粒度识别算法对该当前帧待识别图像中新出现的感兴趣目标进行细粒度识别。由此能够避免对目标进行重复的细粒度识别，从而降低计算量，实现了高效的目标细粒度识别。In step S104, the input data of the fine-grained recognition algorithm is the new objects of interest that appear in each frame of the image to be recognized. Specifically, for the first frame of the image to be recognized, all the objects of interest are new objects of interest that appear. , at this time, the fine-grained recognition algorithm is used to perform fine-grained recognition of all the objects of interest in the first frame of the image to be recognized; for each subsequent frame of the image to be recognized, taking any frame as the current frame, compared with the corresponding previous frame The newly added target of interest in the frame is the new target of interest that appears in the image to be recognized in the current frame. At this time, a fine-grained recognition algorithm is used to perform fine-grained recognition of the new target of interest that appears in the image to be recognized in the current frame. This can avoid repeated fine-grained identification of targets, thereby reducing the amount of calculation and achieving efficient fine-grained identification of targets.

另外，考虑到由于遮挡、模糊等因素，基于单帧图像无法提取目标的有效特征，这会导致识别精度较低。在步骤S104中，还包括：获得当前帧待识别图像的各感兴趣目标的识别结果后，判断识别结果的可靠性；若存在可靠性不满足要求的感兴趣目标，则针对该可靠性不满足要求的感兴趣目标在下一帧待识别图像中对应的感兴趣目标，及下一帧待识别图像中新出现的感兴趣目标利用细粒度识别算法进行细粒度识别。In addition, considering that due to factors such as occlusion and blur, effective features of the target cannot be extracted based on a single frame image, which will lead to low recognition accuracy. In step S104, it also includes: after obtaining the recognition results of each object of interest in the image to be recognized in the current frame, judging the reliability of the recognition results; if there is an object of interest whose reliability does not meet the requirements, then judging whether the reliability does not meet the requirements. The required target of interest corresponding to the target of interest in the next frame of the image to be recognized, and the new target of interest appearing in the next frame of the image to be recognized, are fine-grained identified using a fine-grained recognition algorithm.

其中识别结果的可靠性可利用网络预测得分判定，得分越高可靠性越高。如预测得分大于等于得分阈值(例如0.7)则可靠性满足要求，否则可靠性不满足要求。如果可靠性不满足要求，则标记出来在下一帧再次进行细粒度识别，直到可信度较高，并且利用新的识别结果更新历史结果，以提高遮挡、模糊等情况下的识别精度。The reliability of the recognition results can be determined by using the network prediction score. The higher the score, the higher the reliability. If the prediction score is greater than or equal to the score threshold (for example, 0.7), the reliability meets the requirements, otherwise the reliability does not meet the requirements. If the reliability does not meet the requirements, it will be marked and fine-grained recognition will be performed again in the next frame until the reliability is high, and the new recognition results will be used to update the historical results to improve the recognition accuracy in cases of occlusion, blur, etc.

在步骤S104中，细粒度识别算法只对新出现的感兴趣目标或者之前识别结果可靠性差的感兴趣目标进行细粒度识别，以利用感兴趣目标的时空连续性来避免重复的细粒度识别，减轻计算量提高运行效率。同时，对于遮挡、模糊等复杂场景下的感兴趣目标，基于识别可靠性度量，融合多帧识别结果，增强识别的稳定性和准确性。In step S104, the fine-grained recognition algorithm only performs fine-grained recognition on new targets of interest or targets of interest with poor previous recognition results, so as to use the spatiotemporal continuity of the targets of interest to avoid repeated fine-grained recognition and reduce the risk of The amount of calculation increases operational efficiency. At the same time, for targets of interest in complex scenes such as occlusion and blur, multi-frame recognition results are fused based on recognition reliability measurement to enhance the stability and accuracy of recognition.

步骤S105，基于位置分布和识别结果，得到各待识别图像的感兴趣目标的位置信息和细粒度分类结果。Step S105: Based on the position distribution and recognition results, the position information and fine-grained classification results of the objects of interest in each image to be recognized are obtained.

具体地，在步骤S105中，汇总各待识别图像利用目标检测网络得到的感兴趣目标的位置分布，以及各待识别图像的各感兴趣目标的识别结果，最终得到所有待识别图像中所有感兴趣目标的位置信息和细粒度分类结果。Specifically, in step S105, the position distribution of the objects of interest obtained by using the target detection network in each image to be recognized is summarized, as well as the recognition results of each object of interest in each image to be recognized, and finally all the objects of interest in all the images to be recognized are obtained. Target location information and fine-grained classification results.

为了验证本申请的方法的效果，以鸟类细粒度识别为例进行实验，具体内容如下：In order to verify the effect of the method of this application, an experiment was conducted using the fine-grained identification of birds as an example. The specific contents are as follows:

步骤A：通过实时视频监控设备获取鸟类视频数据，并对视频进行每秒10帧的抽帧处理，得到时间连续的图像数据，之后将图像尺寸变换为448*448以输入到目标检测算法。Step A: Obtain bird video data through real-time video surveillance equipment, and perform frame extraction processing on the video at 10 frames per second to obtain time-continuous image data. Then, the image size is converted to 448*448 to be input to the target detection algorithm.

步骤B：利用轻量化目标检测算法YOLOv5对时间连续的各图像数据进行目标检测，从而得到所有感兴趣目标的位置分布信息。例如采用了YOLOv5s6网络，YOLOv5s6是一个较为轻量级的模型，由4个卷积层、8个Bottleneck层(瓶颈层)和1个输出层组成，总参数量约为10.5M。这使得YOLOv5s6在资源受限的环境下能够实现快速高效的实时目标检测。Step B: Use the lightweight target detection algorithm YOLOv5 to perform target detection on each image data that is continuous in time, thereby obtaining the location distribution information of all targets of interest. For example, the YOLOv5s6 network is used. YOLOv5s6 is a relatively lightweight model, consisting of 4 convolutional layers, 8 Bottleneck layers (bottleneck layers) and 1 output layer, with a total parameter amount of approximately 10.5M. This enables YOLOv5s6 to achieve fast and efficient real-time target detection in resource-constrained environments.

步骤C：利用目标跟踪算法DeepSORT对检测出的感兴趣目标进行跟踪，利用外观信息和运动信息匹配关联连续帧中的目标，得到目标的连续轨迹，同时管理目标数据信息，为不同的目标赋予不同的标号，对于新出现的目标赋予新的标号。DeepSort通过结合运动和外观信息的实现更准确关联匹配，使用卡尔曼滤波处理每帧的关联性，利用匈牙利算法进行关联度量，具有较高的跟踪性能；同时使用神经网络提取特征，提高了对缺失和遮挡形况下的鲁棒性。Step C: Use the target tracking algorithm DeepSORT to track the detected target of interest, use appearance information and motion information to match and associate targets in consecutive frames, and obtain the continuous trajectory of the target. At the same time, the target data information is managed, and different targets are assigned different label, and assign new labels to newly appearing targets. DeepSort achieves more accurate correlation matching by combining motion and appearance information, uses Kalman filtering to process the correlation of each frame, and uses the Hungarian algorithm for correlation measurement, which has high tracking performance; at the same time, it uses neural networks to extract features, which improves the detection of missing features. and robustness under occlusion conditions.

步骤D：利用细粒度识别算法对感兴趣目标进行细粒度识别，同时判断识别结果可靠性。细粒度识别模型采用convnext-base_4xb32_cmb200，该模型基于卷积神经网络架构，其中包含4个并行分支，每个分支具有32个卷积核。通过这种架构，模型能够提取不同层次的图像特征。其中本步骤只对各帧新出现的感兴趣目标或者之前识别可靠性差的感兴趣目标进行细粒度识别。识别可靠性利用预测得分判定，得分越高可靠性越高。如果预测得分低于0.7，即可靠性较低，则标记出来在下一帧再次进行细粒度识别，直到可信度较高，并且利用新的识别结果更新历史结果。利用目标的时空连续性来避免重复的细粒度识别，减轻计算量提高运行效率，同时提高遮挡、模糊等情况下的识别精度。Step D: Use the fine-grained recognition algorithm to conduct fine-grained recognition of the target of interest, and at the same time determine the reliability of the recognition results. The fine-grained recognition model uses convnext-base_4xb32_cmb200, which is based on a convolutional neural network architecture, which contains 4 parallel branches, each branch has 32 convolution kernels. Through this architecture, the model is able to extract image features at different levels. This step only performs fine-grained identification of new targets of interest that appear in each frame or targets of interest that were previously recognized with poor reliability. The recognition reliability is determined by the prediction score. The higher the score, the higher the reliability. If the prediction score is lower than 0.7, that is, the reliability is low, it is marked and fine-grained recognition is performed again in the next frame until the credibility is high, and the new recognition results are used to update the historical results. The spatio-temporal continuity of the target is used to avoid repeated fine-grained recognition, reduce the amount of calculation and improve operating efficiency, while improving the recognition accuracy in situations such as occlusion and blur.

步骤E：最后利用目标检测位置分布结果和细粒度识别分类结果，得到各图像数据的感兴趣目标的位置信息和细粒度分类结果。Step E: Finally, use the target detection position distribution results and fine-grained recognition and classification results to obtain the position information and fine-grained classification results of the target of interest in each image data.

图2为本申请实施例所提供的时间连续的三帧待识别图像中第一帧的识别结果图。图3为本申请实施例所提供的时间连续的三帧待识别图像中第二帧的识别结果图。图4为本申请实施例所提供的时间连续的三帧待识别图像中第三帧的识别结果图。Figure 2 is a recognition result diagram of the first frame of three consecutive frames of images to be recognized provided by the embodiment of the present application. Figure 3 is a recognition result diagram of the second frame of three consecutive frames of images to be recognized provided by the embodiment of the present application. Figure 4 is a recognition result diagram of the third frame of three consecutive frames of images to be recognized provided by the embodiment of the present application.

结合图2至图4可以看出：图2的画面中首先出现一只灰鹤，并被成功定位和识别；之后在图3的画面右侧几乎同时又出现两只灰鹤，其中距离较近的灰鹤被成功定位和识别，距离较远的灰鹤由于部分被遮挡没有被识别；随着目标的移动，图4中最初出现的灰鹤已经在画面中消失，原来被遮挡的灰鹤全部显现，并且被成功定位和识别，证明了本方法的有效性。Combining Figures 2 to 4, we can see that: a gray crane first appeared in the picture in Figure 2 and was successfully located and identified; then two gray cranes appeared almost simultaneously on the right side of the picture in Figure 3, among which the closer gray crane appeared. It was successfully located and identified. The gray cranes that were far away were not recognized because they were partially blocked. As the target moved, the gray cranes that initially appeared in Figure 4 disappeared from the screen. All the gray cranes that were originally blocked appeared and were successfully located. and identification, which proves the effectiveness of this method.

为了实现上述实施例，本申请还提出一种基于级联神经网络和目标时空连续性的细粒度识别系统。In order to implement the above embodiments, this application also proposes a fine-grained recognition system based on cascaded neural networks and target spatiotemporal continuity.

如图5所示，该基于级联神经网络和目标时空连续性的细粒度识别系统包括获取模块11、目标检测模块12、目标追踪模块13、细粒度识别模块14和输出模块15，其中：As shown in Figure 5, the fine-grained recognition system based on cascaded neural networks and target spatio-temporal continuity includes an acquisition module 11, a target detection module 12, a target tracking module 13, a fine-grained recognition module 14 and an output module 15, where:

获取模块11，用于获取时间连续的多帧待识别图像；The acquisition module 11 is used to acquire multiple frames of images to be recognized that are continuous in time;

目标检测模块12，用于利用目标检测算法对各待识别图像进行目标检测，以得到所有感兴趣目标的位置分布；The target detection module 12 is used to perform target detection on each image to be identified using a target detection algorithm to obtain the location distribution of all targets of interest;

目标追踪模块13，用于基于目标时空连续性，利用目标跟踪算法对感兴趣目标进行跟踪，以确定各待识别图像中的新出现的感兴趣目标；The target tracking module 13 is used to track the target of interest based on the spatio-temporal continuity of the target using the target tracking algorithm to determine the newly emerging target of interest in each image to be recognized;

细粒度识别模块14，用于利用细粒度识别算法对新出现的感兴趣目标进行细粒度识别，以得到各感兴趣目标的识别结果；The fine-grained identification module 14 is used to perform fine-grained identification of newly emerged targets of interest using a fine-grained identification algorithm to obtain identification results of each target of interest;

输出模块15，用于基于位置分布和识别结果，得到各待识别图像的感兴趣目标的位置信息和细粒度分类结果。The output module 15 is used to obtain the position information and fine-grained classification results of the target of interest in each image to be recognized based on the position distribution and recognition results.

进一步地，在本申请实施例的一种可能的实现方式中，获取模块11直接获取或通过视频获得多帧待识别图像。Further, in a possible implementation of the embodiment of the present application, the acquisition module 11 directly acquires or acquires multiple frames of images to be recognized through video.

进一步地，在本申请实施例的一种可能的实现方式中，目标检测模块12中，目标检测算法采用轻量化目标检测算法。Further, in a possible implementation manner of the embodiment of the present application, in the target detection module 12, the target detection algorithm adopts a lightweight target detection algorithm.

进一步地，在本申请实施例的一种可能的实现方式中，目标追踪模块13，具体用于：针对感兴趣目标的外观信息和运动信息，利用目标跟踪算法匹配关联时间连续的待识别图像中的感兴趣目标，得到感兴趣目标的连续轨迹，进而确定各待识别图像中的新出现的感兴趣目标。Further, in a possible implementation of the embodiment of the present application, the target tracking module 13 is specifically configured to use a target tracking algorithm to match the appearance information and motion information of the target of interest in the associated time-continuous images to be identified. target of interest, obtain the continuous trajectory of the target of interest, and then determine the new target of interest appearing in each image to be recognized.

进一步地，在本申请实施例的一种可能的实现方式中，细粒度识别模块14，还用于：获得当前帧待识别图像的各感兴趣目标的识别结果后，判断识别结果的可靠性；若存在可靠性不满足要求的感兴趣目标，则针对该可靠性不满足要求的感兴趣目标在下一帧待识别图像中对应的感兴趣目标，及下一帧待识别图像中新出现的感兴趣目标利用细粒度识别算法进行细粒度识别。Further, in a possible implementation of the embodiment of the present application, the fine-grained recognition module 14 is also used to: after obtaining the recognition results of each object of interest in the image to be recognized in the current frame, determine the reliability of the recognition results; If there is an object of interest whose reliability does not meet the requirements, then the corresponding object of interest in the next frame of the image to be recognized for the object of interest whose reliability does not meet the requirements, and the new object of interest that appears in the next frame of the image to be recognized are The target uses a fine-grained identification algorithm for fine-grained identification.

需要说明的是，前述对基于级联神经网络和目标时空连续性的细粒度识别方法实施例的解释说明也适用于该实施例的基于级联神经网络和目标时空连续性的细粒度识别系统，此处不再赘述。It should be noted that the foregoing explanation of the embodiment of the fine-grained identification method based on a cascade neural network and target spatio-temporal continuity is also applicable to the fine-grained identification system based on a cascade neural network and target spatio-temporal continuity in this embodiment. No further details will be given here.

本申请实施例中，利用目标检测算法、目标跟踪算法和细粒度识别算法组成级联神经网络，首先利用目标检测算法快速检测出待识别图像中的感兴趣目标，之后通过目标跟踪算法实现各待识别图像中感兴趣目标的跟踪得到新出现的感兴趣目标，然后对新出现的感兴趣目标进行细粒度识别，由此能够避免对目标进行重复的细粒度识别，从而降低计算量，实现了更加高效的目标细粒度识别。另外，对于遮挡、模糊等复杂场景下的目标，基于识别可靠性判断，能够融合多帧识别结果，增强识别的稳定性和准确性。即通过级联神经网络和目标时空连续性，实现高效准确的目标细粒度识别。In the embodiment of this application, a target detection algorithm, a target tracking algorithm and a fine-grained recognition algorithm are used to form a cascade neural network. First, the target detection algorithm is used to quickly detect the target of interest in the image to be identified, and then the target tracking algorithm is used to realize each target to be identified. The tracking of the target of interest in the recognition image obtains the newly emerging target of interest, and then performs fine-grained identification of the newly emerged target of interest. This can avoid repeated fine-grained identification of the target, thereby reducing the amount of calculation and achieving a more efficient Efficient fine-grained target identification. In addition, for targets in complex scenes such as occlusion and blur, based on recognition reliability judgment, multi-frame recognition results can be integrated to enhance the stability and accuracy of recognition. That is, through cascaded neural networks and target spatiotemporal continuity, efficient and accurate fine-grained target recognition is achieved.

为了实现上述实施例，本申请还提出一种电子设备，包括：处理器，以及与处理器通信连接的存储器；存储器存储计算机执行指令；处理器执行存储器存储的计算机执行指令，以实现执行前述实施例所提供的方法。In order to implement the above embodiments, this application also proposes an electronic device, including: a processor, and a memory communicatively connected to the processor; the memory stores computer execution instructions; the processor executes the computer execution instructions stored in the memory to implement the foregoing implementation. The method provided in the example.

为了实现上述实施例，本申请还提出一种计算机可读存储介质，计算机可读存储介质中存储有计算机执行指令，计算机执行指令被处理器执行时用于实现前述实施例所提供的方法。In order to implement the above embodiments, this application also proposes a computer-readable storage medium. Computer-executable instructions are stored in the computer-readable storage medium. When the computer-executable instructions are executed by a processor, they are used to implement the methods provided in the foregoing embodiments.

为了实现上述实施例，本申请还提出一种计算机程序产品，包括计算机程序，该计算机程序被处理器执行时实现前述实施例所提供的方法。In order to implement the above embodiments, this application also proposes a computer program product, which includes a computer program. When the computer program is executed by a processor, the method provided by the foregoing embodiments is implemented.

本申请中所涉及的信息的收集、存储、使用、加工、传输、提供和公开等处理，均符合相关法律法规的规定，且不违背公序良俗。The collection, storage, use, processing, transmission, provision and disclosure of the information involved in this application are in compliance with relevant laws and regulations and do not violate public order and good customs.

需要说明的是，来自用户的个人信息应当被收集用于合法且合理的用途，并且不在这些合法使用之外共享或出售。此外，应在收到用户知情同意后进行此类采集/共享，包括但不限于在用户使用该功能前，通知用户阅读用户协议/用户通知，并签署包括授权相关用户信息的协议/授权。此外，还需采取任何必要步骤，保卫和保障对此类个人信息数据的访问，并确保有权访问个人信息数据的其他人遵守其隐私政策和流程。To be clear, personal information from users should be collected for lawful and reasonable purposes and not shared or sold outside of those lawful uses. In addition, such collection/sharing should be carried out after receiving the informed consent of the user, including but not limited to informing the user to read the user agreement/user notice and sign an agreement/authorization including authorization-related user information before using the function. In addition, take any necessary steps to safeguard and secure access to such Personal Information Data and ensure that others who have access to Personal Information Data comply with its privacy policies and procedures.

本申请预期可提供用户选择性阻止使用或访问个人信息数据的实施方案。即本公开预期可提供硬件和/或软件，以防止或阻止对此类个人信息数据的访问。一旦不再需要个人信息数据，通过限制数据收集和删除数据可最小化风险。此外，在适用时，对此类个人信息去除个人标识，以保护用户的隐私。This application is expected to provide implementation solutions for users to selectively prevent the use or access of personal information data. That is, the present disclosure contemplates that hardware and/or software may be provided to prevent or block access to such personal information data. Risks are minimized by limiting data collection and deleting data once the personal information data is no longer needed. In addition, when applicable, such personal information is de-identified to protect user privacy.

在前述各实施例描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the foregoing description of various embodiments, reference to the description of the terms "one embodiment", "some embodiments", "examples", "specific examples", or "some examples" means that the specific description in conjunction with the embodiment or example Features, structures, materials or characteristics are included in at least one embodiment or example of the present application. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.

此外，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of this application, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为，表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分，并且本申请的优选实施方式的范围包括另外的实现，其中可以不按所示出或讨论的顺序，包括根据所涉及的功能按基本同时的方式或按相反的顺序，来执行功能，这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing customized logical functions or steps of the process. , and the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed out of the order shown or discussed, including in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which shall It should be understood by those skilled in the technical field to which the embodiments of this application belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何计算机可读介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.

应当理解，本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如，如果用硬件来实现和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或他们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present application can be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，该程序在执行时，包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.

此外，在本申请各个实施例中的各功能单元可以集成在一个处理模块中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现，也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时，也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in various embodiments of the present application can be integrated into a processing module, or each unit can exist physically alone, or two or more units can be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器，磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本申请的限制，本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。The storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc. Although the embodiments of the present application have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and cannot be understood as limitations of the present application. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present application. The embodiments are subject to changes, modifications, substitutions and variations.

Claims

1. A fine-grained identification method based on cascaded neural networks and target spatiotemporal continuity, which is characterized by including the following steps:

Obtain multiple frames of images to be recognized that are continuous in time;

Use the target detection algorithm to perform target detection on each image to be recognized to obtain the location distribution of all targets of interest;

Based on the spatio-temporal continuity of the target, use a target tracking algorithm to track the target of interest to determine new targets of interest appearing in each image to be recognized;

Use a fine-grained identification algorithm to perform fine-grained identification of the newly emerged targets of interest to obtain identification results for each target of interest;

Based on the position distribution and the recognition result, the position information and fine-grained classification results of the target of interest in each image to be recognized are obtained.

2. The fine-grained recognition method based on cascaded neural network and target spatiotemporal continuity according to claim 1, characterized in that the multiple frames of images to be recognized are obtained directly or through video.

3. The fine-grained identification method based on cascaded neural networks and target spatio-temporal continuity according to claim 1, characterized in that the target detection algorithm adopts a lightweight target detection algorithm.

4. The fine-grained identification method based on cascade neural network and target spatio-temporal continuity according to claim 1, characterized in that, based on the target spatio-temporal continuity, a target tracking algorithm is used to track the target of interest, To identify emerging objects of interest in each image to be recognized, including:

In view of the appearance information and motion information of the target of interest, the target tracking algorithm is used to match the target of interest in the images to be identified that are associated with continuous time, and the continuous trajectory of the target of interest is obtained, and then the newly emerged interesting objects in each image to be recognized are determined. Target.

5. The fine-grained identification method based on cascaded neural networks and target spatiotemporal continuity according to claim 1, characterized in that it also includes:

After obtaining the recognition results of each object of interest in the image to be recognized in the current frame, determine the reliability of the recognition results;

If there is an object of interest whose reliability does not meet the requirements, then the corresponding object of interest in the next frame of the image to be recognized for the object of interest whose reliability does not meet the requirements, and the new object of interest that appears in the next frame of the image to be recognized are The target uses a fine-grained identification algorithm for fine-grained identification.

6. A fine-grained recognition system based on cascaded neural networks and target spatiotemporal continuity, which is characterized by:

The acquisition module is used to acquire multiple consecutive frames of images to be recognized;

The target detection module is used to use the target detection algorithm to perform target detection on each image to be recognized to obtain the location distribution of all targets of interest;

A target tracking module, used to track the target of interest based on the spatio-temporal continuity of the target using a target tracking algorithm to determine newly emerging targets of interest in each image to be identified;

A fine-grained identification module, used to perform fine-grained identification of the newly emerged targets of interest using a fine-grained identification algorithm to obtain identification results of each target of interest;

An output module is configured to obtain the position information and fine-grained classification results of the target of interest in each image to be recognized based on the position distribution and the recognition result.

7. The fine-grained recognition system based on cascaded neural networks and target spatio-temporal continuity according to claim 6, characterized in that the acquisition module directly acquires or acquires the multiple frames of images to be recognized through video.

8. The fine-grained recognition system based on cascaded neural network and target spatio-temporal continuity according to claim 6, characterized in that in the target detection module, the target detection algorithm adopts a lightweight target detection algorithm.

9. The fine-grained recognition system based on cascade neural network and target spatiotemporal continuity according to claim 6, characterized in that the target tracking module is specifically used for:

10. The fine-grained recognition system based on cascaded neural network and target spatio-temporal continuity according to claim 6, characterized in that the fine-grained recognition module is also used to:

11. An electronic device, characterized by comprising: a processor, and a memory communicatively connected to the processor;

The memory stores computer execution instructions;

The processor executes computer-executable instructions stored in the memory to implement the method according to any one of claims 1-5.

12. A computer-readable storage medium, characterized in that computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, the computer-executable instructions are used to implement any one of claims 1-5. method described in the item.