CN117542085B - Park scene pedestrian detection method, device and equipment based on knowledge distillation - Google Patents

Park scene pedestrian detection method, device and equipment based on knowledge distillation Download PDF

Info

Publication number
CN117542085B
CN117542085B CN202410036468.XA CN202410036468A CN117542085B CN 117542085 B CN117542085 B CN 117542085B CN 202410036468 A CN202410036468 A CN 202410036468A CN 117542085 B CN117542085 B CN 117542085B
Authority
CN
China
Prior art keywords
model
target
pedestrian
knowledge distillation
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410036468.XA
Other languages
Chinese (zh)
Other versions
CN117542085A (en
Inventor
佘亮
曾阳艳
曹文治
梁伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University of Technology
Original Assignee
Hunan University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University of Technology filed Critical Hunan University of Technology
Priority to CN202410036468.XA priority Critical patent/CN117542085B/en
Publication of CN117542085A publication Critical patent/CN117542085A/en
Application granted granted Critical
Publication of CN117542085B publication Critical patent/CN117542085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses a park scene pedestrian detection method, device and equipment based on knowledge distillation, comprising the following steps: acquiring a training data set, wherein the training data set comprises pedestrian detection data of pedestrian labels; training a first model and a second model by adopting a training data set to obtain a target teacher model and a target student model, wherein the first model is a pedestrian detection model without an anchor frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor frame with a backbone network of ResNet 18; aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain a pedestrian recognition model; pedestrian detection of the park scene is performed by adopting a pedestrian recognition model. The compression of the pedestrian detection model through knowledge distillation is achieved, so that the calculation pressure of pedestrian detection tasks on a park scene is reduced, and the efficiency of park pedestrian recognition is improved.

Description

Park scene pedestrian detection method, device and equipment based on knowledge distillation
Technical Field
The invention relates to the field of data processing, in particular to a method, a device and equipment for detecting pedestrians in a park scene based on knowledge distillation.
Background
In recent years, with the wide application of related technologies such as artificial intelligence, big data, cloud computing and the like, the intellectualization of a park is greatly broken through and developed. In order to ensure the normal operation of the park and maintain the safety of the park, real-time pedestrian monitoring needs to be carried out on the park. With the introduction of the deep neural network, the pedestrian detection technology is greatly developed, and the detection accuracy is greatly improved. Meanwhile, the parameter number and the calculation amount of the pedestrian detection model based on the depth network are also greatly increased. More and more parks employ this approach for real-time pedestrian monitoring.
The inventor realizes that at least the following technical problems exist in the prior art in the process of realizing the invention: in a campus scene, the computing resources available for pedestrian detection are very limited, and because the campus staff is frequent in coming and going and is not fixed in time, the efficiency and real-time requirements of the campus staff monitoring task on the model are very high.
Disclosure of Invention
The embodiment of the invention provides a park scene pedestrian detection method and device based on knowledge distillation, computer equipment and a storage medium, so as to improve pedestrian recognition efficiency.
In order to solve the technical problems, an embodiment of the present application provides a method for detecting a pedestrian in a park scene based on knowledge distillation, where the method for detecting a pedestrian in a park scene based on knowledge distillation includes:
Acquiring a training data set, wherein the training data set comprises pedestrian detection data of pedestrian labels;
Training a first model and a second model by adopting the training data set to obtain a target teacher model and a target student model, wherein the first model is a pedestrian detection model without an anchor frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor frame with a backbone network of ResNet 18;
Aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain a pedestrian recognition model;
and adopting the pedestrian recognition model to detect pedestrians in the park scene.
Optionally, the training dataset is a pedestrian detection dataset CityPersons including street scenes in different places, different time periods, and different weather conditions.
Optionally, the training the first model and the second model with the training data set respectively, to obtain a target teacher model and a target student model includes:
Training a first model by adopting the training data set and the model super-parameters so that the detection performance of the first model reaches a preset condition to obtain the target teacher model, and pre-training a second model by adopting the training data set and the model super-parameters so that the second model reaches a convergence state to obtain the target student model, wherein the model super-parameters are a loss function, an optimizer and a learning rate.
Optionally, the training data set is trained and identified by the target teacher model and the target student model, and neck characteristics, a detection diagram and a splice diagram are sequentially obtained after the training data set passes through a backbone network;
the step of aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation, and the step of obtaining the pedestrian recognition model comprises the following steps:
minimizing the distance between neck features of the target teacher model and the target student model by adopting a feature distillation mode;
And calculating to obtain an output-based knowledge distillation loss value according to the spliced graph of the target teacher model and the target student model.
Optionally, the aligning and matching the target teacher model and the target student model by using feature-based knowledge distillation and output-based knowledge distillation, to obtain a pedestrian recognition model further includes:
the gaussian mask is calculated using the following formula:
Wherein, Numerical value representing point with position (i, j) in Gaussian mask, K is number of target lines in picture corresponding to Gaussian mask,/>Coordinates representing the center point of the kth pedestrian,/>And/>The width and the height of the detection frame corresponding to the pedestrian are the true value;
calculating a teacher true value mask by adopting the following formula:
Wherein, Target center point heat map representing teacher model,/>Representing a truth value diagram of a target center point;
the student truth mask is calculated using the following formula:
Wherein, A target center point heat map generated by the student model is represented;
fusing the Gaussian mask, the teacher truth mask and the student truth mask to obtain a fused mask;
and calculating local loss based on the fusion mask, and training based on the local loss to obtain the pedestrian recognition model.
Optionally, the detection map is a target center point heat map, a scale heat map and an offset map, the aligning and matching the target teacher model and the target student model by using feature-based knowledge distillation and output-based knowledge distillation, and obtaining the pedestrian recognition model further includes:
channel compression is respectively carried out on the neck feature and the detection graph to obtain a corresponding attention graph, and subtraction and splicing are carried out on the attention graph to obtain a bridging matrix;
Splicing the bridging matrixes of the three detection graphs to obtain a combined bridging matrix;
and calculating information flow knowledge distillation loss based on the combined bridging moment, and training according to the information flow knowledge distillation loss to obtain the pedestrian recognition model.
In order to solve the above technical problems, an embodiment of the present application further provides a device for detecting pedestrians in a park scene based on knowledge distillation, including:
the acquisition module is used for acquiring a training data set, wherein the training data set comprises pedestrian detection data of pedestrian labels;
the training module is used for training a first model and a second model by adopting the training data set to obtain a target teacher model and a target student model, wherein the first model is a pedestrian detection model without an anchor point frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor point frame with a backbone network of ResNet;
the alignment module is used for aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain a pedestrian recognition model;
and the detection module is used for detecting pedestrians in the park scene by adopting the pedestrian recognition model.
Optionally, the training data set is trained and identified by the target teacher model and the target student model, and neck characteristics, a detection diagram and a splice diagram are sequentially obtained after the training data set passes through a backbone network;
The alignment module includes:
a first alignment unit for minimizing a distance between neck features of the target teacher model and the target student model by means of feature distillation;
and the second alignment unit is used for calculating and obtaining an output-based knowledge distillation loss value according to the spliced graph of the target teacher model and the target student model.
Optionally, the aligning and matching the target teacher model and the target student model by using feature-based knowledge distillation and output-based knowledge distillation, to obtain a pedestrian recognition model further includes:
a first calculation unit that calculates a gaussian mask using the following formula:
Wherein, Numerical value representing point with position (i, j) in Gaussian mask, K is number of target lines in picture corresponding to Gaussian mask,/>Coordinates representing the center point of the kth pedestrian,/>And/>The width and the height of the detection frame corresponding to the pedestrian are the true value;
The second calculating unit is used for calculating a teacher true value mask by adopting the following formula:
Wherein, Target center point heat map representing teacher model,/>Representing a truth value diagram of a target center point;
a third calculation unit for calculating a student true value mask using the following formula:
Wherein, A target center point heat map generated by the student model is represented;
The fusion unit is used for fusing the Gaussian mask, the teacher true value mask and the student true value mask to obtain a fusion mask;
And the first training unit is used for calculating the local loss based on the fusion mask and training based on the local loss to obtain the pedestrian recognition model.
Optionally, the alignment module further includes:
The generating unit is used for respectively carrying out channel compression on the neck feature and the detection graph to obtain corresponding attention graph, and subtracting and splicing the attention graph to obtain a bridging matrix;
The splicing unit is used for splicing the bridging matrixes of the three detection graphs to obtain a combined bridging matrix;
and the second training unit is used for calculating information flow knowledge distillation loss based on the combined bridge moment and training according to the information flow knowledge distillation loss to obtain the pedestrian recognition model.
In order to solve the technical problem, the embodiment of the application also provides a computer device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of the method for detecting the pedestrians in the park scene based on knowledge distillation when executing the computer program.
To solve the above technical problem, the embodiments of the present application further provide a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the steps of the above method for detecting a pedestrian in a park scene based on knowledge distillation.
According to the method, the device, the computer equipment and the storage medium for detecting pedestrians in the park scene based on knowledge distillation, which are provided by the embodiment of the invention, a training data set is obtained, wherein the training data set comprises pedestrian detection data of pedestrian labels; training a first model and a second model by adopting a training data set to obtain a target teacher model and a target student model, wherein the first model is a pedestrian detection model without an anchor frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor frame with a backbone network of ResNet 18; aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain a pedestrian recognition model; pedestrian detection of the park scene is performed by adopting a pedestrian recognition model. The compression of the pedestrian detection model is achieved through knowledge distillation, so that the calculation pressure of pedestrian detection tasks on a park scene is reduced, and the efficiency of park pedestrian recognition is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a knowledge distillation based campus scene pedestrian detection method of the present application;
FIG. 3 is an exemplary diagram of the basic distillation architecture of a pedestrian detection model without anchor boxes of the present application;
FIG. 4 is a diagram showing an exemplary structure of mask fusion according to the present application;
FIG. 5 is an exemplary diagram of a bridging matrix generation process in accordance with one embodiment of the present application;
FIG. 6 is a schematic structural diagram of one embodiment of a knowledge distillation based campus scene pedestrian detection device in accordance with the present application;
FIG. 7 is a schematic diagram of an embodiment of a computer device in accordance with the application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description of the application and the claims and the description of the drawings above are intended to cover a non-exclusive inclusion. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the method for detecting the pedestrian in the park scene based on the knowledge distillation provided by the embodiment of the application is executed by the server, and correspondingly, the device for detecting the pedestrian in the park scene based on the knowledge distillation is arranged in the server.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements, and the terminal devices 101, 102, 103 in the embodiment of the present application may specifically correspond to application systems in actual production.
Referring to fig. 2, fig. 2 shows a method for detecting pedestrians in a park scene based on knowledge distillation according to an embodiment of the present invention, and the method is applied to a server in fig. 1 for illustration, and is described in detail as follows:
s201: and acquiring a training data set, wherein the training data set comprises pedestrian detection data of pedestrian labels.
Optionally, the training dataset is a pedestrian detection dataset CityPersons, including street scenes in different places, different time periods, different weather conditions.
S202: and training the first model and the second model by adopting a training data set to obtain a target teacher model and a target student model, wherein the first model is a pedestrian detection model without an anchor frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor frame with a backbone network of ResNet 18.
Specifically, in this embodiment, pedestrian detection algorithms are classified into two types, namely, a detection algorithm based on an anchor frame and a detection algorithm without an anchor frame, according to whether an anchor frame is set. Compared with the algorithm based on the anchor block, the algorithm without the anchor block has fewer parameters and higher detection speed, so that the algorithm without the anchor block is more suitable for actual application scenes of parks. The knowledge distillation method is a model compression method based on a teacher-student framework. Wherein, the teacher model is more complex, generally a model with wider, deeper and more parameters; the student model is simpler, and is generally a model with shallower parameters. Knowledge distillation enables student models to learn information useful in teacher models, enabling efficient knowledge transfer between large and small models.
Preferably, the present embodiment selects an anchor-free pedestrian detection model (CENTER AND SCALE Prediction, CSP) based on center point and size predictions.
Optionally, training the first model and the second model with the training data set respectively, to obtain the target teacher model and the target student model includes:
Training the first model by using a training data set and model super parameters to enable the detection performance of the first model to reach preset conditions to obtain a target teacher model, and pre-training the second model by using the training data set and the model super parameters to enable the second model to reach a convergence state to obtain a target student model, wherein the model super parameters are a loss function, an optimizer and a learning rate.
In this embodiment, a teacher model is trained, and a pedestrian detection model without an anchor frame with a backbone network of ResNet is selected as the teacher model, and super parameters such as a loss function, an optimizer, a learning rate and the like are used to train the teacher model, so that higher detection performance is achieved. Training a student model, selecting a pedestrian detection model without an anchor point frame, wherein the backbone network of the pedestrian detection model is ResNet, as the student model, and pre-training the student model by using super parameters such as loss function, optimizer, learning rate and the like to enable the student model to reach a convergence state.
S203: and aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain the pedestrian recognition model.
In this embodiment, output-based distillation and feature-based distillation commonly used in the target detection model knowledge distillation algorithm are transplanted to the pedestrian detection model CSP without anchor frame, and a basic distillation (General Distillation, GD) architecture suitable for the pedestrian detection model without anchor frame is obtained, as shown in fig. 3.
Specifically, the embodiment transfers the distillation based on output and the distillation based on characteristics to the pedestrian detection model without an anchor frame, and proposes a basic distillation architecture. And aligning and matching the teacher model and the student model by using feature-based knowledge distillation and output-based knowledge distillation, so that the student model can simulate the neck features and the detection diagrams of the teacher model, and the detection accuracy of the student model is improved.
In a specific optional implementation manner of this embodiment, training data sets are trained and identified through a target teacher model and a target student model, and neck features, detection diagrams and splice diagrams are sequentially obtained after the training data sets pass through a backbone network;
The method for aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation comprises the following steps:
minimizing the distance between neck features of the target teacher model and the target student model by adopting a feature distillation mode;
and calculating to obtain an output-based knowledge distillation loss value according to the splice diagram of the target teacher model and the target student model.
Further, the embodiment realizes local area knowledge distillation on the basis of a basic distillation framework, designs various masks and fuses the masks to obtain a distillation weight map, so that the distillation process can pay more attention to an area effective for improving the detection capability of the student model, such as a target center point, a small target or a shielding target, and the like, thereby further improving the detection accuracy of the student model.
Specifically, in a specific implementation of this embodiment, the non-anchor frame pedestrian detection model CSP uses the residual network ResNet as a backbone network, and processes the upper four-layer output of the backbone network ResNet through the neck module. The neck module uses inverse convolution to process features of different layers to obtain features of the same size and channel number. The neck module then splices the processed upper four layers of features in the channel dimension, resulting in a spliced neck feature of size 728×160×320. Therefore, the size of the neck feature obtained by the neck module of the pedestrian detection model without the anchor frame is not influenced by the specification of the backbone network, and the neck feature has fixed size 728×160×320, namely the neck features of the teacher model and the student model are identical. The neck features of the pedestrian detection model without the anchor frame simultaneously comprise bottom features and high-level features in backbone network output features, and rich information is contained. Therefore, the basic knowledge distillation architecture proposed in this embodiment directly uses the neck features for distillation, i.e. minimizes the distance between the neck features of the teacher model and the student model. Using SmoothL a distance as an evaluation function, a loss value based on neck features in trainingCan be calculated from formula (1):
Wherein, And/>Representing neck features of the teacher model and the student, respectively. /(I)The number of channels representing the neck feature is 728. /(I)And/>Representing the height and width of the feature, 160 and 320, respectively.
Furthermore, the structure of the detection head module in the pedestrian detection model CSP without the anchor frame is simpler, the detection head module is only composed of a plurality of convolution layers, and three detection diagrams, namely a target center point heat diagram, a scale heat diagram and an offset diagram, can be obtained and are respectively used for predicting the center coordinates, the scale size and the offset of the center point of the target. From these three feature maps, the final target pedestrian prediction frame can be obtained.
Even if the sizes of the backbone networks are different, the three detection graphs obtained by the pedestrian detection models without anchor blocks have the same size, namely the sizes of the detection graphs of the teacher model and the student model are equal. In order to simplify the calculation process of the loss value, distillation training is more convenient, and the three detection graphs are spliced. Order the、/>And/>Representing a target central point heat map, a scale heat map and an offset map respectively, and splicing the detection map/>Can be obtained from formula (2):
By using Spliced detection graph representing teacher model,/>Representing a spliced detection diagram corresponding to the student model. In distillation training, the SmoothL function was still used as the loss function. The output-based knowledge distillation loss value can be calculated according to equation (3):
Wherein, 、/>And/>The number of channels, the height and the width of the splice detection map are respectively shown.
In a specific optional implementation manner of this embodiment, the aligning and matching the target teacher model and the target student model by using feature-based knowledge distillation and output-based knowledge distillation, and obtaining the pedestrian recognition model further includes:
the gaussian mask is calculated using the following formula:
Wherein, Numerical value representing point with position (i, j) in Gaussian mask, K is number of target lines in picture corresponding to Gaussian mask,/>Coordinates representing the center point of the kth pedestrian,/>And/>The width and the height of the detection frame corresponding to the pedestrian are the true value;
calculating a teacher true value mask by adopting the following formula:
Wherein, Target center point heat map representing teacher model,/>Representing a truth value diagram of a target center point;
the student truth mask is calculated using the following formula:
Wherein, A target center point heat map generated by the student model is represented;
fusing the Gaussian mask, the teacher truth mask and the student truth mask to obtain a fused mask;
And calculating the local loss based on the fusion mask, and training based on the local loss to obtain the pedestrian recognition model.
Specifically, the present embodiment proposes a local area knowledge distillation algorithm that can focus distillation attention on an area effective for improving the detection ability of a student model, and is therefore also referred to as a local area knowledge distillation algorithm. The local area knowledge distillation algorithm is realized based on multiple types of masks, and the multiple masks are designed and fused, so that the overall weight during distillation is obtained, and the knowledge distillation effect of the pedestrian detection model CSP without the anchor point frame is further improved on the basic distillation framework.
Further, if the center point position diagram (the size of the center point position of the pedestrian is 1 and the rest is 0) in the truth value of the pedestrian detection model CSP training process without the anchor point frame is directly used as a mask, the distillation will only pay attention to the information of the center point of the target pedestrian, and the target edge information will be lost. In this embodiment, a gaussian mask with a larger attention area is used, and the magnitude of the upper value gradually decreases from the center to the edge of the target.
Further, the detection result of the teacher model is not necessarily completely correct, and in many cases, the prediction result of the teacher model is deviated or erroneous from the true value. It is therefore difficult to avoid introducing some of the noise in the teacher model that affects pedestrian detection into the student model during distillation using the basic distillation architecture. The invention designs a teacher true value mask to guide distillation, and guides distillation to transfer information favorable for pedestrian detection in a teacher model to a student model, so that the transfer of error information in the teacher model to the student model is reduced.
Furthermore, the invention provides a student truth value mask calculated by a target center point heat map and a target center point truth value map output by students. In the student truth mask, the value of the point where the student model predicts the error is higher, and the value of the point where the prediction is correct is lower. Thus, adding a student truth mask to the distillation may guide the distillation to focus more on areas where student models are mispredicted.
After three masks are obtained, as shown in fig. 4, the three masks are spliced, and the spliced masks are input into a convolution layer with a convolution kernel size of 1×1, so as to obtain a single-channel combined mask. And then, removing the channel dimension in the combined mask to obtain the finally used fusion mask. Further, a loss calculation is performed.
In the present embodiment, letRepresenting the fused mask. The loss function corresponding to whole local area knowledge distillation can be based on the/>, of the basic distillation architectureAnd/>Expressed as the following formula:
Further, the embodiment realizes knowledge distillation of information flow based on a basic distillation architecture, performs knowledge distillation on the generation process from the neck feature to the detection graph, and expands the types of information transferred between the teacher model and the student model.
In a specific optional implementation manner of this embodiment, the detection map is a target center point heat map, a scale heat map and an offset map, and the aligning and matching are performed on the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation, so that obtaining the pedestrian recognition model further includes:
channel compression is respectively carried out on the neck feature and the detection graph to obtain a corresponding attention graph, and subtraction and splicing are carried out on the attention graph to obtain a bridging matrix;
Splicing the bridging matrixes of the three detection graphs to obtain a combined bridging matrix;
and calculating information flow knowledge distillation loss based on the combined bridging moment, and training according to the information flow knowledge distillation loss to obtain the pedestrian recognition model.
Specifically, the neck feature of the pedestrian detection model CSP without the anchor frame is different from the channel numbers of the three detection graphs, the channel number of the neck feature is 728, and the channel numbers of the target center point heat graph, the scale heat graph and the offset graph of the three detection graphs are unit numbers. Therefore, the two types of information graphs need to be converted into the same size, so that subsequent calculation operation on the neck feature to detection graph generation matrix is facilitated.
For channel alignment, this embodiment draws attention to this concept. The degree of activation of each pixel on the feature map can be obtained by computing an attention map. Thus, the present embodiment calculates corresponding attention patterns for both the neck feature map and the detection map, and uses the difference between the attention patterns to represent the change from the neck feature to a certain detection map. Since this difference matrix serves to connect the neck feature and the test pattern, this strive-to-note difference matrix is made the bridging matrix. The generation of the bridging matrix is shown in fig. 5. Firstly, respectively carrying out channel compression on the neck feature and the detection graph to obtain a corresponding attention graph. And then subtracting and splicing the attention graphs to obtain the actually used bridging matrix.
The calculation method of the attention map of the neck characteristics of the pedestrian detection model CSP without the anchor frame is as follows:
In the above-mentioned method, the step of, For/>Channel number,/>Representing neck features/>The corresponding attention is sought. By adopting the calculation mode, the attention force diagrams of the detection diagrams can be obtained, and the attention force diagrams of the three detection diagrams of the target center point heat diagram, the scale heat diagram and the offset diagram are respectively recorded as/>、/>And/>
Taking the target center point heat map as an example, the calculation formula of the bridge matrix from the neck feature to the target center point heat map can be expressed as the following formula:
In the same way, a bridging matrix corresponding to the scale heat map and offset map can be calculated And/>. Splicing the three bridging matrices to obtain a combined bridging matrix/>
By usingCombined bridge matrix representing teacher model,/>Representing the bridge matrix of the student model. /(I)The number of channels, height and width of the bridge matrix are shown, respectively. Since all bridging matrices are of the same size, therefore/>Is common. Therefore, the calculation mode of the loss value corresponding to the information flow knowledge distillation algorithm is shown as the following formula:
S204: pedestrian detection of the park scene is performed by adopting a pedestrian recognition model.
Specifically, experiments and verification are carried out on CityPersons datasets on the park scene pedestrian detection method based on knowledge distillation, so that the knowledge distillation algorithm based on the anchor-free frame pedestrian detection model CSP provided by the embodiment can effectively improve the detection effect of the anchor-free frame pedestrian detection model, the experimental result shows that the detection effect of the anchor-free frame pedestrian detection model obtained after distillation can be equivalent to that of the current mainstream model, the parameter quantity of the model after distillation is far smaller than that of the common pedestrian detection model, and the method is well suitable for park scenes with limited calculation resources. After the pedestrian recognition model with a relatively standing item is obtained through training, the pedestrian recognition model is used for pedestrian detection of a park scene, and has relatively good detection efficiency.
In the embodiment, a training data set is obtained, wherein the training data set comprises pedestrian detection data of pedestrian labels; training a first model and a second model by adopting a training data set to obtain a target teacher model and a target student model, wherein the first model is a pedestrian detection model without an anchor frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor frame with a backbone network of ResNet 18; aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain a pedestrian recognition model; pedestrian detection of the park scene is performed by adopting a pedestrian recognition model. The compression of the pedestrian detection model is achieved through knowledge distillation, so that the calculation pressure of pedestrian detection tasks on a park scene is reduced, and the efficiency of park pedestrian recognition is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
Fig. 6 shows a schematic block diagram of a knowledge distillation-based campus scene pedestrian detection device in one-to-one correspondence with the knowledge distillation-based campus scene pedestrian detection method of the above embodiment. As shown in fig. 6, the knowledge distillation-based campus scene pedestrian detection device includes an acquisition module 31, a training module 32, an alignment module 33, and a detection module 34. The functional modules are described in detail as follows:
An obtaining module 31, configured to obtain a training data set, where the training data set includes pedestrian detection data of a pedestrian label;
The training module 32 is configured to train the first model and the second model with a training data set to obtain a target teacher model and a target student model, where the first model is a pedestrian detection model without an anchor frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor frame with a backbone network of ResNet;
An alignment module 33, configured to align and match the target teacher model and the target student model by using feature-based knowledge distillation and output-based knowledge distillation, so as to obtain a pedestrian recognition model;
the detection module 34 is configured to perform pedestrian detection of the campus scene by using the pedestrian recognition model.
Optionally, the alignment module 33 includes:
a first alignment unit for minimizing a distance between neck features of the target teacher model and the target student model by means of feature distillation;
And the second alignment unit is used for calculating and obtaining an output-based knowledge distillation loss value according to the spliced graph of the target teacher model and the target student model.
Optionally, the aligning and matching the target teacher model and the target student model by using feature-based knowledge distillation and output-based knowledge distillation, and obtaining the pedestrian recognition model further includes:
a first calculation unit for calculating a gaussian mask using the following formula:
;/>
Wherein, Numerical value representing point with position (i, j) in Gaussian mask, K is number of target lines in picture corresponding to Gaussian mask,/>Coordinates representing the center point of the kth pedestrian,/>And/>The width and the height of the detection frame corresponding to the pedestrian are the true value;
The second calculating unit is used for calculating a teacher true value mask by adopting the following formula:
Wherein, Target center point heat map representing teacher model,/>Representing a truth value diagram of a target center point;
a third calculation unit for calculating a student true value mask using the following formula:
Wherein, A target center point heat map generated by the student model is represented;
the fusion unit is used for fusing the Gaussian mask, the teacher true mask and the student true mask to obtain a fusion mask;
And the first training unit is used for calculating the local loss based on the fusion mask and training based on the local loss to obtain the pedestrian recognition model.
Optionally, the alignment module 33 further includes:
The generating unit is used for respectively carrying out channel compression on the neck feature and the detection graph to obtain corresponding attention graph, and subtracting and splicing the attention graph to obtain a bridging matrix;
The splicing unit is used for splicing the bridging matrixes of the three detection graphs to obtain a combined bridging matrix;
and the second training unit is used for calculating information flow knowledge distillation loss based on the combined bridging moment and training according to the information flow knowledge distillation loss to obtain a pedestrian recognition model.
For specific limitations on the knowledge distillation-based campus scene pedestrian detection device, reference may be made to the above limitation on the knowledge distillation-based campus scene pedestrian detection method, and no further description is given here. The various modules in the knowledge distillation based campus scene pedestrian detection device described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 7, fig. 7 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only a computer device 4 having a component connection memory 41, a processor 42, a network interface 43 is shown in the figures, but it is understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and its hardware includes, but is not limited to, a microprocessor, an Application SPECIFIC INTEGRATED Circuit (ASIC), a Programmable gate array (Field-Programmable GATE ARRAY, FPGA), a digital Processor (DIGITAL SIGNAL Processor, DSP), an embedded device, and the like.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is generally used to store an operating system and various application software installed on the computer device 4, such as program codes of a method for detecting pedestrians in a park scene based on knowledge distillation. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute the program code stored in the memory 41 or process data, such as program code for executing a method for detecting pedestrians in a campus scene based on knowledge distillation.
The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.
The present application also provides another embodiment, namely, a computer readable storage medium storing an interface display program executable by at least one processor to cause the at least one processor to perform the steps of the method for detecting a pedestrian in a campus scene based on knowledge distillation as described above.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
It is apparent that the above-described embodiments are only some embodiments of the present application, but not all embodiments, and the preferred embodiments of the present application are shown in the drawings, which do not limit the scope of the patent claims. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a thorough and complete understanding of the present disclosure. Although the application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing description, or equivalents may be substituted for elements thereof. All equivalent structures made by the content of the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the scope of the application.

Claims (7)

1. The park scene pedestrian detection method based on knowledge distillation is characterized by comprising the following steps of:
Acquiring a training data set, wherein the training data set comprises pedestrian detection data of pedestrian labels;
Training a first model and a second model by adopting the training data set to obtain a target teacher model and a target student model, wherein the first model is an anchor-free frame pedestrian detection model with a backbone network of ResNet & lt 50 & gt, the second model is an anchor-free frame pedestrian detection model with a backbone network of ResNet & lt 18 & gt, the training data set is trained and identified by the target teacher model and the target student model, and neck characteristics, a detection diagram and a splice diagram are sequentially obtained after the training data set passes through the backbone network;
Aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain a pedestrian recognition model;
Adopting the pedestrian recognition model to detect pedestrians in a park scene;
the step of aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation, and the step of obtaining the pedestrian recognition model comprises the following steps:
minimizing the distance between neck features of the target teacher model and the target student model by adopting a feature distillation mode;
calculating to obtain an output-based knowledge distillation loss value according to the spliced graph of the target teacher model and the target student model;
The method for obtaining the pedestrian recognition model comprises the steps of obtaining a target teacher model, obtaining a target student model, obtaining a target center point heat map, a scale heat map and an offset map, aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation, and obtaining the pedestrian recognition model, wherein the detection map is the target center point heat map, the scale heat map and the offset map, and the method comprises the following steps:
channel compression is respectively carried out on the neck feature and the detection graph to obtain a corresponding attention graph, and subtraction and splicing are carried out on the attention graph to obtain a bridging matrix;
Splicing the bridging matrixes of the three detection graphs to obtain a combined bridging matrix;
and calculating information flow knowledge distillation loss based on the combined bridge matrix, and training according to the information flow knowledge distillation loss to obtain the pedestrian recognition model.
2. The knowledge distillation based campus scene pedestrian detection method of claim 1 wherein the training dataset is a pedestrian detection dataset CityPersons comprising street scenes at different locations, different time periods, different weather conditions.
3. The knowledge distillation based campus scene pedestrian detection method of claim 1 wherein training the first model and the second model with the training data set to obtain a target teacher model and a target student model, respectively, comprises:
Training a first model by adopting the training data set and the model super-parameters so that the detection performance of the first model reaches a preset condition to obtain the target teacher model, and pre-training a second model by adopting the training data set and the model super-parameters so that the second model reaches a convergence state to obtain the target student model, wherein the model super-parameters are a loss function, an optimizer and a learning rate.
4. The knowledge distillation based campus scene pedestrian detection method of claim 1 wherein the aligning and matching the target teacher model and the target student model using feature based knowledge distillation and output based knowledge distillation to obtain a pedestrian recognition model further comprises:
the gaussian mask is calculated using the following formula:
Wherein Mask gaussi,j represents the numerical value of the point with the position (i, j) in the Gaussian Mask, K is the number of target rows in the picture corresponding to the Gaussian Mask, (x k,yk) represents the coordinate of the center point of the kth pedestrian, and w k and h k are the width and the height of the detection frame true value corresponding to the pedestrian;
calculating a teacher true value mask by adopting the following formula:
Wherein, A target center point heat map representing a teacher model, and GT center represents a target center point truth map;
the student truth mask is calculated using the following formula:
Wherein, A target center point heat map generated by the student model is represented;
fusing the Gaussian mask, the teacher truth mask and the student truth mask to obtain a fused mask;
and calculating local loss based on the fusion mask, and training based on the local loss to obtain the pedestrian recognition model.
5. Park scene pedestrian detection device based on knowledge distillation, its characterized in that, park scene pedestrian detection device based on knowledge distillation includes:
the acquisition module is used for acquiring a training data set, wherein the training data set comprises pedestrian detection data of pedestrian labels;
the training module is used for training a first model and a second model by adopting the training data set to obtain a target teacher model and a target student model, wherein the first model is a pedestrian detection model without an anchor point frame with a backbone network of ResNet and the second model is a pedestrian detection model without an anchor point frame with a backbone network of ResNet;
the alignment module is used for aligning and matching the target teacher model and the target student model by adopting feature-based knowledge distillation and output-based knowledge distillation to obtain a pedestrian recognition model;
The detection module is used for detecting pedestrians in a park scene by adopting the pedestrian recognition model;
the training data set is trained and identified through the target teacher model and the target student model, and neck characteristics, a detection diagram and a splicing diagram are sequentially obtained after the training data set passes through a backbone network; the alignment module includes:
a first alignment unit for minimizing a distance between neck features of the target teacher model and the target student model by means of feature distillation;
The second alignment unit is used for calculating and obtaining an output-based knowledge distillation loss value according to the spliced graph of the target teacher model and the target student model;
The detection map is a target center point heat map, a scale heat map and an offset map, and the alignment module further comprises:
channel compression is respectively carried out on the neck feature and the detection graph to obtain a corresponding attention graph, and subtraction and splicing are carried out on the attention graph to obtain a bridging matrix;
Splicing the bridging matrixes of the three detection graphs to obtain a combined bridging matrix;
and calculating information flow knowledge distillation loss based on the combined bridge matrix, and training according to the information flow knowledge distillation loss to obtain the pedestrian recognition model.
6. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor, when executing the computer program, implements the knowledge distillation based campus scene pedestrian detection method of any one of claims 1 to 4.
7. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the knowledge distillation based campus scene pedestrian detection method of any one of claims 1 to 4.
CN202410036468.XA 2024-01-10 2024-01-10 Park scene pedestrian detection method, device and equipment based on knowledge distillation Active CN117542085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410036468.XA CN117542085B (en) 2024-01-10 2024-01-10 Park scene pedestrian detection method, device and equipment based on knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410036468.XA CN117542085B (en) 2024-01-10 2024-01-10 Park scene pedestrian detection method, device and equipment based on knowledge distillation

Publications (2)

Publication Number Publication Date
CN117542085A CN117542085A (en) 2024-02-09
CN117542085B true CN117542085B (en) 2024-05-03

Family

ID=89792385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410036468.XA Active CN117542085B (en) 2024-01-10 2024-01-10 Park scene pedestrian detection method, device and equipment based on knowledge distillation

Country Status (1)

Country Link
CN (1) CN117542085B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664893A (en) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 A kind of method for detecting human face and storage medium
WO2021023202A1 (en) * 2019-08-07 2021-02-11 交叉信息核心技术研究院(西安)有限公司 Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN112529178A (en) * 2020-12-09 2021-03-19 中国科学院国家空间科学中心 Knowledge distillation method and system suitable for detection model without preselection frame
CN113095249A (en) * 2021-04-19 2021-07-09 大连理工大学 Robust multi-mode remote sensing image target detection method
CN113673533A (en) * 2020-05-15 2021-11-19 华为技术有限公司 Model training method and related equipment
CN113792713A (en) * 2021-11-16 2021-12-14 北京的卢深视科技有限公司 Model training method, face recognition model updating method, electronic device and storage medium
CN116612450A (en) * 2023-04-19 2023-08-18 中国人民解放军火箭军工程大学 Point cloud scene-oriented differential knowledge distillation 3D target detection method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664893A (en) * 2018-04-03 2018-10-16 福州海景科技开发有限公司 A kind of method for detecting human face and storage medium
WO2021023202A1 (en) * 2019-08-07 2021-02-11 交叉信息核心技术研究院(西安)有限公司 Self-distillation training method and device for convolutional neural network, and scalable dynamic prediction method
CN113673533A (en) * 2020-05-15 2021-11-19 华为技术有限公司 Model training method and related equipment
CN112529178A (en) * 2020-12-09 2021-03-19 中国科学院国家空间科学中心 Knowledge distillation method and system suitable for detection model without preselection frame
CN113095249A (en) * 2021-04-19 2021-07-09 大连理工大学 Robust multi-mode remote sensing image target detection method
CN113792713A (en) * 2021-11-16 2021-12-14 北京的卢深视科技有限公司 Model training method, face recognition model updating method, electronic device and storage medium
CN116612450A (en) * 2023-04-19 2023-08-18 中国人民解放军火箭军工程大学 Point cloud scene-oriented differential knowledge distillation 3D target detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
指数弹性动量卷积神经网络及其在行人检测中的应用;岳颀;马彩文;;《哈尔滨工业大学学报》;20170530;第49卷(第05期);第159-164页 *

Also Published As

Publication number Publication date
CN117542085A (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN112685565B (en) Text classification method based on multi-mode information fusion and related equipment thereof
Chen et al. An edge traffic flow detection scheme based on deep learning in an intelligent transportation system
CN111898696B (en) Pseudo tag and tag prediction model generation method, device, medium and equipment
CN112395390B (en) Training corpus generation method of intention recognition model and related equipment thereof
US20230009547A1 (en) Method and apparatus for detecting object based on video, electronic device and storage medium
CN107832794A (en) A kind of convolutional neural networks generation method, the recognition methods of car system and computing device
CN114580794B (en) Data processing method, apparatus, program product, computer device and medium
CN112686243A (en) Method and device for intelligently identifying picture characters, computer equipment and storage medium
CN115082752A (en) Target detection model training method, device, equipment and medium based on weak supervision
CN111104941B (en) Image direction correction method and device and electronic equipment
CN114445684A (en) Method, device and equipment for training lane line segmentation model and storage medium
CN111178363A (en) Character recognition method and device, electronic equipment and readable storage medium
CN114359582A (en) Small sample feature extraction method based on neural network and related equipment
CN111310595B (en) Method and device for generating information
CN116186295B (en) Attention-based knowledge graph link prediction method, attention-based knowledge graph link prediction device, attention-based knowledge graph link prediction equipment and attention-based knowledge graph link prediction medium
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN117542085B (en) Park scene pedestrian detection method, device and equipment based on knowledge distillation
CN115700845A (en) Face recognition model training method, face recognition device and related equipment
CN112016503B (en) Pavement detection method, device, computer equipment and storage medium
CN112395450A (en) Picture character detection method and device, computer equipment and storage medium
CN114510592A (en) Image classification method and device, electronic equipment and storage medium
CN112699263B (en) AI-based two-dimensional art image dynamic display method and device
CN117688193B (en) Picture and text unified coding method, device, computer equipment and medium
CN115719465B (en) Vehicle detection method, device, apparatus, storage medium, and program product
CN117912052A (en) Pedestrian detection model training method, pedestrian detection method, device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant