CN117893873A - Active tracking method based on multi-mode information fusion - Google Patents

Active tracking method based on multi-mode information fusion Download PDF

Info

Publication number
CN117893873A
CN117893873A CN202410304634.XA CN202410304634A CN117893873A CN 117893873 A CN117893873 A CN 117893873A CN 202410304634 A CN202410304634 A CN 202410304634A CN 117893873 A CN117893873 A CN 117893873A
Authority
CN
China
Prior art keywords
training
information
fusion
tracking
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410304634.XA
Other languages
Chinese (zh)
Other versions
CN117893873B (en
Inventor
周云
吴巧云
谭春雨
伍煜强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202410304634.XA priority Critical patent/CN117893873B/en
Publication of CN117893873A publication Critical patent/CN117893873A/en
Application granted granted Critical
Publication of CN117893873B publication Critical patent/CN117893873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to an active tracking method based on multi-mode information fusion, which comprises the following steps of; acquiring colour imagesDepth imageSum normal mapThree data information; inputting the initial characteristics of three data information into a multi-mode information preprocessing module; the multi-mode information fusion module adopts a training mode of two stages of pre-training and formal training to perform feature fusion on initial features, and outputs and inputs formal training features with information fusion regularitiesOutputting a corresponding predicted execution action in a reinforcement learning AC framework network RACNet; the invention utilizes the multi-mode information acquired by the intelligent agent to more accurately describe the current state, and increases the constraint on the characteristics after fusion to improve the training efficiency of the reinforcement learning algorithm, thereby achieving ideal effects on the training efficiency and tracking precision.

Description

Active tracking method based on multi-mode information fusion
Technical Field
The invention relates to the technical field of target tracking, in particular to an active tracking method based on multi-mode information fusion.
Background
In the field of computer vision research, object tracking is a very challenging task. Target tracking in a general sense means that a target to be tracked is given in an initial frame and the position of the target is continuously output in a subsequent frame. In this tracking, it is generally assumed that the camera is fixed, and then a moving object in the field of view of the camera is tracked. Under this assumption, the target is easily moved out of the field of view of the camera or is blocked by other objects in the field of view, making it difficult for the tracker to accurately track and locate the target. The active target tracking based on vision aims to adjust the position and focal length of the camera in real time according to the target position in the vision observation information, and the camera is controlled to move along with the target so as to ensure that the target is always in the field of view of the camera, so that the active target tracking based on vision has important theoretical research significance and practical application value.
The prior active tracking work mainly inputs visual color images perceived by an intelligent agent into a convolution network to obtain state expression at the current moment, and sends the state expression to a subsequent reinforcement learning network. However, this method of extracting state expressions by convolutional networks is time consuming and inefficient. On one hand, visual information perceived by an intelligent agent is provided with information of multiple modes such as a depth image, a normal map and the like besides a color image, and the effective fusion of the information of multiple modes can provide a state expression with stronger informativity for an active tracking algorithm, so that the current state can be described more accurately, thereby accelerating the training of reinforcement learning and improving the training effect. On the other hand, the feature form after the multi-mode information fusion is also very important to the subsequent reinforcement learning network training, and the regularization constraint on the fused features can further improve the training efficiency and effect of reinforcement learning.
In the fields of computer vision and reinforcement learning research, active tracking is a challenging emerging task, and most of the previous active tracking algorithms do not fully utilize multi-mode data information acquired by an intelligent agent, do not further restrict fused features, and are difficult to achieve a satisfactory tracking effect. Therefore, in order to solve the above-mentioned problems, a more efficient active tracking algorithm needs to be proposed to improve the training efficiency and tracking effect of the algorithm.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides an active tracking method based on multi-modal information fusion, which solves the problems that the traditional active tracking method does not fully utilize multi-modal data information acquired by an intelligent agent, and further constraint on fused features does not cause lower training efficiency and poorer tracking effect. The method utilizes the multi-mode information acquired by the intelligent agent to carry out fusion to describe the current state more accurately, and increases the constraint on the characteristics after fusion to improve the training efficiency of the reinforcement learning algorithm, thereby achieving ideal effects on the training efficiency and tracking precision.
In order to solve the technical problems, the invention provides the following technical scheme: an active tracking method based on multi-mode information fusion comprises the following steps:
s1, acquiring data information of various modes under the view angle of a tracking agent in an active tracking virtual environment based on a UE framework, wherein the data information comprises a color imageDepth image->And normal map->Three data information;
s2, constructing a feature extraction fusion network FEFNet with a multi-mode information fusion mechanism, wherein the network comprises a multi-mode information preprocessing module and a multi-mode information fusion module;
s3, inputting the three data information in the S1 into a multi-mode information preprocessing module to obtain initial characteristics of the three data information、/>、/>
S4, the multi-mode information fusion module adopts a training mode of two stages of pre-training and formal training to perform initial characteristics on three data information、/>、/>Performing feature fusion to obtain a pre-training feature output +.>And formal training feature output->
S5, constructing a reinforcement learning AC framework network RACNet with information fusion regularization constraint, and outputting formal training characteristicsAnd outputting the corresponding predicted execution action by inputting the predicted execution action into the network.
Further, in step S1, the specific process includes the following steps:
s11, setting two intelligent agents with a moving function in an active tracking virtual environment based on a UE framework, namely a tracking intelligent agent and a target intelligent agent, wherein the two intelligent agents are controlled to move in the environment by a set program;
s12, acquiring data information of various modes under the view angle of the tracking agent, namely a color image, in real time through interactive codes in the virtual environmentDepth image->And normal map->Three data formats.
Further, in step S3, the multi-modal information preprocessing module is obtained by stacking the CONV-MP-ReLU layers, so that the preprocessing module can perform initial feature extraction on the input various modal data information, and input the color imageDepth image->And normal map->The sizes of the three data information are regulated to be uniform, and the initial characteristics of different data information after pretreatment in each mode are respectively as follows: />,/>
Wherein, the serial combination of the CONV-MP-ReLU layer, namely the CONV convolution layer, the MaxPooling maximum pooling layer and the ReLU activation layer is shown,then this represents a superposition of the individual CONV-MP-ReLU layers.
Further, in step S4, the multi-modal information fusion module includes a multi-modal information fusion module in a pre-training processAnd a multi-mode information fusion module in the formal training process>The multi-mode information fusion module in the pre-training process>Is composed of direct weighted fusion network, i.e. pre-training feature output +.>The method comprises the following steps:
wherein,,/>and->The weight corresponding to each mode is given;
multimode information fusion module in formal training processIs composed of mapping coding structure, i.e. formally training feature output +.>The method comprises the following steps:
wherein,representing the feature of the fusion of depth picture information and normal map information, < >>Representing matrix transpose operation,/->Representing a matrix multiplication operation, +.>Representation->Layer (S)>Representation->The layer of the material is formed from a layer,representing normalization operations->Representing a linear mapping operation of the result.
Further, in step S4, the multimodal information fusion module adopts a training manner of two stages of pre-training and formal training, which specifically includes: in the training stage 1, the invention uses the training network structure as the multi-mode information preprocessing module and the multi-mode information fusion moduleThe method comprises the steps of firstly training a proper multi-mode information preprocessing module in a mode of combining with a reinforcement learning network RACNet; in the training phase 2, the appropriate multimodal information preprocessing module parameters obtained in the phase 1 are preloaded as the multimodal information preprocessing module of the training phase 2, and the multimodal information fusion module in the training network structure is->Is replaced by->The form performs stage 2 training.
Further, in step S5, a reinforcement learning AC framework network RACNet with information fusion regularization constraint is constructed, specifically including: outputting formal training characteristics in multi-mode information fusion modulePerforming double regularization constraint, namely, p ∈>The singular matrix and the singular value thereof are constrained, and the corresponding singular matrix constraint is as follows:
wherein,and->Singular values of maximum and minimum singular matrices, respectively,/->And->Weights corresponding to the double regularization constraints respectively, and the extracted characteristics can be enabled to be +_f by constraining the singular matrix and the singular value thereof>The better the current state can be represented, so that the performance of the model is improved;
on the basis of the reinforcement learning AC framework, the reinforcement learning algorithm loss function after the double regularization constraint is added, which is expressed as:
wherein,and->For reinforcement learning of the loss functions corresponding to the Actor network and Critic network under the AC framework, < +.>Then is for->Is a double term regularization constraint term +.>The corresponding loss function is used to determine the loss,,/>and->The weight corresponding to each loss is given.
Further, in step S5, the corresponding predicted execution action is output, specifically including the following steps:
s51, sequentially inputting data information acquired in real time into a feature extraction fusion network FEFNet and a reinforcement learning AC framework network RACNet with information fusion regularization constraint, and outputting a corresponding predicted execution action;
s52, according to the obtained action instruction, the action direction of the tracking intelligent agent is adjusted in real time, so that the tracking intelligent agent can perform action adjustment according to the position of the target intelligent agent under the current view angle, and accurate active target tracking is performed;
s53, repeating all the steps until the target intelligent agent is lost in the field of view of the tracking intelligent agent or the target intelligent agent is tracked to a preset maximum frame number.
By means of the technical scheme, the invention provides an active tracking method based on multi-mode information fusion, which has at least the following beneficial effects:
1. the invention utilizes the fusion of the multi-modal information to actively track, and the feature extraction fusion network based on the multi-modal information fusion mechanism can extract the multi-modal information acquired by the intelligent agent more comprehensively, thus being more reasonable than the prior simple convolution network and achieving ideal effect on tracking performance. Compared with the traditional simple deep learning network model, the feature extraction fusion network under the multi-mode information fusion mechanism can more efficiently process and utilize the input data information under multiple modes;
2. the multi-mode information extraction and fusion mechanism is introduced in the aspect of feature extraction, and the feature extraction and fusion network of the designed multi-mode information fusion mechanism can effectively fuse the feature information of color images, depth images, normal maps and other modes. The multi-mode information fusion module adopts a two-stage training mode of pre-training and formal training to perform feature fusion on the initial features, so that the description capability and the characterization capability of the extracted features on the current state are further improved, the performance of active tracking is improved, and a more robust tracking effect is realized;
3. in the reinforcement learning framework network, further feature form regularization constraint is carried out on the input characterization features so as to improve training efficiency and accuracy of the subsequent reinforcement learning network. Compared with the characteristic features without feature constraint, the method has the advantages that the training convergence, the training effect and the like are improved, and faster active target tracking can be realized.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of an active tracking method based on multi-modal information fusion in the present invention;
FIG. 2 is a schematic view of a random environment selected during training in accordance with the present invention;
FIG. 3 is a schematic view of a simulation environment selected during testing in accordance with the present invention;
FIG. 4 is a schematic diagram of training length curves and rewards curves during a one-stage training process according to the present invention;
FIG. 5 is a schematic diagram of training length curves and rewards curves during a two-stage training process of the present invention;
fig. 6 is a graph of partial frame tracking results for city streetscapes and indoor scenes in accordance with the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. Therefore, the implementation process of how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented.
Those of ordinary skill in the art will appreciate that all or a portion of the steps in a method of implementing an embodiment described above may be implemented by a program to instruct related hardware, and thus the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Referring to fig. 1-6, a specific implementation of the present embodiment is shown, and the method constructs a feature extraction fusion network with a multi-modal information fusion mechanism and a reinforcement learning framework network with information fusion regularization constraints for training and testing. The method utilizes the multi-mode information acquired by the intelligent agent to carry out fusion to describe the current state more accurately, and increases the constraint on the characteristics after fusion to improve the training efficiency of the reinforcement learning algorithm, thereby achieving ideal effects on the training efficiency and tracking precision.
Referring to fig. 1, the embodiment provides an active tracking method based on multi-mode information fusion, which includes the following steps:
s1, acquiring data information of various modes under the view angle of a tracking agent in an active tracking virtual environment based on a UE framework, wherein the data information comprises a color imageDepth image->And normal map->Three data information;
as a preferred embodiment of step S1, the specific procedure comprises the steps of:
s11, setting two intelligent agents with a moving function in an active tracking virtual environment based on a UE framework, namely a tracking intelligent agent and a target intelligent agent, wherein the two intelligent agents are controlled to move in the environment by a set program;
s12, acquiring data information of various modes under the view angle of the tracking agent, namely a color image, in real time through interactive codes in the virtual environmentDepth image->And normal map->Three data formats. S2, constructing a feature extraction fusion network FEFNet with a multi-mode information fusion mechanism, wherein the network comprises a multi-mode information preprocessing module and a multi-mode information fusion module;
s3, inputting the three data information in the S1 into a multi-mode information preprocessing module to obtain initial characteristics of the three data information、/>、/>
As a preferred implementation manner of the step S3, in the step S3, the multi-mode information preprocessing module is obtained by superposing CONV-MP-ReLU layers, so that the preprocessing module can perform initial feature extraction on input various mode data information and input color imagesDepth image->And normal map->The sizes of the three data information are regulated to be uniform, and the initial characteristics of different data information after pretreatment in each mode are respectively as follows:,/>,/>
wherein, the serial combination of the CONV-MP-ReLU layer, namely the CONV convolution layer, the MaxPooling maximum pooling layer and the ReLU activation layer is shown,then this represents a superposition of the individual CONV-MP-ReLU layers.
S4, the multi-mode information fusion module adopts a training mode of two stages of pre-training and formal training to perform initial characteristics on three data information、/>、/>Performing feature fusion to obtain a pre-training feature output +.>And formal training feature output->
As a preferred embodiment of step S4, in step S4, the multi-modal information fusion module includes a multi-modal information fusion module in a pre-training processAnd a multi-mode information fusion module in the formal training process>The multi-mode information fusion module in the pre-training process>Is composed of direct weighted fusion network, i.e. pre-training feature output +.>The method comprises the following steps: />
Wherein,,/>and->The weight corresponding to each mode is given;
multimode information fusion module in formal training processIs composed of mapping coding structure, i.e. formally training feature output +.>The method comprises the following steps:
wherein,representing the feature of the fusion of depth picture information and normal map information, < >>Representing matrix transpose operation,/->Representing a matrix multiplication operation, +.>Representation->Layer (S)>Representation->The layer of the material is formed from a layer,representing normalization operations->Representing a linear mapping operation of the result.
As another preferred embodiment of step S4, in step S4, the multimodal information fusion module employs pre-training and formal trainingThe two-stage training mode of (2) specifically comprises: in the training stage 1, the invention uses the training network structure as the multi-mode information preprocessing module and the multi-mode information fusion moduleThe method comprises the steps of firstly training a proper multi-mode information preprocessing module in a mode of combining with a reinforcement learning network RACNet; in the training phase 2, the appropriate multimodal information preprocessing module parameters obtained in the phase 1 are preloaded as the multimodal information preprocessing module of the training phase 2, and the multimodal information fusion module in the training network structure is->Is replaced by->The form performs stage 2 training.
S5, constructing a reinforcement learning AC framework network RACNet with information fusion regularization constraint, and outputting formal training characteristicsAnd outputting the corresponding predicted execution action by inputting the predicted execution action into the network.
As a preferred embodiment of step S5, in step S5, a reinforcement learning AC framework network RACNet with information fusion regularization constraint is constructed, specifically including: outputting formal training characteristics in multi-mode information fusion modulePerforming double regularization constraint, namely, p ∈>The singular matrix and the singular value thereof are constrained, and the corresponding singular matrix constraint is as follows:
wherein,and->Singular values of maximum and minimum singular matrices, respectively,/->And->Weights corresponding to the double regularization constraints respectively, and the extracted characteristics can be enabled to be +_f by constraining the singular matrix and the singular value thereof>The better the current state can be represented, so that the performance of the model is improved;
on the basis of the reinforcement learning AC framework, the reinforcement learning algorithm loss function after the double regularization constraint is added, which is expressed as:
wherein,and->For reinforcement learning of the loss functions corresponding to the Actor network and Critic network under the AC framework, < +.>Then is for->Is a double term regularization constraint term +.>The corresponding loss function is used to determine the loss,,/>and->The weight corresponding to each loss is given.
As another preferred embodiment of step S5, in step S5, the execution action corresponding to the prediction is outputted, specifically including the steps of:
s51, sequentially inputting data information acquired in real time into a feature extraction fusion network FEFNet and a reinforcement learning AC framework network RACNet with information fusion regularization constraint, and outputting a corresponding predicted execution action;
s52, according to the obtained action instruction, the action direction of the tracking intelligent agent is adjusted in real time, so that the tracking intelligent agent can perform action adjustment according to the position of the target intelligent agent under the current view angle, and accurate active target tracking is performed;
s53, repeating all the steps until the target intelligent agent is lost in the field of view of the tracking intelligent agent or the target intelligent agent is tracked to a preset maximum frame number.
In the actual implementation process of the invention, the action selection of the tracking agent is shown in fig. 1, and various actions such as forward, backward, steering and the like can be executed, and the target agent can randomly act or walk according to a set path. The training environment and the testing environment are shown in fig. 2 and 3, and three different data forms selected by the invention in the training environment are shown in fig. 2, namely a color image mode, a depth image mode and a normal map mode from left to right. In addition, scene illumination, target texture, environment background and the like can be transformed through script control in the training environment, so that the environment is enhanced, and the migration and robustness of the algorithm are improved. In the testing process, the invention selects 2 simulation environments for testing, namely an outdoor scene and an indoor scene, as shown in fig. 3. Many different obstacles in the scenes block the vision, and the problems of light ray transformation exist, which increase a plurality of obstacles and difficulties for the intelligent agent to realize active tracking.
In the actual training process, rewards in the environment are calculated according to a preset rewarding function, the final rewards of the invention are related to the distance between the target intelligent agent and the tracking intelligent agent and the angle between the target intelligent agent and the tracking intelligent agent, namely, the closer the distance is to the set optimal distance and the angle between the target intelligent agent and the tracking intelligent agent is to the angle between the target intelligent agent and the tracking intelligent agent, the closer the distance is to the set optimal distance, the larger the rewards obtained by the algorithm are. The invention adopts a training mode of two stages, namely pre-training and formal training. In training stage 1, the invention uses training network structure as multi-mode information preprocessing moduleMultimode information fusion module>And a reinforcement learning network (RACNet) for first training out a suitable multimodal information preprocessing module. In training phase 2, the multimodal information fusion module in the training network structure is replaced with +.>And (3) preloading the parameters of the multi-mode information preprocessing module obtained in the stage 1, and training. Such training mode settings may help the network converge to better results faster in the initial stage and be more stable in the subsequent stage, thereby speeding up the overall training process.
Fig. 4 and 5 are length curves and rewards curves of the present invention during training, wherein the abscissa is the number of interactions and the ordinate is the optimal length and corresponding rewards value that the tracking agent can track with the current number of interactions. Fig. 4 shows a training length curve and a reward curve during a one-stage training process, and fig. 5 shows a training length curve and a reward curve during a two-stage training process. As can be seen from the figure, the two-stage training process can reach convergence quickly on a one-stage basis, and the two-stage training can reach a higher tracking length and tracking rewards than the one-stage training results.
Fig. 5 is a schematic diagram of the present invention partially tracking in indoor and outdoor scenes. The result shows that the method can continuously and stably track the target intelligent agent under the complex conditions of snowflake interference of outdoor snow scenes, pillar shielding of indoor garages and the like, and has high accuracy.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
The foregoing embodiments have been presented in a detail description of the invention, and are presented herein with a particular application to the understanding of the principles and embodiments of the invention, the foregoing embodiments being merely intended to facilitate an understanding of the method of the invention and its core concepts; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (7)

1. An active tracking method based on multi-mode information fusion is characterized by comprising the following steps:
s1, acquiring data information of various modes under the view angle of a tracking agent in an active tracking virtual environment based on a UE framework, wherein the data information comprises a color imageDepth image->And normal map->Three data information;
s2, constructing a feature extraction fusion network FEFNet with a multi-mode information fusion mechanism, wherein the network comprises a multi-mode information preprocessing module and a multi-mode information fusion module;
s3, inputting the three data information in the S1 into a multi-mode information preprocessing module to obtain initial characteristics of the three data information、/>、/>
S4, the multi-mode information fusion module adopts a training mode of two stages of pre-training and formal training to perform initial characteristics on three data information、/>、/>Performing feature fusion to obtain a pre-training feature output +.>And formal training feature output->
S5, constructing a reinforcement learning AC framework network RACNet with information fusion regularization constraint, and outputting formal training characteristicsAnd outputting the corresponding predicted execution action by inputting the predicted execution action into the network.
2. The method of claim 1, wherein in step S1, the specific process includes the following steps:
s11, setting two intelligent agents with a moving function in an active tracking virtual environment based on a UE framework, namely a tracking intelligent agent and a target intelligent agent, wherein the two intelligent agents are controlled to move in the environment by a set program;
s12, acquiring data information of various modes under the view angle of the tracking agent, namely a color image, in real time through interactive codes in the virtual environmentDepth image->And normal map->Three data formats.
3. The method of claim 1, wherein in step S3, the multi-modal information preprocessing module is obtained by stacking CONV-MP-ReLU layers, so that the preprocessing module can perform initialization on various input modal data informationFeature extraction for inputting color imageDepth imageAnd normal map->The sizes of the three data information are regulated to be uniform, and the initial characteristics of different data information after pretreatment in each mode are respectively as follows: />,/>
Wherein, the serial combination of the CONV-MP-ReLU layer, namely the CONV convolution layer, the MaxPooling maximum pooling layer and the ReLU activation layer is shown,then this represents a superposition of the individual CONV-MP-ReLU layers.
4. The method according to claim 1, wherein in step S4, the multi-modal information fusion module comprises a multi-modal information fusion module in a pre-training processAnd a multi-mode information fusion module in the formal training process>Multimode information fusion module in pre-training processIs composed of direct weighted fusion network, i.e. pre-training feature output +.>The method comprises the following steps:
wherein,,/>and->The weight corresponding to each mode is given;
multimode information fusion module in formal training processIs composed of mapping coding structure, i.e. formally training feature output +.>The method comprises the following steps:
wherein,representing the feature of the fusion of depth picture information and normal map information, < >>Representation ofMatrix transpose operation->Representing a matrix multiplication operation, +.>Representation->Layer (S)>Representation->Layer (S)>Representing normalization operations->Representing a linear mapping operation of the result.
5. The method of claim 4, wherein in step S4, the multi-modal information fusion module adopts a training mode of two stages of pre-training and formal training, and the method specifically comprises: in the training stage 1, the invention uses the training network structure as the multi-mode information preprocessing module and the multi-mode information fusion moduleThe method comprises the steps of firstly training a proper multi-mode information preprocessing module in a mode of combining with a reinforcement learning network RACNet; in the training phase 2, the appropriate multimodal information preprocessing module parameters obtained in the phase 1 are preloaded as the multimodal information preprocessing module of the training phase 2, and the multimodal information fusion module in the training network structure is->Is replaced by->The form performs stage 2 training.
6. The method for actively tracking based on multi-modal information fusion according to claim 1, wherein in step S5, a reinforcement learning AC framework network RACNet with information fusion regularization constraint is constructed, specifically comprising: outputting formal training characteristics in multi-mode information fusion modulePerforming double regularization constraint, namely, p ∈>The singular matrix and the singular value thereof are constrained, and the corresponding singular matrix constraint is as follows:
wherein,and->Singular values of maximum and minimum singular matrices, respectively,/->And->Weights corresponding to the double regularization constraints respectively, and the extracted characteristics can be enabled to be +_f by constraining the singular matrix and the singular value thereof>The better the current state can be represented, so that the performance of the model is improved;
on the basis of the reinforcement learning AC framework, the reinforcement learning algorithm loss function after the double regularization constraint is added, which is expressed as:
wherein,and->For reinforcement learning of the loss functions corresponding to the Actor network and Critic network under the AC framework, < +.>Then is for->Is a double term regularization constraint term +.>Corresponding loss function, ++>And->The weight corresponding to each loss is given.
7. The method according to claim 1, wherein in step S5, the corresponding predicted execution action is outputted, and the method specifically comprises the following steps:
s51, sequentially inputting data information acquired in real time into a feature extraction fusion network FEFNet and a reinforcement learning AC framework network RACNet with information fusion regularization constraint, and outputting a corresponding predicted execution action;
s52, according to the obtained action instruction, the action direction of the tracking intelligent agent is adjusted in real time, so that the tracking intelligent agent can perform action adjustment according to the position of the target intelligent agent under the current view angle, and accurate active target tracking is performed;
s53, repeating all the steps until the target intelligent agent is lost in the field of view of the tracking intelligent agent or the target intelligent agent is tracked to a preset maximum frame number.
CN202410304634.XA 2024-03-18 2024-03-18 Active tracking method based on multi-mode information fusion Active CN117893873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410304634.XA CN117893873B (en) 2024-03-18 2024-03-18 Active tracking method based on multi-mode information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410304634.XA CN117893873B (en) 2024-03-18 2024-03-18 Active tracking method based on multi-mode information fusion

Publications (2)

Publication Number Publication Date
CN117893873A true CN117893873A (en) 2024-04-16
CN117893873B CN117893873B (en) 2024-06-07

Family

ID=90644520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410304634.XA Active CN117893873B (en) 2024-03-18 2024-03-18 Active tracking method based on multi-mode information fusion

Country Status (1)

Country Link
CN (1) CN117893873B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111968238A (en) * 2020-08-22 2020-11-20 晋江市博感电子科技有限公司 Human body color three-dimensional reconstruction method based on dynamic fusion algorithm
CN112862860A (en) * 2021-02-07 2021-05-28 天津大学 Object perception image fusion method for multi-modal target tracking
CN113158584A (en) * 2021-05-24 2021-07-23 北京邮电大学 Upper-bound substitution method for multi-modal feature embedding pre-training network collocation effect evaluation
CN114494354A (en) * 2022-02-15 2022-05-13 中国矿业大学 Unsupervised RGB-T target tracking method based on attention multimodal feature fusion
CN115100235A (en) * 2022-08-18 2022-09-23 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Target tracking method, system and storage medium
US20220335711A1 (en) * 2021-07-29 2022-10-20 Beijing Baidu Netcom Science Technology Co., Ltd. Method for generating pre-trained model, electronic device and storage medium
US20220381870A1 (en) * 2021-05-28 2022-12-01 Nec Laboratories America, Inc. Visual and rf sensor fusion for multi-agent tracking
CN115423847A (en) * 2022-11-04 2022-12-02 华东交通大学 Twin multi-modal target tracking method based on Transformer
CN116740480A (en) * 2023-07-11 2023-09-12 中国科学院长春光学精密机械与物理研究所 Multi-mode image fusion target tracking method
US20240013402A1 (en) * 2022-07-06 2024-01-11 Shanghai Maritime University Ship image trajectory tracking and prediction method based on ship heading recognition

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111968238A (en) * 2020-08-22 2020-11-20 晋江市博感电子科技有限公司 Human body color three-dimensional reconstruction method based on dynamic fusion algorithm
CN112862860A (en) * 2021-02-07 2021-05-28 天津大学 Object perception image fusion method for multi-modal target tracking
CN113158584A (en) * 2021-05-24 2021-07-23 北京邮电大学 Upper-bound substitution method for multi-modal feature embedding pre-training network collocation effect evaluation
US20220381870A1 (en) * 2021-05-28 2022-12-01 Nec Laboratories America, Inc. Visual and rf sensor fusion for multi-agent tracking
US20220335711A1 (en) * 2021-07-29 2022-10-20 Beijing Baidu Netcom Science Technology Co., Ltd. Method for generating pre-trained model, electronic device and storage medium
CN114494354A (en) * 2022-02-15 2022-05-13 中国矿业大学 Unsupervised RGB-T target tracking method based on attention multimodal feature fusion
US20240013402A1 (en) * 2022-07-06 2024-01-11 Shanghai Maritime University Ship image trajectory tracking and prediction method based on ship heading recognition
CN115100235A (en) * 2022-08-18 2022-09-23 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Target tracking method, system and storage medium
CN115423847A (en) * 2022-11-04 2022-12-02 华东交通大学 Twin multi-modal target tracking method based on Transformer
CN116740480A (en) * 2023-07-11 2023-09-12 中国科学院长春光学精密机械与物理研究所 Multi-mode image fusion target tracking method

Also Published As

Publication number Publication date
CN117893873B (en) 2024-06-07

Similar Documents

Publication Publication Date Title
Cheng et al. Cspn++: Learning context and resource aware convolutional spatial propagation networks for depth completion
CN110874578B (en) Unmanned aerial vehicle visual angle vehicle recognition tracking method based on reinforcement learning
US20190355126A1 (en) Image feature extraction method and saliency prediction method using the same
CN110930342B (en) Depth map super-resolution reconstruction network construction method based on color map guidance
CN111968123B (en) Semi-supervised video target segmentation method
CN112489164B (en) Image coloring method based on improved depth separable convolutional neural network
CN115908517B (en) Low-overlapping point cloud registration method based on optimization of corresponding point matching matrix
CN111696110A (en) Scene segmentation method and system
Chen et al. THFuse: An infrared and visible image fusion network using transformer and hybrid feature extractor
Kim et al. Acceleration of actor-critic deep reinforcement learning for visual grasping in clutter by state representation learning based on disentanglement of a raw input image
CN117893873B (en) Active tracking method based on multi-mode information fusion
Li et al. Flexicurve: Flexible piecewise curves estimation for photo retouching
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN113256496A (en) Lightweight progressive feature fusion image super-resolution system and method
Kim et al. Acceleration of actor-critic deep reinforcement learning for visual grasping by state representation learning based on a preprocessed input image
Lin et al. Dyspn: Learning dynamic affinity for image-guided depth completion
CN112887595B (en) Camera instant automatic planning method for dynamic multiple targets
Huang et al. Single image super-resolution reconstruction of enhanced loss function with multi-gpu training
Yu et al. Low light combining multiscale deep learning networks and image enhancement algorithm
Xiu et al. Keypoint heatmap guided self-supervised monocular visual odometry
Wang et al. AMNet: a new RGB-D instance segmentation network based on attention and multi-modality
ZiWen et al. Multi-objective Neural Architecture Search for Efficient and Fast Semantic Segmentation on Edge
Jian et al. MobileNet-SSD with adaptive expansion of receptive field
Chen et al. Research on warehouse object detection algorithm based on fused densenet and ssd
CN116309073B (en) Low-contrast stripe SIM reconstruction method and system based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant