CN116109966A - Remote sensing scene-oriented video large model construction method - Google Patents

Remote sensing scene-oriented video large model construction method Download PDF

Info

Publication number
CN116109966A
CN116109966A CN202211635612.9A CN202211635612A CN116109966A CN 116109966 A CN116109966 A CN 116109966A CN 202211635612 A CN202211635612 A CN 202211635612A CN 116109966 A CN116109966 A CN 116109966A
Authority
CN
China
Prior art keywords
model
remote sensing
neural network
video
network sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211635612.9A
Other languages
Chinese (zh)
Other versions
CN116109966B (en
Inventor
孙显
付琨
于泓峰
姚方龙
卢宛萱
邓楚博
杨和明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202211635612.9A priority Critical patent/CN116109966B/en
Publication of CN116109966A publication Critical patent/CN116109966A/en
Application granted granted Critical
Publication of CN116109966B publication Critical patent/CN116109966B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of computer model construction, in particular to a remote sensing scene-oriented video large model construction method. The method comprises the following steps: acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m Is the B thM target videos, wherein the value range of M is 1 to M, M is the number of target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m A q-th frame target image; training a neural network model using a and B, the neural network model comprising a first neural network sub-model and a second neural network sub-model. The invention constructs the remote sensing scene-oriented video large model with strong feature extraction capability and feature rule discovery capability.

Description

Remote sensing scene-oriented video large model construction method
Technical Field
The invention relates to the technical field of computer model construction, in particular to a remote sensing scene-oriented video large model construction method.
Background
Because the remote sensing video has double characteristics in time and space, and the remote sensing scene itself has a complex texture background, a model required by a video interpretation task in the remote sensing scene needs to have stronger characteristic extraction capability, and meanwhile, the spatial characteristic rule and the temporal characteristic rule of the video need to be explored. How to construct a large video model with strong feature extraction capability and feature rule discovery capability for remote sensing scenes is a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a method for constructing a large video model oriented to a remote sensing scene, which constructs the large video model oriented to the remote sensing scene with strong feature extraction capability and feature rule discovery capability.
According to the invention, a method for constructing a video large model for a remote sensing scene is provided, which comprises the following steps:
acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m Is the B thM target videos, wherein the value range of M is 1 to M, M is the number of target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m In the Q-th frame of target image, the value range of Q is 1 to Q, Q is the number of target images in the target video, b m,1 、b m,2 、…、b m,Q Q frames of target images are continuously shot; and B, the target video is a video shot by a satellite-mounted remote sensing device or a video shot by an unmanned aerial vehicle-mounted remote sensing device, and the remote sensing image is an image shot by the satellite-mounted remote sensing device.
Training a neural network model using a and B, the neural network model comprising a first neural network sub-model and a second neural network sub-model, the training comprising:
traversing A, pair a n Performing block processing, and randomly performing block processing on the a n The k x C blocks in (a) are subjected to mask processing; c is a pair a n The number of blocks obtained by partitioning is k, which is a preset mask proportion; a processed by mask n A first neural network sub-model is trained, the first neural network sub-model being a 2D swin-transformer structure, the first neural network sub-model comprising a first encoder and a first decoder.
Traversal B, pair B m I of [ i ] m ,i m +L]Masking the frame image, i m +L≤Q,i m Not less than 1, L is the number of preset mask frames, i m B is m A start mask frame of (a); b processed by mask m Training a second neural network sub-model, the second sub-model being a 3D swin-transformer structure, the second neural network sub-model comprising a second encoder and a second decoder; the training of the first neural network sub-model is performed simultaneously with the training of the second neural network sub-model, and the second encoder and the first encoder have weight sharing in the training process.
Compared with the prior art, the method provided by the invention has obvious beneficial effects, can achieve quite technical progress and practicality by virtue of the technical scheme, has wide industrial utilization value, and has at least the following beneficial effects:
the video large model facing the remote sensing scene comprises two branches, wherein the first branch corresponds to a first neural network sub-model, and a training sample corresponding to the branch is a remote sensing image set; the second branch corresponds to a second neural network sub-model, a training sample corresponding to the branch is a target video set, and the target video set comprises remote sensing videos (namely videos shot by satellite carried remote sensing equipment) and unmanned aerial vehicle videos (videos shot by unmanned aerial vehicle carried remote sensing equipment), and the number of remote sensing videos which can be used as training samples is small because the remote sensing videos are not easy to acquire; according to the invention, the number of video samples is expanded by introducing unmanned aerial vehicle video, and the expanded video samples are utilized to train the second neural network sub-model, so that the capability of feature extraction and rule mining of the second neural network sub-model is improved, the generalization capability of the trained second neural network sub-model is also improved, and the method can be applied to downstream tasks of different partial space-time predictions.
In addition, a masking strategy adopted by the invention for the remote sensing image sample corresponding to the first neural network sub-model is a random masking of a part of pixel points, and the capability of the first neural network model for extracting the space information of the remote sensing image is improved through the random masking strategy; masking a certain frame in the target video as a starting frame by using a masking strategy adopted by a target video sample corresponding to the second neural network sub-model, masking frames with fixed length after the starting frame, increasing the difficulty of video prediction by using the masking strategy, and improving the capability of the second neural network sub-model for extracting the space-time continuous information of objects in the video; according to the invention, the training process of the first neural network sub-model and the training process of the second neural network sub-model are performed simultaneously, so that the training process of the video large model is accelerated, and weight sharing exists between the first encoder in the first neural network sub-model and the second encoder in the second neural network sub-model in the training process, so that the second neural network sub-model can acquire the capability of the first neural network sub-model to extract the spatial information of the remote sensing image, and further the capability of the second neural network sub-model to extract the spatial information of the remote sensing image is improved, thereby being beneficial to accelerating the training process of the second neural network sub-model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for constructing a large video model for a remote sensing scene according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
According to the invention, a method for constructing a large video model for a remote sensing scene is provided, as shown in fig. 1, and comprises the following steps:
s100, acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m For the mth target video in B, the value range of M is 1 to M, M is the number of the target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m In the Q-th frame of target image, the value range of Q is 1 to Q, Q is the number of target images in the target video, b m,1 、b m,2 、…、b m,Q Q frames of target images are continuously shot; the target video in B isThe method comprises the steps that a video shot by a satellite-mounted remote sensing device or a video shot by an unmanned aerial vehicle-mounted remote sensing device is shot, and a remote sensing image is an image shot by the satellite-mounted remote sensing device.
The video large model facing the remote sensing scene comprises two branches, wherein the first branch corresponds to a first neural network sub-model, and a training sample corresponding to the branch is a remote sensing image set; the second branch corresponds to a second neural network sub-model, a training sample corresponding to the branch is a target video set, and the target video set comprises remote sensing videos (namely videos shot by satellite-mounted remote sensing equipment) and unmanned aerial vehicle videos (videos shot by unmanned aerial vehicle-mounted remote sensing equipment).
Preferably, the number of videos shot by the remote sensing equipment carried by the unmanned aerial vehicle in the B is larger than the number of videos shot by the remote sensing equipment carried by the satellite in the B. According to the invention, the video shot by the remote sensing equipment carried by the unmanned aerial vehicle is taken as one of target videos, so that the number of the target videos can be expanded, and the problem that the number of the target videos is insufficient to meet the subsequent training requirement on the neural network model due to the fact that the remote sensing videos are not easy to acquire is solved; the video shot by the unmanned aerial vehicle carried remote sensing equipment and the video shot by the satellite carried remote sensing equipment are shot at the angle of the aerial carried remote sensing equipment like a nodding, so that the effect of training the neural network model can be achieved by taking the video shot by the unmanned aerial vehicle carried remote sensing equipment as a target video for subsequent training of the neural network model.
Preferably, both N and M are on the order of millions. The number set of the training samples is millions, the trained video large model facing the remote sensing scene has strong feature extraction capability, rule mining capability and generalization capability, and the model parameters of the trained video large model facing the remote sensing scene are used as initial model parameters of models corresponding to different downstream tasks, so that the training process of the models corresponding to the downstream tasks can be accelerated, and the accuracy of the models corresponding to the downstream tasks can be improved; the downstream tasks may be a video prediction task, a target detection task, a single target tracking task, a video segmentation task, and the like.
S200, training a neural network model by using A and B, wherein the neural network model comprises a first neural network sub-model and a second neural network sub-model, and the training process comprises the following steps:
s210, traversing A, for a n Performing block processing, and randomly performing block processing on the a n The k x C blocks in (a) are subjected to mask processing; c is a pair a n The number of blocks obtained by partitioning is k, which is a preset mask proportion; a processed by mask n A first neural network sub-model is trained, the first neural network sub-model being a 2D swin-transformer structure, the first neural network sub-model comprising a first encoder and a first decoder.
The structure of the 2D switch-transducer in the present invention is the prior art, and will not be described here. The first encoder in the invention is used for extracting the a after mask processing n Is operative to predict the corresponding original pixel values of the mask block based on the output of the first encoder.
According to the invention, the masking strategy adopted for the remote sensing image sample corresponding to the first neural network sub-model is a random masking of a part of pixel points, and the capability of the first neural network model for extracting the space information of the remote sensing image is improved through the random masking strategy. Preferably, k is more than or equal to 40% and less than or equal to 60%. A small-scale experiment shows that when the value of k is set within the range of 40% -60%, the first neural network sub-model can not only better extract the spatial information of the remote sensing image, but also give consideration to the training time of the first neural network sub-model. Alternatively, k=50%.
As an example, a n For an image with a resolution of 224 x 224, for a n Performing block processing to obtain 56×56 blocks, wherein each block has 4*4 =16 pixels; randomly extracting half of 56 x 56 blocks, masking the extracted blocks to obtain masked a n
S220 traversing B, pair B m I of [ i ] m ,i m +L]Masking the frame image, i m +L≤Q,i m Not less than 1, L is the number of preset mask frames, i m B is m A start mask frame of (a); b processed by mask m For the second neural networkTraining a model, wherein the second sub-model is a 3D swin-transformer structure, and the second neural network sub-model comprises a second encoder and a second decoder; the training of the first neural network sub-model is performed simultaneously with the training of the second neural network sub-model, and the second encoder and the first encoder have weight sharing in the training process.
The greatest difference between the 3D swin-transducer and the 2D swin-transducer in the present invention is that the 3D swin-transducer is changed from 2D to 3D, and the structure of the 3D swin-transducer is also the prior art, and is not described herein. The second encoder in the invention is used for extracting b after mask processing m Is characterized in that the second decoder is operative to predict the masked target image based on an output of the second encoder.
The training process of the first neural network sub-model and the training process of the second neural network sub-model are performed simultaneously, the training process of the video large model is quickened, and weight sharing exists between the first encoder in the first neural network sub-model and the second encoder in the second neural network sub-model in the training process, so that weights corresponding to modules with the same structure in the second encoder and the first encoder are the same, for example, the weights corresponding to the attention (attention) module in the second encoder and the attention (attention) module in the first encoder are the same. Therefore, the second neural network sub-model can acquire the capacity of the first neural network sub-model to extract the spatial information of the remote sensing image, further the capacity of the second neural network sub-model to extract the spatial information of the remote sensing image is improved, and the training process of the second neural network sub-model is facilitated to be accelerated.
The masking strategy adopted by the target video sample corresponding to the second neural network sub-model is to mask a certain frame in the target video as a starting frame, and frames with fixed length after the starting frame are masked, so that the difficulty of video prediction is increased through the masking strategy, and the capability of the second neural network sub-model for extracting the space-time continuous information of objects in the video is improved.
Preferably, Q=16, 5.ltoreq.L.ltoreq.9. A small-scale experiment shows that when Q=16, the value of L is set within the range of 5-9, the second neural network submodel can better extract the space-time continuous information of objects in the video, and the training time of the second neural network submodel can be considered. Alternatively, l=7.
The invention is directed to b m A random continuous frame masking strategy is used, that is, the starting mask frames corresponding to different target videos may be different or the same, but the number of frames to be masked is equal. As an example, b m The method comprises the steps of continuously shooting 16 frames of target images, wherein each frame is 224 x 224, the number of mask frames is preset to be 7, randomly taking a starting point from the 16 frames of target images, masking off the starting point and 7 frames of subsequent images completely, and obtaining b after masking m . It should be appreciated that the starting point is chosen to ensure that 7 frames or more than 7 frames of images follow the starting point.
According to the invention, the trained neural network model is the remote sensing scene-oriented video big model, and the remote sensing scene-oriented video big model has strong feature extraction capability and feature rule mining capability.
As a specific implementation manner, the remote sensing image set a comprises more than 109 ten thousand remote sensing images, the target video set B comprises more than 101 ten thousand target videos, and more than half of the target videos in the set B are videos shot by unmanned aerial vehicle carried remote sensing equipment; performing blocking processing on the remote sensing image, and performing mask processing on half of blocks in the remote sensing image at random; setting each target video to comprise continuous 16-frame target images, randomly selecting a starting mask frame in the target video, and masking the starting mask frame and 7 subsequent frame target images; training a first neural network sub-model in the neural network model by using the remote sensing image after mask processing, training a second neural network sub-model in the neural network model by using the target video after mask processing, and carrying out weight sharing on an encoder in the first neural network sub-model and an encoder in the second neural network sub-model in the training process until the training is finished.
Experiments show that compared with random initialization model parameters, the model parameters of the trained neural network model are used as initial model parameters of models corresponding to different downstream tasks, and the model corresponding to the downstream tasks with the same training duration achieves higher accuracy: when the downstream task is a target detection task, the corresponding average precision average (mAP) index rises from 0.3629 to 0.3718; when the downstream task is a video prediction task, the corresponding Structural Similarity (SSIM) index rises from 0.7018 to 0.7152. Therefore, the video large model for the remote sensing scene constructed by the method is suitable for different downstream tasks, has strong generalization capability, and has strong corresponding feature extraction capability and feature rule mining capability, and can improve the precision of the model corresponding to different downstream tasks.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (7)

1. The method for constructing the large video model for the remote sensing scene is characterized by comprising the following steps of:
acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m For the mth target video in B, the value range of M is 1 to M, M is the number of the target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m In the Q-th frame of target image, the value range of Q is 1 to Q, Q is the number of target images in the target video, b m,1 、b m,2 、…、b m,Q Q frames of target images are continuously shot; b, the target video is a video shot by a satellite carried remote sensing device or a remote carried by an unmanned aerial vehicleThe remote sensing image is an image shot by the satellite carried remote sensing equipment;
training a neural network model using a and B, the neural network model comprising a first neural network sub-model and a second neural network sub-model, the training comprising:
traversing A, pair a n Performing block processing, and randomly performing block processing on the a n The k x C blocks in (a) are subjected to mask processing; c is a pair a n The number of blocks obtained by partitioning is k, which is a preset mask proportion; a processed by mask n Training a first neural network sub-model, the first neural network sub-model being of a 2D swin-transformer structure, the first neural network sub-model comprising a first encoder and a first decoder;
traversal B, pair B m I of [ i ] m ,i m +L]Masking the frame image, i m +L≤Q,i m Not less than 1, L is the number of preset mask frames, i m B is m A start mask frame of (a); b processed by mask m Training a second neural network sub-model, the second sub-model being a 3D swin-transformer structure, the second neural network sub-model comprising a second encoder and a second decoder; the training of the first neural network sub-model is performed simultaneously with the training of the second neural network sub-model, and the second encoder and the first encoder have weight sharing in the training process.
2. The method for constructing a large video model for a remote sensing scene according to claim 1, wherein k is more than or equal to 40% and less than or equal to 60%.
3. The method for constructing a large video model for a remote sensing scene according to claim 2, wherein k=50%.
4. The method for constructing a large video model for a remote sensing scene according to claim 1, wherein q=16, and 5.ltoreq.l.ltoreq.9.
5. The method for constructing a large video model for a remote sensing scene as claimed in claim 4, wherein l=7.
6. The method for constructing the large video model for the remote sensing scene according to claim 1, wherein the number of videos shot by the remote sensing equipment carried by the unmanned aerial vehicle in the B is larger than the number of videos shot by the remote sensing equipment carried by the satellite in the B.
7. The method for constructing a large video model for a remote sensing scene as claimed in claim 1, wherein N and M are each in the order of millions.
CN202211635612.9A 2022-12-19 2022-12-19 Remote sensing scene-oriented video large model construction method Active CN116109966B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211635612.9A CN116109966B (en) 2022-12-19 2022-12-19 Remote sensing scene-oriented video large model construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211635612.9A CN116109966B (en) 2022-12-19 2022-12-19 Remote sensing scene-oriented video large model construction method

Publications (2)

Publication Number Publication Date
CN116109966A true CN116109966A (en) 2023-05-12
CN116109966B CN116109966B (en) 2023-06-27

Family

ID=86266649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211635612.9A Active CN116109966B (en) 2022-12-19 2022-12-19 Remote sensing scene-oriented video large model construction method

Country Status (1)

Country Link
CN (1) CN116109966B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019056845A1 (en) * 2017-09-19 2019-03-28 北京市商汤科技开发有限公司 Road map generating method and apparatus, electronic device, and computer storage medium
WO2020232905A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium
CN113706388A (en) * 2021-09-24 2021-11-26 上海壁仞智能科技有限公司 Image super-resolution reconstruction method and device
CN114220015A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved YOLOv 5-based satellite image small target detection method
CN114842351A (en) * 2022-04-11 2022-08-02 中国人民解放军战略支援部队航天工程大学 Remote sensing image semantic change detection method based on twin transforms
CN114937202A (en) * 2022-04-11 2022-08-23 青岛理工大学 Double-current Swin transform remote sensing scene classification method
CN115049921A (en) * 2022-04-27 2022-09-13 安徽大学 Method for detecting salient target of optical remote sensing image based on Transformer boundary sensing
WO2022247711A1 (en) * 2021-05-24 2022-12-01 广州智慧城市发展研究院 Target associated video tracking processing method and device
WO2022252557A1 (en) * 2021-05-31 2022-12-08 上海商汤智能科技有限公司 Neural network training method and apparatus, image processing method and apparatus, device, and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019056845A1 (en) * 2017-09-19 2019-03-28 北京市商汤科技开发有限公司 Road map generating method and apparatus, electronic device, and computer storage medium
WO2020232905A1 (en) * 2019-05-20 2020-11-26 平安科技(深圳)有限公司 Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium
WO2022247711A1 (en) * 2021-05-24 2022-12-01 广州智慧城市发展研究院 Target associated video tracking processing method and device
WO2022252557A1 (en) * 2021-05-31 2022-12-08 上海商汤智能科技有限公司 Neural network training method and apparatus, image processing method and apparatus, device, and storage medium
CN113706388A (en) * 2021-09-24 2021-11-26 上海壁仞智能科技有限公司 Image super-resolution reconstruction method and device
CN114220015A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Improved YOLOv 5-based satellite image small target detection method
CN114842351A (en) * 2022-04-11 2022-08-02 中国人民解放军战略支援部队航天工程大学 Remote sensing image semantic change detection method based on twin transforms
CN114937202A (en) * 2022-04-11 2022-08-23 青岛理工大学 Double-current Swin transform remote sensing scene classification method
CN115049921A (en) * 2022-04-27 2022-09-13 安徽大学 Method for detecting salient target of optical remote sensing image based on Transformer boundary sensing

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FANGLONG YAO ET AL.: "《Gated hierarchical multi-task learning network for judicial decision prediction》", 《NEUROCOMPUTING》, vol. 411, pages 313 - 326 *
楼林,黄韦艮: "基于人工神经网络的赤潮卫星遥感方法研究", 遥感学报, no. 02, pages 125 - 130 *
焦云清;王世新;周艺;扶卿华;: "基于神经网络的遥感影像超高分辨率目标识别", 系统仿真学报, no. 14, pages 3223 - 3225 *

Also Published As

Publication number Publication date
CN116109966B (en) 2023-06-27

Similar Documents

Publication Publication Date Title
US10924755B2 (en) Real time end-to-end learning system for a high frame rate video compressive sensing network
Liu et al. Mobile video object detection with temporally-aware feature maps
Wu et al. Compressed video action recognition
CN108960059A (en) A kind of video actions recognition methods and device
CN112084868A (en) Target counting method in remote sensing image based on attention mechanism
CN113592026B (en) Binocular vision stereo matching method based on cavity volume and cascade cost volume
CN110751018A (en) Group pedestrian re-identification method based on mixed attention mechanism
CN115457498A (en) Urban road semantic segmentation method based on double attention and dense connection
CN111860175B (en) Unmanned aerial vehicle image vehicle detection method and device based on lightweight network
CN110765841A (en) Group pedestrian re-identification system and terminal based on mixed attention mechanism
Löhdefink et al. On low-bitrate image compression for distributed automotive perception: Higher peak snr does not mean better semantic segmentation
CN116958687A (en) Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR
Löhdefink et al. GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation
CN118097150A (en) Small sample camouflage target segmentation method
CN116109966B (en) Remote sensing scene-oriented video large model construction method
CN113160250A (en) Airport scene surveillance video target segmentation method based on ADS-B position prior
CN117097853A (en) Real-time image matting method and system based on deep learning
CN113887419B (en) Human behavior recognition method and system based on extracted video space-time information
CN116340568A (en) Online video abstract generation method based on cross-scene knowledge migration
CN112861698B (en) Compressed domain behavior identification method based on multi-scale time sequence receptive field
CN115346115A (en) Image target detection method, device, equipment and storage medium
Doan et al. Real-time Image Semantic Segmentation Networks with Residual Depth-wise Separable Blocks
Yue et al. A small target detection method for UAV aerial images based on improved YOLOv5
CN116703786B (en) Image deblurring method and system based on improved UNet network
CN115631115B (en) Dynamic image restoration method based on recursion transform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant