CN116109966A - Remote sensing scene-oriented video large model construction method - Google Patents
Remote sensing scene-oriented video large model construction method Download PDFInfo
- Publication number
- CN116109966A CN116109966A CN202211635612.9A CN202211635612A CN116109966A CN 116109966 A CN116109966 A CN 116109966A CN 202211635612 A CN202211635612 A CN 202211635612A CN 116109966 A CN116109966 A CN 116109966A
- Authority
- CN
- China
- Prior art keywords
- model
- remote sensing
- neural network
- video
- network sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title abstract description 6
- 238000013528 artificial neural network Methods 0.000 claims abstract description 65
- 238000012549 training Methods 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000003062 neural network model Methods 0.000 claims abstract description 17
- 230000000873 masking effect Effects 0.000 claims description 20
- 238000012545 processing Methods 0.000 claims description 16
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 238000000605 extraction Methods 0.000 abstract description 8
- 238000005094 computer simulation Methods 0.000 abstract description 2
- 238000005065 mining Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000011172 small scale experimental method Methods 0.000 description 2
- 230000000903 blocking effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Analysis (AREA)
Abstract
The application relates to the technical field of computer model construction, in particular to a remote sensing scene-oriented video large model construction method. The method comprises the following steps: acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m Is the B thM target videos, wherein the value range of M is 1 to M, M is the number of target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m A q-th frame target image; training a neural network model using a and B, the neural network model comprising a first neural network sub-model and a second neural network sub-model. The invention constructs the remote sensing scene-oriented video large model with strong feature extraction capability and feature rule discovery capability.
Description
Technical Field
The invention relates to the technical field of computer model construction, in particular to a remote sensing scene-oriented video large model construction method.
Background
Because the remote sensing video has double characteristics in time and space, and the remote sensing scene itself has a complex texture background, a model required by a video interpretation task in the remote sensing scene needs to have stronger characteristic extraction capability, and meanwhile, the spatial characteristic rule and the temporal characteristic rule of the video need to be explored. How to construct a large video model with strong feature extraction capability and feature rule discovery capability for remote sensing scenes is a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a method for constructing a large video model oriented to a remote sensing scene, which constructs the large video model oriented to the remote sensing scene with strong feature extraction capability and feature rule discovery capability.
According to the invention, a method for constructing a video large model for a remote sensing scene is provided, which comprises the following steps:
acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m Is the B thM target videos, wherein the value range of M is 1 to M, M is the number of target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m In the Q-th frame of target image, the value range of Q is 1 to Q, Q is the number of target images in the target video, b m,1 、b m,2 、…、b m,Q Q frames of target images are continuously shot; and B, the target video is a video shot by a satellite-mounted remote sensing device or a video shot by an unmanned aerial vehicle-mounted remote sensing device, and the remote sensing image is an image shot by the satellite-mounted remote sensing device.
Training a neural network model using a and B, the neural network model comprising a first neural network sub-model and a second neural network sub-model, the training comprising:
traversing A, pair a n Performing block processing, and randomly performing block processing on the a n The k x C blocks in (a) are subjected to mask processing; c is a pair a n The number of blocks obtained by partitioning is k, which is a preset mask proportion; a processed by mask n A first neural network sub-model is trained, the first neural network sub-model being a 2D swin-transformer structure, the first neural network sub-model comprising a first encoder and a first decoder.
Traversal B, pair B m I of [ i ] m ,i m +L]Masking the frame image, i m +L≤Q,i m Not less than 1, L is the number of preset mask frames, i m B is m A start mask frame of (a); b processed by mask m Training a second neural network sub-model, the second sub-model being a 3D swin-transformer structure, the second neural network sub-model comprising a second encoder and a second decoder; the training of the first neural network sub-model is performed simultaneously with the training of the second neural network sub-model, and the second encoder and the first encoder have weight sharing in the training process.
Compared with the prior art, the method provided by the invention has obvious beneficial effects, can achieve quite technical progress and practicality by virtue of the technical scheme, has wide industrial utilization value, and has at least the following beneficial effects:
the video large model facing the remote sensing scene comprises two branches, wherein the first branch corresponds to a first neural network sub-model, and a training sample corresponding to the branch is a remote sensing image set; the second branch corresponds to a second neural network sub-model, a training sample corresponding to the branch is a target video set, and the target video set comprises remote sensing videos (namely videos shot by satellite carried remote sensing equipment) and unmanned aerial vehicle videos (videos shot by unmanned aerial vehicle carried remote sensing equipment), and the number of remote sensing videos which can be used as training samples is small because the remote sensing videos are not easy to acquire; according to the invention, the number of video samples is expanded by introducing unmanned aerial vehicle video, and the expanded video samples are utilized to train the second neural network sub-model, so that the capability of feature extraction and rule mining of the second neural network sub-model is improved, the generalization capability of the trained second neural network sub-model is also improved, and the method can be applied to downstream tasks of different partial space-time predictions.
In addition, a masking strategy adopted by the invention for the remote sensing image sample corresponding to the first neural network sub-model is a random masking of a part of pixel points, and the capability of the first neural network model for extracting the space information of the remote sensing image is improved through the random masking strategy; masking a certain frame in the target video as a starting frame by using a masking strategy adopted by a target video sample corresponding to the second neural network sub-model, masking frames with fixed length after the starting frame, increasing the difficulty of video prediction by using the masking strategy, and improving the capability of the second neural network sub-model for extracting the space-time continuous information of objects in the video; according to the invention, the training process of the first neural network sub-model and the training process of the second neural network sub-model are performed simultaneously, so that the training process of the video large model is accelerated, and weight sharing exists between the first encoder in the first neural network sub-model and the second encoder in the second neural network sub-model in the training process, so that the second neural network sub-model can acquire the capability of the first neural network sub-model to extract the spatial information of the remote sensing image, and further the capability of the second neural network sub-model to extract the spatial information of the remote sensing image is improved, thereby being beneficial to accelerating the training process of the second neural network sub-model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for constructing a large video model for a remote sensing scene according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
According to the invention, a method for constructing a large video model for a remote sensing scene is provided, as shown in fig. 1, and comprises the following steps:
s100, acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m For the mth target video in B, the value range of M is 1 to M, M is the number of the target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m In the Q-th frame of target image, the value range of Q is 1 to Q, Q is the number of target images in the target video, b m,1 、b m,2 、…、b m,Q Q frames of target images are continuously shot; the target video in B isThe method comprises the steps that a video shot by a satellite-mounted remote sensing device or a video shot by an unmanned aerial vehicle-mounted remote sensing device is shot, and a remote sensing image is an image shot by the satellite-mounted remote sensing device.
The video large model facing the remote sensing scene comprises two branches, wherein the first branch corresponds to a first neural network sub-model, and a training sample corresponding to the branch is a remote sensing image set; the second branch corresponds to a second neural network sub-model, a training sample corresponding to the branch is a target video set, and the target video set comprises remote sensing videos (namely videos shot by satellite-mounted remote sensing equipment) and unmanned aerial vehicle videos (videos shot by unmanned aerial vehicle-mounted remote sensing equipment).
Preferably, the number of videos shot by the remote sensing equipment carried by the unmanned aerial vehicle in the B is larger than the number of videos shot by the remote sensing equipment carried by the satellite in the B. According to the invention, the video shot by the remote sensing equipment carried by the unmanned aerial vehicle is taken as one of target videos, so that the number of the target videos can be expanded, and the problem that the number of the target videos is insufficient to meet the subsequent training requirement on the neural network model due to the fact that the remote sensing videos are not easy to acquire is solved; the video shot by the unmanned aerial vehicle carried remote sensing equipment and the video shot by the satellite carried remote sensing equipment are shot at the angle of the aerial carried remote sensing equipment like a nodding, so that the effect of training the neural network model can be achieved by taking the video shot by the unmanned aerial vehicle carried remote sensing equipment as a target video for subsequent training of the neural network model.
Preferably, both N and M are on the order of millions. The number set of the training samples is millions, the trained video large model facing the remote sensing scene has strong feature extraction capability, rule mining capability and generalization capability, and the model parameters of the trained video large model facing the remote sensing scene are used as initial model parameters of models corresponding to different downstream tasks, so that the training process of the models corresponding to the downstream tasks can be accelerated, and the accuracy of the models corresponding to the downstream tasks can be improved; the downstream tasks may be a video prediction task, a target detection task, a single target tracking task, a video segmentation task, and the like.
S200, training a neural network model by using A and B, wherein the neural network model comprises a first neural network sub-model and a second neural network sub-model, and the training process comprises the following steps:
s210, traversing A, for a n Performing block processing, and randomly performing block processing on the a n The k x C blocks in (a) are subjected to mask processing; c is a pair a n The number of blocks obtained by partitioning is k, which is a preset mask proportion; a processed by mask n A first neural network sub-model is trained, the first neural network sub-model being a 2D swin-transformer structure, the first neural network sub-model comprising a first encoder and a first decoder.
The structure of the 2D switch-transducer in the present invention is the prior art, and will not be described here. The first encoder in the invention is used for extracting the a after mask processing n Is operative to predict the corresponding original pixel values of the mask block based on the output of the first encoder.
According to the invention, the masking strategy adopted for the remote sensing image sample corresponding to the first neural network sub-model is a random masking of a part of pixel points, and the capability of the first neural network model for extracting the space information of the remote sensing image is improved through the random masking strategy. Preferably, k is more than or equal to 40% and less than or equal to 60%. A small-scale experiment shows that when the value of k is set within the range of 40% -60%, the first neural network sub-model can not only better extract the spatial information of the remote sensing image, but also give consideration to the training time of the first neural network sub-model. Alternatively, k=50%.
As an example, a n For an image with a resolution of 224 x 224, for a n Performing block processing to obtain 56×56 blocks, wherein each block has 4*4 =16 pixels; randomly extracting half of 56 x 56 blocks, masking the extracted blocks to obtain masked a n 。
S220 traversing B, pair B m I of [ i ] m ,i m +L]Masking the frame image, i m +L≤Q,i m Not less than 1, L is the number of preset mask frames, i m B is m A start mask frame of (a); b processed by mask m For the second neural networkTraining a model, wherein the second sub-model is a 3D swin-transformer structure, and the second neural network sub-model comprises a second encoder and a second decoder; the training of the first neural network sub-model is performed simultaneously with the training of the second neural network sub-model, and the second encoder and the first encoder have weight sharing in the training process.
The greatest difference between the 3D swin-transducer and the 2D swin-transducer in the present invention is that the 3D swin-transducer is changed from 2D to 3D, and the structure of the 3D swin-transducer is also the prior art, and is not described herein. The second encoder in the invention is used for extracting b after mask processing m Is characterized in that the second decoder is operative to predict the masked target image based on an output of the second encoder.
The training process of the first neural network sub-model and the training process of the second neural network sub-model are performed simultaneously, the training process of the video large model is quickened, and weight sharing exists between the first encoder in the first neural network sub-model and the second encoder in the second neural network sub-model in the training process, so that weights corresponding to modules with the same structure in the second encoder and the first encoder are the same, for example, the weights corresponding to the attention (attention) module in the second encoder and the attention (attention) module in the first encoder are the same. Therefore, the second neural network sub-model can acquire the capacity of the first neural network sub-model to extract the spatial information of the remote sensing image, further the capacity of the second neural network sub-model to extract the spatial information of the remote sensing image is improved, and the training process of the second neural network sub-model is facilitated to be accelerated.
The masking strategy adopted by the target video sample corresponding to the second neural network sub-model is to mask a certain frame in the target video as a starting frame, and frames with fixed length after the starting frame are masked, so that the difficulty of video prediction is increased through the masking strategy, and the capability of the second neural network sub-model for extracting the space-time continuous information of objects in the video is improved.
Preferably, Q=16, 5.ltoreq.L.ltoreq.9. A small-scale experiment shows that when Q=16, the value of L is set within the range of 5-9, the second neural network submodel can better extract the space-time continuous information of objects in the video, and the training time of the second neural network submodel can be considered. Alternatively, l=7.
The invention is directed to b m A random continuous frame masking strategy is used, that is, the starting mask frames corresponding to different target videos may be different or the same, but the number of frames to be masked is equal. As an example, b m The method comprises the steps of continuously shooting 16 frames of target images, wherein each frame is 224 x 224, the number of mask frames is preset to be 7, randomly taking a starting point from the 16 frames of target images, masking off the starting point and 7 frames of subsequent images completely, and obtaining b after masking m . It should be appreciated that the starting point is chosen to ensure that 7 frames or more than 7 frames of images follow the starting point.
According to the invention, the trained neural network model is the remote sensing scene-oriented video big model, and the remote sensing scene-oriented video big model has strong feature extraction capability and feature rule mining capability.
As a specific implementation manner, the remote sensing image set a comprises more than 109 ten thousand remote sensing images, the target video set B comprises more than 101 ten thousand target videos, and more than half of the target videos in the set B are videos shot by unmanned aerial vehicle carried remote sensing equipment; performing blocking processing on the remote sensing image, and performing mask processing on half of blocks in the remote sensing image at random; setting each target video to comprise continuous 16-frame target images, randomly selecting a starting mask frame in the target video, and masking the starting mask frame and 7 subsequent frame target images; training a first neural network sub-model in the neural network model by using the remote sensing image after mask processing, training a second neural network sub-model in the neural network model by using the target video after mask processing, and carrying out weight sharing on an encoder in the first neural network sub-model and an encoder in the second neural network sub-model in the training process until the training is finished.
Experiments show that compared with random initialization model parameters, the model parameters of the trained neural network model are used as initial model parameters of models corresponding to different downstream tasks, and the model corresponding to the downstream tasks with the same training duration achieves higher accuracy: when the downstream task is a target detection task, the corresponding average precision average (mAP) index rises from 0.3629 to 0.3718; when the downstream task is a video prediction task, the corresponding Structural Similarity (SSIM) index rises from 0.7018 to 0.7152. Therefore, the video large model for the remote sensing scene constructed by the method is suitable for different downstream tasks, has strong generalization capability, and has strong corresponding feature extraction capability and feature rule mining capability, and can improve the precision of the model corresponding to different downstream tasks.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.
Claims (7)
1. The method for constructing the large video model for the remote sensing scene is characterized by comprising the following steps of:
acquiring a remote sensing image set A and a target video set B, wherein A= { a 1 ,a 2 ,…,a N },a n The method comprises the steps that N is the number of the remote sensing images in A, wherein the value range of N is 1 to N, and N is the number of the remote sensing images in A; b= { B 1 ,b 2 ,…,b M },b m For the mth target video in B, the value range of M is 1 to M, M is the number of the target videos in B, B m =(b m,1 ,b m,2 ,…,b m,Q ),b m,q B is m In the Q-th frame of target image, the value range of Q is 1 to Q, Q is the number of target images in the target video, b m,1 、b m,2 、…、b m,Q Q frames of target images are continuously shot; b, the target video is a video shot by a satellite carried remote sensing device or a remote carried by an unmanned aerial vehicleThe remote sensing image is an image shot by the satellite carried remote sensing equipment;
training a neural network model using a and B, the neural network model comprising a first neural network sub-model and a second neural network sub-model, the training comprising:
traversing A, pair a n Performing block processing, and randomly performing block processing on the a n The k x C blocks in (a) are subjected to mask processing; c is a pair a n The number of blocks obtained by partitioning is k, which is a preset mask proportion; a processed by mask n Training a first neural network sub-model, the first neural network sub-model being of a 2D swin-transformer structure, the first neural network sub-model comprising a first encoder and a first decoder;
traversal B, pair B m I of [ i ] m ,i m +L]Masking the frame image, i m +L≤Q,i m Not less than 1, L is the number of preset mask frames, i m B is m A start mask frame of (a); b processed by mask m Training a second neural network sub-model, the second sub-model being a 3D swin-transformer structure, the second neural network sub-model comprising a second encoder and a second decoder; the training of the first neural network sub-model is performed simultaneously with the training of the second neural network sub-model, and the second encoder and the first encoder have weight sharing in the training process.
2. The method for constructing a large video model for a remote sensing scene according to claim 1, wherein k is more than or equal to 40% and less than or equal to 60%.
3. The method for constructing a large video model for a remote sensing scene according to claim 2, wherein k=50%.
4. The method for constructing a large video model for a remote sensing scene according to claim 1, wherein q=16, and 5.ltoreq.l.ltoreq.9.
5. The method for constructing a large video model for a remote sensing scene as claimed in claim 4, wherein l=7.
6. The method for constructing the large video model for the remote sensing scene according to claim 1, wherein the number of videos shot by the remote sensing equipment carried by the unmanned aerial vehicle in the B is larger than the number of videos shot by the remote sensing equipment carried by the satellite in the B.
7. The method for constructing a large video model for a remote sensing scene as claimed in claim 1, wherein N and M are each in the order of millions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211635612.9A CN116109966B (en) | 2022-12-19 | 2022-12-19 | Remote sensing scene-oriented video large model construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211635612.9A CN116109966B (en) | 2022-12-19 | 2022-12-19 | Remote sensing scene-oriented video large model construction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116109966A true CN116109966A (en) | 2023-05-12 |
CN116109966B CN116109966B (en) | 2023-06-27 |
Family
ID=86266649
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211635612.9A Active CN116109966B (en) | 2022-12-19 | 2022-12-19 | Remote sensing scene-oriented video large model construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116109966B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019056845A1 (en) * | 2017-09-19 | 2019-03-28 | 北京市商汤科技开发有限公司 | Road map generating method and apparatus, electronic device, and computer storage medium |
WO2020232905A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium |
CN113706388A (en) * | 2021-09-24 | 2021-11-26 | 上海壁仞智能科技有限公司 | Image super-resolution reconstruction method and device |
CN114220015A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
CN114842351A (en) * | 2022-04-11 | 2022-08-02 | 中国人民解放军战略支援部队航天工程大学 | Remote sensing image semantic change detection method based on twin transforms |
CN114937202A (en) * | 2022-04-11 | 2022-08-23 | 青岛理工大学 | Double-current Swin transform remote sensing scene classification method |
CN115049921A (en) * | 2022-04-27 | 2022-09-13 | 安徽大学 | Method for detecting salient target of optical remote sensing image based on Transformer boundary sensing |
WO2022247711A1 (en) * | 2021-05-24 | 2022-12-01 | 广州智慧城市发展研究院 | Target associated video tracking processing method and device |
WO2022252557A1 (en) * | 2021-05-31 | 2022-12-08 | 上海商汤智能科技有限公司 | Neural network training method and apparatus, image processing method and apparatus, device, and storage medium |
-
2022
- 2022-12-19 CN CN202211635612.9A patent/CN116109966B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019056845A1 (en) * | 2017-09-19 | 2019-03-28 | 北京市商汤科技开发有限公司 | Road map generating method and apparatus, electronic device, and computer storage medium |
WO2020232905A1 (en) * | 2019-05-20 | 2020-11-26 | 平安科技(深圳)有限公司 | Superobject information-based remote sensing image target extraction method, device, electronic apparatus, and medium |
WO2022247711A1 (en) * | 2021-05-24 | 2022-12-01 | 广州智慧城市发展研究院 | Target associated video tracking processing method and device |
WO2022252557A1 (en) * | 2021-05-31 | 2022-12-08 | 上海商汤智能科技有限公司 | Neural network training method and apparatus, image processing method and apparatus, device, and storage medium |
CN113706388A (en) * | 2021-09-24 | 2021-11-26 | 上海壁仞智能科技有限公司 | Image super-resolution reconstruction method and device |
CN114220015A (en) * | 2021-12-21 | 2022-03-22 | 一拓通信集团股份有限公司 | Improved YOLOv 5-based satellite image small target detection method |
CN114842351A (en) * | 2022-04-11 | 2022-08-02 | 中国人民解放军战略支援部队航天工程大学 | Remote sensing image semantic change detection method based on twin transforms |
CN114937202A (en) * | 2022-04-11 | 2022-08-23 | 青岛理工大学 | Double-current Swin transform remote sensing scene classification method |
CN115049921A (en) * | 2022-04-27 | 2022-09-13 | 安徽大学 | Method for detecting salient target of optical remote sensing image based on Transformer boundary sensing |
Non-Patent Citations (3)
Title |
---|
FANGLONG YAO ET AL.: "《Gated hierarchical multi-task learning network for judicial decision prediction》", 《NEUROCOMPUTING》, vol. 411, pages 313 - 326 * |
楼林,黄韦艮: "基于人工神经网络的赤潮卫星遥感方法研究", 遥感学报, no. 02, pages 125 - 130 * |
焦云清;王世新;周艺;扶卿华;: "基于神经网络的遥感影像超高分辨率目标识别", 系统仿真学报, no. 14, pages 3223 - 3225 * |
Also Published As
Publication number | Publication date |
---|---|
CN116109966B (en) | 2023-06-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10924755B2 (en) | Real time end-to-end learning system for a high frame rate video compressive sensing network | |
Liu et al. | Mobile video object detection with temporally-aware feature maps | |
Wu et al. | Compressed video action recognition | |
CN108960059A (en) | A kind of video actions recognition methods and device | |
CN112084868A (en) | Target counting method in remote sensing image based on attention mechanism | |
CN113592026B (en) | Binocular vision stereo matching method based on cavity volume and cascade cost volume | |
CN110751018A (en) | Group pedestrian re-identification method based on mixed attention mechanism | |
CN115457498A (en) | Urban road semantic segmentation method based on double attention and dense connection | |
CN111860175B (en) | Unmanned aerial vehicle image vehicle detection method and device based on lightweight network | |
CN110765841A (en) | Group pedestrian re-identification system and terminal based on mixed attention mechanism | |
Löhdefink et al. | On low-bitrate image compression for distributed automotive perception: Higher peak snr does not mean better semantic segmentation | |
CN116958687A (en) | Unmanned aerial vehicle-oriented small target detection method and device based on improved DETR | |
Löhdefink et al. | GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation | |
CN118097150A (en) | Small sample camouflage target segmentation method | |
CN116109966B (en) | Remote sensing scene-oriented video large model construction method | |
CN113160250A (en) | Airport scene surveillance video target segmentation method based on ADS-B position prior | |
CN117097853A (en) | Real-time image matting method and system based on deep learning | |
CN113887419B (en) | Human behavior recognition method and system based on extracted video space-time information | |
CN116340568A (en) | Online video abstract generation method based on cross-scene knowledge migration | |
CN112861698B (en) | Compressed domain behavior identification method based on multi-scale time sequence receptive field | |
CN115346115A (en) | Image target detection method, device, equipment and storage medium | |
Doan et al. | Real-time Image Semantic Segmentation Networks with Residual Depth-wise Separable Blocks | |
Yue et al. | A small target detection method for UAV aerial images based on improved YOLOv5 | |
CN116703786B (en) | Image deblurring method and system based on improved UNet network | |
CN115631115B (en) | Dynamic image restoration method based on recursion transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |