CN113393495B - High-altitude parabolic track identification method based on reinforcement learning - Google Patents

High-altitude parabolic track identification method based on reinforcement learning Download PDF

Info

Publication number
CN113393495B
CN113393495B CN202110685692.8A CN202110685692A CN113393495B CN 113393495 B CN113393495 B CN 113393495B CN 202110685692 A CN202110685692 A CN 202110685692A CN 113393495 B CN113393495 B CN 113393495B
Authority
CN
China
Prior art keywords
altitude parabolic
image
model
reinforcement learning
altitude
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110685692.8A
Other languages
Chinese (zh)
Other versions
CN113393495A (en
Inventor
郭洪飞
马向东
曾云辉
陈柄赞
何智慧
任亚平
张锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan University
Original Assignee
Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan University filed Critical Jinan University
Priority to CN202110685692.8A priority Critical patent/CN113393495B/en
Publication of CN113393495A publication Critical patent/CN113393495A/en
Application granted granted Critical
Publication of CN113393495B publication Critical patent/CN113393495B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20182Noise reduction or smoothing in the temporal domain; Spatio-temporal filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a high-altitude parabolic track identification method based on reinforcement learning. The method comprises the following steps: acquiring a high-altitude parabolic track image of a monitored window area through an image sensor; preprocessing the high-altitude parabolic track image to obtain preprocessed image information; judging whether the image sensor is shielded or not according to the preprocessed image information; when the image sensor is judged not to be shielded, inputting the preprocessed image information into a processor, acquiring a pre-training target model after reinforcement learning by the processor, and performing high-altitude parabolic recognition on the preprocessed image information through the pre-training target model to obtain high-altitude parabolic recognition result information; and the processor stores the high-altitude parabolic recognition result information into a data storage unit, a cloud server and a storage so as to train and update the pre-training target model. According to the method, the high-altitude parabolic track is identified through the reinforcement learning model, and the identification accuracy is improved.

Description

High-altitude parabolic track identification method based on reinforcement learning
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a high-altitude parabolic track identification method based on reinforcement learning.
Background
As the economies of scale further develop, the population of cities gathers, the production and living environment of mankind is filled with various uncertainties and risks, the high altitude parabola is called "pain over cities", which cannot be easily controlled and stopped once started, and is in a rapidly developing situation, and once the behavior is started and reaches the standard, the behavior is difficult to be controlled and stopped immediately, so that the behavior can rapidly spread in a very short time, and great damage is caused to public safety. Particularly, since recent years, civil and criminal cases about high-altitude parabolic behaviors are increasing, and newspapers in various places report events about high-altitude parabolic injuries in a dispute, so that people have a dispute and call for asking for strict regulations on high-altitude parabolic behaviors in order to ensure the 'top safety' of people. In the background, the highest courtyard publishes the opinion on justice-based proper high-altitude parabolic and falling cases of the highest people's court, and the crime and punishment are harmed by a dangerous method as long as the social public safety is endangered even though the actual damage result is not caused.
For traditional reinforcement learning, the typical problem is the Markov Decision Process (MDP). The markov decision process contains a set of states S and actions a. The transition of the states is determined by the probability P, the reward R and a compromise parameter gamma. The probability P reflects the relationship between transitions and rewards for state transitions, which depend only on the state and action of the last time step. Reinforcement learning defines an environment for an Agent (a software and hardware system) to implement certain actions to maximize rewards. The basis for the optimization behavior of an Agent is defined by Bellman's equation, a method widely used to solve practical optimization problems. Reinforcement learning is good enough for the environment when all reachable states are controllable and can be stored in computer RAM (random access memory). However, when the number of states in the environment exceeds the capacity of modern computers, the standard reinforcement learning mode is less effective. Moreover, in a real environment, the agent must face the problems of continuous state, continuous variables and continuous control (action). Therefore, the standard, well-defined reinforcement learning Q-table is replaced by a deep neural network, i.e., a Q-network, which can map environmental states to agent actions. The network architecture, selection of network hyper-parameters and learning are all done in the training phase (learning of Q network weights). DQN (Deep Q Network, reinforcement learning) allows agents to explore unstructured environments and gain knowledge that, over time, can mimic human behavior. We use DQN algorithm to solve this problem of continuous state (non-discrete), continuous variable and continuous control (action) in high altitude parabolic trajectory recognition systems.
At present, there are already high altitude parabolic trajectory prediction patents in the market: a high-altitude parabolic detection method, equipment, a storage medium (patent number: CN111931599A) and a high-altitude parabolic radar wave visual fusion monitoring and early warning system (patent number: CN201922207460.2) are provided. The former is to calculate the motion state of an object by an image processing algorithm based on SUV (standard absorption value) or the like to realize prediction, and the latter is to monitor a high-altitude parabolic trajectory by using a radar system. Therefore, there are few ideas in analyzing and predicting the high-altitude parabolic trajectory from the perspective of the intelligent prediction algorithm in the market.
Disclosure of Invention
The invention aims to provide a high-altitude parabolic track identification method based on reinforcement learning so as to accurately identify a high-altitude parabolic track.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a high-altitude parabolic track identification method based on reinforcement learning comprises the following steps:
s1, acquiring a high-altitude parabolic track image of the monitored window area through an image sensor;
s2, preprocessing the high-altitude parabolic track image to obtain preprocessed image information;
s3, judging whether the image sensor is blocked according to the preprocessed image information;
s4, when the image sensor is judged not to be shielded, the preprocessed image information is input to a processor, the processor obtains a pre-training target model after reinforcement learning, and high-altitude parabolic recognition is carried out on the preprocessed image information through the pre-training target model to obtain high-altitude parabolic recognition result information;
and S5, the processor stores the high altitude parabolic recognition result information into a data storage unit, a cloud server and a storage to train and update the pre-training target model.
Optionally, the S2 includes:
s2.1, converting the high-altitude parabolic image collected by the image sensor into a low-dimensional gray image;
s2.2, carrying out affine transformation on the gray level image;
s2.3, carrying out noise elimination on the gray image after affine transformation in a spatial filtering and time domain filtering mode;
and S2.4, acquiring a target detection frame of the moving object in each frame of image after noise elimination by adopting a background difference and inter-frame difference fusion method, and predicting the target detection frame of the moving object in the next frame of image according to the target detection frame in the previous frame of image through Kalman filtering to obtain the preprocessed image information.
Optionally, the S3 includes:
s3.1, acquiring pixel values and distribution characteristics in the preprocessed image information;
s3.2, judging whether the image sensor is shielded or not according to the size and the distribution characteristics of the pixel values in the preprocessed image information;
and S3.3, when the image sensor is judged to be shielded, storing the preprocessed image information into the cloud server and the storage.
Optionally, after the S2 and before the S3, the method further comprises: and storing the preprocessed image information into the cloud server and a storage.
Optionally, the step of obtaining the pre-trained target model in S4 includes:
s4.1, initializing an action model and a target model before pre-training;
s4.2, establishing a simulation environment, and transmitting the optimal action parameters to the simulation environment by the action model;
s4.3, the simulation environment simulates according to the optimal action parameters to obtain simulated action parameters, and stores the simulated action parameters to the data storage unit;
s4.4, the action model acquires the simulated action parameters from the data storage unit so as to train and update the action model;
and S4.5, copying the latest simulated action parameters to the target model after the action model is trained for C times to train and update the target model to obtain the pre-trained target model, wherein C is an integer greater than or equal to 2.
Optionally, the optimal action parameters in S4.2 include: the high-altitude parabolic track image, the high-altitude parabolic predicted track and the target model parameters.
Optionally, the simulated operation parameters in S4.3 include: the high altitude parabolic track image of the current state, the current high altitude parabolic predicted track, the current reward obtaining and the high altitude parabolic track image of the next state.
Optionally, the step of establishing a simulation environment in S4.2 includes:
s4.2.1, acquiring physical characteristics, dynamic characteristics and surrounding environment characteristics of a high-altitude parabolic moving object;
s4.2.2, analyzing the physical characteristics, dynamic characteristics and surrounding environment characteristics of the moving object of the high-altitude parabola according to the air resistance and wind speed variables of the environment of the high-altitude parabola to establish the simulation environment.
Optionally, the action model and the target model continuously obtain high-altitude parabolic track prediction error information in an updating process, so as to change a prediction strategy according to the error information and an error value of an adjacent frame high-altitude parabolic track image.
Optionally, the S5 further includes: and comparing the high-altitude parabolic recognition result information with an actual high-altitude parabolic track to obtain actual prediction error information, and feeding back the actual prediction error information to the data storage unit.
The invention has at least one of the following beneficial effects:
the method starts from the angle of an intelligent prediction algorithm and the idea of predicting the high-altitude parabolic track, and the high-altitude parabolic track is recognized through a reinforcement learning model, so that the recognition accuracy rate is improved. In the high-altitude parabolic track recognition method based on reinforcement learning, the processor acquires a pre-training target model after reinforcement learning, so that high-altitude parabolic recognition is performed on pre-processing image information through the pre-training target model, the pre-training target model does not need to train a data set labeled manually and can improve high-altitude parabolic track prediction accuracy, and the processor stores high-altitude parabolic recognition result information into a data storage unit, a cloud server and a storage, so that the pre-training target model is trained and updated, the high-altitude parabolic track prediction accuracy can be further improved, the data storage unit can improve the data utilization rate, samples participating in network training can meet the requirement of independent and same distribution, and the training stability is improved.
Furthermore, in the high-altitude parabolic track recognition method based on reinforcement learning provided by the invention, after the action model is updated every C times, the latest simulated action parameters are copied to the target model to train and update the target model, so that the stability of model training is ensured, and the action model and the target model continuously acquire high-altitude parabolic track prediction error information in the updating process, so that the prediction strategy is changed according to the error information and the error value of the adjacent frame high-altitude parabolic track image, and the high-altitude parabolic track prediction accuracy can be effectively improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a flowchart of a high-altitude parabolic trajectory recognition method based on reinforcement learning according to this embodiment;
fig. 2 is a specific working schematic diagram of the high-altitude parabolic trajectory identification method based on reinforcement learning according to the present embodiment;
fig. 3 is a schematic diagram of a reinforcement learning model architecture provided in this embodiment.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The invention relates to a high-altitude parabolic track identification method based on reinforcement learning. The method is applied to a high-altitude parabolic track recognition system based on reinforcement learning. The system mainly comprises a simulation environment module, a data storage unit, an action model, a target model, a DQN error function module, an image acquisition module, a preprocessing module, an image storage module, a shielding prediction module, a cloud server and a memory module.
The high-altitude parabolic trajectory recognition method based on reinforcement learning of the present embodiment is described below with reference to the drawings.
Referring to fig. 1, the high-altitude parabolic trajectory identification method based on reinforcement learning according to the present embodiment includes the following steps:
and S1, acquiring a high-altitude parabolic track image of the monitored window area through an image sensor.
Specifically, an image sensor is installed at a proper position of the monitored window to collect image information of the monitored window. In order to reduce monitoring dead angles as much as possible, a plurality of image sensors with different angles are designed for the same window to acquire information, so that the probability that a malicious parabolic user avoids a camera in the process of parabolic movement is reduced.
And S2, preprocessing the high-altitude parabolic track image to obtain preprocessed image information.
Wherein the S2 includes:
s2.1, converting the high-altitude parabolic image collected by the image sensor into a low-dimensional gray image;
s2.2, carrying out affine transformation on the gray level image;
s2.3, carrying out noise elimination on the gray image after affine transformation in a spatial filtering and time domain filtering mode;
and S2.4, acquiring a target detection frame of the moving object in each frame of image after noise elimination by adopting a background difference and inter-frame difference fusion method, and predicting the target detection frame of the moving object in the next frame of image according to the target detection frame in the previous frame of image through Kalman filtering to obtain the preprocessed image information.
Specifically, as shown with reference to FIG. 2, the collected data information is passed to a preprocessing module. And converting the color image collected by the image sensor into a low-dimensional gray-scale image. The converted image still retains main information, meanwhile, the data processing burden is reduced, affine transformation of the image is carried out, scale, stretching, rotation and translation change is carried out, the image information suitable for being predicted by using a training model is processed, salt and pepper noise and Gaussian noise are eliminated through spatial filtering and time domain filtering, and then a target detection frame of a moving object in each frame of image is obtained through a background difference and interframe difference fusion method. The background difference method can better keep the whole foreground of the target; the frame difference method has high detection sensitivity, and the background difference foreground in a window around the frame difference foreground is reserved on the basis of the frame difference foreground, so that a target detection frame of a moving object is detected and obtained more completely by using a background difference and interframe difference fusion method, and the target prediction frame of the moving object in a next frame image can be predicted by using Kalman filtering according to the target detection frame in the previous frame image, so that the preprocessed image information is obtained.
As one example, information collected by the image sensor is passed to a pre-processing module, which performs image pre-processing on the collected image information. The native size of the collected image is 210 × 160, with 128 colors per pixel, which is converted to a grayscale image of 84 × 84 dimensions. The transformed image still retains the main information while reducing the burden of data processing.
It should be noted that, since the trajectory of the high altitude parabola is continuous, the Agent can only obtain 1 frame of information from the environment at each moment, and such static image information is difficult to represent the dynamic motion information of the thrown object. To this end, the recognition algorithm will collect the first N frames from the current time and combine this information as input to the model. The state information collected within a certain time is obtained, and the reinforcement learning model can learn more accurate action value.
And S3, judging whether the image sensor is blocked according to the preprocessed image information.
Wherein the S3 includes:
s3.1, acquiring pixel values and distribution characteristics in the preprocessed image information;
s3.2, judging whether the image sensor is shielded or not according to the size and the distribution characteristics of the pixel values in the preprocessed image information;
and S3.3, when the image sensor is judged to be shielded, storing the preprocessed image information into the cloud server and the storage.
Specifically, referring to fig. 2, after the trajectory information of the high altitude parabola is processed by the preprocessing module, it needs to be predicted to be blocked, and whether the image sensor is blocked or not is judged by judging the size and distribution characteristics of the pixel value of the preprocessed image information, and the prediction result can be transmitted to the cloud server and the storage. The preprocessed image information is also transmitted to an image storage module, so that historical basis and experience are provided for the arrival of the subsequent similar recognition task.
After the S2 and before the S3, the method further comprises: and storing the preprocessed image information into the cloud server and a storage.
And S4, when the image sensor is judged not to be shielded, inputting the preprocessed image information into a processor, acquiring a pre-training target model after reinforcement learning by the processor, and performing high-altitude parabolic recognition on the preprocessed image information through the pre-training target model to obtain high-altitude parabolic recognition result information.
Specifically, when the image sensor is judged not to be shielded, the preprocessed image information obtained by the preprocessing module is transmitted to the processor for data processing, and the main high-altitude falling object trajectory prediction and recognition task is realized by data transmission and processing with the established pre-training target model. And judging and predicting the real-occurring track through the existing prediction experience of the pre-trained target model.
The step of obtaining the pre-training target model in S4 includes:
s4.1, initializing an action model and a target model before pre-training;
s4.2, establishing a simulation environment, and transmitting the optimal action parameters to the simulation environment by the action model;
s4.3, the simulation environment simulates according to the optimal action parameters to obtain simulated action parameters, and stores the simulated action parameters to the data storage unit;
s4.4, the action model acquires the simulated action parameters from the data storage unit so as to train and update the action model;
and S4.5, copying the latest simulated action parameters to the target model after the action model is trained for C times to train and update the target model to obtain the pre-trained target model, wherein C is an integer greater than or equal to 2.
Specifically, when initializing an action model and a target model before pre-training, parameters to be optimized need to be extracted from the model, where s represents a high-altitude parabolic track image to be identified, a represents a high-altitude parabolic predicted track, r represents accuracy of a prediction result, i.e., an obtained reward, t represents a t-th time step, G represents an accumulated reward, γ represents a decay factor of the reward, k represents an accumulated reward calculated from the k-th step, and a definition function Q is as follows:
Figure BDA0003124561730000071
state:St=f(Ht),At=h(St) (2)
the loss function is defined as, where θ represents the model parameters:
L(θ)=E[(TargetQ-Q(s,a;θ))2] (3)
the objective function is:
Figure BDA0003124561730000081
the objective model solution value objective function formula is as follows:
Figure BDA0003124561730000082
wherein theta is-And (3) further expanding the formula as a parameter of the Target Network, wherein j represents a state number, and the formula is as follows:
yj=rj+1+γQ(sj+1,argmaxa'Q(sj+1,a';θ-);θ-) (6)
the updating method comprises the following steps:
Figure BDA0003124561730000083
and continuously interacting with the action model in the subsequent training process, and feeding back to the action model:
at=argmaxaQ(φ(st),a;θ) (8)
the structure of the model mainly adopts
Figure BDA0003124561730000084
The output of the model is a vector of length | A |, each value in the vector representing a value estimate for the corresponding action. Thus, only one calculation is needed to find the value of all actions, and the time for evaluating the value is the same no matter how many actions exist.
The step of establishing the simulation environment in S4.2 includes:
s4.2.1, acquiring physical characteristics, dynamic characteristics and surrounding environment characteristics of a high-altitude parabolic moving object;
s4.2.2, analyzing the physical characteristics, dynamic characteristics and surrounding environment characteristics of the moving object of the high-altitude parabola according to the air resistance and wind speed variables of the environment of the high-altitude parabola to establish the simulation environment.
Specifically, a virtual environment can be constructed according to the real environment characteristics of the high-altitude parabolic time, and materials are provided for model training. A motion trail model is established mainly according to physical characteristics of object motion in high-altitude parabolic motion and dynamic characteristics in combination with surrounding environment, and a simulation environment is established by considering variables such as air resistance, wind speed and the like. Therefore, the method can be attached to the real high-altitude parabolic scene as much as possible, and provides the most accurate materials for the training of the model.
When the action model interacts with the simulation environment, the action model transmits the optimal action argmaxQ (s, a, theta) to the simulation environment. Wherein s is a high-altitude parabolic track image, a is a high-altitude parabolic predicted track, and theta is a target model parameter.
And S4.3, the simulation environment simulates according to the optimal action parameters to obtain simulated action parameters, and stores the simulated action parameters to the data storage unit.
The simulated action parameters in S4.3 include: the high altitude parabolic track image of the current state, the current high altitude parabolic predicted track, the current reward obtaining and the high altitude parabolic track image of the next state.
Specifically, the simulation environment transmits the current state s to the action model, and stores the current state s, the current action a, the currently obtained reward r, and the next state s' in the data storage unit.
It should be noted that in the training process, the recognition algorithm can make a decision from a random scene, and if we make a decision from a fixed scene each time, the Agent always makes a decision on these same frames, which obviously is not beneficial to exploring more frames for learning. In order to enhance the exploratory property without deteriorating the model effect, the Agent is enabled to perform random actions in a short period of time from the beginning, so that different scene samples can be obtained to the maximum extent.
And S4.4, the action model acquires the simulated action parameters from the data storage unit so as to train and update the action model.
Specifically, the motion model acquires (s, a, r, s') data from the data storage unit and updates the model. The data storage unit stores sample data information, simulation prediction information and results, and is set to store 100 ten thousand samples, so that samples in a long period of time can be stored. When the value function is trained, a certain number of samples can be taken out from the value function, and training is carried out according to the information recorded by the samples. In general, the data storage unit includes both the process of collecting samples and sampling samples. The collected samples are stored in the structure in chronological order, and if the data storage unit is already full of samples, the new samples will overwrite the samples that are the oldest in time. The action model acquires information from the data storage unit to realize information transfer and adaptive updating with the simulation environment.
And S4.5, copying the latest simulated action parameters to the target model after the action model is trained for C times to train and update the target model to obtain the pre-trained target model, wherein C is an integer greater than or equal to 2.
Specifically, the action model copies the model parameters to the target model every C updates. If the latest sample is taken every time, the algorithm is similar to online learning, and the data storage unit uniformly and randomly samples a batch of samples from the cache for learning so as to satisfy that the sequence obtained by interaction has certain correlation in the time dimension. In the future, the learned value function can represent the expectation of long-term benefits under the action of the current state, however, the sequence obtained by each interaction only represents one sampling track under the action of the current state, and cannot represent all possible tracks, so that the estimated result has a certain difference from the expected result. This gap accumulates more and more as the interaction time is lengthened. And the model is prone to large fluctuations. Therefore, after uniform sampling is adopted, the sample of each training usually comes from a plurality of interactive sequences, so that the fluctuation of a single sequence is greatly reduced, and the training effect is greatly stabilized. Meanwhile, one sample can be trained for multiple times, and the utilization rate of the sample is improved. Therefore, the model parameters are copied to the target model every time the model parameters are updated for C times, so that the instability of data is reduced, and the utilization rate of the data is improved.
Optionally, the action model and the target model continuously obtain high-altitude parabolic track prediction error information in an updating process, so as to change a prediction strategy according to the error information and an error value of an adjacent frame high-altitude parabolic track image.
Specifically, the action model and the target model receive information from the DQN error module during the update process. And continuously drawing the error information of the DQN error module by the action model and the target model in the updating process, changing an optimization strategy by the action model and the target model according to the numerical value of the error information and the error numerical value of the adjacent frame picture, and further updating so as to improve the model prediction accuracy. The DQN error module also stores the reward information r in a data storage unit for subsequent random, repetitive training and goal updating to provide data support.
As an example, the obtaining step of the pre-training target model may specifically include:
when training begins, the action model and the target model use the same parameters, and in the training process, the action model is responsible for interacting with the simulation environment to obtain an interaction sample. In the Learning process, the target value obtained by Q-Learning is calculated from the target model, and then compared with the estimated value of the motion model to obtain the target value and update the motion model. And each time the training completes a certain number of iterations, the parameters of the action model are synchronized to the target model, so that the next stage of learning can be carried out. By using the target model, the model that calculates the value of the target will be fixed over a period of time so that the model can mitigate the volatility of the model.
The S5 further includes: and comparing the high-altitude parabolic recognition result information with an actual high-altitude parabolic track to obtain actual prediction error information, and feeding back the actual prediction error information to the data storage unit.
Specifically, the system can judge whether an error is generated between the high-altitude parabolic recognition result information and the actual high-altitude parabolic track, and feeds back a comparison result to the data storage unit, so that an actual prediction effect is provided for a subsequent model training module, and further optimization and upgrading of the deep reinforcement learning system are promoted.
In order to make the construction process of the reinforcement learning model in the invention clear to those skilled in the art. The construction of the reinforcement learning model will be described in detail below.
FIG. 3 is a schematic diagram of an architecture of a reinforcement learning model. Strong learning model in this embodiment
Figure BDA0003124561730000111
The output of (a) is a vector of length | A |, the vectorEach value in (a) represents a value estimate for the corresponding action.
The main body of the model adopts a structure of four layers of convolutional neural networks: s represents the image of the high-altitude parabolic track to be identified, a represents the predicted track of the high-altitude parabolic track, and r represents the accuracy of the prediction result. For four convolutional layers:
the number of channels output for stride by the convolution kernel of the first layer convolutional layer is 32, and then the ReLU nonlinear layer is applied. The convolution kernel of the second convolutional layer is stride output with the number of channels of 64, after which the ReLU nonlinear layer is applied. The convolution kernel of the third convolutional layer is stride output with the number of channels of 64, and then the ReLU nonlinear layer is applied. The fourth layer is a fully connected layer, with output dimension 512, after which the ReLU non-linear layer is applied. And finally, obtaining value estimation of the corresponding action by the full connection layer.
The reinforcement learning model in this embodiment adopts greedy's strategy, which generates actions at random with a probability of 100% at first, and this probability will decay continuously with the continuous training, and eventually to 10%. That is, there is a 90% probability of executing the current optimal strategy. In this way, the strategy mainly used for exploration is gradually changed into the strategy mainly used for utilization, and the two strategies are well combined.
It should be noted that, when the reinforcement learning model is subjected to simulation training, if the test is performed from the same scene each time, the Agent always makes a decision on the same frames, which obviously is not beneficial to us to explore more frames for learning. In order to enhance the exploratory property without deteriorating the model effect, the Agent can be set to perform random actions in a short period of time from the beginning of the game, so that different scene samples can be obtained to the maximum extent.
When processing frame images collected by an image sensor, because the pictures between adjacent frames have great similarity, the same action can be generally adopted for the very similar pictures, so that the judgment of a certain number of frames is skipped, the space-time complexity of an algorithm is reduced, and the repeated processing of redundant data is avoided.
Meanwhile, due to the fact that the variance of the reward value is large, in order to enable a reinforcement learning model to better fit long-term return, the score needs to be compressed into a range which is good for the model to process, and the return obtained in each round is compressed to be between-1 and 1.
In summary, as shown in fig. 2, the main steps of the high-altitude parabolic trajectory identification method based on reinforcement learning can be divided into two stages:
in the model training phase: initializing an action model and a target model; the action model interacts with the simulation environment, the action model transmits the optimal action argmaxQ (s, a, theta) to the simulation environment, the simulation environment transmits the current state s to the action model, and stores the current state s, the current action a, the currently obtained reward r and the next state s' in the data storage unit; the action model acquires (s, a, r, s') data from the data storage unit and updates the model; copying the model parameters to the target model for each C times of updating of the action model; the action model and the target model receive information from the DQN error module in the updating process; the DQN error module stores the reward information r to the data storage unit.
In the model application phase: the image acquisition module acquires image information in a real scene and transmits the image information to the preprocessing module; the data preprocessing model carries out image cutting and median filtering operation, and key areas of the images are extracted; after the image is preprocessed, the image is stored in a cloud server and a storage through an image storage module; the preprocessed image is transmitted to a shielding prediction module, whether the camera is shielded in practical application is judged, and the result is transmitted to a cloud server and a storage; the processor judges the preprocessed image information by using the model trained in the model training stage, predicts to obtain a high-altitude parabolic track, and transmits a related result to the data storage unit to further train and update the target model.
In the high-altitude parabolic track recognition method based on reinforcement learning, the processor acquires a pre-training target model after reinforcement learning, so that high-altitude parabolic recognition is performed on pre-processing image information through the pre-training target model, the pre-training target model does not need to train a data set labeled manually and can improve high-altitude parabolic track prediction accuracy, and the processor stores high-altitude parabolic recognition result information into a data storage unit, a cloud server and a storage, so that the pre-training target model is trained and updated, the high-altitude parabolic track prediction accuracy can be further improved, the data storage unit can improve the data utilization rate, samples participating in network training can meet the requirement of independent and same distribution, and the training stability is improved.
Furthermore, in the high-altitude parabolic track recognition method based on reinforcement learning provided by the invention, after the action model is updated every C times, the latest simulated action parameters are copied to the target model to train and update the target model, so that the stability of model training is ensured, and the action model and the target model continuously acquire high-altitude parabolic track prediction error information in the updating process, so that the prediction strategy is changed according to the error information and the error value of the adjacent frame high-altitude parabolic track image, and the high-altitude parabolic track prediction accuracy can be effectively improved.
It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. A high-altitude parabolic track identification method based on reinforcement learning is characterized by comprising the following steps:
s1, acquiring a high-altitude parabolic track image of the monitored window area through an image sensor;
s2, preprocessing the high-altitude parabolic track image to obtain preprocessed image information;
s3, judging whether the image sensor is blocked according to the preprocessed image information;
s4, when the image sensor is judged not to be shielded, the preprocessed image information is input to a processor, the processor obtains a pre-training target model after reinforcement learning, and high-altitude parabolic recognition is carried out on the preprocessed image information through the pre-training target model to obtain high-altitude parabolic recognition result information;
s5, the processor stores the high altitude parabolic recognition result information into a data storage unit, a cloud server and a storage to train and update the pre-training target model;
the step of obtaining the pre-training target model in S4 includes:
s4.1, initializing an action model and a target model before pre-training;
s4.2, establishing a simulation environment, and transmitting the optimal action parameters to the simulation environment by the action model;
s4.3, the simulation environment simulates according to the optimal action parameters to obtain simulated action parameters, and stores the simulated action parameters to the data storage unit;
s4.4, the action model acquires the simulated action parameters from the data storage unit so as to train and update the action model;
s4.5, copying the latest simulated action parameters to the target model after the action model is trained for C times to train and update the target model to obtain the pre-trained target model, wherein C is an integer greater than or equal to 2;
the step of establishing the simulation environment in S4.2 includes:
s4.2.1, acquiring physical characteristics, dynamic characteristics and surrounding environment characteristics of a high-altitude parabolic moving object;
s4.2.2, analyzing the physical characteristics, dynamic characteristics and surrounding environment characteristics of the moving object of the high-altitude parabola according to the air resistance and wind speed variables of the environment of the high-altitude parabola to establish the simulation environment.
2. The reinforcement learning-based high-altitude parabolic trajectory recognition method according to claim 1, wherein the S2 includes:
s2.1, converting the high-altitude parabolic image collected by the image sensor into a low-dimensional gray image;
s2.2, carrying out affine transformation on the gray level image;
s2.3, carrying out noise elimination on the gray image after affine transformation in a spatial filtering and time domain filtering mode;
and S2.4, acquiring a target detection frame of the moving object in each frame of image after noise elimination by adopting a background difference and inter-frame difference fusion method, and predicting the target detection frame of the moving object in the next frame of image according to the target detection frame in the previous frame of image through Kalman filtering to obtain the preprocessed image information.
3. The reinforcement learning-based high-altitude parabolic trajectory recognition method according to claim 1, wherein the S3 includes:
s3.1, acquiring pixel values and distribution characteristics in the preprocessed image information;
s3.2, judging whether the image sensor is shielded or not according to the size and the distribution characteristics of the pixel values in the preprocessed image information;
and S3.3, when the image sensor is judged to be shielded, storing the preprocessed image information into the cloud server and the storage.
4. The reinforcement learning-based high-altitude parabolic trajectory recognition method according to claim 1, wherein after the S2 and before the S3, the method further comprises: and storing the preprocessed image information into the cloud server and a storage.
5. The reinforcement learning-based high-altitude parabolic track recognition method according to claim 1, wherein the optimal action parameters in S4.2 include: the high-altitude parabolic track image, the high-altitude parabolic predicted track and the target model parameters.
6. The reinforcement learning-based high-altitude parabolic track recognition method according to claim 1, wherein the simulated action parameters in S4.3 include: the high altitude parabolic track image of the current state, the current high altitude parabolic predicted track, the current reward obtaining and the high altitude parabolic track image of the next state.
7. The reinforcement learning-based high-altitude parabolic track recognition method as claimed in claim 1, wherein the action model and the target model continuously obtain high-altitude parabolic track prediction error information in an updating process, so as to change a prediction strategy according to the error information and an error value of an adjacent frame high-altitude parabolic track image.
8. The reinforcement learning-based high-altitude parabolic trajectory recognition method according to claim 1, wherein the S5 further includes: and comparing the high-altitude parabolic recognition result information with an actual high-altitude parabolic track to obtain actual prediction error information, and feeding back the actual prediction error information to the data storage unit.
CN202110685692.8A 2021-06-21 2021-06-21 High-altitude parabolic track identification method based on reinforcement learning Active CN113393495B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110685692.8A CN113393495B (en) 2021-06-21 2021-06-21 High-altitude parabolic track identification method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110685692.8A CN113393495B (en) 2021-06-21 2021-06-21 High-altitude parabolic track identification method based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN113393495A CN113393495A (en) 2021-09-14
CN113393495B true CN113393495B (en) 2022-02-01

Family

ID=77623201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110685692.8A Active CN113393495B (en) 2021-06-21 2021-06-21 High-altitude parabolic track identification method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN113393495B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116597340B (en) * 2023-04-12 2023-10-10 深圳市明源云科技有限公司 High altitude parabolic position prediction method, electronic device and readable storage medium
CN116977931A (en) * 2023-07-31 2023-10-31 深圳市星河智善科技有限公司 High-altitude parabolic identification method based on deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10175697B1 (en) * 2017-12-21 2019-01-08 Luminar Technologies, Inc. Object identification and labeling tool for training autonomous vehicle controllers
CN110084414A (en) * 2019-04-18 2019-08-02 成都蓉奥科技有限公司 A kind of blank pipe anti-collision method based on the study of K secondary control deeply
CN111415389A (en) * 2020-03-18 2020-07-14 清华大学 Label-free six-dimensional object posture prediction method and device based on reinforcement learning
CN111618847A (en) * 2020-04-22 2020-09-04 南通大学 Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
CN112257557A (en) * 2020-10-20 2021-01-22 中国电子科技集团公司第五十八研究所 High-altitude parabolic detection and identification method and system based on machine vision
CN112269390A (en) * 2020-10-15 2021-01-26 北京理工大学 Small celestial body surface fixed-point attachment trajectory planning method considering bounce
CN112818599A (en) * 2021-01-29 2021-05-18 四川大学 Air control method based on reinforcement learning and four-dimensional track

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9679258B2 (en) * 2013-10-08 2017-06-13 Google Inc. Methods and apparatus for reinforcement learning
US10204097B2 (en) * 2016-08-16 2019-02-12 Microsoft Technology Licensing, Llc Efficient dialogue policy learning
US11295174B2 (en) * 2018-11-05 2022-04-05 Royal Bank Of Canada Opponent modeling with asynchronous methods in deep RL
KR20200080396A (en) * 2018-12-18 2020-07-07 삼성전자주식회사 Autonomous driving method and apparatus thereof
CN109521774B (en) * 2018-12-27 2023-04-07 南京芊玥机器人科技有限公司 Spraying robot track optimization method based on reinforcement learning
CN110458281B (en) * 2019-08-02 2021-09-03 中科新松有限公司 Method and system for predicting deep reinforcement learning rotation speed of table tennis robot
CN111263332A (en) * 2020-03-02 2020-06-09 湖北工业大学 Unmanned aerial vehicle track and power joint optimization method based on deep reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10175697B1 (en) * 2017-12-21 2019-01-08 Luminar Technologies, Inc. Object identification and labeling tool for training autonomous vehicle controllers
CN110084414A (en) * 2019-04-18 2019-08-02 成都蓉奥科技有限公司 A kind of blank pipe anti-collision method based on the study of K secondary control deeply
CN111415389A (en) * 2020-03-18 2020-07-14 清华大学 Label-free six-dimensional object posture prediction method and device based on reinforcement learning
CN111618847A (en) * 2020-04-22 2020-09-04 南通大学 Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
CN112269390A (en) * 2020-10-15 2021-01-26 北京理工大学 Small celestial body surface fixed-point attachment trajectory planning method considering bounce
CN112257557A (en) * 2020-10-20 2021-01-22 中国电子科技集团公司第五十八研究所 High-altitude parabolic detection and identification method and system based on machine vision
CN112818599A (en) * 2021-01-29 2021-05-18 四川大学 Air control method based on reinforcement learning and four-dimensional track

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Playing Atari with Deep Reinforcement Learning";Volodymyr Mnih et al;《arXiv》;20131219;全文 *
"基于深度强化学习的机械臂抓捕控制研究";黄伟伟;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20210115;全文 *
"深度强化学习综述";刘全等;《计算机学报》;20180131;第41卷(第1期);全文 *

Also Published As

Publication number Publication date
CN113393495A (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN113392935B (en) Multi-agent deep reinforcement learning strategy optimization method based on attention mechanism
JP6877630B2 (en) How and system to detect actions
CN109344725B (en) Multi-pedestrian online tracking method based on space-time attention mechanism
CN108222749B (en) Intelligent automatic door control method based on image analysis
CN113393495B (en) High-altitude parabolic track identification method based on reinforcement learning
CN111178183B (en) Face detection method and related device
Leibfried et al. A deep learning approach for joint video frame and reward prediction in atari games
US20140143183A1 (en) Hierarchical model for human activity recognition
Gao et al. Object tracking using firefly algorithm
CN112037263B (en) Surgical tool tracking system based on convolutional neural network and long-term and short-term memory network
CN110413838A (en) A kind of unsupervised video frequency abstract model and its method for building up
CN114241511B (en) Weak supervision pedestrian detection method, system, medium, equipment and processing terminal
CN110009060A (en) A kind of robustness long-term follow method based on correlation filtering and target detection
CN110287829A (en) A kind of video face identification method of combination depth Q study and attention model
CN112184767A (en) Method, device, equipment and storage medium for tracking moving object track
CN111626198A (en) Pedestrian motion detection method based on Body Pix in automatic driving scene
CN109544584B (en) Method and system for realizing inspection image stabilization precision measurement
CN108898221B (en) Joint learning method of characteristics and strategies based on state characteristics and subsequent characteristics
CN111160170B (en) Self-learning human behavior recognition and anomaly detection method
CN111833375B (en) Method and system for tracking animal group track
CN112418149A (en) Abnormal behavior detection method based on deep convolutional neural network
CN113033582B (en) Model training method, feature extraction method and device
KR102563346B1 (en) System for monitoring of structural and method ithereof
CN115331162A (en) Cross-scale infrared pedestrian detection method, system, medium, equipment and terminal
CN114913098A (en) Image processing hyper-parameter optimization method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant