CN115240056A - Sequence image scene recognition method based on deep learning - Google Patents

Sequence image scene recognition method based on deep learning Download PDF

Info

Publication number
CN115240056A
CN115240056A CN202210692229.0A CN202210692229A CN115240056A CN 115240056 A CN115240056 A CN 115240056A CN 202210692229 A CN202210692229 A CN 202210692229A CN 115240056 A CN115240056 A CN 115240056A
Authority
CN
China
Prior art keywords
current
image
probability distribution
sequence
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210692229.0A
Other languages
Chinese (zh)
Inventor
吴梦蝶
闫文耀
苗水清
张静
吴晓晖
柴荣军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Innovation College Of Yan'an University
Original Assignee
Xi'an Innovation College Of Yan'an University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Innovation College Of Yan'an University filed Critical Xi'an Innovation College Of Yan'an University
Priority to CN202210692229.0A priority Critical patent/CN115240056A/en
Publication of CN115240056A publication Critical patent/CN115240056A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a sequence image scene recognition method based on deep learning, which comprises the following steps: acquiring an image sequence of a current observation image and a map, and extracting a descriptor of the image; calculating a sequence cost value between the current observation image and the map according to the extracted descriptors; estimating a current speed value based on the sequence cost value, and determining a current likelihood probability distribution; determining the confidence of the current state based on the current likelihood probability distribution and the prediction confidence of the last state; determining a probability distribution of state transitions based on the current velocity value; based on the probability distribution of the state transition and the confidence of the current state, a prediction confidence of the next state is determined. According to the scene recognition scheme, the problem that a single picture has large matching error is solved through matching of the sequence images; the robot can realize the positioning function only through the sense of the visual sensor under the condition that the outdoor scene changes along with seasons, weather and time.

Description

Sequence image scene recognition method based on deep learning
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a sequence image scene recognition method based on deep learning.
Background
The image scene recognition means that the image is automatically processed and analyzed by using the visual information of the image, and specific scenes (such as kitchens, streets, mountains and the like) carried in the image are judged and recognized. The judgment of the scene in one image is not only helpful for understanding the whole semantic content of the image, but also provides a basis for identifying specific targets and events in the image, so that the scene identification plays an important role in the automatic image understanding of a computer. Scene recognition techniques can be applied to many practical problems, such as intelligent image management and retrieval.
For the identification of a changing scene, a single picture has a large error in matching the single picture, so that many identification methods are used for adjusting the identification method of the system to improve the robustness. Many machine learning-based methods are also used to learn how the scene changes; such a method is based on the assumption that the scene appearance changes with time at different locations are mutually reversible, i.e. generic and learnable, and such an assumption has been verified in some studies.
In the related art, in the current robot positioning method, the real-time positioning of the unmanned vehicle is mainly realized by means of a GPS + LiDAR method, the cost is high, and the amount of acquired information is not as large as that of vision. The difficulty of positioning by scene recognition is that there is no mature solution for effectively recognizing a changing scene.
Disclosure of Invention
In order to overcome the problems in the related art at least to a certain extent, the application provides a sequential image scene recognition method based on deep learning.
According to a first aspect of embodiments of the present application, a method for identifying a sequence image scene based on deep learning is provided, including:
acquiring an image sequence of a current observation image and a map, and extracting a descriptor of the image;
calculating a sequence cost value between the current observation image and the map according to the extracted descriptor;
estimating a current speed value based on the sequence cost value and determining a current likelihood probability distribution;
determining the confidence of the current state based on the current likelihood probability distribution and the prediction confidence of the last state;
determining a probability distribution of state transitions based on a current speed value;
based on the probability distribution of the state transition and the confidence of the current state, a prediction confidence of the next state is determined.
Further, the extracting the descriptor of the image comprises:
processing the image through a convolutional neural network model, and taking the output of a specified layer as a descriptor; wherein the convolutional neural network model is a model trained from a scene-classified data set;
the descriptor is subjected to normalization processing.
Further, calculating a sequence cost value between the current observation image and the map, comprising:
the current set of observed images observed by the agent is recorded as: q = [ I = [ ] 1 I 2 ... I j ... I n ](ii) a The image sequence of the map is noted as: m = [ I ] 1 I 2 ... I i ... I m ];
Image I i And image I j The sequence cost values between are: image I i The descriptor and the image I j The value of the cosine distance between the descriptors of (a); wherein, I i ∈M,I j ∈Q。
Further, estimating a current velocity value based on the sequence cost value includes:
determining a search sequence with the length as the preset length by taking the current time as a reference;
summing the sequence cost values of the whole search sequence to serve as the matching degree between the current observation image and the map;
selecting the search sequence z with the highest matching degree by adjusting the slope v of the search sequence T
Will search for sequence z T The corresponding slope v is determined as the current velocity value of the agent.
Further, the determining a current likelihood probability distribution includes:
to search for a sequence z T Selecting N adjacent candidate sequences as a reference; wherein N is a preset value;
calculating the probability distribution of the N candidate sequences;
assuming that the obtained probability distribution is a sample of gaussian distribution, estimating parameters of the gaussian distribution based on the obtained probability distribution, and taking the estimated gaussian distribution as the current likelihood probability distribution.
Further, determining the confidence of the current state based on the current likelihood probability distribution and the prediction confidence of the previous state specifically includes:
based on the current likelihood probability distribution, the prediction confidence of the last state is calculated recursively by a Kalman filtering method, and the confidence of the current state is obtained.
Further, after determining the confidence of the current state, the method further includes:
and determining the recognition result of the current observed image according to the confidence coefficient of the current state.
Further, determining a probability distribution of the state transition based on the current velocity value includes:
assuming that the probability distribution of the state transition is Gaussian distribution;
and solving a kinetic equation describing the state transition and an equation of a prediction process based on the current speed value to obtain the probability distribution of the state transition.
Further, determining a confidence of the prediction of the next state based on the probability distribution of the state transition and the confidence of the current state, comprising:
based on the probability distribution of state transition, the confidence coefficient of the current state is recursively calculated through a Kalman filtering method, and the prediction confidence coefficient of the next state is obtained.
According to a second aspect of the embodiments of the present application, there is provided a sequential image scene recognition apparatus based on deep learning, including:
the acquisition module acquires the current observation image and the image sequence of the map;
the extraction module is used for extracting the descriptor of the image;
the cost value calculation module is used for calculating the sequence cost value between the current observation image and the map according to the extracted descriptor;
the estimation module is used for estimating a current speed value based on the sequence cost value;
a first probability determination module for determining a current likelihood probability distribution;
the updating module is used for determining the confidence coefficient of the current state based on the current likelihood probability distribution and the prediction confidence coefficient of the last state;
the second probability determination module is used for determining the probability distribution of the state transition based on the current speed value;
and the prediction module is used for determining the prediction confidence coefficient of the next state based on the probability distribution of the state transition and the confidence coefficient of the current state.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
the invention provides a sequence image scene recognition scheme based on deep learning, which solves the problem of large matching error of a single picture through matching between sequence images; under the condition that outdoor scenes change along with seasons, weather and time, the robot can sense the surrounding environment only through the visual sensor, the positioning function is realized, and a foundation is laid for positioning navigation based on scene recognition.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the principles of the application.
Fig. 1 is a flowchart illustrating a method for scene recognition of a sequence image based on deep learning according to an exemplary embodiment.
Fig. 2 is a flow chart of a system according to an embodiment of the present application.
Fig. 3 is a schematic diagram illustrating sequence cost matching according to an embodiment of the present application.
Fig. 4 is a schematic diagram of a hidden markov chain according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.
To further detail the technical solution of the present application, the state of the art is first introduced.
The scene recognition problem is a very challenging problem. Whether for human beings, animals, robots and unmanned vehicles, scene recognition is to be realized, some basic conditions are needed: firstly, a 'map' which can be referred to is needed, wherein the map is not a map in daily life, a point cloud map in unmanned driving, but a continuous picture sequence or pictures of sporadic positions of a plurality of landmarks and the like, represents a priori information, is a fact known in advance, and is a basis for identification; secondly, the system should be able to convert the current picture into a corresponding positioning confidence that indicates whether the picture is in our map, and if so, further which location.
Thus, the scene recognition problem can be divided into three parts: firstly, a good scene description method needs to be obtained to abstract the rich and huge two-dimensional image information of a scene into simpler mathematical description, namely a descriptor; secondly, the scene information needs to be classified into map information, that is, a geometric logical relationship between scenes needs to be established; third, an effective method is needed to convert the current scene state of the agent into the positioning confidence of the scene in the map.
The current scene recognition method can be divided into extracting a local descriptor and extracting a global descriptor.
(1) A local descriptor. Many studies focus on traditional local feature descriptors, such as SIFT, SURF, etc., to explore their robustness in changing scenes. Furgalle, barfoot et al indicate in their studies that the conventional SURF descriptor is not robust in the case of illumination variations. Valgren, lilienthal et al indicated in the study that the various variants of SURF and SIFT were not robust under light, cloud, and seasonal conditions.
(2) A global descriptor. In addition to local descriptors, many studies are also investigating other scene description forms. For example, seqSLAM and its variants use a description of the entire picture to form robustness. Convolutional neural networks also play a significant role here. Many studies show that convolutional neural networks can learn more generalized features, which can be transformed in many related visual tasks, that is, descriptors trained on objects and scene classification can also be used in scene recognition tasks. The studies by Sunderhauf et al indicate that descriptors generated by the middle layers of the convolutional neural network, such as convolutional layers, are robust to changes in the appearance of the scene, while higher layers, such as fully-connected layers, are highly robust to changes in the viewing angle. In the study of Grag et al, it is proposed that normalization of the descriptors of the fully connected layer can significantly improve the robustness to appearance changes. Dai et al compared CNN descriptors to conventional descriptors and found that CNN descriptors are more robust through experiments on datasets with different viewing angles and lighting conditions.
Fig. 1 is a flowchart illustrating a method for scene recognition of a sequence image based on deep learning according to an exemplary embodiment. The method may comprise the steps of:
s1, acquiring an image sequence of a current observation image and a map, and extracting a descriptor of the image; wherein, the 'map' comprises a continuous picture sequence or a plurality of pictures of landmark positions;
s2, calculating the sequence cost value between the current observation image and the map according to the extracted descriptors;
s3, estimating a current speed value based on the sequence cost value, and determining current likelihood probability distribution;
s4, determining the confidence coefficient of the current state based on the current likelihood probability distribution and the prediction confidence coefficient of the previous state;
s5, determining probability distribution of state transition based on the current speed value;
and S6, determining the prediction confidence coefficient of the next state based on the probability distribution of the state transition and the confidence coefficient of the current state.
The invention provides a sequence image scene recognition scheme based on deep learning, which solves the problem of large matching error of a single picture through matching between sequence images; under the condition that outdoor scenes change along with seasons, weather and time, the robot can sense the surrounding environment only through the visual sensor, the positioning function is realized, and a foundation is laid for positioning navigation based on scene recognition.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
The flow of the identification method is shown in FIG. 2, x t Representing the current state of the agent, v t For the current speed of the agent, M represents the sequence of map pictures, bel, referenced by the agent t And
Figure BDA0003700508360000071
and representing the confidence of the agent at the moment t to the current time and the confidence of the agent at the predicted moment t +1, and continuously obtaining the latest recognition result in a recursive mode.
1) Firstly, obtaining a descriptor of a current image through a convolutional neural network, calculating a sequence cost value of the current image through observation image information of a current agent and an image in a map, and estimating a current speed value by using the sequence cost
Figure BDA0003700508360000072
Obtaining an observed likelihood probability distribution p (z) t |x t ,M);
2) Second, using the observed likelihood probability distribution p (z) t |x t M) and confidence bel of last stage prediction pred (x t ) To update the current probability distribution and obtain the confidence bel (x) of the current state t ) (Algorithm 3-1, lines 18-20);
3) Third, the current speed obtained by estimation
Figure BDA0003700508360000073
To obtain a probability distribution of state transitions
Figure BDA0003700508360000074
And by its setting with the current stateConfidence bel (x) t ) Prediction confidence bel for predicting the next stage state pred (x t+1 )。
According to the method, a set of scene description based on deep learning is designed by imitating a mechanism that humans can obtain the most basic scene information for navigation by abstracting and understanding complex scenes, and robust scene recognition is completed by combining reasoning and decision, and positioning navigation is realized based on the scene recognition method.
The algorithm of the present invention is specifically shown in the following table:
Figure BDA0003700508360000075
Figure BDA0003700508360000081
the following describes each part of the algorithm proposed by the present invention in detail with reference to specific embodiments.
1. Normalization of convolutional neural network image descriptions
Step S1 is first executed to acquire an image sequence of a current observation image and a map, and extract a descriptor of the image. In some embodiments, the step of extracting the descriptor comprises: processing the image through a convolutional neural network model, and taking the output of a specified layer as a descriptor; the descriptor is subjected to normalization processing. Wherein the convolutional neural network model is a model trained from a scene-classified data set.
A classical model of a convolutional neural network, such as AlexNet, is trained with a scene-classified dataset, such as Place365, and its outputs at certain layers are used as a description of an image, such as the fc6 layer containing 4096 variables. And a descriptor normalization method is adopted, so that the robustness of the full-connection layer descriptor to appearance change is obviously improved:
Figure BDA0003700508360000082
wherein,
Figure BDA0003700508360000083
represents the output of the fc6 level feature descriptor of image i, and
Figure BDA0003700508360000084
and
Figure BDA0003700508360000085
it represents the mean and variance of the image descriptors in the entire map database. As a result of
Figure BDA0003700508360000086
Then it is a Normalized Descriptor, called Normalized Set of Descriptors (NSD). Such a normalized depth feature descriptor is also used herein as a scene description method of the method.
2. Sequence cost matching and related probability model construction
With a robust scene description, it should also be analyzed: how to effectively use descriptor information to achieve robust matching. In order to improve the matching accuracy, the matching of only one picture should not be considered, and for one scene, the scenes of some positions before the picture should be comprehensively considered.
Step S2 is next performed to calculate a sequence cost value between the current observation image and the map from the extracted descriptor. The step S2 specifically includes:
the current set of observed images observed by the agent is recorded as: q = [ I = [ ] 1 I 2 ... I j ... I n ](ii) a The image sequence of the map is noted as: m = [ I ] 1 I 2 ... I i ... I m ];
Image I i And image I j The sequence cost values between are: image I i The descriptor and the image I j The value of the cosine distance between the descriptors of (a); wherein, I i ∈M,I j ∈Q。
2.1 construction of cost matrix
Firstly, a cost matrix D is required to be constructed, and the cost matrix represents the matching degree between the view angle of the intelligent agent and the map picture.
Wherein, the picture set in the view angle of the agent is recorded as Q, Q = [ I = 1 I 2 ... I j ... I n ](ii) a The picture set in the map is marked as M, M = [ I = [ I ] 1 I 2 ... I i ... I m ]。
The cost matrix is constructed as shown in equation 2:
Figure BDA0003700508360000091
here, element D of the matrix i,j For image I in a map i Image I under E M descriptor and intelligent view angle j e.Q describes the value of the cosine distance of the sub-. The cosine distance is used to describe the similarity between two eigenvectors and is not affected by the difference of vector mode length, and the calculation formula 3 is shown as follows:
Figure BDA0003700508360000101
wherein,
Figure BDA0003700508360000102
is picture I i ∈M,I j E.q normalized fc6 descriptor.
2.2 sequence matching and velocity estimation
Step S3 is performed next, and the current velocity value is estimated based on the sequence cost value. In some embodiments, estimating the current speed value based on the sequence cost value specifically includes: determining a search sequence with the length as the preset length by taking the current time as a reference; summing the sequence cost values of the whole search sequence to serve as the matching degree between the current observation image and the map; selecting the search sequence with the highest matching degree by adjusting the slope v of the search sequenceSearch sequence z of T (ii) a Will search for sequence z T The corresponding slope v is determined as the current velocity value of the agent.
Sequence matching requires searching within a sequence to obtain the best match result. A search space M is introduced here:
Figure BDA0003700508360000103
here, T represents the timestamp of the current time, and a search sequence with length ds is searched to integrate the matching degree of the ds pictures, thereby improving the robustness of recognition. And the degree of matching is measured by a sequence cost value:
Figure BDA0003700508360000104
wherein k = j + v (T-T). (5)
This sequence cost is determined by three parameters: the current time point T, the selected frame j in the map, and the matching slope v, as shown in fig. 3.
Therefore, the cosine distance of the descriptor is used to represent the matching degree of the two images, and the cosine distance in the time period is summed to represent the matching degree between the current picture and the map. And selecting the result with the minimum cosine distance and the highest matching degree by adjusting the slope v. The slope v corresponding to the best result, i.e. the estimated current agent speed according to the present scheme, is shown in equation (6). A range V = [ V ] is set for the speed V here min ,v max ]A limit is imposed on this optimal estimate to prevent unrealistic value generation.
Figure BDA0003700508360000111
In the formula, D k,t Representing the cosine distance between the image obtained by the agent at time t and image k in the map, S T,j The image and the ground acquired by the agent at the current moment TThe matching cost value of image j and its previous sequence in the map is used to measure how similar the current location is to the location in the map. After obtaining the best matching result by changing the value of v, the obtained
Figure BDA0003700508360000112
The present solution estimates the current agent speed, and uses this estimation as a basis to predict the agent position at the next time T + 1. Whereby the best matching result is
Figure BDA0003700508360000113
As an output of the observation at time T. Here, the scheme uses a probability distribution of state transitions
Figure BDA0003700508360000114
To represent an estimate of the next time instant. An important assumption is made here by the present scheme, assuming that this probability distribution conforms to a Gaussian distribution, i.e., p (x) T+1 |x T ,v esti ,M)~N(μ,σ)。
2.3 obtaining probability distributions
Step S3 is performed next to determine the current likelihood probability distribution. In some embodiments, determining the current likelihood probability distribution specifically includes: to search for a sequence z T Selecting N adjacent candidate sequences as a reference; wherein N is a preset value; calculating the probability distribution of the N candidate sequences; assuming that the obtained probability distribution is a sample of gaussian distribution, estimating parameters of the gaussian distribution based on the obtained probability distribution, and taking the estimated gaussian distribution as the current likelihood probability distribution.
After obtaining the matching result of the sequence, how to convert this value into a probability distribution, i.e. a likelihood distribution p (z) T |x T M) to facilitate the approach of using probability-based markov positioning. The SoftMax function is used here to map these values to a range of [0,1]As shown in equation (7):
Figure BDA0003700508360000115
where S is T,k Image I acquired for agent at time T T And image I in map j And matching the obtained sequence cost value. In order to simplify the operation, the scheme adopts a sliding window W = [ zT-N/2, zT + N/2 with the length of N]Calculating the probability distribution in N candidate frames near the optimal matching result:
Figure BDA0003700508360000121
after converting the cost value into a probability, the present solution further assumes that this probability conforms to a Gaussian distribution, i.e., p (z) T |x T M) to N (. Mu.o,. Sigma.o). This gaussian distribution is estimated using the most computationally efficient least squares method. Assuming that the resulting probability distribution is a sample of a Gaussian distribution, use (x) i ,y i ) (i =1,2, ·, n), then there is the following relationship:
Figure BDA0003700508360000122
here, gaussian distribution has a particularly good characteristic, that is, it is an exponential function of e, and it can be logarithmized to convert the form of exponent into linear relation, and then solved as a linear equation set, so that the estimation process of the scheme can be greatly simplified:
Figure BDA0003700508360000123
here, a variable substitution is made, let ln (y) i )=z i
Figure BDA0003700508360000124
Figure BDA0003700508360000125
Then publicThe equation becomes the following linear equation form:
Figure BDA0003700508360000126
this linear equation can obtain an estimate of B according to the least squares method:
B=(X T X) -1 X T Z (12)
thus, the mean μ and variance σ as well as the parameter θ can be obtained from the matrix B.
3. Kalman filtering-markov positioning under gaussian distribution
Some reasoning behaviors of human beings in the positioning process are modeled by markov positioning. In the navigation and positioning process, a human not only estimates the current state through the current observation, but also combines the memory of the human, namely the prior information. Here, the scheme expresses this process in terms of a hidden markov chain, as shown in fig. 4.
The Markov positioning process is divided into two sub-processes of prediction and updating, the prediction process is to estimate the state of the agent at the next moment by the state of the last moment and a state transition model, and the updating is to update the confidence of the agent to the self state at the current moment by using the confidence of the predicted state and the observed likelihood probability distribution.
It is assumed here that the transition and observation of the state are both linear relations, and a Gaussian distributed noise, that is, a Gaussian linear system (Gaussian linear system) is added, so the present solution here adopts a kalman filtering method to recursively calculate the confidence of the agent for its own state.
3.1 prediction
Step S4 is performed next, and the confidence of the current state is determined based on the current likelihood probability distribution and the prediction confidence of the previous state. In some embodiments, step S4 specifically includes: based on the current likelihood probability distribution, the prediction confidence of the last state is calculated recursively by a Kalman filtering method, and the confidence of the current state is obtained.
First, state x of agent i Is represented by a map M = { Image = { [ Image ] i The index value i of the picture in (v) }, control command u of the agent i Is the velocity derived from the estimation
Figure BDA0003700508360000131
Described, then the kinetic equations describing the state transitions are described by the following equations:
Figure BDA0003700508360000132
this is a linear relationship and a gaussian noise contribution, and the prediction process is generally described by the following equation:
Figure BDA0003700508360000133
in the formula
Figure BDA0003700508360000134
For prediction confidence, corresponding to FIG. 2
Figure BDA0003700508360000135
As previously mentioned, the probability distribution of state transitions is described
Figure BDA0003700508360000136
Is a Gaussian distribution, and confidence bel (x) i-1 )~N(μ i-1i-1 ) Also a gaussian distribution, so that the scheme can be recursively calculated by using a kalman filtering method, where the predicted confidence bel (x) is i )~N(μ ii ) Also a gaussian distribution, whose parameters can be calculated using the following equation:
Figure BDA0003700508360000137
3.2 updating
Step S5 is performed next, and the probability distribution of the state transition is determined based on the current velocity value. In some embodiments, step S5 specifically includes: assuming that the probability distribution of the state transition is Gaussian distribution; and solving a kinetic equation describing the state transition and an equation describing the prediction process based on the current speed value to obtain the probability distribution of the state transition.
And finally, executing a step S6, and determining the prediction confidence of the next state based on the probability distribution of the state transition and the confidence of the current state. In some embodiments, step S6 specifically includes: and based on the probability distribution of state transition, carrying out recursive calculation on the confidence coefficient of the current state by a Kalman filtering method to obtain the prediction confidence coefficient of the next state.
The problem to be solved by the confidence update can be described by the following equation:
Figure BDA0003700508360000141
the likelihood probability distribution p (z) observed by the agent as introduced in the previous section i |x i ,M)~N(μ oo ) It is also a gaussian probability distribution, and its variance is estimated by the least square method after mapping the sequence cost value to the probability by the SoftMax equation. The observation model can then be described by the following equation:
z i =x ii ,δ i ~M(0,σ o ) (17)
here, the observed state is also a Gaussian linear system, since the scheme defines the observed result as
Figure BDA0003700508360000142
This is a result of gaussian noise being added, so the present solution cannot take the mean value to which the observed result is fitted as the mean value of the probability distribution, which is meaningless. In thatHere, the fitted variance is an important indicator because the variance represents the trustworthiness of the observation, and if my observation is very trustworthy, the probability distribution of the observation should be very concentrated, and therefore the variance is smaller, and if the observation is untrustworthy, the probability distribution of the observation is more dispersed, and the variance is larger. The variance resulting from this fit therefore largely describes the confidence level of the match. However, it is far from sufficient to rely on the observation result, because the solution cannot only consider the situation near the best observation result, but also consider the jump situation of the observation, and if the observation is too different from the last observation or the predicted result, the observation is considered to be unreliable. Based on the above discussion, therefore, the present scheme designs the observed variance as follows:
Figure BDA0003700508360000143
here, the square of the difference between the predicted result and the observed result is used as a penalty term (penalty factor) to evaluate whether the observation is reliable, so as to prevent the condition that the observation generates jump from influencing the final recognition result. Since the likelihood probability distribution obtained by the observation is assumed to be Gaussian distribution, the updated confidence coefficient parameter mu can be calculated according to Kalman filtering i And σ i
Figure BDA0003700508360000151
After the step S4 is executed to determine the confidence of the current state, the following steps are also executed: and determining the recognition result of the current observed image according to the confidence coefficient of the current state. Thus, the final agent's current image I i Is identified as I μi ∈M。
In summary, the key points of the present invention are: 1. normalizing the deep learning descriptor and the database image based on the continuous sequence image to obtain a cost matrix; 2. a sequence cost matching definition mode; 3. a method of estimating robot velocity based on cosine distance; 4. a way of probabilistic matching of image features to similarities; 5. according to the probabilistic statistical characteristics, completing the Markov prediction and updating process of simulating human memory and reasoning; 6. in order to deal with unstable conditions and other unexpected conditions which may occur in the descriptors, the design of the variance penalty term is predicted.
The embodiment of the present application further provides a device for recognizing a sequence image scene based on deep learning, where the device includes:
the acquisition module acquires the current observation image and the image sequence of the map;
the extraction module is used for extracting the descriptor of the image;
the cost value calculation module is used for calculating the sequence cost value between the current observation image and the map according to the extracted descriptor;
the estimation module is used for estimating a current speed value based on the sequence cost value;
a first probability determination module for determining a current likelihood probability distribution;
the updating module is used for determining the confidence coefficient of the current state based on the current likelihood probability distribution and the prediction confidence coefficient of the previous state;
the second probability determination module is used for determining the probability distribution of the state transition based on the current speed value;
and the prediction module is used for determining the prediction confidence coefficient of the next state based on the probability distribution of the state transition and the confidence coefficient of the current state.
In some embodiments, the extraction module is specifically configured to: processing the image through a convolutional neural network model, and taking the output of a specified layer as a descriptor; the convolutional neural network model is a model trained through a scene classification data set; the descriptor is subjected to normalization processing.
In some embodiments, the cost value calculation module is specifically configured to:
recording a current observation image set observed by an agentComprises the following steps: q = [ I = [ ] 1 I 2 ... I j ... I n ](ii) a The image sequence of the map is noted as: m = [ I ] 1 I 2 ... I i ... I m ];
Image I i And image I j The sequence cost values between are: image I i The descriptor and the image I j The value of the cosine distance between the descriptors of (a); wherein, I i ∈M,I j ∈Q。
In some embodiments, the estimation module is specifically configured to: determining a search sequence with the length as the preset length by taking the current time as a reference; summing the sequence cost values of the whole search sequence to serve as the matching degree between the current observation image and the map; selecting a search sequence z with the highest matching degree by adjusting the slope v of the search sequence T (ii) a Will search for sequence z T The corresponding slope v is determined as the current velocity value of the agent.
In some embodiments, the first probability determination module is specifically configured to: to search for a sequence z T Selecting N adjacent candidate sequences as a reference; wherein N is a preset value; calculating the probability distribution of the N candidate sequences; assuming that the obtained probability distribution is a sample of gaussian distribution, estimating parameters of the gaussian distribution based on the obtained probability distribution, and taking the estimated gaussian distribution as the current likelihood probability distribution.
In some embodiments, the update module is specifically configured to: based on the current likelihood probability distribution, the prediction confidence of the last state is calculated recursively by a Kalman filtering method, and the confidence of the current state is obtained.
In some embodiments, the scene recognition apparatus further includes a determining module, specifically configured to: after the confidence of the current state is determined, the recognition result of the current observed image is determined according to the confidence of the current state.
In some embodiments, the second probability determination module is specifically configured to: assuming that the probability distribution of the state transition is Gaussian distribution; and solving a kinetic equation describing the state transition and an equation describing the prediction process based on the current speed value to obtain the probability distribution of the state transition.
In some embodiments, the prediction module is specifically configured to: based on the probability distribution of state transition, the confidence coefficient of the current state is recursively calculated through a Kalman filtering method, and the prediction confidence coefficient of the next state is obtained.
With regard to the apparatus in the above embodiment, the specific steps in which the respective modules perform operations have been described in detail in the embodiment related to the method, and are not described in detail herein. The modules in the scene recognition device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A sequence image scene recognition method based on deep learning is characterized by comprising the following steps:
acquiring an image sequence of a current observation image and a map, and extracting a descriptor of the image;
calculating a sequence cost value between the current observation image and the map according to the extracted descriptors;
estimating a current speed value based on the sequence cost value and determining a current likelihood probability distribution;
determining the confidence of the current state based on the current likelihood probability distribution and the prediction confidence of the last state;
determining a probability distribution of state transitions based on the current velocity value;
based on the probability distribution of the state transition and the confidence of the current state, a prediction confidence of the next state is determined.
2. The method of claim 1, wherein extracting the descriptor of the image comprises:
processing the image through a convolutional neural network model, and taking the output of a specified layer as a descriptor; wherein the convolutional neural network model is a model trained from a scene-classified data set;
the descriptor is subjected to normalization processing.
3. The method of claim 1, wherein calculating a sequence cost value between a current observation image and a map comprises:
the current set of observed images observed by the agent is recorded as: q = [ I = [ ] 1 I 2 ...I j ...I n ](ii) a The image sequence of the map is noted as: m = [ I ] 1 I 2 ...I i ...I m ];
Image I i And image I j The sequence cost values between are: image I i The descriptor and the image I j The value of the cosine distance between the descriptors of (a); wherein, I i ∈M,I j ∈Q。
4. A method according to any of claims 1-3, wherein estimating a current velocity value based on the sequence cost value comprises:
determining a search sequence with the length as the preset length by taking the current time as a reference;
summing the sequence cost values of the whole search sequence to serve as the matching degree between the current observation image and the map;
selecting the search sequence z with the highest matching degree by adjusting the slope v of the search sequence T
Will search for sequence z T The corresponding slope v is determined as the current velocity value of the agent.
5. The method of claim 4, wherein determining the current likelihood probability distribution comprises:
to search for a sequence z T Selecting N adjacent candidate sequences as a reference; wherein N is a preset value;
calculating probability distribution of N candidate sequences;
assuming that the obtained probability distribution is a sample of gaussian distribution, estimating parameters of the gaussian distribution based on the obtained probability distribution, and taking the estimated gaussian distribution as the current likelihood probability distribution.
6. The method of claim 5, wherein determining the confidence level of the current state based on the current likelihood probability distribution and the prediction confidence level of the previous state comprises:
based on the current likelihood probability distribution, the prediction confidence of the last state is calculated recursively by a Kalman filtering method, and the confidence of the current state is obtained.
7. The method of claim 6, wherein determining the confidence level for the current state further comprises:
and determining the recognition result of the current observed image according to the confidence coefficient of the current state.
8. A method according to any of claims 1-3, wherein determining a probability distribution for a state transition based on a current speed value comprises:
assuming that the probability distribution of the state transition is Gaussian distribution;
and solving a kinetic equation describing the state transition and an equation of a prediction process based on the current speed value to obtain the probability distribution of the state transition.
9. The method of claim 8, wherein determining the confidence of the prediction for the next state based on the probability distribution of the state transition and the confidence of the current state comprises:
based on the probability distribution of state transition, the confidence coefficient of the current state is recursively calculated through a Kalman filtering method, and the prediction confidence coefficient of the next state is obtained.
10. A sequence image scene recognition apparatus based on deep learning, comprising:
the acquisition module acquires the current observation image and the image sequence of the map;
the extraction module is used for extracting the descriptor of the image;
the cost value calculation module is used for calculating the sequence cost value between the current observation image and the map according to the extracted descriptor;
the estimation module is used for estimating a current speed value based on the sequence cost value;
a first probability determination module for determining a current likelihood probability distribution;
the updating module is used for determining the confidence coefficient of the current state based on the current likelihood probability distribution and the prediction confidence coefficient of the previous state;
the second probability determination module is used for determining the probability distribution of the state transition based on the current speed value;
and the prediction module is used for determining the prediction confidence coefficient of the next state based on the probability distribution of the state transition and the confidence coefficient of the current state.
CN202210692229.0A 2022-06-17 2022-06-17 Sequence image scene recognition method based on deep learning Pending CN115240056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210692229.0A CN115240056A (en) 2022-06-17 2022-06-17 Sequence image scene recognition method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210692229.0A CN115240056A (en) 2022-06-17 2022-06-17 Sequence image scene recognition method based on deep learning

Publications (1)

Publication Number Publication Date
CN115240056A true CN115240056A (en) 2022-10-25

Family

ID=83669599

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210692229.0A Pending CN115240056A (en) 2022-06-17 2022-06-17 Sequence image scene recognition method based on deep learning

Country Status (1)

Country Link
CN (1) CN115240056A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115744526A (en) * 2023-01-06 2023-03-07 常熟理工学院 Elevator brake action state detection method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115744526A (en) * 2023-01-06 2023-03-07 常熟理工学院 Elevator brake action state detection method and system

Similar Documents

Publication Publication Date Title
Hausler et al. Multi-process fusion: Visual place recognition using multiple image processing methods
CN110555390B (en) Pedestrian re-identification method, device and medium based on semi-supervised training mode
US9619561B2 (en) Change invariant scene recognition by an agent
Kwon et al. Visual graph memory with unsupervised representation for visual navigation
JP7131994B2 (en) Self-position estimation device, self-position estimation method, self-position estimation program, learning device, learning method and learning program
US20090110236A1 (en) Method And System For Object Detection And Tracking
CN110969648B (en) 3D target tracking method and system based on point cloud sequence data
CN112489081B (en) Visual target tracking method and device
CN112596064B (en) Laser and vision integrated global positioning method for indoor robot
JP6867054B2 (en) A learning method and a learning device for improving segmentation performance used for detecting a road user event by utilizing a double embedding configuration in a multi-camera system, and a testing method and a testing device using the learning method and a learning device. {LEARNING METHOD AND LEARNING DEVICE FOR IMPROVING SEGMENTATION PERFORMANCE TO BE USED FOR DETECTING ROAD USER EVENTS USING DOUBLE EMBEDDING CONFIGURATION IN MULTI-CAMERA SYSTEM AND TESTING METHOD AND TESTING DEVICE USING THE SAME}
CN110009060B (en) Robustness long-term tracking method based on correlation filtering and target detection
JP6962605B2 (en) A method for providing an object detection system capable of updating the types of detectable classes in real time using continuous learning, and a device using the same.
CN108830171A (en) A kind of Intelligent logistics warehouse guide line visible detection method based on deep learning
KR20220074782A (en) Method and device for simultaneous localization and mapping (slam)
CN115240056A (en) Sequence image scene recognition method based on deep learning
CN111783716A (en) Pedestrian detection method, system and device based on attitude information
CN112733971B (en) Pose determination method, device and equipment of scanning equipment and storage medium
Tsintotas et al. The revisiting problem in simultaneous localization and mapping
Garcia-Fidalgo et al. Probabilistic appearance-based mapping and localization using visual features
CN116958057A (en) Strategy-guided visual loop detection method
Leung et al. Evaluating set measurement likelihoods in random-finite-set slam
CN114332716B (en) Clustering method and device for scenes in video, electronic equipment and storage medium
CN115880332A (en) Target tracking method for low-altitude aircraft visual angle
CN114926652A (en) Twin tracking method and system based on interactive and convergent feature optimization
CN115527083A (en) Image annotation method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination