CN115240056A

CN115240056A - Sequence image scene recognition method based on deep learning

Info

Publication number: CN115240056A
Application number: CN202210692229.0A
Authority: CN
Inventors: 吴梦蝶; 闫文耀; 苗水清; 张静; 吴晓晖; 柴荣军
Original assignee: Xi'an Innovation College Of Yan'an University
Current assignee: Xi'an Innovation College Of Yan'an University
Priority date: 2022-06-17
Filing date: 2022-06-17
Publication date: 2022-10-25

Abstract

The application relates to a sequence image scene recognition method based on deep learning, which comprises the following steps: acquiring an image sequence of a current observation image and a map, and extracting a descriptor of the image; calculating a sequence cost value between the current observation image and the map according to the extracted descriptors; estimating a current speed value based on the sequence cost value, and determining a current likelihood probability distribution; determining the confidence of the current state based on the current likelihood probability distribution and the prediction confidence of the last state; determining a probability distribution of state transitions based on the current velocity value; based on the probability distribution of the state transition and the confidence of the current state, a prediction confidence of the next state is determined. According to the scene recognition scheme, the problem that a single picture has large matching error is solved through matching of the sequence images; the robot can realize the positioning function only through the sense of the visual sensor under the condition that the outdoor scene changes along with seasons, weather and time.

Description

Sequence image scene recognition method based on deep learning

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a sequence image scene recognition method based on deep learning.

Background

The image scene recognition means that the image is automatically processed and analyzed by using the visual information of the image, and specific scenes (such as kitchens, streets, mountains and the like) carried in the image are judged and recognized. The judgment of the scene in one image is not only helpful for understanding the whole semantic content of the image, but also provides a basis for identifying specific targets and events in the image, so that the scene identification plays an important role in the automatic image understanding of a computer. Scene recognition techniques can be applied to many practical problems, such as intelligent image management and retrieval.

For the identification of a changing scene, a single picture has a large error in matching the single picture, so that many identification methods are used for adjusting the identification method of the system to improve the robustness. Many machine learning-based methods are also used to learn how the scene changes; such a method is based on the assumption that the scene appearance changes with time at different locations are mutually reversible, i.e. generic and learnable, and such an assumption has been verified in some studies.

In the related art, in the current robot positioning method, the real-time positioning of the unmanned vehicle is mainly realized by means of a GPS + LiDAR method, the cost is high, and the amount of acquired information is not as large as that of vision. The difficulty of positioning by scene recognition is that there is no mature solution for effectively recognizing a changing scene.

Disclosure of Invention

In order to overcome the problems in the related art at least to a certain extent, the application provides a sequential image scene recognition method based on deep learning.

According to a first aspect of embodiments of the present application, a method for identifying a sequence image scene based on deep learning is provided, including:

acquiring an image sequence of a current observation image and a map, and extracting a descriptor of the image;

calculating a sequence cost value between the current observation image and the map according to the extracted descriptor;

estimating a current speed value based on the sequence cost value and determining a current likelihood probability distribution;

determining the confidence of the current state based on the current likelihood probability distribution and the prediction confidence of the last state;

determining a probability distribution of state transitions based on a current speed value;

based on the probability distribution of the state transition and the confidence of the current state, a prediction confidence of the next state is determined.

Further, the extracting the descriptor of the image comprises:

processing the image through a convolutional neural network model, and taking the output of a specified layer as a descriptor; wherein the convolutional neural network model is a model trained from a scene-classified data set;

the descriptor is subjected to normalization processing.

Further, calculating a sequence cost value between the current observation image and the map, comprising:

the current set of observed images observed by the agent is recorded as: q = [ I = [ ] ₁ I ₂ ... I _j ... I _n ](ii) a The image sequence of the map is noted as: m = [ I ] ₁ I ₂ ... I _i ... I _m ]；

Image I _i And image I _j The sequence cost values between are: image I _i The descriptor and the image I _j The value of the cosine distance between the descriptors of (a); wherein, I _i ∈M，I _j ∈Q。

Further, estimating a current velocity value based on the sequence cost value includes:

determining a search sequence with the length as the preset length by taking the current time as a reference;

summing the sequence cost values of the whole search sequence to serve as the matching degree between the current observation image and the map;

selecting the search sequence z with the highest matching degree by adjusting the slope v of the search sequence _T ；

Will search for sequence z _T The corresponding slope v is determined as the current velocity value of the agent.

Further, the determining a current likelihood probability distribution includes:

to search for a sequence z _T Selecting N adjacent candidate sequences as a reference; wherein N is a preset value;

calculating the probability distribution of the N candidate sequences;

assuming that the obtained probability distribution is a sample of gaussian distribution, estimating parameters of the gaussian distribution based on the obtained probability distribution, and taking the estimated gaussian distribution as the current likelihood probability distribution.

Further, determining the confidence of the current state based on the current likelihood probability distribution and the prediction confidence of the previous state specifically includes:

based on the current likelihood probability distribution, the prediction confidence of the last state is calculated recursively by a Kalman filtering method, and the confidence of the current state is obtained.

Further, after determining the confidence of the current state, the method further includes:

and determining the recognition result of the current observed image according to the confidence coefficient of the current state.

Further, determining a probability distribution of the state transition based on the current velocity value includes:

assuming that the probability distribution of the state transition is Gaussian distribution;

and solving a kinetic equation describing the state transition and an equation of a prediction process based on the current speed value to obtain the probability distribution of the state transition.

Further, determining a confidence of the prediction of the next state based on the probability distribution of the state transition and the confidence of the current state, comprising:

based on the probability distribution of state transition, the confidence coefficient of the current state is recursively calculated through a Kalman filtering method, and the prediction confidence coefficient of the next state is obtained.

According to a second aspect of the embodiments of the present application, there is provided a sequential image scene recognition apparatus based on deep learning, including:

the acquisition module acquires the current observation image and the image sequence of the map;

the extraction module is used for extracting the descriptor of the image;

the cost value calculation module is used for calculating the sequence cost value between the current observation image and the map according to the extracted descriptor;

the estimation module is used for estimating a current speed value based on the sequence cost value;

a first probability determination module for determining a current likelihood probability distribution;

the updating module is used for determining the confidence coefficient of the current state based on the current likelihood probability distribution and the prediction confidence coefficient of the last state;

the second probability determination module is used for determining the probability distribution of the state transition based on the current speed value;

and the prediction module is used for determining the prediction confidence coefficient of the next state based on the probability distribution of the state transition and the confidence coefficient of the current state.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

the invention provides a sequence image scene recognition scheme based on deep learning, which solves the problem of large matching error of a single picture through matching between sequence images; under the condition that outdoor scenes change along with seasons, weather and time, the robot can sense the surrounding environment only through the visual sensor, the positioning function is realized, and a foundation is laid for positioning navigation based on scene recognition.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart illustrating a method for scene recognition of a sequence image based on deep learning according to an exemplary embodiment.

Fig. 2 is a flow chart of a system according to an embodiment of the present application.

Fig. 3 is a schematic diagram illustrating sequence cost matching according to an embodiment of the present application.

Fig. 4 is a schematic diagram of a hidden markov chain according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and apparatus consistent with certain aspects of the present application, as detailed in the appended claims.

To further detail the technical solution of the present application, the state of the art is first introduced.

The scene recognition problem is a very challenging problem. Whether for human beings, animals, robots and unmanned vehicles, scene recognition is to be realized, some basic conditions are needed: firstly, a 'map' which can be referred to is needed, wherein the map is not a map in daily life, a point cloud map in unmanned driving, but a continuous picture sequence or pictures of sporadic positions of a plurality of landmarks and the like, represents a priori information, is a fact known in advance, and is a basis for identification; secondly, the system should be able to convert the current picture into a corresponding positioning confidence that indicates whether the picture is in our map, and if so, further which location.

Thus, the scene recognition problem can be divided into three parts: firstly, a good scene description method needs to be obtained to abstract the rich and huge two-dimensional image information of a scene into simpler mathematical description, namely a descriptor; secondly, the scene information needs to be classified into map information, that is, a geometric logical relationship between scenes needs to be established; third, an effective method is needed to convert the current scene state of the agent into the positioning confidence of the scene in the map.

The current scene recognition method can be divided into extracting a local descriptor and extracting a global descriptor.

(1) A local descriptor. Many studies focus on traditional local feature descriptors, such as SIFT, SURF, etc., to explore their robustness in changing scenes. Furgalle, barfoot et al indicate in their studies that the conventional SURF descriptor is not robust in the case of illumination variations. Valgren, lilienthal et al indicated in the study that the various variants of SURF and SIFT were not robust under light, cloud, and seasonal conditions.

(2) A global descriptor. In addition to local descriptors, many studies are also investigating other scene description forms. For example, seqSLAM and its variants use a description of the entire picture to form robustness. Convolutional neural networks also play a significant role here. Many studies show that convolutional neural networks can learn more generalized features, which can be transformed in many related visual tasks, that is, descriptors trained on objects and scene classification can also be used in scene recognition tasks. The studies by Sunderhauf et al indicate that descriptors generated by the middle layers of the convolutional neural network, such as convolutional layers, are robust to changes in the appearance of the scene, while higher layers, such as fully-connected layers, are highly robust to changes in the viewing angle. In the study of Grag et al, it is proposed that normalization of the descriptors of the fully connected layer can significantly improve the robustness to appearance changes. Dai et al compared CNN descriptors to conventional descriptors and found that CNN descriptors are more robust through experiments on datasets with different viewing angles and lighting conditions.

Fig. 1 is a flowchart illustrating a method for scene recognition of a sequence image based on deep learning according to an exemplary embodiment. The method may comprise the steps of:

s1, acquiring an image sequence of a current observation image and a map, and extracting a descriptor of the image; wherein, the 'map' comprises a continuous picture sequence or a plurality of pictures of landmark positions;

s2, calculating the sequence cost value between the current observation image and the map according to the extracted descriptors;

s3, estimating a current speed value based on the sequence cost value, and determining current likelihood probability distribution;

s4, determining the confidence coefficient of the current state based on the current likelihood probability distribution and the prediction confidence coefficient of the previous state;

s5, determining probability distribution of state transition based on the current speed value;

and S6, determining the prediction confidence coefficient of the next state based on the probability distribution of the state transition and the confidence coefficient of the current state.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

The flow of the identification method is shown in FIG. 2, x _t Representing the current state of the agent, v _t For the current speed of the agent, M represents the sequence of map pictures, bel, referenced by the agent _t And

and representing the confidence of the agent at the moment t to the current time and the confidence of the agent at the predicted moment t +1, and continuously obtaining the latest recognition result in a recursive mode.

1) Firstly, obtaining a descriptor of a current image through a convolutional neural network, calculating a sequence cost value of the current image through observation image information of a current agent and an image in a map, and estimating a current speed value by using the sequence cost

Obtaining an observed likelihood probability distribution p (z) _t |x _t ,M)；

2) Second, using the observed likelihood probability distribution p (z) _t |x _t M) and confidence bel of last stage prediction ^pred (x _t ) To update the current probability distribution and obtain the confidence bel (x) of the current state _t ) (Algorithm 3-1, lines 18-20);

3) Third, the current speed obtained by estimation

To obtain a probability distribution of state transitions

And by its setting with the current stateConfidence bel (x) _t ) Prediction confidence bel for predicting the next stage state ^pred (x _t+1 )。

According to the method, a set of scene description based on deep learning is designed by imitating a mechanism that humans can obtain the most basic scene information for navigation by abstracting and understanding complex scenes, and robust scene recognition is completed by combining reasoning and decision, and positioning navigation is realized based on the scene recognition method.

The algorithm of the present invention is specifically shown in the following table:

the following describes each part of the algorithm proposed by the present invention in detail with reference to specific embodiments.

1. Normalization of convolutional neural network image descriptions

Step S1 is first executed to acquire an image sequence of a current observation image and a map, and extract a descriptor of the image. In some embodiments, the step of extracting the descriptor comprises: processing the image through a convolutional neural network model, and taking the output of a specified layer as a descriptor; the descriptor is subjected to normalization processing. Wherein the convolutional neural network model is a model trained from a scene-classified data set.

A classical model of a convolutional neural network, such as AlexNet, is trained with a scene-classified dataset, such as Place365, and its outputs at certain layers are used as a description of an image, such as the fc6 layer containing 4096 variables. And a descriptor normalization method is adopted, so that the robustness of the full-connection layer descriptor to appearance change is obviously improved:

wherein,

represents the output of the fc6 level feature descriptor of image i, and

and

it represents the mean and variance of the image descriptors in the entire map database. As a result of

Then it is a Normalized Descriptor, called Normalized Set of Descriptors (NSD). Such a normalized depth feature descriptor is also used herein as a scene description method of the method.

2. Sequence cost matching and related probability model construction

With a robust scene description, it should also be analyzed: how to effectively use descriptor information to achieve robust matching. In order to improve the matching accuracy, the matching of only one picture should not be considered, and for one scene, the scenes of some positions before the picture should be comprehensively considered.

Step S2 is next performed to calculate a sequence cost value between the current observation image and the map from the extracted descriptor. The step S2 specifically includes:

2.1 construction of cost matrix

Firstly, a cost matrix D is required to be constructed, and the cost matrix represents the matching degree between the view angle of the intelligent agent and the map picture.

Wherein, the picture set in the view angle of the agent is recorded as Q, Q = [ I = ₁ I ₂ ... I _j ... I _n ](ii) a The picture set in the map is marked as M, M = [ I = [ I ] ₁ I ₂ ... I _i ... I _m ]。

The cost matrix is constructed as shown in equation 2:

here, element D of the matrix _i,j For image I in a map _i Image I under E M descriptor and intelligent view angle _j e.Q describes the value of the cosine distance of the sub-. The cosine distance is used to describe the similarity between two eigenvectors and is not affected by the difference of vector mode length, and the calculation formula 3 is shown as follows:

wherein,

is picture I _i ∈M,I _j E.q normalized fc6 descriptor.

2.2 sequence matching and velocity estimation

Step S3 is performed next, and the current velocity value is estimated based on the sequence cost value. In some embodiments, estimating the current speed value based on the sequence cost value specifically includes: determining a search sequence with the length as the preset length by taking the current time as a reference; summing the sequence cost values of the whole search sequence to serve as the matching degree between the current observation image and the map; selecting the search sequence with the highest matching degree by adjusting the slope v of the search sequenceSearch sequence z of _T (ii) a Will search for sequence z _T The corresponding slope v is determined as the current velocity value of the agent.

Sequence matching requires searching within a sequence to obtain the best match result. A search space M is introduced here:

here, T represents the timestamp of the current time, and a search sequence with length ds is searched to integrate the matching degree of the ds pictures, thereby improving the robustness of recognition. And the degree of matching is measured by a sequence cost value:

wherein k = j + v (T-T). (5)

This sequence cost is determined by three parameters: the current time point T, the selected frame j in the map, and the matching slope v, as shown in fig. 3.

Therefore, the cosine distance of the descriptor is used to represent the matching degree of the two images, and the cosine distance in the time period is summed to represent the matching degree between the current picture and the map. And selecting the result with the minimum cosine distance and the highest matching degree by adjusting the slope v. The slope v corresponding to the best result, i.e. the estimated current agent speed according to the present scheme, is shown in equation (6). A range V = [ V ] is set for the speed V here _min ,v _max ]A limit is imposed on this optimal estimate to prevent unrealistic value generation.

In the formula, D _k,t Representing the cosine distance between the image obtained by the agent at time t and image k in the map, S _T,j The image and the ground acquired by the agent at the current moment TThe matching cost value of image j and its previous sequence in the map is used to measure how similar the current location is to the location in the map. After obtaining the best matching result by changing the value of v, the obtained

The present solution estimates the current agent speed, and uses this estimation as a basis to predict the agent position at the next time T + 1. Whereby the best matching result is

As an output of the observation at time T. Here, the scheme uses a probability distribution of state transitions

To represent an estimate of the next time instant. An important assumption is made here by the present scheme, assuming that this probability distribution conforms to a Gaussian distribution, i.e., p (x) _T+1 |x _T ,v _esti ,M)～N(μ,σ)。

2.3 obtaining probability distributions

Step S3 is performed next to determine the current likelihood probability distribution. In some embodiments, determining the current likelihood probability distribution specifically includes: to search for a sequence z _T Selecting N adjacent candidate sequences as a reference; wherein N is a preset value; calculating the probability distribution of the N candidate sequences; assuming that the obtained probability distribution is a sample of gaussian distribution, estimating parameters of the gaussian distribution based on the obtained probability distribution, and taking the estimated gaussian distribution as the current likelihood probability distribution.

After obtaining the matching result of the sequence, how to convert this value into a probability distribution, i.e. a likelihood distribution p (z) _T |x _T M) to facilitate the approach of using probability-based markov positioning. The SoftMax function is used here to map these values to a range of [0,1]As shown in equation (7):

where S is _T,k Image I acquired for agent at time T _T And image I in map _j And matching the obtained sequence cost value. In order to simplify the operation, the scheme adopts a sliding window W = [ zT-N/2, zT + N/2 with the length of N]Calculating the probability distribution in N candidate frames near the optimal matching result:

after converting the cost value into a probability, the present solution further assumes that this probability conforms to a Gaussian distribution, i.e., p (z) _T |x _T M) to N (. Mu.o,. Sigma.o). This gaussian distribution is estimated using the most computationally efficient least squares method. Assuming that the resulting probability distribution is a sample of a Gaussian distribution, use (x) _i ,y _i ) (i =1,2, ·, n), then there is the following relationship:

here, gaussian distribution has a particularly good characteristic, that is, it is an exponential function of e, and it can be logarithmized to convert the form of exponent into linear relation, and then solved as a linear equation set, so that the estimation process of the scheme can be greatly simplified:

here, a variable substitution is made, let ln (y) _i )＝z _i ，

Then publicThe equation becomes the following linear equation form:

this linear equation can obtain an estimate of B according to the least squares method:

B＝(X ^T X) ^-1 X ^T Z (12)

thus, the mean μ and variance σ as well as the parameter θ can be obtained from the matrix B.

3. Kalman filtering-markov positioning under gaussian distribution

Some reasoning behaviors of human beings in the positioning process are modeled by markov positioning. In the navigation and positioning process, a human not only estimates the current state through the current observation, but also combines the memory of the human, namely the prior information. Here, the scheme expresses this process in terms of a hidden markov chain, as shown in fig. 4.

The Markov positioning process is divided into two sub-processes of prediction and updating, the prediction process is to estimate the state of the agent at the next moment by the state of the last moment and a state transition model, and the updating is to update the confidence of the agent to the self state at the current moment by using the confidence of the predicted state and the observed likelihood probability distribution.

It is assumed here that the transition and observation of the state are both linear relations, and a Gaussian distributed noise, that is, a Gaussian linear system (Gaussian linear system) is added, so the present solution here adopts a kalman filtering method to recursively calculate the confidence of the agent for its own state.

3.1 prediction

Step S4 is performed next, and the confidence of the current state is determined based on the current likelihood probability distribution and the prediction confidence of the previous state. In some embodiments, step S4 specifically includes: based on the current likelihood probability distribution, the prediction confidence of the last state is calculated recursively by a Kalman filtering method, and the confidence of the current state is obtained.

First, state x of agent _i Is represented by a map M = { Image = { [ Image ] _i The index value i of the picture in (v) }, control command u of the agent _i Is the velocity derived from the estimation

Described, then the kinetic equations describing the state transitions are described by the following equations:

this is a linear relationship and a gaussian noise contribution, and the prediction process is generally described by the following equation:

in the formula

For prediction confidence, corresponding to FIG. 2

As previously mentioned, the probability distribution of state transitions is described

Is a Gaussian distribution, and confidence bel (x) _i-1 )～N(μ _i-1 ,σ _i-1 ) Also a gaussian distribution, so that the scheme can be recursively calculated by using a kalman filtering method, where the predicted confidence bel (x) is _i )～N(μ _i ,σ _i ) Also a gaussian distribution, whose parameters can be calculated using the following equation:

3.2 updating

Step S5 is performed next, and the probability distribution of the state transition is determined based on the current velocity value. In some embodiments, step S5 specifically includes: assuming that the probability distribution of the state transition is Gaussian distribution; and solving a kinetic equation describing the state transition and an equation describing the prediction process based on the current speed value to obtain the probability distribution of the state transition.

And finally, executing a step S6, and determining the prediction confidence of the next state based on the probability distribution of the state transition and the confidence of the current state. In some embodiments, step S6 specifically includes: and based on the probability distribution of state transition, carrying out recursive calculation on the confidence coefficient of the current state by a Kalman filtering method to obtain the prediction confidence coefficient of the next state.

The problem to be solved by the confidence update can be described by the following equation:

the likelihood probability distribution p (z) observed by the agent as introduced in the previous section _i |x _i ,M)～N(μ _o ,σ _o ) It is also a gaussian probability distribution, and its variance is estimated by the least square method after mapping the sequence cost value to the probability by the SoftMax equation. The observation model can then be described by the following equation:

z _i ＝x _i +δ _i ，δ _i ～M(0,σ _o ) (17)

here, the observed state is also a Gaussian linear system, since the scheme defines the observed result as

This is a result of gaussian noise being added, so the present solution cannot take the mean value to which the observed result is fitted as the mean value of the probability distribution, which is meaningless. In thatHere, the fitted variance is an important indicator because the variance represents the trustworthiness of the observation, and if my observation is very trustworthy, the probability distribution of the observation should be very concentrated, and therefore the variance is smaller, and if the observation is untrustworthy, the probability distribution of the observation is more dispersed, and the variance is larger. The variance resulting from this fit therefore largely describes the confidence level of the match. However, it is far from sufficient to rely on the observation result, because the solution cannot only consider the situation near the best observation result, but also consider the jump situation of the observation, and if the observation is too different from the last observation or the predicted result, the observation is considered to be unreliable. Based on the above discussion, therefore, the present scheme designs the observed variance as follows:

here, the square of the difference between the predicted result and the observed result is used as a penalty term (penalty factor) to evaluate whether the observation is reliable, so as to prevent the condition that the observation generates jump from influencing the final recognition result. Since the likelihood probability distribution obtained by the observation is assumed to be Gaussian distribution, the updated confidence coefficient parameter mu can be calculated according to Kalman filtering _i And σ _i ：

After the step S4 is executed to determine the confidence of the current state, the following steps are also executed: and determining the recognition result of the current observed image according to the confidence coefficient of the current state. Thus, the final agent's current image I _i Is identified as I _μi ∈M。

In summary, the key points of the present invention are: 1. normalizing the deep learning descriptor and the database image based on the continuous sequence image to obtain a cost matrix; 2. a sequence cost matching definition mode; 3. a method of estimating robot velocity based on cosine distance; 4. a way of probabilistic matching of image features to similarities; 5. according to the probabilistic statistical characteristics, completing the Markov prediction and updating process of simulating human memory and reasoning; 6. in order to deal with unstable conditions and other unexpected conditions which may occur in the descriptors, the design of the variance penalty term is predicted.

The embodiment of the present application further provides a device for recognizing a sequence image scene based on deep learning, where the device includes:

the extraction module is used for extracting the descriptor of the image;

the updating module is used for determining the confidence coefficient of the current state based on the current likelihood probability distribution and the prediction confidence coefficient of the previous state;

In some embodiments, the extraction module is specifically configured to: processing the image through a convolutional neural network model, and taking the output of a specified layer as a descriptor; the convolutional neural network model is a model trained through a scene classification data set; the descriptor is subjected to normalization processing.

In some embodiments, the cost value calculation module is specifically configured to:

recording a current observation image set observed by an agentComprises the following steps: q = [ I = [ ] ₁ I ₂ ... I _j ... I _n ](ii) a The image sequence of the map is noted as: m = [ I ] ₁ I ₂ ... I _i ... I _m ]；

In some embodiments, the estimation module is specifically configured to: determining a search sequence with the length as the preset length by taking the current time as a reference; summing the sequence cost values of the whole search sequence to serve as the matching degree between the current observation image and the map; selecting a search sequence z with the highest matching degree by adjusting the slope v of the search sequence _T (ii) a Will search for sequence z _T The corresponding slope v is determined as the current velocity value of the agent.

In some embodiments, the first probability determination module is specifically configured to: to search for a sequence z _T Selecting N adjacent candidate sequences as a reference; wherein N is a preset value; calculating the probability distribution of the N candidate sequences; assuming that the obtained probability distribution is a sample of gaussian distribution, estimating parameters of the gaussian distribution based on the obtained probability distribution, and taking the estimated gaussian distribution as the current likelihood probability distribution.

In some embodiments, the update module is specifically configured to: based on the current likelihood probability distribution, the prediction confidence of the last state is calculated recursively by a Kalman filtering method, and the confidence of the current state is obtained.

In some embodiments, the scene recognition apparatus further includes a determining module, specifically configured to: after the confidence of the current state is determined, the recognition result of the current observed image is determined according to the confidence of the current state.

In some embodiments, the second probability determination module is specifically configured to: assuming that the probability distribution of the state transition is Gaussian distribution; and solving a kinetic equation describing the state transition and an equation describing the prediction process based on the current speed value to obtain the probability distribution of the state transition.

In some embodiments, the prediction module is specifically configured to: based on the probability distribution of state transition, the confidence coefficient of the current state is recursively calculated through a Kalman filtering method, and the prediction confidence coefficient of the next state is obtained.

With regard to the apparatus in the above embodiment, the specific steps in which the respective modules perform operations have been described in detail in the embodiment related to the method, and are not described in detail herein. The modules in the scene recognition device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A sequence image scene recognition method based on deep learning is characterized by comprising the following steps:

calculating a sequence cost value between the current observation image and the map according to the extracted descriptors;

determining a probability distribution of state transitions based on the current velocity value;

2. The method of claim 1, wherein extracting the descriptor of the image comprises:

the descriptor is subjected to normalization processing.

3. The method of claim 1, wherein calculating a sequence cost value between a current observation image and a map comprises:

the current set of observed images observed by the agent is recorded as: q = [ I = [ ] ₁ I ₂ ...I _j ...I _n ](ii) a The image sequence of the map is noted as: m = [ I ] ₁ I ₂ ...I _i ...I _m ]；

4. A method according to any of claims 1-3, wherein estimating a current velocity value based on the sequence cost value comprises:

5. The method of claim 4, wherein determining the current likelihood probability distribution comprises:

calculating probability distribution of N candidate sequences;

6. The method of claim 5, wherein determining the confidence level of the current state based on the current likelihood probability distribution and the prediction confidence level of the previous state comprises:

7. The method of claim 6, wherein determining the confidence level for the current state further comprises:

8. A method according to any of claims 1-3, wherein determining a probability distribution for a state transition based on a current speed value comprises:

9. The method of claim 8, wherein determining the confidence of the prediction for the next state based on the probability distribution of the state transition and the confidence of the current state comprises:

10. A sequence image scene recognition apparatus based on deep learning, comprising:

the extraction module is used for extracting the descriptor of the image;