CN109447096A - A kind of pan path prediction technique and device based on machine learning - Google Patents
A kind of pan path prediction technique and device based on machine learning Download PDFInfo
- Publication number
- CN109447096A CN109447096A CN201810332835.5A CN201810332835A CN109447096A CN 109447096 A CN109447096 A CN 109447096A CN 201810332835 A CN201810332835 A CN 201810332835A CN 109447096 A CN109447096 A CN 109447096A
- Authority
- CN
- China
- Prior art keywords
- information
- described image
- lstm network
- data set
- true value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The present invention provides a kind of pan path prediction technique and device based on machine learning, it is related to field of computer technology, the described method includes: wherein each image information in described image data set all has corresponding true value information by obtaining image data set to be processed;According to the true value information, the training sample of described image data set is made;According to described image information, the image feature representation information of described image information is obtained;According to described image character representation information and the eye movement data sample, constructs and train LSTM network;According to the LSTM network, scan path is predicted.Solve the problems, such as that prediction blinkpunkt excessively relies on static Saliency maps in the prior art, and in natural scene picture the technical issues of prediction pan path Shortcomings, the dependence for eliminating model to Saliency maps is reached, and the timing between blinkpunkt is considered, obtains good technical effect on multiple public data sets.
Description
Technical field
The present invention relates to technical field of image processing more particularly to a kind of pan path prediction techniques based on machine learning
And device.
Background technique
With the fast development of information technology, the mankind have entered the epoch that a data increase on a large scale, digital picture, view
Frequency become information important carrier, the image data of magnanimity be obtain information an important component, how effectively from
The information that most worthy is chosen in image has been increasingly becoming the hot spot of field of image processing concern.
The problem of prediction blinkpunkt excessively relies on static Saliency maps in the prior art, while the prior art is in natural scene
There is also many deficiencies in prediction pan path in picture.
Summary of the invention
The embodiment of the invention provides a kind of pan path prediction technique and device based on machine learning, solve existing
Predict that prediction pan path exists in the problem of blinkpunkt excessively relies on static Saliency maps and natural scene picture in technology
Insufficient technical problem has reached the dependence for eliminating model to Saliency maps, and has considered the timing between blinkpunkt,
Good technical effect is obtained on multiple public data sets.
In view of the above problems, the embodiment of the present application is proposed in order to provide a kind of pan path prediction based on machine learning
Method and apparatus.
In a first aspect, the present invention provides a kind of pan path prediction technique based on machine learning, which comprises
Image data set to be processed is obtained, wherein each image information in described image data set all has corresponding true value information;Root
According to the true value information, the training sample of described image data set is made;According to described image information, described image information is obtained
Image feature representation information;According to described image character representation information and the eye movement data sample, constructs and train LSTM
Network;According to the LSTM network, scan path is predicted.
Preferably, described according to the true value information, the training sample of described image data set is made, is specifically included: right
The true value information is handled, and the eye movement data information of N number of observer is obtained;To the eye movement data of N number of observer into
Row bound processing;The eye movement data of N number of observer after BORDER PROCESSING is normalized;By N number of observer's
Eye movement data merges to obtain the training sample, and wherein N is positive integer.
Preferably, the image feature representation information for obtaining described image information, specifically includes: according to described image number
According to collection, training set and test set are obtained;Training set image information is cut out as standard size;Construct convolutional neural networks, load
Trained model parameter;Using described image information as the input of convolutional neural networks, the image of described image information is exported
Character representation information.
Preferably, the building and training LSTM network, specifically include: the coordinate of the LSTM network are obtained, according to institute
It states coordinate and defines corresponding weight matrix;Using described image character representation information and the corresponding weight matrix of the coordinate as
The input of LSTM network;The operation that input gate is carried out to the input using propagated forward method, forgets door, out gate;According to
Deep layer output layer is decoded LSTM network output;By described image character representation information input to the LSTM net
Network uses back-propagation algorithm, the training LSTM network.
Preferably, the method also includes: load the LSTM network, the training sample be input to the LSTM net
In network;Using propagation algorithm forward, the output feature vector of the LSTM network is obtained;By the output feature vector and described
True value information input, using propagation algorithm forward, obtains blinkpunkt coordinate into the LSTM network.
Second aspect, the pan path prediction meanss based on machine learning that the present invention provides a kind of, described device include:
First obtains unit, the first obtains unit are used to obtain image data set to be processed, wherein described image number
Corresponding true value information is all had according to each image information of concentration;
First production unit, first production unit are used to make described image data set according to the true value information
Training sample;
Second obtaining unit, second obtaining unit are used to obtain described image information according to described image information
Image feature representation information;
First construction unit, first construction unit are used for according to described image character representation information and the eye movement number
According to sample, constructs and train LSTM network;
First predicting unit, first predicting unit are used to predict scan path according to the LSTM network.
Preferably, described device further include:
Third obtaining unit, the third obtaining unit obtain N number of observer for handling the true value information
Eye movement data information;
First processing units, the first processing units are used to carry out boundary to the eye movement data of N number of observer
Reason;
First normalization unit, first normalization unit are used for the eye of N number of observer after BORDER PROCESSING
Dynamic data are normalized;
First combining unit, first combining unit is for merging the eye movement data of N number of observer to obtain institute
State training sample.
Preferably, described device further include:
4th obtaining unit, the 4th obtaining unit are used to obtain training set and test according to described image data set
Collection;
First cuts out unit, and described first cuts out unit for cutting out training set image information for standard size;
First construction unit, first construction unit load trained model ginseng for constructing convolutional neural networks
Number;
First output unit, first output unit are used for using described image information as the defeated of convolutional neural networks
Enter, exports the image feature representation information of described image information.
Preferably, described device further include:
5th obtaining unit, the 5th obtaining unit is used to obtain the coordinate of the LSTM network, according to the coordinate
Define corresponding weight matrix;
First input unit, first input unit is for answering described image character representation information and the coordinate pair
Input of the weight matrix as LSTM network;
First operating unit, first operating unit is for inputting the input using propagated forward method
Door, the operation for forgeing door, out gate;
First decoding unit, first decoding unit are used to export the LSTM network according to deep layer output layer and carry out
Decoding;
First training unit, first training unit are used for described image character representation information input to the LSTM
Network uses back-propagation algorithm, the training LSTM network.
Preferably, described device further include:
Second input unit, second input unit input the training sample for loading the LSTM network
Into the LSTM network;
6th obtaining unit, the 6th obtaining unit are used to obtain the LSTM network using propagation algorithm forward
Export feature vector;
7th obtaining unit, the 7th obtaining unit are used for the output feature vector and the true value information input
Into the LSTM network, using propagation algorithm forward, blinkpunkt coordinate is obtained.
The third aspect, the pan path prediction meanss based on machine learning that the present invention provides a kind of, including memory, place
The computer program managing device and storage on a memory and can running on a processor, the processor execute real when described program
Existing following steps: image data set to be processed is obtained, wherein each image information in described image data set all has correspondence
True value information;According to the true value information, the training sample of described image data set is made;According to described image information, obtain
The image feature representation information of described image information;According to described image character representation information and the eye movement data sample, structure
It builds and trains LSTM network;According to the LSTM network, scan path is predicted.
Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects
Fruit:
1, a kind of pan path prediction technique and device based on machine learning provided by the embodiments of the present application, passes through acquisition
Image data set to be processed, wherein each image information in described image data set all has corresponding true value information;According to institute
True value information is stated, the training sample of described image data set is made;According to described image information, the figure of described image information is obtained
As character representation information;According to described image character representation information and the eye movement data sample, constructs and train LSTM network;
According to the LSTM network, scan path is predicted.It solves prediction blinkpunkt in the prior art and excessively relies on static Saliency maps
The problem of and natural scene picture in prediction pan path Shortcomings the technical issues of, reached elimination model to significant
The dependence of property figure, and the timing between blinkpunkt is considered, good technology is obtained on multiple public data sets
Effect.
2, the present invention extracts characteristics of image by using convolutional neural networks, the powerful representative learning of convolutional neural networks
Ability and layer-by-layer learning strategy can learn to higher level feature, overcome in the prior art using selection by hand or connection
The deficiency of multidimensional characteristic choosing method is closed, there is preferable universality and scalability.
3, the present invention estimates pan path that the structure of LSTM network is suitble to handle timing by building LSTM network
Sequence trains LSTM network by the blinkpunkt in conjunction with image-region currently entered and up to the present generated, simulates
Human vision process phase pan the stage and visual cortex on information propagation and prediction, realized from Biological Mechanism with
The mankind sweep the consistency of path process, and have obtained and the consistent pan route result of human eye eye movement data.
4, for the present invention by introducing attention mechanism in a network, each step for exporting network all allows decoder to be concerned about
The different piece of image, the model after final training can learn to which partial region that should pay close attention to image, to instruct to decode
Network output.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the pan path prediction technique based on machine learning in the embodiment of the present invention;
Fig. 2 is convolutional neural networks structure chart in the embodiment of the present invention;
Fig. 3 is the LSTM network structure constructed in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the pan path prediction meanss based on machine learning in the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of pan path prediction meanss of the another kind based on machine learning in the embodiment of the present invention
Drawing reference numeral explanation: bus 300, receiver 301, processor 302, transmitter 303, memory 304, bus interface
306。
Specific embodiment
The embodiment of the invention provides a kind of pan path prediction technique and device based on machine learning, it is existing for solving
Have and predicts that prediction pan path is deposited in the problem of blinkpunkt excessively relies on static Saliency maps and natural scene picture in technology
In insufficient technical problem, technical solution general thought provided by the invention is as follows:
In the technical solution of the embodiment of the present invention, by obtaining image data set to be processed, wherein described image data
The each image information concentrated all has corresponding true value information;According to the true value information, the instruction of described image data set is made
Practice sample;According to described image information, the image feature representation information of described image information is obtained;According to described image mark sheet
Show information and the eye movement data sample, construct and trains LSTM network;According to the LSTM network, scan path is predicted.It reaches
The dependence for eliminating model to Saliency maps has been arrived, and has considered the timing between blinkpunkt, in multiple public data
Good technical effect is obtained on collection.
Technical solution of the present invention is described in detail below by attached drawing and specific embodiment, it should be understood that the application
Specific features in embodiment and embodiment are the detailed description to technical scheme, rather than to present techniques
The restriction of scheme, in the absence of conflict, the technical characteristic in the embodiment of the present application and embodiment can be combined with each other.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes
System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein
Middle character "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or".
In order to become apparent from a kind of pan path prediction technique based on machine learning provided by open the embodiment of the present application,
Some terms are described below.
Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural networks, it
Artificial neuron can respond the surrounding cells in a part of coverage area, have outstanding performance for large-scale image procossing.It is wrapped
Include convolutional layer (alternating convolutional layer) and pond layer (pooling layer).
LSTM (Long Short-Term Memory, LSTM) is a kind of improved Recognition with Recurrent Neural Network, and paper is sent out for the first time
Table was in 1997.Due to unique design structure, LSTM is suitable for being spaced in processing and predicted time sequence and delay is very long
Critical event.
TensorFlow is the second generation artificial intelligence learning system that Google is researched and developed based on DistBelief, name
From the operation logic of itself.Tensor (tensor) means N-dimensional array, and Flow (stream) means based on data flow diagram
It calculates, TensorFlow flow to other end calculating process from one end of flow graph for tensor.TensorFlow is by complicated data
The system that structural transmission carries out analysis and treatment process into artificial intelligence nerve net.
BasicLSTMCell is basic LSTM recirculating network unit.
Embodiment 1
Fig. 1 is a kind of flow diagram of the pan path prediction technique based on machine learning in the embodiment of the present invention.Such as
Shown in Fig. 1, which comprises
Step 110: obtaining image data set to be processed, wherein each image information in described image data set all has
Corresponding true value information;
Specifically, the image data set to be processed such as refers at the set of plurality of pictures to be processed, it is described corresponding true
Value information refers to the blinkpunkt coordinate of correspondence image, as label.
Step 120: according to the true value information, making the training sample of described image data set;
Further, described according to the true value information, the training sample of described image data set is made, is specifically included:
The true value information is handled, the eye movement data information of N number of observer is obtained;To the eye movement data of N number of observer
Carry out BORDER PROCESSING;The eye movement data of N number of observer after BORDER PROCESSING is normalized;By N number of observer
Eye movement data merge to obtain the training sample, wherein N be positive integer.
Specifically, the processing for the true value, each image data set has corresponding eye movement data, every picture
There is the eye movement data of N number of observer, BORDER PROCESSING is carried out to eye movement data, the point outside image is all handled as in image boundary
On point;It chooses corresponding eye movement data every time according to dictionary, is normalized, and the data of N number of observer are merged to obtain
Training sequence, each sequence is 8 blinkpunkt coordinates here, and being expressed as one-dimensional data is exactly that a sequence includes 16 numbers.
Each data set is tested several times, for example, MIT1003 data set, there are 1003 width pictures, every width picture has 15
Picture sequence numbers and number 0~1003 are formed and are mapped, obtain dictionary, every time experiment ordinal selection 900 from dictionary by a observer
Width picture, such as 0~900 as training, and 900~1003 as test.Then according to the method for selection picture from Truth data
True value is chosen, i.e. eye movement data is 900 × 15=13500, as label.Experiment makes training sample according to this method every time
Sheet and label.
Step 130: according to described image information, obtaining the image feature representation information of described image information;
Further, the image feature representation information for obtaining described image information, specifically includes: according to described image
Data set obtains training set and test set;Training set image information is cut out as standard size;Convolutional neural networks are constructed, are added
Carry trained model parameter;Using described image information as the input of convolutional neural networks, the figure of described image information is exported
As character representation information.
Specifically, including corresponding number, the picture name concentrated to image data and number in described image data set
Word is mapped, and chooses certain picture according to digital number every time as training set and test set;Knot Fig. 2 is to convolution below
The structure of neural network is further described.The pan path model that the present invention establishes mainly includes coding network, decoding net
Three parts of network and output layer.Wherein coding network is made of convolutional neural networks, below to the network of convolutional neural networks
Structure is further described, and convolutional neural networks include five parts, and first part is two convolutional layers;Second part is two
A convolutional layer;Part III is four convolutional layers;Part IV is four convolutional layers;Part V is four convolutional layers;Wherein
Each convolutional layer includes convolution operation and pondization operation, and convolution kernel is 3 × 3 sizes, and the activation primitive of all convolutional layers selects
For line rectification function.The number of first convolutional layer convolution kernel is 64, and the number of second convolutional layer convolution kernel is 128, the
The number of three convolutional layer convolution kernels be 256, behind two convolutional layer convolution kernels number be 512.By using convolutional Neural
Network extracts characteristics of image, and the ability and layer-by-layer learning strategy of the powerful representative learning of convolutional neural networks can learn to more
High-level feature, overcome in the prior art using by hand choose or joint multidimensional characteristic choosing method deficiency, have compared with
Good universality and scalability.
The embodiment of the present application uses VGG19 trained model parameter, because VGG19 network is in large data collection
It is trained on ImageNet, accurately characteristics of image can be extracted, we write function load VGG19 model parameter and come
Carry out feature extraction.Wherein input of the image as convolutional neural networks, size are 224 × 224 × 3, and output is the spy of image
Levy vector α={ α1,...,αL},αi∈RD, L=196, D=512 here, for every picture, network be extracted L to
It measures, each correspondence image a region.
Step 140: according to described image character representation information and the eye movement data sample, constructing and train LSTM net
Network;
Further, the building and training LSTM network, specifically include: obtaining the coordinate of the LSTM network, according to
The coordinate defines corresponding weight matrix;Using described image character representation information and the corresponding weight matrix of the coordinate as
The input of LSTM network;The operation that input gate is carried out to the input using propagated forward method, forgets door, out gate;According to
Deep layer output layer is decoded LSTM network output;By described image character representation information input to the LSTM net
Network uses back-propagation algorithm, the training LSTM network.
Specifically, the structure of 3LSTM network is further described with reference to the accompanying drawing.The LSTM network by
BasicLSTMCell unit composition inside TensorFlow, unit number H here take 1024;We define generation model
Coordinate be x, y, yiIt is the vector of 1 × K dimension, K is the size in coordinate library, and C is the length of the sequence obtained, takes C in experiment
=8, i.e., every width figure generates eight blinkpunkts.
Y={ y1,...,yC},yi∈RK
I in attached drawing 2tIt is input gate, ftIt is to forget door, otIt is out gate and gtIt is to be entered a candidate vector for control,
ht-1Represent the hidden layer state of previous moment, ztRepresent t moment context vector, Eyt-1Indicate that the output at t-1 moment passes through insertion
The insertion vector that matrix E is obtained.Embeded matrix E is exactly using the corresponding total weight matrix of x, y and true value coordinate as function
The input of embedding_lookup (params, ids) obtains the corresponding weight matrix of x, y.The present invention by drawing in a network
Enter attention mechanism, the different piece that each step for exporting network all allows decoder to be concerned about image, the mould after final training
Type can learn to which partial region that should pay close attention to image, so that decoding network be instructed to export.
The propagated forward process of LSTM network is as follows:
There is hidden layer state ht, cell state ctWith context vector zt, so that it may calculate the output of LSTM network, formula
It is as follows:
p(yt|α,y1,...yt-1)∝exp(Lo(Eyt-1+Lhht+Lzzt))
Layer network is exported particular by deep layer to realize, includes two layers of neural net layer, the first layer network is first to hidden layer
Dropout is carried out, output h_logits, the coordinate generated by contextual information and before are then obtained using logistic regression mode
Information is all added in h_logits, using tanh activation primitive, then carries out dropout;Second layer network is by the defeated of first layer
Output out_logits is obtained by the way of logistic regression out.Wherein Dropout refers in the training process of neural network,
For certain units in neural network, in each iterative process according to certain probability by certain neurons temporarily from network
Middle discarding, those of be dropped node and can temporarily not think be network structure a part, but its weight needs to retain
Get off.
Further, before starting to train LSTM neural network, all weight parameters of LSTM network is initial at random
One is turned to close to 0 number, all amount of bias are initialized as 0, and the initialization of hidden layer state h and cell state c are by two
What a independent multi-layer perception (MLP) obtained, by the input of the feature average value of each image-region perceptually machine, obtain hidden layer
The initial value of state and cell state, formula are
Training sample data are randomly divided into several lesser batch, the size of our selections criticized in experiment of the invention
It is 25;A collection of image feature vector and Truth data are inputted in each repetitive exercise,
According to the following formula, the cost of LSTM network is calculated:
lti=-[ytilnati+(1-yti)ln(1-ati)]
Wherein ytiIndicate the real output value of network, atiIndicate idea output, ltiIndicate the damage of i-th of sample of t moment
Lose functional value.When network trains N number of sample every time, sum to the penalty values of N number of sample of each moment t, when obtaining t
Carve the penalty values loss of all samplest.By the training time, all penalty values of step t sum to obtain the penalty values loss of N number of sample.
According to the cost of LSTM network, the optimization algorithm RMSProp declined using gradient optimizes the cost letter of LSTM network
Number, the model parameter of LSTM network is successively updated by back-propagation algorithm.Training LSTM network is allowed to restrain, persistence LSTM
The network model and parameter of network.The structure of LSTM network, which is suitble to locate, to be estimated to pan path by building LSTM network
Time series is managed, LSTM net is trained by the blinkpunkt in conjunction with image-region currently entered and up to the present generated
Network, the propagation and prediction of information on the pan stage of simulation human vision process phase and visual cortex, from Biological Mechanism
The consistency for sweeping path process with the mankind is realized, and has been obtained and the consistent pan route result of human eye eye movement data.
Step 150: according to the LSTM network, predicting scan path.
Further, the LSTM network is loaded, the training sample is input in the LSTM network;Using forward
Propagation algorithm obtains the output feature vector of the LSTM network;By the output feature vector and the true value information input
Into the LSTM network, using propagation algorithm forward, blinkpunkt coordinate is obtained.
Specifically, loading the LSTM network, the training sample picture made is input in the LSTM network,
Using propagated forward algorithm, the output feature vector of the LSTM network is calculated.Feature vector and sample true value are input to institute
It states in LSTM network, obtains blinkpunkt coordinate using propagated forward algorithm.
Embodiment 2
Effect of the invention is further described below with reference to emulation experiment.
1. simulated conditions:
In emulation experiment of the invention, the computer system of use is Ubuntu 16.04, and machine learning frame is
TensorFlow, version are 1.1.0, and the Python version of use is 2.7, and the vector of embeded matrix is V × M, and V is according to different
Data set does corresponding adjustment, and M takes 512, C to take 16, indicates 8 blinkpunkts.
2. emulation content:
In emulation experiment of the invention, map picture name and Arabic numerals to form dictionary, to each data set
Contrived experiment chooses training set picture and test set picture according to number, and handles corresponding eye movement data collection and obtain label.Make
With sample training LSTM network, LSTM network is trained using the optimization algorithm RMSProp of gradient decline, when the cost of LSTM network
Deconditioning when convergence.The present invention uses the network-evaluated figure of trained LSTM only with the emulation experiment method in the present invention
Trained LSTM network is tested by test set sample in the pan path of picture, and each data set has about 100 surveys
Sample sheet.
3. analysis of simulation result:
Estimated pan path includes 8 blinkpunkt coordinates.The evaluation index of this method includes three: HD
(Hausdorff distance), MMD (The mean minimal distance), SS (Sequence Score), wherein before
Two indices are used to measure similarity between two sequences, and to represent two sequences more similar apart from smaller;SS is from watching point attentively
Set, direction and distance and these angles of the sequence of pan that blinkpunkt is mobile describe sequence, value is closer to 1, then sequence
Similarity degree is higher.
Algorithm numerical value of the pan path estimated by model on HD, MMD than classical pan path is small, close to human eye
The curve of true data calculation, bigger than classical algorithm values on SS, closer with true value, effect is more preferable.
Embodiment 3
Based on inventive concept same as a kind of pan path prediction technique based on machine learning in previous embodiment, originally
Invention also provides a kind of pan path prediction meanss based on machine learning, as shown in Figure 4, comprising:
First obtains unit, the first obtains unit are used to obtain image data set to be processed, wherein described image number
Corresponding true value information is all had according to each image information of concentration;
First production unit, first production unit are used to make described image data set according to the true value information
Training sample;
Second obtaining unit, second obtaining unit are used to obtain described image information according to described image information
Image feature representation information;
First construction unit, first construction unit are used for according to described image character representation information and the eye movement number
According to sample, constructs and train LSTM network;
First predicting unit, first predicting unit are used to predict scan path according to the LSTM network.
Further, described device further include:
Third obtaining unit, the third obtaining unit obtain N number of observer for handling the true value information
Eye movement data information;
First processing units, the first processing units are used to carry out boundary to the eye movement data of N number of observer
Reason;
First normalization unit, first normalization unit are used for the eye of N number of observer after BORDER PROCESSING
Dynamic data are normalized;
First combining unit, first combining unit is for merging the eye movement data of N number of observer to obtain institute
State training sample.
Further, described device further include:
4th obtaining unit, the 4th obtaining unit are used to obtain training set and test according to described image data set
Collection;
First cuts out unit, and described first cuts out unit for cutting out training set image information for standard size;
First construction unit, first construction unit load trained model parameter for constructing LSTM network;
First output unit, first output unit is used for using described image information as the input of LSTM network, defeated
The image feature representation information of described image information out.
Further, described device further include:
5th obtaining unit, the 5th obtaining unit is used to obtain the coordinate of the LSTM network, according to the coordinate
Define corresponding weight matrix;
First input unit, first input unit is for answering described image character representation information and the coordinate pair
Input of the weight matrix as LSTM network;
First operating unit, first operating unit is for inputting the input using propagated forward method
Door, the operation for forgeing door, out gate;
First decoding unit, first decoding unit are used to export the LSTM network according to deep layer output layer and carry out
Decoding;
First training unit, first training unit are used for described image character representation information input to the LSTM
Network uses back-propagation algorithm, the training LSTM network.
Further, described device further include:
Second input unit, second input unit input the training sample for loading the LSTM network
Into the LSTM network;
6th obtaining unit, the 6th obtaining unit are used to obtain the LSTM network using propagation algorithm forward
Export feature vector;
7th obtaining unit, the 7th obtaining unit are used for the output feature vector and the true value information input
Into the LSTM network, using propagation algorithm forward, blinkpunkt coordinate is obtained.
One of 1 embodiment 1 of earlier figures based on machine learning pan path prediction technique various change mode and
Specific example is equally applicable to a kind of pan path prediction meanss based on machine learning of the present embodiment, by aforementioned to one kind
The detailed description of pan path prediction technique based on machine learning, those skilled in the art are clear that the present embodiment
The implementation method of middle a kind of pan path prediction meanss based on machine learning, so in order to illustrate the succinct of book, herein no longer
It is described in detail.
Embodiment 4
Based on inventive concept same as a kind of pan path prediction technique based on machine learning in previous embodiment, originally
Invention also provides a kind of pan path prediction meanss based on machine learning, including memory, processor and is stored in memory
Computer program that is upper and can running on a processor, the program are realized described previously a kind of based on machine when being executed by processor
The step of either the pan path prediction technique of study method.
Wherein, in Fig. 5, bus architecture (is represented) with bus 300, and bus 300 may include any number of interconnection
Bus and bridge, bus 300 will include the one or more processors represented by processor 302 and what memory 304 represented deposits
The various circuits of reservoir link together.Bus 300 can also will peripheral equipment, voltage-stablizer and management circuit etc. it
Various other circuits of class link together, and these are all it is known in the art, therefore, no longer carry out further to it herein
Description.Bus interface 306 provides interface between bus 300 and receiver 301 and transmitter 303.Receiver 301 and transmitter
303 can be the same element, i.e. transceiver, provide the unit for communicating over a transmission medium with various other devices.
Processor 302 is responsible for management bus 300 and common processing, and memory 304 can be used for storage processor
302 when executing operation used information.
Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects
Fruit:
1, a kind of pan path prediction technique and device based on machine learning provided by the embodiments of the present application, passes through acquisition
Image data set to be processed, wherein each image information in described image data set all has corresponding true value information;According to institute
True value information is stated, the training sample of described image data set is made;According to described image information, the figure of described image information is obtained
As character representation information;According to described image character representation information and the eye movement data sample, constructs and train LSTM network;
According to the LSTM network, scan path is predicted.It solves prediction blinkpunkt in the prior art and excessively relies on static Saliency maps
The problem of and natural scene picture in prediction pan path Shortcomings the technical issues of, reached elimination model to significant
The dependence of property figure, and the timing between blinkpunkt is considered, good technology is obtained on multiple public data sets
Effect.
2, the present invention extracts characteristics of image by using convolutional neural networks, the powerful representative learning of convolutional neural networks
Ability and layer-by-layer learning strategy can learn to higher level feature, overcome in the prior art using selection by hand or connection
The deficiency of multidimensional characteristic choosing method is closed, there is preferable universality and scalability.
3, the present invention estimates pan path that the structure of LSTM network is suitble to handle timing by building LSTM network
Sequence trains LSTM network by the blinkpunkt in conjunction with image-region currently entered and up to the present generated, simulates
Human vision process phase pan the stage and visual cortex on information propagation and prediction, realized from Biological Mechanism with
The mankind sweep the consistency of path process, and have obtained and the consistent pan route result of human eye eye movement data.
4, for the present invention by introducing attention mechanism in a network, each step for exporting network all allows decoder to be concerned about
The different piece of image, the model after final training can learn to which partial region that should pay close attention to image, to instruct to decode
Network output.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention
Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more,
The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces
The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable information processing equipments to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable information processing equipments execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable information processing equipments with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions can also be loaded into computer or other programmable information processing equipments, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art
Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies
Within, then the present invention is also intended to include these modifications and variations.
Claims (7)
1. a kind of pan path prediction technique based on machine learning, which is characterized in that the described method includes:
Image data set to be processed is obtained, wherein each image information in described image data set all has corresponding true value letter
Breath;
According to the true value information, the training sample of described image data set is made;
According to described image information, the image feature representation information of described image information is obtained;
According to described image character representation information and the eye movement data sample, constructs and train LSTM network;
According to the LSTM network, scan path is predicted.
2. the method as described in claim 1, which is characterized in that it is described according to the true value information, make described image data
The training sample of collection, specifically includes:
The true value information is handled, the eye movement data information of N number of observer is obtained;
BORDER PROCESSING is carried out to the eye movement data of N number of observer;
The eye movement data of N number of observer after BORDER PROCESSING is normalized;
Merge the eye movement data of N number of observer to obtain the training sample, wherein N is positive integer.
3. the method as described in claim 1, which is characterized in that the image feature representation letter for obtaining described image information
Breath, specifically includes:
According to described image data set, training set and test set are obtained;
Training set image information is cut out as standard size;
Convolutional neural networks are constructed, trained model parameter is loaded;
Using described image information as the input of convolutional neural networks, the image feature representation information of described image information is exported.
4. the method as described in claim 1, which is characterized in that the building and training LSTM network specifically include:
The coordinate for obtaining the LSTM network defines corresponding weight matrix according to the coordinate;
Using described image character representation information and the corresponding weight matrix of the coordinate as the input of LSTM network;
The operation that input gate is carried out to the input using propagated forward method, forgets door, out gate;
LSTM network output is decoded according to deep layer output layer;
By described image character representation information input to the LSTM network, back-propagation algorithm, the training LSTM net are used
Network.
5. the method as described in claim 1, which is characterized in that the method also includes:
The LSTM network is loaded, the training sample is input in the LSTM network;
Using propagation algorithm forward, the output feature vector of the LSTM network is obtained;
By the output feature vector and the true value information input into the LSTM network, using propagation algorithm forward, obtain
Obtain blinkpunkt coordinate.
6. a kind of pan path prediction meanss based on machine learning, which is characterized in that described device includes:
First obtains unit, the first obtains unit are used to obtain image data set to be processed, wherein described image data set
In each image information all have corresponding true value information;
First production unit, first production unit are used to make the instruction of described image data set according to the true value information
Practice sample;
Second obtaining unit, second obtaining unit are used to obtain the image of described image information according to described image information
Character representation information;
First construction unit, first construction unit are used for according to described image character representation information and the eye movement data sample
This, constructs and trains LSTM network;
First predicting unit, first predicting unit are used to predict scan path according to the LSTM network.
7. a kind of pan path prediction meanss based on machine learning, including memory, processor and storage are on a memory simultaneously
The computer program that can be run on a processor, which is characterized in that the processor performs the steps of when executing described program
Image data set to be processed is obtained, wherein each image information in described image data set all has corresponding true value letter
Breath;
According to the true value information, the training sample of described image data set is made;
According to described image information, the image feature representation information of described image information is obtained;
According to described image character representation information and the eye movement data sample, constructs and train LSTM network;
According to the LSTM network, scan path is predicted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810332835.5A CN109447096B (en) | 2018-04-13 | 2018-04-13 | Glance path prediction method and device based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810332835.5A CN109447096B (en) | 2018-04-13 | 2018-04-13 | Glance path prediction method and device based on machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109447096A true CN109447096A (en) | 2019-03-08 |
CN109447096B CN109447096B (en) | 2022-05-06 |
Family
ID=65530053
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810332835.5A Active CN109447096B (en) | 2018-04-13 | 2018-04-13 | Glance path prediction method and device based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109447096B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245660A (en) * | 2019-06-03 | 2019-09-17 | 西北工业大学 | Webpage based on significant characteristics fusion sweeps path prediction technique |
CN110298303A (en) * | 2019-06-27 | 2019-10-01 | 西北工业大学 | A kind of crowd recognition method based on the long pan of memory network in short-term path learning |
CN111461974A (en) * | 2020-02-17 | 2020-07-28 | 天津大学 | Image scanning path control method based on L STM model from coarse to fine |
CN111723707A (en) * | 2020-06-09 | 2020-09-29 | 天津大学 | Method and device for estimating fixation point based on visual saliency |
CN113313123A (en) * | 2021-06-11 | 2021-08-27 | 西北工业大学 | Semantic inference based glance path prediction method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110150293A1 (en) * | 2008-11-26 | 2011-06-23 | Bower Bradley A | Methods, Systems and Computer Program Products for Biometric Identification by Tissue Imaging Using Optical Coherence Tomography (OCT) |
US20140201126A1 (en) * | 2012-09-15 | 2014-07-17 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
CN105678735A (en) * | 2015-10-13 | 2016-06-15 | 中国人民解放军陆军军官学院 | Target salience detection method for fog images |
CN106491129A (en) * | 2016-10-10 | 2017-03-15 | 安徽大学 | A kind of Human bodys' response system and method based on EOG |
CN106959749A (en) * | 2017-02-20 | 2017-07-18 | 浙江工业大学 | A kind of vision attention behavior cooperating type method for visualizing and system based on eye-tracking data |
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN107515466A (en) * | 2017-08-14 | 2017-12-26 | 华为技术有限公司 | A kind of eyeball tracking system and eyeball tracking method |
CN107644401A (en) * | 2017-08-11 | 2018-01-30 | 西安电子科技大学 | Multiplicative noise minimizing technology based on deep neural network |
CN107808132A (en) * | 2017-10-23 | 2018-03-16 | 重庆邮电大学 | A kind of scene image classification method for merging topic model |
CN107852521A (en) * | 2015-08-07 | 2018-03-27 | Smi创新传感技术有限公司 | System and method for display image stream |
-
2018
- 2018-04-13 CN CN201810332835.5A patent/CN109447096B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110150293A1 (en) * | 2008-11-26 | 2011-06-23 | Bower Bradley A | Methods, Systems and Computer Program Products for Biometric Identification by Tissue Imaging Using Optical Coherence Tomography (OCT) |
US20140201126A1 (en) * | 2012-09-15 | 2014-07-17 | Lotfi A. Zadeh | Methods and Systems for Applications for Z-numbers |
CN107852521A (en) * | 2015-08-07 | 2018-03-27 | Smi创新传感技术有限公司 | System and method for display image stream |
CN105678735A (en) * | 2015-10-13 | 2016-06-15 | 中国人民解放军陆军军官学院 | Target salience detection method for fog images |
CN106491129A (en) * | 2016-10-10 | 2017-03-15 | 安徽大学 | A kind of Human bodys' response system and method based on EOG |
CN106959749A (en) * | 2017-02-20 | 2017-07-18 | 浙江工业大学 | A kind of vision attention behavior cooperating type method for visualizing and system based on eye-tracking data |
CN106970615A (en) * | 2017-03-21 | 2017-07-21 | 西北工业大学 | A kind of real-time online paths planning method of deeply study |
CN107644401A (en) * | 2017-08-11 | 2018-01-30 | 西安电子科技大学 | Multiplicative noise minimizing technology based on deep neural network |
CN107515466A (en) * | 2017-08-14 | 2017-12-26 | 华为技术有限公司 | A kind of eyeball tracking system and eyeball tracking method |
CN107808132A (en) * | 2017-10-23 | 2018-03-16 | 重庆邮电大学 | A kind of scene image classification method for merging topic model |
Non-Patent Citations (4)
Title |
---|
D BAHDANAU ET AL: "Neural Machine Translation by Jointly Learning to Align and Translate", 《COMPUTER SCIENCE》 * |
DANIEL SIMON等: "Automatic Scanpath Generation with Deep Recurrent Neural Networks", 《THE ACM SYMPOSIUM ACM》 * |
THUYEN NGO ET AL: "Saccade gaze prediction using a recurrent neural network", 《2017 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 * |
严艳梅: "图片重复扫描路径的眼动研究", 《中国优秀硕士学位论文全文数据库 (哲学与人文科学辑)》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110245660A (en) * | 2019-06-03 | 2019-09-17 | 西北工业大学 | Webpage based on significant characteristics fusion sweeps path prediction technique |
CN110245660B (en) * | 2019-06-03 | 2022-04-22 | 西北工业大学 | Webpage glance path prediction method based on saliency feature fusion |
CN110298303A (en) * | 2019-06-27 | 2019-10-01 | 西北工业大学 | A kind of crowd recognition method based on the long pan of memory network in short-term path learning |
CN110298303B (en) * | 2019-06-27 | 2022-03-25 | 西北工业大学 | Crowd identification method based on long-time memory network glance path learning |
CN111461974A (en) * | 2020-02-17 | 2020-07-28 | 天津大学 | Image scanning path control method based on L STM model from coarse to fine |
CN111461974B (en) * | 2020-02-17 | 2023-04-25 | 天津大学 | Image scanning path control method based on LSTM model from coarse to fine |
CN111723707A (en) * | 2020-06-09 | 2020-09-29 | 天津大学 | Method and device for estimating fixation point based on visual saliency |
CN111723707B (en) * | 2020-06-09 | 2023-10-17 | 天津大学 | Gaze point estimation method and device based on visual saliency |
CN113313123A (en) * | 2021-06-11 | 2021-08-27 | 西北工业大学 | Semantic inference based glance path prediction method |
CN113313123B (en) * | 2021-06-11 | 2024-04-02 | 西北工业大学 | Glance path prediction method based on semantic inference |
Also Published As
Publication number | Publication date |
---|---|
CN109447096B (en) | 2022-05-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Krohn et al. | Deep learning illustrated: a visual, interactive guide to artificial intelligence | |
CN107437096B (en) | Image classification method based on parameter efficient depth residual error network model | |
CN109447096A (en) | A kind of pan path prediction technique and device based on machine learning | |
Reed et al. | Deep visual analogy-making | |
Cui et al. | Efficient human motion prediction using temporal convolutional generative adversarial network | |
Tadeusiewicz et al. | Exploring neural networks with C | |
Storrs et al. | Deep learning for cognitive neuroscience | |
CN111325664B (en) | Style migration method and device, storage medium and electronic equipment | |
CN111681178B (en) | Knowledge distillation-based image defogging method | |
WO2019053052A1 (en) | A method for (re-)training a machine learning component | |
CN110070107A (en) | Object identification method and device | |
CN113536922A (en) | Video behavior identification method for weighting fusion of multiple image tasks | |
Chen et al. | Unsupervised segmentation in real-world images via spelke object inference | |
Liu et al. | Libero: Benchmarking knowledge transfer for lifelong robot learning | |
CN114819091B (en) | Multi-task network model training method and system based on self-adaptive task weight | |
CN116353623A (en) | Driving control method based on self-supervision imitation learning | |
CN116306793A (en) | Self-supervision learning method with target task directivity based on comparison twin network | |
CN116110022A (en) | Lightweight traffic sign detection method and system based on response knowledge distillation | |
Dai et al. | Ctrn: Class-temporal relational network for action detection | |
CN113592008B (en) | System, method, device and storage medium for classifying small sample images | |
CN113554653A (en) | Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration | |
CN109919005A (en) | Livestock personal identification method, electronic device and readable storage medium storing program for executing | |
CN113779244B (en) | Document emotion classification method and device, storage medium and electronic equipment | |
CN114282741A (en) | Task decision method, device, equipment and storage medium | |
Xue et al. | Recent research trends on Model Compression and Knowledge Transfer in CNNs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |