CN109447096A - A kind of pan path prediction technique and device based on machine learning - Google Patents

A kind of pan path prediction technique and device based on machine learning Download PDF

Info

Publication number
CN109447096A
CN109447096A CN201810332835.5A CN201810332835A CN109447096A CN 109447096 A CN109447096 A CN 109447096A CN 201810332835 A CN201810332835 A CN 201810332835A CN 109447096 A CN109447096 A CN 109447096A
Authority
CN
China
Prior art keywords
information
described image
lstm network
data set
true value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810332835.5A
Other languages
Chinese (zh)
Other versions
CN109447096B (en
Inventor
齐飞
高帅
石光明
夏朝辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201810332835.5A priority Critical patent/CN109447096B/en
Publication of CN109447096A publication Critical patent/CN109447096A/en
Application granted granted Critical
Publication of CN109447096B publication Critical patent/CN109447096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The present invention provides a kind of pan path prediction technique and device based on machine learning, it is related to field of computer technology, the described method includes: wherein each image information in described image data set all has corresponding true value information by obtaining image data set to be processed;According to the true value information, the training sample of described image data set is made;According to described image information, the image feature representation information of described image information is obtained;According to described image character representation information and the eye movement data sample, constructs and train LSTM network;According to the LSTM network, scan path is predicted.Solve the problems, such as that prediction blinkpunkt excessively relies on static Saliency maps in the prior art, and in natural scene picture the technical issues of prediction pan path Shortcomings, the dependence for eliminating model to Saliency maps is reached, and the timing between blinkpunkt is considered, obtains good technical effect on multiple public data sets.

Description

A kind of pan path prediction technique and device based on machine learning
Technical field
The present invention relates to technical field of image processing more particularly to a kind of pan path prediction techniques based on machine learning And device.
Background technique
With the fast development of information technology, the mankind have entered the epoch that a data increase on a large scale, digital picture, view Frequency become information important carrier, the image data of magnanimity be obtain information an important component, how effectively from The information that most worthy is chosen in image has been increasingly becoming the hot spot of field of image processing concern.
The problem of prediction blinkpunkt excessively relies on static Saliency maps in the prior art, while the prior art is in natural scene There is also many deficiencies in prediction pan path in picture.
Summary of the invention
The embodiment of the invention provides a kind of pan path prediction technique and device based on machine learning, solve existing Predict that prediction pan path exists in the problem of blinkpunkt excessively relies on static Saliency maps and natural scene picture in technology Insufficient technical problem has reached the dependence for eliminating model to Saliency maps, and has considered the timing between blinkpunkt, Good technical effect is obtained on multiple public data sets.
In view of the above problems, the embodiment of the present application is proposed in order to provide a kind of pan path prediction based on machine learning Method and apparatus.
In a first aspect, the present invention provides a kind of pan path prediction technique based on machine learning, which comprises Image data set to be processed is obtained, wherein each image information in described image data set all has corresponding true value information;Root According to the true value information, the training sample of described image data set is made;According to described image information, described image information is obtained Image feature representation information;According to described image character representation information and the eye movement data sample, constructs and train LSTM Network;According to the LSTM network, scan path is predicted.
Preferably, described according to the true value information, the training sample of described image data set is made, is specifically included: right The true value information is handled, and the eye movement data information of N number of observer is obtained;To the eye movement data of N number of observer into Row bound processing;The eye movement data of N number of observer after BORDER PROCESSING is normalized;By N number of observer's Eye movement data merges to obtain the training sample, and wherein N is positive integer.
Preferably, the image feature representation information for obtaining described image information, specifically includes: according to described image number According to collection, training set and test set are obtained;Training set image information is cut out as standard size;Construct convolutional neural networks, load Trained model parameter;Using described image information as the input of convolutional neural networks, the image of described image information is exported Character representation information.
Preferably, the building and training LSTM network, specifically include: the coordinate of the LSTM network are obtained, according to institute It states coordinate and defines corresponding weight matrix;Using described image character representation information and the corresponding weight matrix of the coordinate as The input of LSTM network;The operation that input gate is carried out to the input using propagated forward method, forgets door, out gate;According to Deep layer output layer is decoded LSTM network output;By described image character representation information input to the LSTM net Network uses back-propagation algorithm, the training LSTM network.
Preferably, the method also includes: load the LSTM network, the training sample be input to the LSTM net In network;Using propagation algorithm forward, the output feature vector of the LSTM network is obtained;By the output feature vector and described True value information input, using propagation algorithm forward, obtains blinkpunkt coordinate into the LSTM network.
Second aspect, the pan path prediction meanss based on machine learning that the present invention provides a kind of, described device include:
First obtains unit, the first obtains unit are used to obtain image data set to be processed, wherein described image number Corresponding true value information is all had according to each image information of concentration;
First production unit, first production unit are used to make described image data set according to the true value information Training sample;
Second obtaining unit, second obtaining unit are used to obtain described image information according to described image information Image feature representation information;
First construction unit, first construction unit are used for according to described image character representation information and the eye movement number According to sample, constructs and train LSTM network;
First predicting unit, first predicting unit are used to predict scan path according to the LSTM network.
Preferably, described device further include:
Third obtaining unit, the third obtaining unit obtain N number of observer for handling the true value information Eye movement data information;
First processing units, the first processing units are used to carry out boundary to the eye movement data of N number of observer Reason;
First normalization unit, first normalization unit are used for the eye of N number of observer after BORDER PROCESSING Dynamic data are normalized;
First combining unit, first combining unit is for merging the eye movement data of N number of observer to obtain institute State training sample.
Preferably, described device further include:
4th obtaining unit, the 4th obtaining unit are used to obtain training set and test according to described image data set Collection;
First cuts out unit, and described first cuts out unit for cutting out training set image information for standard size;
First construction unit, first construction unit load trained model ginseng for constructing convolutional neural networks Number;
First output unit, first output unit are used for using described image information as the defeated of convolutional neural networks Enter, exports the image feature representation information of described image information.
Preferably, described device further include:
5th obtaining unit, the 5th obtaining unit is used to obtain the coordinate of the LSTM network, according to the coordinate Define corresponding weight matrix;
First input unit, first input unit is for answering described image character representation information and the coordinate pair Input of the weight matrix as LSTM network;
First operating unit, first operating unit is for inputting the input using propagated forward method Door, the operation for forgeing door, out gate;
First decoding unit, first decoding unit are used to export the LSTM network according to deep layer output layer and carry out Decoding;
First training unit, first training unit are used for described image character representation information input to the LSTM Network uses back-propagation algorithm, the training LSTM network.
Preferably, described device further include:
Second input unit, second input unit input the training sample for loading the LSTM network Into the LSTM network;
6th obtaining unit, the 6th obtaining unit are used to obtain the LSTM network using propagation algorithm forward Export feature vector;
7th obtaining unit, the 7th obtaining unit are used for the output feature vector and the true value information input Into the LSTM network, using propagation algorithm forward, blinkpunkt coordinate is obtained.
The third aspect, the pan path prediction meanss based on machine learning that the present invention provides a kind of, including memory, place The computer program managing device and storage on a memory and can running on a processor, the processor execute real when described program Existing following steps: image data set to be processed is obtained, wherein each image information in described image data set all has correspondence True value information;According to the true value information, the training sample of described image data set is made;According to described image information, obtain The image feature representation information of described image information;According to described image character representation information and the eye movement data sample, structure It builds and trains LSTM network;According to the LSTM network, scan path is predicted.
Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects Fruit:
1, a kind of pan path prediction technique and device based on machine learning provided by the embodiments of the present application, passes through acquisition Image data set to be processed, wherein each image information in described image data set all has corresponding true value information;According to institute True value information is stated, the training sample of described image data set is made;According to described image information, the figure of described image information is obtained As character representation information;According to described image character representation information and the eye movement data sample, constructs and train LSTM network; According to the LSTM network, scan path is predicted.It solves prediction blinkpunkt in the prior art and excessively relies on static Saliency maps The problem of and natural scene picture in prediction pan path Shortcomings the technical issues of, reached elimination model to significant The dependence of property figure, and the timing between blinkpunkt is considered, good technology is obtained on multiple public data sets Effect.
2, the present invention extracts characteristics of image by using convolutional neural networks, the powerful representative learning of convolutional neural networks Ability and layer-by-layer learning strategy can learn to higher level feature, overcome in the prior art using selection by hand or connection The deficiency of multidimensional characteristic choosing method is closed, there is preferable universality and scalability.
3, the present invention estimates pan path that the structure of LSTM network is suitble to handle timing by building LSTM network Sequence trains LSTM network by the blinkpunkt in conjunction with image-region currently entered and up to the present generated, simulates Human vision process phase pan the stage and visual cortex on information propagation and prediction, realized from Biological Mechanism with The mankind sweep the consistency of path process, and have obtained and the consistent pan route result of human eye eye movement data.
4, for the present invention by introducing attention mechanism in a network, each step for exporting network all allows decoder to be concerned about The different piece of image, the model after final training can learn to which partial region that should pay close attention to image, to instruct to decode Network output.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the pan path prediction technique based on machine learning in the embodiment of the present invention;
Fig. 2 is convolutional neural networks structure chart in the embodiment of the present invention;
Fig. 3 is the LSTM network structure constructed in the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the pan path prediction meanss based on machine learning in the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of pan path prediction meanss of the another kind based on machine learning in the embodiment of the present invention
Drawing reference numeral explanation: bus 300, receiver 301, processor 302, transmitter 303, memory 304, bus interface 306。
Specific embodiment
The embodiment of the invention provides a kind of pan path prediction technique and device based on machine learning, it is existing for solving Have and predicts that prediction pan path is deposited in the problem of blinkpunkt excessively relies on static Saliency maps and natural scene picture in technology In insufficient technical problem, technical solution general thought provided by the invention is as follows:
In the technical solution of the embodiment of the present invention, by obtaining image data set to be processed, wherein described image data The each image information concentrated all has corresponding true value information;According to the true value information, the instruction of described image data set is made Practice sample;According to described image information, the image feature representation information of described image information is obtained;According to described image mark sheet Show information and the eye movement data sample, construct and trains LSTM network;According to the LSTM network, scan path is predicted.It reaches The dependence for eliminating model to Saliency maps has been arrived, and has considered the timing between blinkpunkt, in multiple public data Good technical effect is obtained on collection.
Technical solution of the present invention is described in detail below by attached drawing and specific embodiment, it should be understood that the application Specific features in embodiment and embodiment are the detailed description to technical scheme, rather than to present techniques The restriction of scheme, in the absence of conflict, the technical characteristic in the embodiment of the present application and embodiment can be combined with each other.
The terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates that there may be three kinds of passes System, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.In addition, herein Middle character "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or".
In order to become apparent from a kind of pan path prediction technique based on machine learning provided by open the embodiment of the present application, Some terms are described below.
Convolutional neural networks (Convolutional Neural Network, CNN) are a kind of feedforward neural networks, it Artificial neuron can respond the surrounding cells in a part of coverage area, have outstanding performance for large-scale image procossing.It is wrapped Include convolutional layer (alternating convolutional layer) and pond layer (pooling layer).
LSTM (Long Short-Term Memory, LSTM) is a kind of improved Recognition with Recurrent Neural Network, and paper is sent out for the first time Table was in 1997.Due to unique design structure, LSTM is suitable for being spaced in processing and predicted time sequence and delay is very long Critical event.
TensorFlow is the second generation artificial intelligence learning system that Google is researched and developed based on DistBelief, name From the operation logic of itself.Tensor (tensor) means N-dimensional array, and Flow (stream) means based on data flow diagram It calculates, TensorFlow flow to other end calculating process from one end of flow graph for tensor.TensorFlow is by complicated data The system that structural transmission carries out analysis and treatment process into artificial intelligence nerve net.
BasicLSTMCell is basic LSTM recirculating network unit.
Embodiment 1
Fig. 1 is a kind of flow diagram of the pan path prediction technique based on machine learning in the embodiment of the present invention.Such as Shown in Fig. 1, which comprises
Step 110: obtaining image data set to be processed, wherein each image information in described image data set all has Corresponding true value information;
Specifically, the image data set to be processed such as refers at the set of plurality of pictures to be processed, it is described corresponding true Value information refers to the blinkpunkt coordinate of correspondence image, as label.
Step 120: according to the true value information, making the training sample of described image data set;
Further, described according to the true value information, the training sample of described image data set is made, is specifically included: The true value information is handled, the eye movement data information of N number of observer is obtained;To the eye movement data of N number of observer Carry out BORDER PROCESSING;The eye movement data of N number of observer after BORDER PROCESSING is normalized;By N number of observer Eye movement data merge to obtain the training sample, wherein N be positive integer.
Specifically, the processing for the true value, each image data set has corresponding eye movement data, every picture There is the eye movement data of N number of observer, BORDER PROCESSING is carried out to eye movement data, the point outside image is all handled as in image boundary On point;It chooses corresponding eye movement data every time according to dictionary, is normalized, and the data of N number of observer are merged to obtain Training sequence, each sequence is 8 blinkpunkt coordinates here, and being expressed as one-dimensional data is exactly that a sequence includes 16 numbers.
Each data set is tested several times, for example, MIT1003 data set, there are 1003 width pictures, every width picture has 15 Picture sequence numbers and number 0~1003 are formed and are mapped, obtain dictionary, every time experiment ordinal selection 900 from dictionary by a observer Width picture, such as 0~900 as training, and 900~1003 as test.Then according to the method for selection picture from Truth data True value is chosen, i.e. eye movement data is 900 × 15=13500, as label.Experiment makes training sample according to this method every time Sheet and label.
Step 130: according to described image information, obtaining the image feature representation information of described image information;
Further, the image feature representation information for obtaining described image information, specifically includes: according to described image Data set obtains training set and test set;Training set image information is cut out as standard size;Convolutional neural networks are constructed, are added Carry trained model parameter;Using described image information as the input of convolutional neural networks, the figure of described image information is exported As character representation information.
Specifically, including corresponding number, the picture name concentrated to image data and number in described image data set Word is mapped, and chooses certain picture according to digital number every time as training set and test set;Knot Fig. 2 is to convolution below The structure of neural network is further described.The pan path model that the present invention establishes mainly includes coding network, decoding net Three parts of network and output layer.Wherein coding network is made of convolutional neural networks, below to the network of convolutional neural networks Structure is further described, and convolutional neural networks include five parts, and first part is two convolutional layers;Second part is two A convolutional layer;Part III is four convolutional layers;Part IV is four convolutional layers;Part V is four convolutional layers;Wherein Each convolutional layer includes convolution operation and pondization operation, and convolution kernel is 3 × 3 sizes, and the activation primitive of all convolutional layers selects For line rectification function.The number of first convolutional layer convolution kernel is 64, and the number of second convolutional layer convolution kernel is 128, the The number of three convolutional layer convolution kernels be 256, behind two convolutional layer convolution kernels number be 512.By using convolutional Neural Network extracts characteristics of image, and the ability and layer-by-layer learning strategy of the powerful representative learning of convolutional neural networks can learn to more High-level feature, overcome in the prior art using by hand choose or joint multidimensional characteristic choosing method deficiency, have compared with Good universality and scalability.
The embodiment of the present application uses VGG19 trained model parameter, because VGG19 network is in large data collection It is trained on ImageNet, accurately characteristics of image can be extracted, we write function load VGG19 model parameter and come Carry out feature extraction.Wherein input of the image as convolutional neural networks, size are 224 × 224 × 3, and output is the spy of image Levy vector α={ α1,...,αL},αi∈RD, L=196, D=512 here, for every picture, network be extracted L to It measures, each correspondence image a region.
Step 140: according to described image character representation information and the eye movement data sample, constructing and train LSTM net Network;
Further, the building and training LSTM network, specifically include: obtaining the coordinate of the LSTM network, according to The coordinate defines corresponding weight matrix;Using described image character representation information and the corresponding weight matrix of the coordinate as The input of LSTM network;The operation that input gate is carried out to the input using propagated forward method, forgets door, out gate;According to Deep layer output layer is decoded LSTM network output;By described image character representation information input to the LSTM net Network uses back-propagation algorithm, the training LSTM network.
Specifically, the structure of 3LSTM network is further described with reference to the accompanying drawing.The LSTM network by BasicLSTMCell unit composition inside TensorFlow, unit number H here take 1024;We define generation model Coordinate be x, y, yiIt is the vector of 1 × K dimension, K is the size in coordinate library, and C is the length of the sequence obtained, takes C in experiment =8, i.e., every width figure generates eight blinkpunkts.
Y={ y1,...,yC},yi∈RK
I in attached drawing 2tIt is input gate, ftIt is to forget door, otIt is out gate and gtIt is to be entered a candidate vector for control, ht-1Represent the hidden layer state of previous moment, ztRepresent t moment context vector, Eyt-1Indicate that the output at t-1 moment passes through insertion The insertion vector that matrix E is obtained.Embeded matrix E is exactly using the corresponding total weight matrix of x, y and true value coordinate as function The input of embedding_lookup (params, ids) obtains the corresponding weight matrix of x, y.The present invention by drawing in a network Enter attention mechanism, the different piece that each step for exporting network all allows decoder to be concerned about image, the mould after final training Type can learn to which partial region that should pay close attention to image, so that decoding network be instructed to export.
The propagated forward process of LSTM network is as follows:
There is hidden layer state ht, cell state ctWith context vector zt, so that it may calculate the output of LSTM network, formula It is as follows:
p(yt|α,y1,...yt-1)∝exp(Lo(Eyt-1+Lhht+Lzzt))
Layer network is exported particular by deep layer to realize, includes two layers of neural net layer, the first layer network is first to hidden layer Dropout is carried out, output h_logits, the coordinate generated by contextual information and before are then obtained using logistic regression mode Information is all added in h_logits, using tanh activation primitive, then carries out dropout;Second layer network is by the defeated of first layer Output out_logits is obtained by the way of logistic regression out.Wherein Dropout refers in the training process of neural network, For certain units in neural network, in each iterative process according to certain probability by certain neurons temporarily from network Middle discarding, those of be dropped node and can temporarily not think be network structure a part, but its weight needs to retain Get off.
Further, before starting to train LSTM neural network, all weight parameters of LSTM network is initial at random One is turned to close to 0 number, all amount of bias are initialized as 0, and the initialization of hidden layer state h and cell state c are by two What a independent multi-layer perception (MLP) obtained, by the input of the feature average value of each image-region perceptually machine, obtain hidden layer The initial value of state and cell state, formula are
Training sample data are randomly divided into several lesser batch, the size of our selections criticized in experiment of the invention It is 25;A collection of image feature vector and Truth data are inputted in each repetitive exercise,
According to the following formula, the cost of LSTM network is calculated:
lti=-[ytilnati+(1-yti)ln(1-ati)]
Wherein ytiIndicate the real output value of network, atiIndicate idea output, ltiIndicate the damage of i-th of sample of t moment Lose functional value.When network trains N number of sample every time, sum to the penalty values of N number of sample of each moment t, when obtaining t Carve the penalty values loss of all samplest.By the training time, all penalty values of step t sum to obtain the penalty values loss of N number of sample.
According to the cost of LSTM network, the optimization algorithm RMSProp declined using gradient optimizes the cost letter of LSTM network Number, the model parameter of LSTM network is successively updated by back-propagation algorithm.Training LSTM network is allowed to restrain, persistence LSTM The network model and parameter of network.The structure of LSTM network, which is suitble to locate, to be estimated to pan path by building LSTM network Time series is managed, LSTM net is trained by the blinkpunkt in conjunction with image-region currently entered and up to the present generated Network, the propagation and prediction of information on the pan stage of simulation human vision process phase and visual cortex, from Biological Mechanism The consistency for sweeping path process with the mankind is realized, and has been obtained and the consistent pan route result of human eye eye movement data.
Step 150: according to the LSTM network, predicting scan path.
Further, the LSTM network is loaded, the training sample is input in the LSTM network;Using forward Propagation algorithm obtains the output feature vector of the LSTM network;By the output feature vector and the true value information input Into the LSTM network, using propagation algorithm forward, blinkpunkt coordinate is obtained.
Specifically, loading the LSTM network, the training sample picture made is input in the LSTM network, Using propagated forward algorithm, the output feature vector of the LSTM network is calculated.Feature vector and sample true value are input to institute It states in LSTM network, obtains blinkpunkt coordinate using propagated forward algorithm.
Embodiment 2
Effect of the invention is further described below with reference to emulation experiment.
1. simulated conditions:
In emulation experiment of the invention, the computer system of use is Ubuntu 16.04, and machine learning frame is TensorFlow, version are 1.1.0, and the Python version of use is 2.7, and the vector of embeded matrix is V × M, and V is according to different Data set does corresponding adjustment, and M takes 512, C to take 16, indicates 8 blinkpunkts.
2. emulation content:
In emulation experiment of the invention, map picture name and Arabic numerals to form dictionary, to each data set Contrived experiment chooses training set picture and test set picture according to number, and handles corresponding eye movement data collection and obtain label.Make With sample training LSTM network, LSTM network is trained using the optimization algorithm RMSProp of gradient decline, when the cost of LSTM network Deconditioning when convergence.The present invention uses the network-evaluated figure of trained LSTM only with the emulation experiment method in the present invention Trained LSTM network is tested by test set sample in the pan path of picture, and each data set has about 100 surveys Sample sheet.
3. analysis of simulation result:
Estimated pan path includes 8 blinkpunkt coordinates.The evaluation index of this method includes three: HD (Hausdorff distance), MMD (The mean minimal distance), SS (Sequence Score), wherein before Two indices are used to measure similarity between two sequences, and to represent two sequences more similar apart from smaller;SS is from watching point attentively Set, direction and distance and these angles of the sequence of pan that blinkpunkt is mobile describe sequence, value is closer to 1, then sequence Similarity degree is higher.
Algorithm numerical value of the pan path estimated by model on HD, MMD than classical pan path is small, close to human eye The curve of true data calculation, bigger than classical algorithm values on SS, closer with true value, effect is more preferable.
Embodiment 3
Based on inventive concept same as a kind of pan path prediction technique based on machine learning in previous embodiment, originally Invention also provides a kind of pan path prediction meanss based on machine learning, as shown in Figure 4, comprising:
First obtains unit, the first obtains unit are used to obtain image data set to be processed, wherein described image number Corresponding true value information is all had according to each image information of concentration;
First production unit, first production unit are used to make described image data set according to the true value information Training sample;
Second obtaining unit, second obtaining unit are used to obtain described image information according to described image information Image feature representation information;
First construction unit, first construction unit are used for according to described image character representation information and the eye movement number According to sample, constructs and train LSTM network;
First predicting unit, first predicting unit are used to predict scan path according to the LSTM network.
Further, described device further include:
Third obtaining unit, the third obtaining unit obtain N number of observer for handling the true value information Eye movement data information;
First processing units, the first processing units are used to carry out boundary to the eye movement data of N number of observer Reason;
First normalization unit, first normalization unit are used for the eye of N number of observer after BORDER PROCESSING Dynamic data are normalized;
First combining unit, first combining unit is for merging the eye movement data of N number of observer to obtain institute State training sample.
Further, described device further include:
4th obtaining unit, the 4th obtaining unit are used to obtain training set and test according to described image data set Collection;
First cuts out unit, and described first cuts out unit for cutting out training set image information for standard size;
First construction unit, first construction unit load trained model parameter for constructing LSTM network;
First output unit, first output unit is used for using described image information as the input of LSTM network, defeated The image feature representation information of described image information out.
Further, described device further include:
5th obtaining unit, the 5th obtaining unit is used to obtain the coordinate of the LSTM network, according to the coordinate Define corresponding weight matrix;
First input unit, first input unit is for answering described image character representation information and the coordinate pair Input of the weight matrix as LSTM network;
First operating unit, first operating unit is for inputting the input using propagated forward method Door, the operation for forgeing door, out gate;
First decoding unit, first decoding unit are used to export the LSTM network according to deep layer output layer and carry out Decoding;
First training unit, first training unit are used for described image character representation information input to the LSTM Network uses back-propagation algorithm, the training LSTM network.
Further, described device further include:
Second input unit, second input unit input the training sample for loading the LSTM network Into the LSTM network;
6th obtaining unit, the 6th obtaining unit are used to obtain the LSTM network using propagation algorithm forward Export feature vector;
7th obtaining unit, the 7th obtaining unit are used for the output feature vector and the true value information input Into the LSTM network, using propagation algorithm forward, blinkpunkt coordinate is obtained.
One of 1 embodiment 1 of earlier figures based on machine learning pan path prediction technique various change mode and Specific example is equally applicable to a kind of pan path prediction meanss based on machine learning of the present embodiment, by aforementioned to one kind The detailed description of pan path prediction technique based on machine learning, those skilled in the art are clear that the present embodiment The implementation method of middle a kind of pan path prediction meanss based on machine learning, so in order to illustrate the succinct of book, herein no longer It is described in detail.
Embodiment 4
Based on inventive concept same as a kind of pan path prediction technique based on machine learning in previous embodiment, originally Invention also provides a kind of pan path prediction meanss based on machine learning, including memory, processor and is stored in memory Computer program that is upper and can running on a processor, the program are realized described previously a kind of based on machine when being executed by processor The step of either the pan path prediction technique of study method.
Wherein, in Fig. 5, bus architecture (is represented) with bus 300, and bus 300 may include any number of interconnection Bus and bridge, bus 300 will include the one or more processors represented by processor 302 and what memory 304 represented deposits The various circuits of reservoir link together.Bus 300 can also will peripheral equipment, voltage-stablizer and management circuit etc. it Various other circuits of class link together, and these are all it is known in the art, therefore, no longer carry out further to it herein Description.Bus interface 306 provides interface between bus 300 and receiver 301 and transmitter 303.Receiver 301 and transmitter 303 can be the same element, i.e. transceiver, provide the unit for communicating over a transmission medium with various other devices.
Processor 302 is responsible for management bus 300 and common processing, and memory 304 can be used for storage processor 302 when executing operation used information.
Said one or multiple technical solutions in the embodiment of the present application at least have following one or more technology effects Fruit:
1, a kind of pan path prediction technique and device based on machine learning provided by the embodiments of the present application, passes through acquisition Image data set to be processed, wherein each image information in described image data set all has corresponding true value information;According to institute True value information is stated, the training sample of described image data set is made;According to described image information, the figure of described image information is obtained As character representation information;According to described image character representation information and the eye movement data sample, constructs and train LSTM network; According to the LSTM network, scan path is predicted.It solves prediction blinkpunkt in the prior art and excessively relies on static Saliency maps The problem of and natural scene picture in prediction pan path Shortcomings the technical issues of, reached elimination model to significant The dependence of property figure, and the timing between blinkpunkt is considered, good technology is obtained on multiple public data sets Effect.
2, the present invention extracts characteristics of image by using convolutional neural networks, the powerful representative learning of convolutional neural networks Ability and layer-by-layer learning strategy can learn to higher level feature, overcome in the prior art using selection by hand or connection The deficiency of multidimensional characteristic choosing method is closed, there is preferable universality and scalability.
3, the present invention estimates pan path that the structure of LSTM network is suitble to handle timing by building LSTM network Sequence trains LSTM network by the blinkpunkt in conjunction with image-region currently entered and up to the present generated, simulates Human vision process phase pan the stage and visual cortex on information propagation and prediction, realized from Biological Mechanism with The mankind sweep the consistency of path process, and have obtained and the consistent pan route result of human eye eye movement data.
4, for the present invention by introducing attention mechanism in a network, each step for exporting network all allows decoder to be concerned about The different piece of image, the model after final training can learn to which partial region that should pay close attention to image, to instruct to decode Network output.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, complete hardware embodiment, complete software embodiment or reality combining software and hardware aspects can be used in the present invention Apply the form of example.Moreover, it wherein includes the computer of computer usable program code that the present invention, which can be used in one or more, The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) produces The form of product.
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable information processing equipments to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable information processing equipments execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable information processing equipments with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions can also be loaded into computer or other programmable information processing equipments, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (7)

1. a kind of pan path prediction technique based on machine learning, which is characterized in that the described method includes:
Image data set to be processed is obtained, wherein each image information in described image data set all has corresponding true value letter Breath;
According to the true value information, the training sample of described image data set is made;
According to described image information, the image feature representation information of described image information is obtained;
According to described image character representation information and the eye movement data sample, constructs and train LSTM network;
According to the LSTM network, scan path is predicted.
2. the method as described in claim 1, which is characterized in that it is described according to the true value information, make described image data The training sample of collection, specifically includes:
The true value information is handled, the eye movement data information of N number of observer is obtained;
BORDER PROCESSING is carried out to the eye movement data of N number of observer;
The eye movement data of N number of observer after BORDER PROCESSING is normalized;
Merge the eye movement data of N number of observer to obtain the training sample, wherein N is positive integer.
3. the method as described in claim 1, which is characterized in that the image feature representation letter for obtaining described image information Breath, specifically includes:
According to described image data set, training set and test set are obtained;
Training set image information is cut out as standard size;
Convolutional neural networks are constructed, trained model parameter is loaded;
Using described image information as the input of convolutional neural networks, the image feature representation information of described image information is exported.
4. the method as described in claim 1, which is characterized in that the building and training LSTM network specifically include:
The coordinate for obtaining the LSTM network defines corresponding weight matrix according to the coordinate;
Using described image character representation information and the corresponding weight matrix of the coordinate as the input of LSTM network;
The operation that input gate is carried out to the input using propagated forward method, forgets door, out gate;
LSTM network output is decoded according to deep layer output layer;
By described image character representation information input to the LSTM network, back-propagation algorithm, the training LSTM net are used Network.
5. the method as described in claim 1, which is characterized in that the method also includes:
The LSTM network is loaded, the training sample is input in the LSTM network;
Using propagation algorithm forward, the output feature vector of the LSTM network is obtained;
By the output feature vector and the true value information input into the LSTM network, using propagation algorithm forward, obtain Obtain blinkpunkt coordinate.
6. a kind of pan path prediction meanss based on machine learning, which is characterized in that described device includes:
First obtains unit, the first obtains unit are used to obtain image data set to be processed, wherein described image data set In each image information all have corresponding true value information;
First production unit, first production unit are used to make the instruction of described image data set according to the true value information Practice sample;
Second obtaining unit, second obtaining unit are used to obtain the image of described image information according to described image information Character representation information;
First construction unit, first construction unit are used for according to described image character representation information and the eye movement data sample This, constructs and trains LSTM network;
First predicting unit, first predicting unit are used to predict scan path according to the LSTM network.
7. a kind of pan path prediction meanss based on machine learning, including memory, processor and storage are on a memory simultaneously The computer program that can be run on a processor, which is characterized in that the processor performs the steps of when executing described program
Image data set to be processed is obtained, wherein each image information in described image data set all has corresponding true value letter Breath;
According to the true value information, the training sample of described image data set is made;
According to described image information, the image feature representation information of described image information is obtained;
According to described image character representation information and the eye movement data sample, constructs and train LSTM network;
According to the LSTM network, scan path is predicted.
CN201810332835.5A 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning Active CN109447096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810332835.5A CN109447096B (en) 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810332835.5A CN109447096B (en) 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning

Publications (2)

Publication Number Publication Date
CN109447096A true CN109447096A (en) 2019-03-08
CN109447096B CN109447096B (en) 2022-05-06

Family

ID=65530053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810332835.5A Active CN109447096B (en) 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning

Country Status (1)

Country Link
CN (1) CN109447096B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245660A (en) * 2019-06-03 2019-09-17 西北工业大学 Webpage based on significant characteristics fusion sweeps path prediction technique
CN110298303A (en) * 2019-06-27 2019-10-01 西北工业大学 A kind of crowd recognition method based on the long pan of memory network in short-term path learning
CN111461974A (en) * 2020-02-17 2020-07-28 天津大学 Image scanning path control method based on L STM model from coarse to fine
CN111723707A (en) * 2020-06-09 2020-09-29 天津大学 Method and device for estimating fixation point based on visual saliency
CN113313123A (en) * 2021-06-11 2021-08-27 西北工业大学 Semantic inference based glance path prediction method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110150293A1 (en) * 2008-11-26 2011-06-23 Bower Bradley A Methods, Systems and Computer Program Products for Biometric Identification by Tissue Imaging Using Optical Coherence Tomography (OCT)
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN105678735A (en) * 2015-10-13 2016-06-15 中国人民解放军陆军军官学院 Target salience detection method for fog images
CN106491129A (en) * 2016-10-10 2017-03-15 安徽大学 A kind of Human bodys' response system and method based on EOG
CN106959749A (en) * 2017-02-20 2017-07-18 浙江工业大学 A kind of vision attention behavior cooperating type method for visualizing and system based on eye-tracking data
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study
CN107515466A (en) * 2017-08-14 2017-12-26 华为技术有限公司 A kind of eyeball tracking system and eyeball tracking method
CN107644401A (en) * 2017-08-11 2018-01-30 西安电子科技大学 Multiplicative noise minimizing technology based on deep neural network
CN107808132A (en) * 2017-10-23 2018-03-16 重庆邮电大学 A kind of scene image classification method for merging topic model
CN107852521A (en) * 2015-08-07 2018-03-27 Smi创新传感技术有限公司 System and method for display image stream

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110150293A1 (en) * 2008-11-26 2011-06-23 Bower Bradley A Methods, Systems and Computer Program Products for Biometric Identification by Tissue Imaging Using Optical Coherence Tomography (OCT)
US20140201126A1 (en) * 2012-09-15 2014-07-17 Lotfi A. Zadeh Methods and Systems for Applications for Z-numbers
CN107852521A (en) * 2015-08-07 2018-03-27 Smi创新传感技术有限公司 System and method for display image stream
CN105678735A (en) * 2015-10-13 2016-06-15 中国人民解放军陆军军官学院 Target salience detection method for fog images
CN106491129A (en) * 2016-10-10 2017-03-15 安徽大学 A kind of Human bodys' response system and method based on EOG
CN106959749A (en) * 2017-02-20 2017-07-18 浙江工业大学 A kind of vision attention behavior cooperating type method for visualizing and system based on eye-tracking data
CN106970615A (en) * 2017-03-21 2017-07-21 西北工业大学 A kind of real-time online paths planning method of deeply study
CN107644401A (en) * 2017-08-11 2018-01-30 西安电子科技大学 Multiplicative noise minimizing technology based on deep neural network
CN107515466A (en) * 2017-08-14 2017-12-26 华为技术有限公司 A kind of eyeball tracking system and eyeball tracking method
CN107808132A (en) * 2017-10-23 2018-03-16 重庆邮电大学 A kind of scene image classification method for merging topic model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
D BAHDANAU ET AL: "Neural Machine Translation by Jointly Learning to Align and Translate", 《COMPUTER SCIENCE》 *
DANIEL SIMON等: "Automatic Scanpath Generation with Deep Recurrent Neural Networks", 《THE ACM SYMPOSIUM ACM》 *
THUYEN NGO ET AL: "Saccade gaze prediction using a recurrent neural network", 《2017 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 *
严艳梅: "图片重复扫描路径的眼动研究", 《中国优秀硕士学位论文全文数据库 (哲学与人文科学辑)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245660A (en) * 2019-06-03 2019-09-17 西北工业大学 Webpage based on significant characteristics fusion sweeps path prediction technique
CN110245660B (en) * 2019-06-03 2022-04-22 西北工业大学 Webpage glance path prediction method based on saliency feature fusion
CN110298303A (en) * 2019-06-27 2019-10-01 西北工业大学 A kind of crowd recognition method based on the long pan of memory network in short-term path learning
CN110298303B (en) * 2019-06-27 2022-03-25 西北工业大学 Crowd identification method based on long-time memory network glance path learning
CN111461974A (en) * 2020-02-17 2020-07-28 天津大学 Image scanning path control method based on L STM model from coarse to fine
CN111461974B (en) * 2020-02-17 2023-04-25 天津大学 Image scanning path control method based on LSTM model from coarse to fine
CN111723707A (en) * 2020-06-09 2020-09-29 天津大学 Method and device for estimating fixation point based on visual saliency
CN111723707B (en) * 2020-06-09 2023-10-17 天津大学 Gaze point estimation method and device based on visual saliency
CN113313123A (en) * 2021-06-11 2021-08-27 西北工业大学 Semantic inference based glance path prediction method
CN113313123B (en) * 2021-06-11 2024-04-02 西北工业大学 Glance path prediction method based on semantic inference

Also Published As

Publication number Publication date
CN109447096B (en) 2022-05-06

Similar Documents

Publication Publication Date Title
Krohn et al. Deep learning illustrated: a visual, interactive guide to artificial intelligence
CN107437096B (en) Image classification method based on parameter efficient depth residual error network model
CN109447096A (en) A kind of pan path prediction technique and device based on machine learning
Reed et al. Deep visual analogy-making
Cui et al. Efficient human motion prediction using temporal convolutional generative adversarial network
Tadeusiewicz et al. Exploring neural networks with C
Storrs et al. Deep learning for cognitive neuroscience
CN111325664B (en) Style migration method and device, storage medium and electronic equipment
CN111681178B (en) Knowledge distillation-based image defogging method
WO2019053052A1 (en) A method for (re-)training a machine learning component
CN110070107A (en) Object identification method and device
CN113536922A (en) Video behavior identification method for weighting fusion of multiple image tasks
Chen et al. Unsupervised segmentation in real-world images via spelke object inference
Liu et al. Libero: Benchmarking knowledge transfer for lifelong robot learning
CN114819091B (en) Multi-task network model training method and system based on self-adaptive task weight
CN116353623A (en) Driving control method based on self-supervision imitation learning
CN116306793A (en) Self-supervision learning method with target task directivity based on comparison twin network
CN116110022A (en) Lightweight traffic sign detection method and system based on response knowledge distillation
Dai et al. Ctrn: Class-temporal relational network for action detection
CN113592008B (en) System, method, device and storage medium for classifying small sample images
CN113554653A (en) Semantic segmentation method for long-tail distribution of point cloud data based on mutual information calibration
CN109919005A (en) Livestock personal identification method, electronic device and readable storage medium storing program for executing
CN113779244B (en) Document emotion classification method and device, storage medium and electronic equipment
CN114282741A (en) Task decision method, device, equipment and storage medium
Xue et al. Recent research trends on Model Compression and Knowledge Transfer in CNNs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant