CN109447096B - Glance path prediction method and device based on machine learning - Google Patents

Glance path prediction method and device based on machine learning Download PDF

Info

Publication number
CN109447096B
CN109447096B CN201810332835.5A CN201810332835A CN109447096B CN 109447096 B CN109447096 B CN 109447096B CN 201810332835 A CN201810332835 A CN 201810332835A CN 109447096 B CN109447096 B CN 109447096B
Authority
CN
China
Prior art keywords
lstm network
information
image
training
obtaining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810332835.5A
Other languages
Chinese (zh)
Other versions
CN109447096A (en
Inventor
齐飞
高帅
石光明
夏朝辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201810332835.5A priority Critical patent/CN109447096B/en
Publication of CN109447096A publication Critical patent/CN109447096A/en
Application granted granted Critical
Publication of CN109447096B publication Critical patent/CN109447096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a glance path prediction method and a glance path prediction device based on machine learning, which relate to the technical field of computers, and the method comprises the following steps: obtaining an image data set to be processed, wherein each image information in the image data set has corresponding true value information; according to the truth value information, making a training sample of the image data set; according to the image information, obtaining image feature representation information of the image information; constructing and training an LSTM network according to the image feature representation information and the eye movement data sample; and predicting a scanning path according to the LSTM network. The method solves the technical problems that the predicted fixation point depends on a static saliency map too much and the predicted panning path in a natural scene picture is insufficient in the prior art, achieves the purposes of eliminating the dependence of a model on the saliency map, takes the time sequence among the fixation points into consideration and achieves good technical effects on a plurality of public data sets.

Description

Glance path prediction method and device based on machine learning
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for predicting a saccade path based on machine learning.
Background
With the rapid development of information technology, people have entered an era of large-scale data growth, digital images and videos become important carriers of information, massive image data is an important component for obtaining information, and how to effectively select the most valuable information from images gradually becomes a hot spot concerned in the field of image processing.
The method has the advantages that the method is used for predicting the fixation point and depends on a static saliency map too much in the prior art, and meanwhile, the prior art also has a plurality of defects in predicting a panning path in a natural scene picture.
Disclosure of Invention
The embodiment of the invention provides a glance path prediction method and device based on machine learning, which solves the problems that the predicted gaze point excessively depends on a static saliency map in the prior art and the predicted glance path in a natural scene picture is insufficient, achieves the purposes of eliminating the dependence of a model on the saliency map, takes the time sequence among the gaze points into consideration and achieves a good technical effect on a plurality of public data sets.
In view of the foregoing, embodiments of the present application are proposed to provide a method and apparatus for predicting a glance path based on machine learning.
In a first aspect, the present invention provides a glance path prediction method based on machine learning, including: obtaining an image data set to be processed, wherein each image information in the image data set has corresponding true value information; according to the truth value information, making a training sample of the image data set; according to the image information, obtaining image feature representation information of the image information; constructing and training an LSTM network according to the image feature representation information and the eye movement data sample; and predicting a scanning path according to the LSTM network.
Preferably, the making a training sample of the image data set according to the truth value information specifically includes: processing the truth value information to obtain eye movement data information of N observers; performing boundary processing on the eye movement data of the N observers; normalizing the eye movement data of the N observers after the boundary processing; and combining the eye movement data of the N observers to obtain the training sample, wherein N is a positive integer.
Preferably, the obtaining of the image feature representation information of the image information specifically includes: obtaining a training set and a testing set according to the image data set; cutting the image information of the training set into a standard size; constructing a convolutional neural network, and loading the trained model parameters; and taking the image information as the input of a convolutional neural network, and outputting image feature representation information of the image information.
Preferably, the constructing and training of the LSTM network specifically includes: obtaining the coordinates of the LSTM network, and defining a corresponding weight matrix according to the coordinates; taking the image feature representation information and a weight matrix corresponding to the coordinates as the input of an LSTM network; carrying out the operations of an input gate, a forgetting gate and an output gate on the input by using a forward propagation method; decoding the LSTM network output according to a deep output layer; inputting the image feature representation information into the LSTM network, and training the LSTM network by using a back propagation algorithm.
Preferably, the method further comprises: loading the LSTM network and inputting the training samples into the LSTM network; obtaining an output feature vector of the LSTM network by using a forward propagation algorithm; and inputting the output characteristic vector and the truth value information into the LSTM network, and obtaining the fixation point coordinate by using a forward propagation algorithm.
In a second aspect, the present invention provides a machine learning based glance path prediction apparatus, comprising:
a first obtaining unit, configured to obtain an image dataset to be processed, where each image information in the image dataset has corresponding true value information;
a first production unit, configured to produce a training sample of the image data set according to the truth value information;
a second obtaining unit configured to obtain image feature representation information of the image information based on the image information;
a first constructing unit, configured to construct and train an LSTM network according to the image feature representation information and the eye movement data samples;
a first prediction unit to predict a scan path according to the LSTM network.
Preferably, the apparatus further comprises:
a third obtaining unit, configured to process the truth value information to obtain eye movement data information of the N observers;
a first processing unit for performing boundary processing on the eye movement data of the N observers;
a first normalization unit, configured to normalize the eye movement data of the N observers after the boundary processing;
a first merging unit, configured to merge the eye movement data of the N observers to obtain the training sample.
Preferably, the apparatus further comprises:
a fourth obtaining unit, configured to obtain a training set and a test set according to the image data set;
a first cropping unit for cropping the training set image information to a standard size;
the first construction unit is used for constructing a convolutional neural network and loading the trained model parameters;
a first output unit configured to output image feature representation information of the image information using the image information as an input to a convolutional neural network.
Preferably, the apparatus further comprises:
a fifth obtaining unit, configured to obtain coordinates of the LSTM network, and define a corresponding weight matrix according to the coordinates;
a first input unit configured to input the image feature representation information and a weight matrix corresponding to the coordinates as an LSTM network;
a first operation unit for performing operations of an input gate, a forgetting gate, and an output gate on the input using a forward propagation method;
a first decoding unit to decode the LSTM network output according to a deep output layer;
a first training unit for inputting the image feature representation information to the LSTM network, and training the LSTM network using a back propagation algorithm.
Preferably, the apparatus further comprises:
a second input unit, configured to load the LSTM network and input the training samples into the LSTM network;
a sixth obtaining unit, configured to obtain an output feature vector of the LSTM network using a forward propagation algorithm;
a seventh obtaining unit, configured to input the output eigenvector and the truth information into the LSTM network, and obtain the gaze point coordinates using a forward propagation algorithm.
In a third aspect, the present invention provides a machine learning based glance path prediction apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program: obtaining an image data set to be processed, wherein each image information in the image data set has corresponding true value information; according to the truth value information, making a training sample of the image data set; according to the image information, obtaining image feature representation information of the image information; constructing and training an LSTM network according to the image feature representation information and the eye movement data sample; and predicting a scanning path according to the LSTM network.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
1. according to the method and the device for predicting the saccade path based on machine learning, an image data set to be processed is obtained, wherein each piece of image information in the image data set has corresponding true value information; according to the truth value information, making a training sample of the image data set; according to the image information, obtaining image feature representation information of the image information; constructing and training an LSTM network according to the image feature representation information and the eye movement data sample; and predicting a scanning path according to the LSTM network. The method solves the technical problems that the predicted fixation point depends on a static saliency map too much and the predicted panning path in a natural scene picture is insufficient in the prior art, achieves the purposes of eliminating the dependence of a model on the saliency map, takes the time sequence among the fixation points into consideration and achieves good technical effects on a plurality of public data sets.
2. The image features are extracted by adopting the convolutional neural network, the convolutional neural network has strong capability of representing learning and can learn higher-level features by using a layer-by-layer learning strategy, the defects of a manual selection or combined multi-dimensional feature selection method in the prior art are overcome, and the method has better universality and expandability.
3. The invention estimates the saccade path by constructing the LSTM network, the structure of the LSTM network is suitable for processing a time sequence, the LSTM network is trained by combining the currently input image area and the fixation point generated so far, the saccade stage of the human visual processing period and the transmission and prediction of information on the visual cortex are simulated, the consistency with the human saccade path process is realized from the biological mechanism, and the saccade path result consistent with the human eye movement data is obtained.
4. According to the invention, by introducing an attention mechanism into the network, each step of network output allows a decoder to pay attention to different parts of the image, and finally, the trained model can learn which part of the image should be paid attention to, so as to guide the decoding of the network output.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
FIG. 1 is a flow chart of a method for predicting a glance path based on machine learning according to an embodiment of the present invention;
FIG. 2 is a block diagram of a convolutional neural network in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a glance path predicting device based on machine learning according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another glance path predicting device based on machine learning according to an embodiment of the present invention.
The reference numbers illustrate: a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, a bus interface 306.
Detailed Description
The embodiment of the invention provides a glance path prediction method and a device based on machine learning, which are used for solving the problems that the predicted fixation point is over dependent on a static saliency map in the prior art and the predicted glance path in a natural scene picture is insufficient, and the technical scheme provided by the invention has the following general ideas:
in the technical scheme of the embodiment of the invention, by obtaining an image data set to be processed, each image information in the image data set has corresponding true value information; according to the truth value information, making a training sample of the image data set; according to the image information, obtaining image feature representation information of the image information; constructing and training an LSTM network according to the image feature representation information and the eye movement data sample; and predicting a scanning path according to the LSTM network. The method achieves the purposes of eliminating the dependence of the model on the saliency map, and taking the time sequence among the fixation points into consideration, and achieves good technical effects on a plurality of common data sets.
The technical solutions of the present invention are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features in the embodiments and examples of the present invention are described in detail in the technical solutions of the present application, and are not limited to the technical solutions of the present application, and the technical features in the embodiments and examples of the present application may be combined with each other without conflict.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter associated objects are in an "or" relationship.
In order to more clearly disclose a glance path prediction method based on machine learning provided by the embodiments of the present application, some terms are described below.
A Convolutional Neural Network (CNN) is a feed-forward Neural Network whose artificial neurons can respond to a portion of the coverage of surrounding cells, and performs well for large image processing. It includes a convolutional layer (alternating volumetric layer) and a pooling layer (pooling layer).
LSTM (Long Short-Term Memory, LSTM) is an improved recurrent neural network, and the paper was first published in 1997. Due to the unique design structure, LSTM is suitable for handling and predicting significant events of very long intervals and delays in a time series.
Tensorflow is a second generation artificial intelligence learning system developed by Google based on DistBerief, and the naming of the Tensorflow comes from the operation principle of the Tensorflow. Tensor means an N-dimensional array, Flow means computation based on a dataflow graph, and TensorFlow is a computation process in which tensors Flow from one end of the Flow graph to the other. TensorFlow is a system that transports complex data structures into artificial intelligent neural networks for analysis and processing.
Basicslstmcell is the basic LSTM cyclic network element.
Example 1
Fig. 1 is a flowchart illustrating a glance path prediction method based on machine learning according to an embodiment of the present invention. As shown in fig. 1, the method includes:
step 110: obtaining an image data set to be processed, wherein each image information in the image data set has corresponding true value information;
specifically, the to-be-processed image data set refers to a set of a plurality of pictures to be processed, and the corresponding truth information refers to the fixation point coordinates of the corresponding image as a label.
Step 120: according to the truth value information, making a training sample of the image data set;
further, the making of the training sample of the image data set according to the truth information specifically includes: processing the truth value information to obtain eye movement data information of N observers; performing boundary processing on the eye movement data of the N observers; normalizing the eye movement data of the N observers after the boundary processing; and combining the eye movement data of the N observers to obtain the training sample, wherein N is a positive integer.
Specifically, for the processing of the truth value, each image data set has corresponding eye movement data, each picture has eye movement data of N observers, the eye movement data is subjected to boundary processing, and all points outside the image are processed into points on the image boundary; and selecting corresponding eye movement data each time according to a dictionary, normalizing, and combining the data of N observers to obtain training sequences, wherein each sequence is 8 fixation point coordinates, and the one-dimensional data is expressed as that one sequence contains 16 numbers.
And performing experiments on each data set for several times, for example, the MIT1003 data set comprises 1003 pictures, each picture comprises 15 observers, mapping the picture sequence numbers and the numbers 0-1003 to obtain a dictionary, and selecting 900 pictures from the dictionary in sequence in each experiment, wherein 0-900 is used as training and 900-1003 is used as testing. Then, a truth value is selected from the truth value data according to the method for selecting pictures, that is, the eye movement data is 900 × 15 — 13500 as a label. Training samples and labels were made according to this method for each experiment.
Step 130: obtaining image characteristic representation information of the image information according to the image information;
further, the obtaining of the image feature representation information of the image information specifically includes: obtaining a training set and a test set according to the image data set; cutting the image information of the training set into a standard size; constructing a convolutional neural network, and loading the trained model parameters; and taking the image information as the input of a convolutional neural network, and outputting image feature representation information of the image information.
Specifically, the image data set comprises corresponding numbers, the names and the numbers of the pictures in the image data set are mapped, and a certain picture is selected as a training set and a test set according to the number each time; the structure of the convolutional neural network is further described below in conjunction with fig. 2. The glancing path model established by the invention mainly comprises an encoding network, a decoding network and an output layer. The coding network consists of a convolutional neural network, the network structure of the convolutional neural network is further described below, the convolutional neural network comprises five parts, and the first part is two convolutional layers; the second part is two convolution layers; the third part is four convolution layers; the fourth part is four convolution layers; the fifth part is four convolution layers; wherein each convolutional layer comprises a convolution operation and a pooling operation, the convolution kernel is 3 x 3 in size, and the activation functions of all convolutional layers are selected to be linear rectification functions. The number of first convolutional layer convolutional kernels is 64, the number of second convolutional layer convolutional kernels is 128, the number of third convolutional layer convolutional kernels is 256, and the number of the last two convolutional layer convolutional kernels is 512. By adopting the convolutional neural network to extract the image features, the convolutional neural network has strong capability of representing learning and can learn higher-level features through a layer-by-layer learning strategy, the defects of using a manual selection or combined multi-dimensional feature selection method in the prior art are overcome, and the method has better universality and expandability.
The embodiment of the application adopts the model parameters trained by the VGG19, and the VGG19 network is trained on a large data set ImageNet, so that accurate image features can be extracted, and the VGG19 model parameters are written to carry out feature extraction. The image is used as the input of the convolutional neural network, the size is 224 multiplied by 3, and the output is the feature vector alpha of the image ═ alpha1,...,αL},αi∈RDHere, L196 and D512, for each picture, the network extracts L vectors, each corresponding to a region of the image.
Step 140: constructing and training an LSTM network according to the image feature representation information and the eye movement data sample;
further, the constructing and training the LSTM network specifically includes: obtaining the coordinates of the LSTM network, and defining a corresponding weight matrix according to the coordinates; taking the image feature representation information and a weight matrix corresponding to the coordinates as the input of an LSTM network; carrying out the operations of an input gate, a forgetting gate and an output gate on the input by using a forward propagation method; decoding the LSTM network output according to a deep output layer; inputting the image feature representation information into the LSTM network, and training the LSTM network by using a back propagation algorithm.
Specifically, the LSTM network consists of BasiclSSTMCell units in TensorFlow, wherein the number H of the units is 1024; we define the coordinates of the generative model as x, yiThe vector is a vector with dimensions of 1 × K, K is the size of the coordinate base, C is the length of the obtained sequence, and in the experiment, C is 8, that is, each graph generates eight fixation points.
y={y1,...,yC},yi∈RK
I in FIG. 2tIs an input gate, ftIs forgetting to gate, otIs an output gate and gtIs a candidate vector controlled by an input gate, ht-1Representing the hidden state at the previous time,
Figure GDA0001869363580000101
representing the context vector at time t, Eyt-1The output representing the time t-1 is the embedding vector obtained by embedding the matrix E. The embedded matrix E is to use the total weight matrix and the true coordinates corresponding to x and y as the input of the function embedding _ lookup (ids), so as to obtain the weight matrix corresponding to x and y. According to the invention, by introducing an attention mechanism into the network, each step of network output allows a decoder to pay attention to different parts of the image, and finally, the trained model can learn which part of the image should be paid attention to, so as to guide the decoding of the network output.
The output formula of the LSTM network is as follows:
Figure GDA0001869363580000111
the method is specifically realized through a deep output layer network and comprises two neural network layers, wherein the first layer network firstly performs dropout on a hidden layer, then obtains output h _ logits by adopting a logistic regression mode, adds context information and previously generated coordinate information into the h _ logits, and performs dropout by using a tanh activation function; the second layer network obtains the output out _ locations by using the output of the first layer in a logistic regression mode. Dropout means that, in the training process of the neural network, for some units in the neural network, some neurons are temporarily discarded from the network according to a certain probability in each iteration process, and those discarded nodes can be temporarily regarded as not being part of the network structure, but the weight of the discarded nodes needs to be retained.
Further, before the training of the LSTM neural network is started, all weight parameters of the LSTM neural network are randomly initialized to a number close to 0, all bias quantities are initialized to 0, the initialization of the hidden layer state h and the cell state c is obtained through two independent multilayer perceptrons, the characteristic average value of each image area is used as the input of the perceptrons to obtain the initial values of the hidden layer state and the cell state, and the formula is that
Figure GDA0001869363580000112
Figure GDA0001869363580000113
Randomly dividing training sample data into a plurality of smaller batches, wherein the size of the batch selected in the experiment is 25; a batch of image feature vectors and truth value data are input in each iterative training,
the cost of the LSTM network is calculated as follows:
lti=-[ytiln ati+(1-yti)ln(1-ati)]
Figure GDA0001869363580000121
Figure GDA0001869363580000122
wherein y istiRepresenting the actual output value of the network, atiRepresenting the ideal output value,/tiThe value of the loss function for the ith sample at time t is shown. When the network trains N samples each time, the loss values of the N samples at each time t are summed to obtain the loss values loss of all the samples at the time tt. And summing all the loss values of the training time step t to obtain the loss values loss of the N samples.
And according to the cost of the LSTM network, optimizing a cost function of the LSTM network by adopting a gradient descent optimization algorithm RMSProp, and updating the model parameters of the LSTM network layer by layer through a back propagation algorithm. And training the LSTM network to converge and persisting the network model and parameters of the LSTM network. The method comprises the steps of estimating a saccade path by constructing an LSTM network, wherein the structure of the LSTM network is suitable for processing a time sequence, training the LSTM network by combining a currently input image area and a fixation point generated so far, simulating the saccade stage of a human visual processing period and the propagation and prediction of information on visual cortex, realizing the consistency with the human saccade path process from the biological mechanism, and obtaining a saccade path result consistent with human eye movement data.
Step 150: and predicting a scanning path according to the LSTM network.
Further, loading the LSTM network, and inputting the training samples into the LSTM network; obtaining an output feature vector of the LSTM network by using a forward propagation algorithm; and inputting the output characteristic vector and the truth value information into the LSTM network, and obtaining the fixation point coordinate by using a forward propagation algorithm.
Specifically, the LSTM network is loaded, the prepared training sample picture is input into the LSTM network, and the output feature vector of the LSTM network is calculated by using a forward propagation algorithm. And inputting the characteristic vector and the sample truth value into the LSTM network, and obtaining the fixation point coordinate by using a forward propagation algorithm.
Example 2
The effect of the present invention will be further described with reference to simulation experiments.
1. Simulation conditions are as follows:
in the simulation experiment of the invention, the computer system adopted is Ubuntu 16.04, the machine learning framework is TensorFlow, the version is 1.1.0, the Python version adopted is 2.7, the vector of the embedded matrix is V multiplied by M, V is correspondingly adjusted according to different data sets, M is 512, C is 16, and 8 fixation points are represented.
2. Simulation content:
in the simulation experiment of the invention, the picture names and Arabic numerals are mapped to form a dictionary, the experiment is designed for each data set, the training set picture and the test set picture are selected according to the numbers, and the corresponding eye movement data set is processed to obtain the label. And (3) training the LSTM network by using the samples, training the LSTM network by using a gradient descent optimization algorithm RMSProp, and stopping training when the cost of the LSTM network is converged. The invention only adopts the simulation experiment method in the invention, uses the trained LSTM network to estimate the scanning path of the image, tests the trained LSTM network through the test set samples, and each data set has about 100 test samples.
3. And (3) simulation result analysis:
the estimated saccade path includes 8 fixation point coordinates. The evaluation indexes of the method comprise three indexes: HD (Hausdorff distance), MMD (the mean minimum distance), SS (sequence score), wherein the first two indices are used to measure the similarity between two sequences, with smaller distances representing more similar sequences; the SS describes the sequences from several angles of the gaze point position, the direction and distance of gaze point movement, and the order of panning, the closer the value is to 1, the higher the degree of similarity of the sequences.
The glancing path estimated by the model has smaller algorithm value than the classical glancing path on HD and MMD, is close to the curve calculated by the human eye truth value, is larger than the classical algorithm value on SS, is closer to the truth value and has better effect.
Example 3
Based on the same inventive concept as the glance path prediction method based on machine learning in the foregoing embodiment, the present invention further provides a glance path prediction apparatus based on machine learning, as shown in fig. 3, including:
a first obtaining unit, configured to obtain an image dataset to be processed, where each image information in the image dataset has corresponding true value information;
a first production unit, configured to produce a training sample of the image data set according to the truth value information;
a second obtaining unit configured to obtain image feature representation information of the image information based on the image information;
the first construction unit is used for constructing and training an LSTM network according to the image feature representation information and the eye movement data samples;
a first prediction unit to predict a scan path according to the LSTM network.
Further, the apparatus further comprises:
a third obtaining unit, configured to process the truth value information to obtain eye movement data information of N observers;
a first processing unit for performing boundary processing on the eye movement data of the N observers;
a first normalization unit, configured to normalize the eye movement data of the N observers after the boundary processing;
a first merging unit, configured to merge the eye movement data of the N observers to obtain the training sample.
Further, the apparatus further comprises:
a fourth obtaining unit, configured to obtain a training set and a test set according to the image data set;
a first cropping unit for cropping the training set image information to a standard size;
the first construction unit is used for constructing an LSTM network and loading the trained model parameters;
a first output unit configured to output image feature representation information of the image information as an input of an LSTM network.
Further, the apparatus further comprises:
a fifth obtaining unit, configured to obtain coordinates of the LSTM network, and define a corresponding weight matrix according to the coordinates;
a first input unit configured to input the image feature representation information and a weight matrix corresponding to the coordinates as an LSTM network;
a first operation unit for performing operations of an input gate, a forgetting gate, and an output gate on the input using a forward propagation method;
a first decoding unit to decode the LSTM network output according to a deep output layer;
a first training unit for inputting the image feature representation information to the LSTM network, and training the LSTM network using a back propagation algorithm.
Further, the apparatus further comprises:
a second input unit, configured to load the LSTM network and input the training samples into the LSTM network;
a sixth obtaining unit, configured to obtain an output feature vector of the LSTM network using a forward propagation algorithm;
a seventh obtaining unit, configured to input the output eigenvector and the truth information into the LSTM network, and obtain the gaze point coordinates using a forward propagation algorithm.
Various modifications and embodiments of a machine learning based glance path prediction method in embodiment 1 of fig. 1 are also applicable to a machine learning based glance path prediction apparatus of this embodiment, and a detailed description of a machine learning based glance path prediction method will be apparent to those skilled in the art, and therefore, for brevity of description, a detailed description thereof will not be provided herein.
Example 4
Based on the same inventive concept as the machine learning based glance path prediction method in the previous embodiment, the present invention further provides a machine learning based glance path prediction apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the program, when executed by the processor, implementing the steps of any one of the above-mentioned machine learning based glance path prediction methods.
Where in fig. 4 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 306 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be the same element, i.e., a transceiver, providing a means for communicating with various other apparatus over a transmission medium.
The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing information used by the processor 302 in performing operations.
One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:
1. according to the method and the device for predicting the saccade path based on machine learning, an image data set to be processed is obtained, wherein each piece of image information in the image data set has corresponding true value information; according to the truth value information, making a training sample of the image data set; according to the image information, obtaining image feature representation information of the image information; constructing and training an LSTM network according to the image feature representation information and the eye movement data sample; and predicting a scanning path according to the LSTM network. The method solves the technical problems that the predicted fixation point depends on a static saliency map too much and the predicted panning path in a natural scene picture is insufficient in the prior art, achieves the purposes of eliminating the dependence of a model on the saliency map, takes the time sequence among the fixation points into consideration and achieves good technical effects on a plurality of public data sets.
2. The image features are extracted by adopting the convolutional neural network, the convolutional neural network has strong capability of representing learning and can learn higher-level features by using a layer-by-layer learning strategy, the defects of a manual selection or combined multi-dimensional feature selection method in the prior art are overcome, and the method has better universality and expandability.
3. The invention estimates the saccade path by constructing the LSTM network, the structure of the LSTM network is suitable for processing a time sequence, the LSTM network is trained by combining the currently input image area and the fixation point generated so far, the saccade stage of the human visual processing period and the transmission and prediction of information on the visual cortex are simulated, the consistency with the human saccade path process is realized from the biological mechanism, and the saccade path result consistent with the human eye movement data is obtained.
4. According to the invention, by introducing an attention mechanism into the network, each step of network output allows a decoder to pay attention to different parts of the image, and finally, the trained model can learn which part of the image should be paid attention to, so as to guide the decoding of the network output.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable information processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable information processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable information processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable information processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (5)

1. A method for machine learning based glance path prediction, the method comprising:
obtaining an image data set to be processed, wherein each image information in the image data set has corresponding true value information;
according to the truth value information, making a training sample of the image data set;
according to the image information, obtaining image feature representation information of the image information;
constructing and training an LSTM network according to the image characteristic representation information and the eye movement data sample;
predicting a scanning path according to the LSTM network;
the constructing and training of the LSTM network specifically comprises the following steps:
obtaining the coordinates of the LSTM network, and defining a corresponding weight matrix according to a total weight matrix and a true value coordinate corresponding to the coordinates;
taking the image feature representation information and a weight matrix corresponding to the coordinates as the input of an LSTM network;
carrying out the operations of an input gate, a forgetting gate and an output gate on the input by using a forward propagation method;
decoding the LSTM network output according to a deep output layer;
inputting the image feature representation information into the LSTM network, and training the LSTM network by using a back propagation algorithm;
predicting a scanning path according to the LSTM network, which specifically comprises the following steps:
loading the LSTM network and inputting the training samples into the LSTM network;
obtaining an output feature vector of the LSTM network by using a forward propagation algorithm;
and inputting the output characteristic vector and the truth value information into the LSTM network, and obtaining the fixation point coordinate by using a forward propagation algorithm.
2. The method of claim 1, wherein the making training samples of the image dataset based on the truth information comprises:
processing the truth value information to obtain eye movement data information of N observers;
performing boundary processing on the eye movement data of the N observers;
normalizing the eye movement data of the N observers after the boundary processing;
and combining the eye movement data of the N observers to obtain the training sample, wherein N is a positive integer.
3. The method according to claim 1, wherein the obtaining image feature representation information of the image information specifically includes:
obtaining a training set and a test set according to the image data set;
cutting the image information of the training set into a standard size;
constructing a convolutional neural network, and loading the trained model parameters;
and taking the image information as the input of a convolutional neural network, and outputting image feature representation information of the image information.
4. A machine learning based glance path prediction apparatus, comprising:
a first obtaining unit, configured to obtain an image dataset to be processed, where each image information in the image dataset has corresponding true value information;
a first production unit, configured to produce a training sample of the image data set according to the truth value information;
a second obtaining unit configured to obtain image feature representation information of the image information based on the image information;
the first construction unit is used for constructing and training an LSTM network according to the image feature representation information and the eye movement data samples;
a first prediction unit for predicting a scan path according to the LSTM network;
wherein the apparatus further comprises:
a fifth obtaining unit, configured to obtain a coordinate of the LSTM network, and define a corresponding weight matrix according to a total weight matrix and a true value coordinate corresponding to the coordinate;
a first input unit configured to input the image feature representation information and a weight matrix corresponding to the coordinates as an LSTM network;
a first operation unit for performing operations of an input gate, a forgetting gate, and an output gate on the input using a forward propagation method;
a first decoding unit to decode the LSTM network output according to a deep output layer;
a first training unit, configured to input the image feature representation information to the LSTM network, and train the LSTM network using a back propagation algorithm;
a second input unit, configured to load the LSTM network and input the training samples into the LSTM network;
a sixth obtaining unit, configured to obtain an output feature vector of the LSTM network using a forward propagation algorithm;
a seventh obtaining unit, configured to input the output eigenvector and the truth information into the LSTM network, and obtain the gaze point coordinates using a forward propagation algorithm.
5. A machine learning based glance path prediction apparatus comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of:
obtaining an image data set to be processed, wherein each image information in the image data set has corresponding true value information;
according to the truth value information, making a training sample of the image data set;
according to the image information, obtaining image feature representation information of the image information;
constructing and training an LSTM network according to the image characteristic representation information and the eye movement data sample;
predicting a scanning path according to the LSTM network;
the constructing and training of the LSTM network specifically comprises the following steps:
obtaining the coordinates of the LSTM network, and defining a corresponding weight matrix according to a total weight matrix and a true value coordinate corresponding to the coordinates;
taking the image feature representation information and a weight matrix corresponding to the coordinates as the input of an LSTM network;
carrying out the operations of an input gate, a forgetting gate and an output gate on the input by using a forward propagation method;
decoding the LSTM network output according to a deep output layer;
inputting the image feature representation information into the LSTM network, and training the LSTM network by using a back propagation algorithm;
predicting a scanning path according to the LSTM network, which specifically comprises the following steps:
loading the LSTM network and inputting the training samples into the LSTM network;
obtaining an output feature vector of the LSTM network by using a forward propagation algorithm;
and inputting the output characteristic vector and the truth value information into the LSTM network, and obtaining the fixation point coordinate by using a forward propagation algorithm.
CN201810332835.5A 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning Active CN109447096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810332835.5A CN109447096B (en) 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810332835.5A CN109447096B (en) 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning

Publications (2)

Publication Number Publication Date
CN109447096A CN109447096A (en) 2019-03-08
CN109447096B true CN109447096B (en) 2022-05-06

Family

ID=65530053

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810332835.5A Active CN109447096B (en) 2018-04-13 2018-04-13 Glance path prediction method and device based on machine learning

Country Status (1)

Country Link
CN (1) CN109447096B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110245660B (en) * 2019-06-03 2022-04-22 西北工业大学 Webpage glance path prediction method based on saliency feature fusion
CN110298303B (en) * 2019-06-27 2022-03-25 西北工业大学 Crowd identification method based on long-time memory network glance path learning
CN111461974B (en) * 2020-02-17 2023-04-25 天津大学 Image scanning path control method based on LSTM model from coarse to fine
CN111723707B (en) * 2020-06-09 2023-10-17 天津大学 Gaze point estimation method and device based on visual saliency
CN113313123B (en) * 2021-06-11 2024-04-02 西北工业大学 Glance path prediction method based on semantic inference

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106491129A (en) * 2016-10-10 2017-03-15 安徽大学 A kind of Human bodys' response system and method based on EOG
CN106959749A (en) * 2017-02-20 2017-07-18 浙江工业大学 A kind of vision attention behavior cooperating type method for visualizing and system based on eye-tracking data
CN107515466A (en) * 2017-08-14 2017-12-26 华为技术有限公司 A kind of eyeball tracking system and eyeball tracking method
CN107644401A (en) * 2017-08-11 2018-01-30 西安电子科技大学 Multiplicative noise minimizing technology based on deep neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010062883A1 (en) * 2008-11-26 2010-06-03 Bioptigen, Inc. Methods, systems and computer program products for biometric identification by tissue imaging using optical coherence tomography (oct)
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
WO2017025487A1 (en) * 2015-08-07 2017-02-16 SensoMotoric Instruments Gesellschaft für innovative Sensorik mbH System and method for displaying a stream of images
CN105678735A (en) * 2015-10-13 2016-06-15 中国人民解放军陆军军官学院 Target salience detection method for fog images
CN106970615B (en) * 2017-03-21 2019-10-22 西北工业大学 A kind of real-time online paths planning method of deeply study
CN107808132A (en) * 2017-10-23 2018-03-16 重庆邮电大学 A kind of scene image classification method for merging topic model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106491129A (en) * 2016-10-10 2017-03-15 安徽大学 A kind of Human bodys' response system and method based on EOG
CN106959749A (en) * 2017-02-20 2017-07-18 浙江工业大学 A kind of vision attention behavior cooperating type method for visualizing and system based on eye-tracking data
CN107644401A (en) * 2017-08-11 2018-01-30 西安电子科技大学 Multiplicative noise minimizing technology based on deep neural network
CN107515466A (en) * 2017-08-14 2017-12-26 华为技术有限公司 A kind of eyeball tracking system and eyeball tracking method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Automatic Scanpath Generation with Deep Recurrent Neural Networks;Daniel Simon等;《the ACM Symposium ACM》;20160722;130 *
Neural Machine Translation by Jointly Learning to Align and Translate;D Bahdanau et al;《Computer Science》;20160519;1-15 *
Saccade gaze prediction using a recurrent neural network;Thuyen Ngo et al;《2017 IEEE International Conference on Image Processing (ICIP)》;20170901;正文第3436-3437页、图2 *
图片重复扫描路径的眼动研究;严艳梅;《中国优秀硕士学位论文全文数据库 (哲学与人文科学辑)》;20061215(第12期);F102-15 *

Also Published As

Publication number Publication date
CN109447096A (en) 2019-03-08

Similar Documents

Publication Publication Date Title
CN109447096B (en) Glance path prediction method and device based on machine learning
CN109308318B (en) Training method, device, equipment and medium for cross-domain text emotion classification model
JP6600009B2 (en) Fine-grained image classification by investigation of bipartite graph labels
El Aziz et al. Multi-objective whale optimization algorithm for multilevel thresholding segmentation
CN109754078A (en) Method for optimization neural network
CN113168559A (en) Automated generation of machine learning models
CN111667483B (en) Training method of segmentation model of multi-modal image, image processing method and device
CN111052128B (en) Descriptor learning method for detecting and locating objects in video
CN113256592B (en) Training method, system and device of image feature extraction model
CN110738650B (en) Infectious disease infection identification method, terminal device and storage medium
Lee et al. Localization uncertainty estimation for anchor-free object detection
US20240071576A1 (en) Simulating electronic structure with quantum annealing devices and artificial neural networks
CN111091010A (en) Similarity determination method, similarity determination device, network training device, network searching device and storage medium
CN111667027A (en) Multi-modal image segmentation model training method, image processing method and device
CN116129141A (en) Medical data processing method, apparatus, device, medium and computer program product
CN116306793A (en) Self-supervision learning method with target task directivity based on comparison twin network
WO2022125181A1 (en) Recurrent neural network architectures based on synaptic connectivity graphs
CN114266927A (en) Unsupervised saliency target detection method, system, equipment and medium
US20220229943A1 (en) Joint retrieval and mesh deformation
CN111260074B (en) Method for determining hyper-parameters, related device, equipment and storage medium
CN113609337A (en) Pre-training method, device, equipment and medium of graph neural network
US11816185B1 (en) Multi-view image analysis using neural networks
US11875250B1 (en) Deep neural networks with semantically weighted loss functions
Bhattacharjya et al. A genetic algorithm for intelligent imaging from quantum-limited data
KR102334666B1 (en) A method for creating a face image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant