CN109886780B

CN109886780B - Commodity target detection method and device based on eyeball tracking

Info

Publication number: CN109886780B
Application number: CN201910089990.3A
Authority: CN
Inventors: 方武; 宋志强; 朱婷
Original assignee: Suzhou Institute of Trade and Commerce
Current assignee: Nova Electronics Co ltd
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2022-04-08
Anticipated expiration: 2039-01-31
Also published as: CN109886780A

Abstract

The invention provides a commodity target detection method and a device based on eyeball tracking, wherein the method comprises the following steps: collecting commodity images in the visual field range of a user; determining the attention point position information of the user on the commodity image through an eyeball tracking platform; inputting the commodity image and the attention point position information into a target learning model; the target learning model is used for determining a target commodity area in the commodity image according to the attention point position information; acquiring commodity information in the target commodity area, wherein the commodity information comprises: the category of the goods. Therefore, the attention point position data are obtained according to the eyeball tracking information, the target learning model is adopted for commodity target detection, the problems of low detection speed and low positioning accuracy in the existing commodity image target detection method are solved, the method has the advantages of high detection speed and high detection accuracy, and can be used for automatically detecting the target of the commodity which is concerned by the attention point of the user in real time.

Description

Commodity target detection method and device based on eyeball tracking

Technical Field

The invention relates to the technical field of computer vision, in particular to a commodity target detection method and device based on eyeball tracking.

Background

Eye tracking has long been used to study the visual attention of individuals, and the most common eye tracking technique is Pupil Central Corneal Reflex (PCCR). The PCCR technology is based on the principle that a light source irradiates on a pupil to form highly visible reflected images captured by a camera of a physical tracking device, the images are used for determining the reflection conditions of the light source in a cornea and the pupil, and finally, the direction watched by human eyes is obtained by calculating the vector included angle formed by the reflection of the cornea and the pupil and other geometric characteristics.

The traditional target detection method is usually based on artificial feature extraction, and the traditional target detection method has no good robustness to the change of environment and target diversity. The accuracy of the target monitoring method may be reduced when the environment or target conditions change. The target detection method based on the convolutional neural network combines a deep learning technology with a computer vision technology and has global training characteristics of local receptive field, structural hierarchy, characteristic extraction and classification combination. The accuracy and robustness of target detection are improved. Therefore, the target detection technology based on the deep learning convolutional neural network is a hot research hotspot in recent years.

However, the existing target detection method based on the deep learning technology has low detection speed, is difficult to track the attention point of the user in real time, and automatically carries out rapid commodity detection according to the attention point of the user.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a commodity target detection method and device based on eyeball tracking.

In a first aspect, an embodiment of the present invention provides a commodity target detection method based on eye tracking, including:

collecting commodity images in the visual field range of a user;

determining the attention point position information of the user on the commodity image through an eyeball tracking platform;

inputting the commodity image and the attention point position information into a target learning model; the target learning model is used for determining a target commodity area in the commodity image according to the attention point position information;

acquiring commodity information in the target commodity area, wherein the commodity information comprises: the category of the goods.

Optionally, before acquiring the commodity image of the user's field of view, the method further includes:

constructing an initial learning model;

and acquiring a training sample set, performing iterative training on the initial learning model through the training sample set until the output result of the initial learning model meets a preset judgment condition, and finishing the iterative training to obtain a corresponding target learning model.

Optionally, obtaining a training sample set comprises:

acquiring a commodity image;

marking all target commodity areas in the commodity image through a prediction frame;

giving a weight corresponding to the prediction frame according to the attention point position information on the commodity image to obtain a labeled sample image; the set of all sample images forms a training sample set; wherein, the weight value corresponding to the prediction box closer to the attention point is larger.

Optionally, the initial learning model is a neural network classification regression model; the neural network classification regression model comprises a 14-layer convolutional neural network.

Optionally, the preset determination condition includes: the probability that the output result of the initial learning model is the same as the actual result of the commodity image reaches a preset threshold value, and/or the error of the loss function of the initial learning model is within a preset error range.

Optionally, before inputting the commodity image and the attention point position information into the target learning model, the method further includes:

acquiring the coordinates of the attention point in the commodity image from the attention point position information;

taking the position corresponding to the coordinates as a central point, and cutting the commodity image to obtain a subimage with a preset size;

averagely dividing the subimages with the preset size into N prediction frames with the preset matrix size; the presetting of the matrix size comprises: (16x30) pixels, (33x12) pixels, (30x61) pixels, (62x35) pixels, (59x119) pixels; wherein each prediction box is used for predicting a specified number of target commodity areas.

In a first aspect, an embodiment of the present invention provides a commodity target detection apparatus based on eye tracking, configured to execute the commodity target detection method based on eye tracking according to any one of the first aspect, where the apparatus includes:

the camera is used for acquiring a commodity image in a user visual field range;

the eyeball tracking platform is used for determining the attention point position information of the user on the commodity image;

the processor is used for inputting the commodity image and the attention point position information into a target learning model; the target learning model is used for determining a target commodity area in the commodity image according to the attention point position information;

the processor is further configured to acquire commodity information in the target commodity area, where the commodity information includes: the category of the goods.

Compared with the prior art, the invention has the following beneficial effects:

the commodity target detection method and device based on eyeball tracking, provided by the invention, are characterized in that commodity images in the visual field range of a user are collected; determining the attention point position information of the user on the commodity image through an eyeball tracking platform; inputting the commodity image and the attention point position information into a target learning model; the target learning model is used for determining a target commodity area in the commodity image according to the attention point position information; acquiring commodity information in the target commodity area, wherein the commodity information comprises: the category of the goods. Therefore, the attention point position data are obtained according to the eyeball tracking information, the target learning model is adopted for commodity target detection, the problems of low detection speed and low positioning accuracy in the existing commodity image target detection method are solved, the method has the advantages of high detection speed and high detection accuracy, and can be used for automatically detecting the target of the commodity which is concerned by the attention point of the user in real time.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a flowchart of a commodity target detection method based on eye tracking according to an embodiment of the present invention;

fig. 2 is a flowchart of model training according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

Fig. 1 is a flowchart of a commodity target detection method based on eye tracking according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

s101, acquiring a commodity image in the user visual field range.

S102, determining the attention point position information of the user on the commodity image through the eyeball tracking platform.

And S103, inputting the commodity image and the attention point position information into the target learning model.

In this embodiment, the target learning model is used to determine a target commodity region in the commodity image according to the information of the attention point position.

Optionally, before inputting the commodity image and the attention point position information into the target learning model, the method further includes: acquiring the coordinates of the attention point in the commodity image from the attention point position information; taking the position corresponding to the coordinates as a central point, and cutting the commodity image to obtain a subimage with a preset size; averagely dividing a subimage with a preset size into N prediction frames with a preset matrix size; wherein each prediction box is used for predicting a specified number of target commodity areas.

Specifically, according to image frame information, a pupil eye tracking platform is adopted to obtain data of an attention point in an image, a sub-image with the size of 448 x 448 is cut by taking the attention point as the center, the sub-image is averagely divided into 49 prediction frames with the size of 7 x 7 matrixes, and each prediction frame predicts 5 target frames with different sizes. Among them, 23 × 46, 48 × 100, 46 × 20, 12 × 48, 56 × 112 may be provided according to the common commercial product size, respectively.

And S104, acquiring the commodity information in the target commodity area.

In this embodiment, the commodity information includes: the category of the goods.

Fig. 2 is a flowchart of model training provided in the embodiment of the present invention, and as shown in fig. 2, the method includes:

s201, constructing an initial learning model.

In this embodiment, the initial learning model is a neural network classification regression model; the neural network classification regression model includes a 14-layer convolutional neural network.

S202, a training sample set is obtained, iterative training is carried out on the initial learning model through the training sample set until the output result of the initial learning model meets the preset judgment condition, the iterative training is finished, and the corresponding target learning model is obtained.

In this embodiment, obtaining the training sample set includes: acquiring a commodity image; marking all target commodity areas in the commodity image through a prediction frame; according to the attention point position information on the commodity image, giving a weight corresponding to the prediction frame to obtain a marked sample image; the set of all sample images forms a training sample set; wherein, the weight value corresponding to the prediction box closer to the attention point is larger.

Specifically, commodity images are collected through a pupil eyeball tracking platform, and a data set is expanded. And then, labeling and labeling the commodity image data set to construct a training sample set, wherein the label is a rectangular frame containing the size and the position coordinates of the commodity target in the whole image in the image, and the label is a category mark of the labeled commodity target. And finally, constructing a weight matrix by using a Gaussian kernel, endowing the 7 nearest prediction frames close to the attention point with large weight values, and endowing the prediction frames far away from the attention point with small weight values.

Specifically, y ═ m (x), is the input-to-output mapping function (1)

Wherein x is the input image + the coordinate value of the attention point, y is the target classification + the position coordinate, ω, b is the learning parameter of the function,

to learn the estimated value of the parameter ω, W is a weight matrix, and different weights are given according to the distance from the attention point, and the farther the value is, the smaller the value is, the closer the value is, the larger the value is.

As shown in formula (2), a weight matrix W is constructed by using a gaussian kernel, and 7 prediction boxes x closest to the attention point are subjected to⁽ⁱ⁾And a large weight is given, and other prediction boxes far away from the attention point are assigned as 0, so that the efficiency and the accuracy of the algorithm can be improved. I.e., the closer the point x is to x (i), the larger the value of ω (i, i). And k is an adjusting parameter, the smaller the value is, the larger the weight of the prediction frame near the attention point is, and the commodity target can be detected more easily. In the method, k takes a value of 0.1.

Table 1 shows parameters of a test platform of the system of the present invention, and table 2 shows structural parameters of a convolutional network of the system of the present invention, as shown in tables 1 and 2, a general target detection database is used to pre-train a classification regression model M of a 14-layer customized convolutional neural network (as shown in table 2) by using a "transfer learning" method, and the model M is further optimized by using a labeled commodity target image. And a target detection network model is designed based on the model, so that the detection precision can be effectively improved.

TABLE 1

Computer with a memory card	CPU 2.6GHz i 5; win1064 bit
		8GB RAM；256SSD
Camera head	A scene camera: resolution ratio: 1600x 1200; frame rate: 30fps SVGA;
		eyeball tracking camera: resolution ratio: 400x 400; 200 Hz;

TABLE 2

Compared with the prior art, the invention has the following remarkable advantages: the invention solves the problems of low detection speed and low positioning accuracy in the existing commodity image target detection method, has the advantages of high detection speed and high detection accuracy, and can be used for automatically detecting the target of the commodity concerned by the attention of the user in real time.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A commodity target detection method based on eyeball tracking is characterized by comprising the following steps:

constructing an initial learning model; the initial learning model is a neural network classification regression model; the neural network classification regression model comprises 14 layers of convolutional neural networks;

acquiring a training sample set, performing iterative training on the initial learning model through the training sample set until the output result of the initial learning model meets a preset judgment condition, and ending the iterative training to obtain a corresponding target learning model; obtaining a training sample set includes: acquiring a commodity image; marking all target commodity areas in the commodity image through a prediction frame; giving a weight corresponding to the prediction frame according to the attention point position information on the commodity image to obtain a labeled sample image; the set of all sample images forms a training sample set; wherein, the weight value corresponding to the prediction box closer to the attention point is larger;

collecting commodity images in the visual field range of a user;

2. The eyeball tracking-based commodity target detection method according to claim 1, wherein the preset determination condition comprises: the probability that the output result of the initial learning model is the same as the actual result of the commodity image reaches a preset threshold value, and/or the error of the loss function of the initial learning model is within a preset error range.

3. The method for detecting a commodity target based on eye tracking according to claim 1 or 2, further comprising, before inputting the commodity image and the information on the position of the attention point into a target learning model:

averagely dividing the subimages with the preset size into N prediction frames with the preset matrix size, wherein the preset matrix size comprises: a (16x30) pixel, a (33x12) pixel, a (30x61) pixel, a (62x35) pixel, a (59x119) pixel; wherein each prediction box is used for predicting a specified number of target commodity areas.

4. An eye-tracking-based commodity object detection device for performing the eye-tracking-based commodity object detection method according to any one of claims 1 to 3, the device comprising: