CN108694471A - A kind of user preference prediction technique based on personalized attention network - Google Patents
A kind of user preference prediction technique based on personalized attention network Download PDFInfo
- Publication number
- CN108694471A CN108694471A CN201810619393.2A CN201810619393A CN108694471A CN 108694471 A CN108694471 A CN 108694471A CN 201810619393 A CN201810619393 A CN 201810619393A CN 108694471 A CN108694471 A CN 108694471A
- Authority
- CN
- China
- Prior art keywords
- conspicuousness
- preference
- tensor
- channel
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Marketing (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
A kind of user preference prediction technique based on personalized attention network proposed in the present invention, main contents include:The pre- flow measurement of conspicuousness, preference fitted flow merge two streams, and process is to give input picture, personalized attention network (PANet) will extract its depth characteristic on multiple scales, and pass them to two streams:The pre- flow measurement of conspicuousness will generate Saliency maps, and without being influenced by user preference, and preference fitted flow will generate the preference figure according to inputting preferences using object detection model architecture;In conjunction with after the result that two streams obtain, will be post-processed (including increasing a center priori), prediction result will be provided as the Pixel-level notable figure of the suitable specific user.Personalization attention network proposed by the present invention, is adapted to different user preferences, more accurate quick when predicting the attention force of different user, is more advantageous to practical application.
Description
Technical field
The present invention relates to preferences to predict field, more particularly, to a kind of user preference based on personalized attention network
Prediction technique.
Background technology
Attention is a kind of experience of personalization, even if different people, when facing Same Scene, attention may also can
It concentrates in different regions or target.The attention of correctly predicted each user for human-computer interaction (HCI) application program extremely
It closes important.With the raising of the progress and computing capability of recent deep learning, the visual tasks such as object detection and conspicuousness prediction
It realizes higher precision and realizes faster.The lime light of user is accurately analyzed, it is emerging to commodity sense to be conducive to businessman
The crowd of interest carries out specific aim sale, or the placement position etc. of the more popular commodity of adjustment, to adjust sales tactics in real time;
And the attention of child is analyzed, then it will be seen where their point of interest, so that the phase develops its interest in due course
Hobby.Traditional prediction technique calculates complexity, and speed is slower, and accuracy is not high so that it is difficult to put into practical application.
The present invention proposes a kind of user preference prediction technique based on personalized attention network, gives input picture,
Personalized attention network (PANet) will extract its depth characteristic on multiple scales, and pass them to two streams:Significantly
The pre- flow measurement of property will generate Saliency maps, and without being influenced by user preference, and preference fitted flow will utilize object detection model
Architecture generates the preference figure according to inputting preferences;In conjunction with after the result that two streams obtain, will be post-processed (including
Increase a center priori), prediction result will be provided as the Pixel-level notable figure of the suitable specific user.It is proposed by the present invention
Personalized attention network, is adapted to different user preferences, more accurate fast when predicting the attention force of different user
Speed is more advantageous to practical application.
Invention content
Complicated, the problems such as speed is slower, and accuracy is not high is calculated for traditional prediction technique, the purpose of the present invention exists
In providing a kind of user preference prediction technique based on personalized attention network, input picture, personalized attention net are given
Network (PANet) will extract its depth characteristic on multiple scales, and pass them to two streams:The pre- flow measurement of conspicuousness will generate
Saliency maps, without being influenced by user preference, and preference fitted flow will be generated using object detection model architecture
According to the preference figure of inputting preferences;In conjunction with after the result that two streams obtain, will be post-processed (including increase a center elder generation
Test), prediction result will be provided as the Pixel-level notable figure of the suitable specific user.
To solve the above problems, the present invention provides a kind of user preference prediction technique based on personalized attention network,
Its main contents includes:
(1) the pre- flow measurement of conspicuousness;
(2) preference fitted flow;
(3) merge two streams.
Wherein, the personalized attention network (PANet), PANet by two shared common trait extract layers volume
Product neural network (CNN) forms;The model needs three inputs:Pending original image, user-defined detailed class are to super
Classification maps and the other user preference vector of superclass;Given input picture, PANet will extract its depth spy on multiple scales
Sign, and pass them to two streams:The pre- flow measurement of conspicuousness will generate Saliency maps, without being influenced by user preference, and
Preference fitted flow will generate the preference figure according to inputting preferences using object detection model architecture;It is obtained in conjunction with from two streams
It after the result obtained, will be post-processed (including increasing a center priori), prediction result will be as the suitable specific user's
Pixel-level notable figure provides;In order to train PANet models, carry out Pixel-level based on the recurrence really marked, this is given
In the case of inputting preferences in training generator dynamic generation.
Wherein, the pre- flow measurement of the conspicuousness, in order to combine the Analysis On Multi-scale Features of input picture to carry out conspicuousness prediction, mould
Type uses the different layers of the feature VGG-16 and single detector (SSD) self-defined figure layer of extraction;Use three kinds of different proportions
Feature, size are respectively 38 × 38,19 × 19,10 × 10 to sample the identical size with first;By second and third scale
Feature be combined can dramatically improve conspicuousness prediction accuracy;After re-scaling, characteristic pattern is combined into three-dimensional tensor,
Size is 38 × 38 × 3, totally 512 channels;Then tensor will be combined respectively by four spatial nuclei convolutional layers, be respectively provided with
64,128,4,1 feature channels;Then, network remodeling is the triple channel two dimension tensor that size is 38 × 38, and it is passed through more
1 × 1 more convolutional layers, therefrom network output have the final result in the single feature channel for conspicuousness detection stream.
Wherein, the preference fitted flow, the stream is identical as the custom layers in SSD models, including selected anchor point life
Stratification, and generate characteristic pattern on multiple scales;The output of this part is the series connection of object type, confidence level and coordinate information,
Need to convert it back to image tensor in non-maximum suppression (NMS) and mapping layer to be further processed.
Further, the non-maximum suppression, at NMS layers, network chooses whether to keep detection according to its confidence level;It sets
Confidence threshold is different for different data sets;0.5 is set a threshold to, this can detect most of wisp simultaneously
With rational rate of false alarm;NMS layers will indicate that these predictive information height are credibly converted into the two-dimentional tensor in image space.
Further, the tensor, in order to merge later with the pre- flow measurement of conspicuousness, the tensor created is used as NMS layers
Output be arranged to size be 38 × 38, port number N is identical as the quantity of exhaustive division;Each channel represent one it is specific
The prediction of classification.
Further, the predictive information, for input picture, if there is classification CatiThe prediction of middle object, then
According to predicted position and forecast confidence, i-th of channel of tensor will have non-zero pixels;Value at each pixel (x, y)
It is (Conf1,…,Confk), wherein Conf1,…,ConfkIt is the confidence of the prediction with the bounding box for surrounding pixel (x, y)
Degree.
Further, the output, NMS layers of output channel is detailed classification, and mapping layer is combined into
Represent the channel of superclass;Mapping layer needs two additional inputs:User preference vector and superclass are clipped to the definition between classification
Mapping;In view of such mapping:
It indicatesTensor channel will be merged into an individual channel SCati;Indicate SCatiNew tunnel
Pixel orientation value is:
Wherein, two streams of the merging, which is cascaded by tensor and merges two streams, and adds tool
There have two 1 × 1 convolutional layers that channel number is 8 and 1 to be more non-linear to obtain;It is regarded further, since attention is generally concentrated at
Wild center, therefore a center is added before model before final active coating, the institute concentrated by summary data
Significant calibration truthful data SALgt, then by its Biao Zhunhuawei [0,1].
Further, the conspicuousness demarcates truthful data, generates this priori from the conspicuousness label in data set and reflects
It penetrates:
Prior=∑s SALgt (2)
Finally, a Softmax active coating is added, will finally predict to export as probability graph.
Description of the drawings
Fig. 1 is a kind of system flow chart of the user preference prediction technique based on personalized attention network of the present invention.
Fig. 2 is a kind of non-maximum suppression layer of the user preference prediction technique based on personalized attention network of the present invention.
Fig. 3 is a kind of mapping layer of the user preference prediction technique based on personalized attention network of the present invention.
Specific implementation mode
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of system flow chart of the user preference prediction technique based on personalized attention network of the present invention.It is main
To include the pre- flow measurement of conspicuousness, preference fitted flow merges two streams.
Personalized attention network (PANet) by two shared common trait extract layers convolutional neural networks (CNN) group
At;The model needs three inputs:Pending original image, user-defined detailed class to superclass not Ying She and superclass it is other
User preference vector;Given input picture, PANet will extract its depth characteristic on multiple scales, and pass them to two
A stream:The pre- flow measurement of conspicuousness will generate Saliency maps, and without being influenced by user preference, and preference fitted flow will utilize object
Detection model architecture generates the preference figure according to inputting preferences;In conjunction with after the result that two streams obtain, after progress
Processing (including increasing a center priori), prediction result will be provided as the Pixel-level notable figure of the suitable specific user;For
Trained PANet models, carry out Pixel-level based on the recurrence really marked, this is being instructed in the case of given inputting preferences
Practice dynamic generation in generator.
The pre- flow measurement of conspicuousness, in order to combine the Analysis On Multi-scale Features of input picture to carry out conspicuousness prediction, model uses extraction
Feature VGG-16 and single detector (SSD) self-defined figure layer different layers;Use the feature of three kinds of different proportions, size point
38 × 38,19 × 19,10 × 10 the identical size with first Wei not be sampled;The feature of second and third scale is combined
It can dramatically the accuracy for improving conspicuousness prediction;After re-scaling, characteristic pattern is combined into three-dimensional tensor, and size is 38 × 38
× 3, totally 512 channels;Then tensor will be combined respectively by four spatial nuclei convolutional layers, be respectively provided with 64,128,4,1 spies
Levy channel;Then, network remodeling is the triple channel two dimension tensor that size is 38 × 38, and it is passed through more 1 × 1 convolution
Layer, therefrom network output have the final result in the single feature channel for conspicuousness detection stream.
Preference fitted flow is identical as the custom layers in SSD models, including selected anchor point generation layer, and in multiple scales
Upper generation characteristic pattern;The output of this part is the series connection of object type, confidence level and coordinate information, needs non-maximum suppression
(NMS) and in mapping layer image tensor is converted it back to be further processed.
Merge two streams, which is cascaded by tensor and merge two streams, and it is 8 to add with channel number
Two 1 × 1 convolutional layers with 1 are more non-linear to obtain;Further, since attention is generally concentrated at the centre bit in the visual field
It sets, therefore adds a center before model before final active coating, all conspicuousness standards concentrated by summary data
Determine truthful data SALgt, then by its Biao Zhunhuawei [0,1].
The mapping of this priori is generated from the conspicuousness label in data set:
Prior=∑s SALgt (1)
Finally, a Softmax active coating is added, will finally predict to export as probability graph.
Fig. 2 is a kind of non-maximum suppression layer of the user preference prediction technique based on personalized attention network of the present invention.
Network chooses whether to keep detection according to its confidence level;Confidence threshold value is different for different data sets;By threshold
Value is set as 0.5, this can detect most of wisp and have rational rate of false alarm;NMS layers will indicate these predictive information
Height is credibly converted into the two-dimentional tensor in image space.
In order to merge later with the pre- flow measurement of conspicuousness, the tensor created is arranged to size as NMS layers of output and is
38 × 38, port number N is identical as the quantity of exhaustive division;Each channel represents the prediction of a particular category.
For input picture, if there is classification CatiThe prediction of middle object, then according to predicted position and forecast confidence,
I-th of channel of tensor will have non-zero pixels;Value at each pixel (x, y) is (Conf1,…,Confk), wherein
Conf1,…,ConfkIt is the confidence level of the prediction with the bounding box for surrounding pixel (x, y).
Fig. 3 is a kind of mapping layer of the user preference prediction technique based on personalized attention network of the present invention.NMS layers
Output channel is detailed classification, and mapping layer is combined into representing the channel of superclass;Mapping layer need two it is additional
Input:User preference vector and superclass are clipped to the mapping of the definition between classification;In view of such mapping:
It indicatesTensor channel will be merged into an individual channel SCati;Indicate SCatiNew tunnel
Pixel orientation value is:
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair
Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's
Protection domain.Therefore, the following claims are intended to be interpreted as including preferred embodiment and falls into all changes of the scope of the invention
More and change.
Claims (10)
1. a kind of user preference prediction technique based on personalized attention network, which is characterized in that include mainly that conspicuousness is pre-
Flow measurement (one);Preference fitted flow (two);Merge two streams (three).
2. based on the personalized attention network (PANet) described in claims 1, which is characterized in that PANet is shared by two
The convolutional neural networks (CNN) of common trait extract layer form;The model needs three inputs:Pending original image is used
The detailed class that family defines to superclass not Ying She and the other user preference vector of superclass;Given input picture, PANet will be in multiple rulers
Its depth characteristic is extracted on degree, and passes them to two streams:The pre- flow measurement of conspicuousness will generate Saliency maps, without by user
The influence of preference, and preference fitted flow will generate the preference according to inputting preferences using object detection model architecture
Figure;In conjunction with after the result that two streams obtain, being post-processed (including increase a center priori), prediction result is by conduct
It is suitble to the Pixel-level notable figure of the specific user to provide;In order to train PANet models, carry out Pixel-level based on really marking
Return, this be in the case of given inputting preferences in training generator dynamic generation.
3. based on the pre- flow measurement of conspicuousness (one) described in claims 1, which is characterized in that in order to combine more rulers of input picture
It spends feature and carries out conspicuousness prediction, model uses the feature VGG-16 extracted and single detector (SSD) self-defined figure layer not
Same layer;Using the feature of three kinds of different proportions, size be respectively 38 × 38,19 × 19,10 × 10 sample it is identical as first
Size;The feature of second and third scale is combined to the accuracy that can dramatically and improve conspicuousness prediction;After re-scaling,
Characteristic pattern is combined into three-dimensional tensor, and size is 38 × 38 × 3, totally 512 channels;Then combination tensor is passed through four respectively
Spatial nuclei convolutional layer is respectively provided with 64,128,4,1 feature channels;Then, network remodeling is the triple channel that size is 38 × 38
Two-dimentional tensor, and by it by more 1 × 1 convolutional layers, therefrom network output has the single spy for conspicuousness detection stream
Levy the final result in channel.
4. based on the preference fitted flow (two) described in claims 1, which is characterized in that the stream with it is self-defined in SSD models
Layer is identical, including selected anchor point generation layer, and generate characteristic pattern on multiple scales;The output of this part be object type,
The series connection of confidence level and coordinate information needs to convert it back to image tensor in non-maximum suppression (NMS) and mapping layer with into one
Step processing.
5. based on the non-maximum suppression described in claims 4, which is characterized in that at NMS layers, network is selected according to its confidence level
Whether holding detects;Confidence threshold value is different for different data sets;0.5 is set a threshold to, this can be examined
It surveys most of wisp and there is rational rate of false alarm;NMS layers will indicate that these predictive information height are credibly converted into image sky
Between in two-dimentional tensor.
6. based on the tensor described in claims 5, which is characterized in that in order to merge later with the pre- flow measurement of conspicuousness, created
Tensor to be arranged to size as NMS layer of output be 38 × 38, port number N is identical as the quantity of exhaustive division;Each
Channel represents the prediction of a particular category.
7. based on the predictive information described in claims 5, which is characterized in that for input picture, if there is classification CatiIn
The prediction of object, then according to predicted position and forecast confidence, i-th of channel of tensor will have non-zero pixels;In each picture
Value at plain (x, y) is (Conf1,…,Confk), wherein Conf1,…,ConfkIt is with the bounding box for surrounding pixel (x, y)
Prediction confidence level.
8. based on the output described in claims 6, which is characterized in that NMS layers of output channel is detailed classification, and is mapped
Layer is combined into representing the channel of superclass;Mapping layer needs two additional inputs:User preference vector and superclass are clipped to
Definition mapping between classification;In view of such mapping:
It indicatesTensor channel will be merged into an individual channel SCati;Indicate SCatiNew tunnel pixel
Direction value is:
9. based on two streams (three) of merging described in claims 1, which is characterized in that the model is cascaded by tensor by two
Stream merges, and it is more non-linear to obtain to add two 1 × 1 convolutional layers that there is channel number to be 8 and 1;In addition,
Since attention is generally concentrated at the center in the visual field, added in one before model before final active coating
The heart demarcates truthful data SAL by all conspicuousnesses that summary data is concentratedgt, then by its Biao Zhunhuawei [0,1].
10. demarcating truthful data based on the conspicuousness described in claims 9, which is characterized in that from the conspicuousness in data set
Label generates the mapping of this priori:
Prior=∑s SALgt (2)
Finally, a Softmax active coating is added, will finally predict to export as probability graph.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810619393.2A CN108694471A (en) | 2018-06-11 | 2018-06-11 | A kind of user preference prediction technique based on personalized attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810619393.2A CN108694471A (en) | 2018-06-11 | 2018-06-11 | A kind of user preference prediction technique based on personalized attention network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108694471A true CN108694471A (en) | 2018-10-23 |
Family
ID=63849724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810619393.2A Withdrawn CN108694471A (en) | 2018-06-11 | 2018-06-11 | A kind of user preference prediction technique based on personalized attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108694471A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109636211A (en) * | 2018-12-19 | 2019-04-16 | 淄博职业学院 | Books automatic management system and its management method based on mobile Internet of Things |
CN110166850A (en) * | 2019-05-30 | 2019-08-23 | 上海交通大学 | The method and system of multiple CNN neural network forecast panoramic video viewing location |
CN111079739A (en) * | 2019-11-28 | 2020-04-28 | 长沙理工大学 | Multi-scale attention feature detection method |
-
2018
- 2018-06-11 CN CN201810619393.2A patent/CN108694471A/en not_active Withdrawn
Non-Patent Citations (1)
Title |
---|
SIKUN LIN 等: ""Where"s YOUR focus: Personalized Attention"", 《网页在线公开:HTTPS://ARXIV.ORG/PDF/1802.07931》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109636211A (en) * | 2018-12-19 | 2019-04-16 | 淄博职业学院 | Books automatic management system and its management method based on mobile Internet of Things |
CN110166850A (en) * | 2019-05-30 | 2019-08-23 | 上海交通大学 | The method and system of multiple CNN neural network forecast panoramic video viewing location |
CN110166850B (en) * | 2019-05-30 | 2020-11-06 | 上海交通大学 | Method and system for predicting panoramic video watching position by multiple CNN networks |
CN111079739A (en) * | 2019-11-28 | 2020-04-28 | 长沙理工大学 | Multi-scale attention feature detection method |
CN111079739B (en) * | 2019-11-28 | 2023-04-18 | 长沙理工大学 | Multi-scale attention feature detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110276316B (en) | Human body key point detection method based on deep learning | |
CN111754596B (en) | Editing model generation method, device, equipment and medium for editing face image | |
CN110176027A (en) | Video target tracking method, device, equipment and storage medium | |
CN110472627A (en) | One kind SAR image recognition methods end to end, device and storage medium | |
CN108460338A (en) | Estimation method of human posture and device, electronic equipment, storage medium, program | |
CN110310175A (en) | System and method for mobile augmented reality | |
CN106096542B (en) | Image video scene recognition method based on distance prediction information | |
CN108304761A (en) | Method for text detection, device, storage medium and computer equipment | |
CN108229303A (en) | Detection identification and the detection identification training method of network and device, equipment, medium | |
CN110246181A (en) | Attitude estimation model training method, Attitude estimation method and system based on anchor point | |
CN108694471A (en) | A kind of user preference prediction technique based on personalized attention network | |
CN111160111B (en) | Human body key point detection method based on deep learning | |
CN107563350A (en) | A kind of method for detecting human face for suggesting network based on yardstick | |
Wang et al. | BANet: Small and multi-object detection with a bidirectional attention network for traffic scenes | |
CN109903339B (en) | Video group figure positioning detection method based on multi-dimensional fusion features | |
WO2022178833A1 (en) | Target detection network training method, target detection method, and apparatus | |
CN108492301A (en) | A kind of Scene Segmentation, terminal and storage medium | |
US20240312181A1 (en) | Video detection method and apparatus, device, and storage medium | |
CN106780701A (en) | The synthesis control method of non-homogeneous texture image, device, storage medium and equipment | |
CN108256400A (en) | The method for detecting human face of SSD based on deep learning | |
Qu et al. | Visual cross-image fusion using deep neural networks for image edge detection | |
CN115393179A (en) | Writing processing method and device, electronic equipment and readable storage medium | |
Li et al. | Real-time crowd density estimation based on convolutional neural networks | |
CN108985385A (en) | Based on the quick Weakly supervised object detection method for generating confrontation study | |
CN112712571A (en) | Video-based object plane mapping method, device and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181023 |