CN108694471A - A kind of user preference prediction technique based on personalized attention network - Google Patents

A kind of user preference prediction technique based on personalized attention network Download PDF

Info

Publication number
CN108694471A
CN108694471A CN201810619393.2A CN201810619393A CN108694471A CN 108694471 A CN108694471 A CN 108694471A CN 201810619393 A CN201810619393 A CN 201810619393A CN 108694471 A CN108694471 A CN 108694471A
Authority
CN
China
Prior art keywords
conspicuousness
preference
tensor
channel
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810619393.2A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201810619393.2A priority Critical patent/CN108694471A/en
Publication of CN108694471A publication Critical patent/CN108694471A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

A kind of user preference prediction technique based on personalized attention network proposed in the present invention, main contents include:The pre- flow measurement of conspicuousness, preference fitted flow merge two streams, and process is to give input picture, personalized attention network (PANet) will extract its depth characteristic on multiple scales, and pass them to two streams:The pre- flow measurement of conspicuousness will generate Saliency maps, and without being influenced by user preference, and preference fitted flow will generate the preference figure according to inputting preferences using object detection model architecture;In conjunction with after the result that two streams obtain, will be post-processed (including increasing a center priori), prediction result will be provided as the Pixel-level notable figure of the suitable specific user.Personalization attention network proposed by the present invention, is adapted to different user preferences, more accurate quick when predicting the attention force of different user, is more advantageous to practical application.

Description

A kind of user preference prediction technique based on personalized attention network
Technical field
The present invention relates to preferences to predict field, more particularly, to a kind of user preference based on personalized attention network Prediction technique.
Background technology
Attention is a kind of experience of personalization, even if different people, when facing Same Scene, attention may also can It concentrates in different regions or target.The attention of correctly predicted each user for human-computer interaction (HCI) application program extremely It closes important.With the raising of the progress and computing capability of recent deep learning, the visual tasks such as object detection and conspicuousness prediction It realizes higher precision and realizes faster.The lime light of user is accurately analyzed, it is emerging to commodity sense to be conducive to businessman The crowd of interest carries out specific aim sale, or the placement position etc. of the more popular commodity of adjustment, to adjust sales tactics in real time; And the attention of child is analyzed, then it will be seen where their point of interest, so that the phase develops its interest in due course Hobby.Traditional prediction technique calculates complexity, and speed is slower, and accuracy is not high so that it is difficult to put into practical application.
The present invention proposes a kind of user preference prediction technique based on personalized attention network, gives input picture, Personalized attention network (PANet) will extract its depth characteristic on multiple scales, and pass them to two streams:Significantly The pre- flow measurement of property will generate Saliency maps, and without being influenced by user preference, and preference fitted flow will utilize object detection model Architecture generates the preference figure according to inputting preferences;In conjunction with after the result that two streams obtain, will be post-processed (including Increase a center priori), prediction result will be provided as the Pixel-level notable figure of the suitable specific user.It is proposed by the present invention Personalized attention network, is adapted to different user preferences, more accurate fast when predicting the attention force of different user Speed is more advantageous to practical application.
Invention content
Complicated, the problems such as speed is slower, and accuracy is not high is calculated for traditional prediction technique, the purpose of the present invention exists In providing a kind of user preference prediction technique based on personalized attention network, input picture, personalized attention net are given Network (PANet) will extract its depth characteristic on multiple scales, and pass them to two streams:The pre- flow measurement of conspicuousness will generate Saliency maps, without being influenced by user preference, and preference fitted flow will be generated using object detection model architecture According to the preference figure of inputting preferences;In conjunction with after the result that two streams obtain, will be post-processed (including increase a center elder generation Test), prediction result will be provided as the Pixel-level notable figure of the suitable specific user.
To solve the above problems, the present invention provides a kind of user preference prediction technique based on personalized attention network, Its main contents includes:
(1) the pre- flow measurement of conspicuousness;
(2) preference fitted flow;
(3) merge two streams.
Wherein, the personalized attention network (PANet), PANet by two shared common trait extract layers volume Product neural network (CNN) forms;The model needs three inputs:Pending original image, user-defined detailed class are to super Classification maps and the other user preference vector of superclass;Given input picture, PANet will extract its depth spy on multiple scales Sign, and pass them to two streams:The pre- flow measurement of conspicuousness will generate Saliency maps, without being influenced by user preference, and Preference fitted flow will generate the preference figure according to inputting preferences using object detection model architecture;It is obtained in conjunction with from two streams It after the result obtained, will be post-processed (including increasing a center priori), prediction result will be as the suitable specific user's Pixel-level notable figure provides;In order to train PANet models, carry out Pixel-level based on the recurrence really marked, this is given In the case of inputting preferences in training generator dynamic generation.
Wherein, the pre- flow measurement of the conspicuousness, in order to combine the Analysis On Multi-scale Features of input picture to carry out conspicuousness prediction, mould Type uses the different layers of the feature VGG-16 and single detector (SSD) self-defined figure layer of extraction;Use three kinds of different proportions Feature, size are respectively 38 × 38,19 × 19,10 × 10 to sample the identical size with first;By second and third scale Feature be combined can dramatically improve conspicuousness prediction accuracy;After re-scaling, characteristic pattern is combined into three-dimensional tensor, Size is 38 × 38 × 3, totally 512 channels;Then tensor will be combined respectively by four spatial nuclei convolutional layers, be respectively provided with 64,128,4,1 feature channels;Then, network remodeling is the triple channel two dimension tensor that size is 38 × 38, and it is passed through more 1 × 1 more convolutional layers, therefrom network output have the final result in the single feature channel for conspicuousness detection stream.
Wherein, the preference fitted flow, the stream is identical as the custom layers in SSD models, including selected anchor point life Stratification, and generate characteristic pattern on multiple scales;The output of this part is the series connection of object type, confidence level and coordinate information, Need to convert it back to image tensor in non-maximum suppression (NMS) and mapping layer to be further processed.
Further, the non-maximum suppression, at NMS layers, network chooses whether to keep detection according to its confidence level;It sets Confidence threshold is different for different data sets;0.5 is set a threshold to, this can detect most of wisp simultaneously With rational rate of false alarm;NMS layers will indicate that these predictive information height are credibly converted into the two-dimentional tensor in image space.
Further, the tensor, in order to merge later with the pre- flow measurement of conspicuousness, the tensor created is used as NMS layers Output be arranged to size be 38 × 38, port number N is identical as the quantity of exhaustive division;Each channel represent one it is specific The prediction of classification.
Further, the predictive information, for input picture, if there is classification CatiThe prediction of middle object, then According to predicted position and forecast confidence, i-th of channel of tensor will have non-zero pixels;Value at each pixel (x, y) It is (Conf1,…,Confk), wherein Conf1,…,ConfkIt is the confidence of the prediction with the bounding box for surrounding pixel (x, y) Degree.
Further, the output, NMS layers of output channel is detailed classification, and mapping layer is combined into Represent the channel of superclass;Mapping layer needs two additional inputs:User preference vector and superclass are clipped to the definition between classification Mapping;In view of such mapping:
It indicatesTensor channel will be merged into an individual channel SCati;Indicate SCatiNew tunnel Pixel orientation value is:
Wherein, two streams of the merging, which is cascaded by tensor and merges two streams, and adds tool There have two 1 × 1 convolutional layers that channel number is 8 and 1 to be more non-linear to obtain;It is regarded further, since attention is generally concentrated at Wild center, therefore a center is added before model before final active coating, the institute concentrated by summary data Significant calibration truthful data SALgt, then by its Biao Zhunhuawei [0,1].
Further, the conspicuousness demarcates truthful data, generates this priori from the conspicuousness label in data set and reflects It penetrates:
Prior=∑s SALgt (2)
Finally, a Softmax active coating is added, will finally predict to export as probability graph.
Description of the drawings
Fig. 1 is a kind of system flow chart of the user preference prediction technique based on personalized attention network of the present invention.
Fig. 2 is a kind of non-maximum suppression layer of the user preference prediction technique based on personalized attention network of the present invention.
Fig. 3 is a kind of mapping layer of the user preference prediction technique based on personalized attention network of the present invention.
Specific implementation mode
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of system flow chart of the user preference prediction technique based on personalized attention network of the present invention.It is main To include the pre- flow measurement of conspicuousness, preference fitted flow merges two streams.
Personalized attention network (PANet) by two shared common trait extract layers convolutional neural networks (CNN) group At;The model needs three inputs:Pending original image, user-defined detailed class to superclass not Ying She and superclass it is other User preference vector;Given input picture, PANet will extract its depth characteristic on multiple scales, and pass them to two A stream:The pre- flow measurement of conspicuousness will generate Saliency maps, and without being influenced by user preference, and preference fitted flow will utilize object Detection model architecture generates the preference figure according to inputting preferences;In conjunction with after the result that two streams obtain, after progress Processing (including increasing a center priori), prediction result will be provided as the Pixel-level notable figure of the suitable specific user;For Trained PANet models, carry out Pixel-level based on the recurrence really marked, this is being instructed in the case of given inputting preferences Practice dynamic generation in generator.
The pre- flow measurement of conspicuousness, in order to combine the Analysis On Multi-scale Features of input picture to carry out conspicuousness prediction, model uses extraction Feature VGG-16 and single detector (SSD) self-defined figure layer different layers;Use the feature of three kinds of different proportions, size point 38 × 38,19 × 19,10 × 10 the identical size with first Wei not be sampled;The feature of second and third scale is combined It can dramatically the accuracy for improving conspicuousness prediction;After re-scaling, characteristic pattern is combined into three-dimensional tensor, and size is 38 × 38 × 3, totally 512 channels;Then tensor will be combined respectively by four spatial nuclei convolutional layers, be respectively provided with 64,128,4,1 spies Levy channel;Then, network remodeling is the triple channel two dimension tensor that size is 38 × 38, and it is passed through more 1 × 1 convolution Layer, therefrom network output have the final result in the single feature channel for conspicuousness detection stream.
Preference fitted flow is identical as the custom layers in SSD models, including selected anchor point generation layer, and in multiple scales Upper generation characteristic pattern;The output of this part is the series connection of object type, confidence level and coordinate information, needs non-maximum suppression (NMS) and in mapping layer image tensor is converted it back to be further processed.
Merge two streams, which is cascaded by tensor and merge two streams, and it is 8 to add with channel number Two 1 × 1 convolutional layers with 1 are more non-linear to obtain;Further, since attention is generally concentrated at the centre bit in the visual field It sets, therefore adds a center before model before final active coating, all conspicuousness standards concentrated by summary data Determine truthful data SALgt, then by its Biao Zhunhuawei [0,1].
The mapping of this priori is generated from the conspicuousness label in data set:
Prior=∑s SALgt (1)
Finally, a Softmax active coating is added, will finally predict to export as probability graph.
Fig. 2 is a kind of non-maximum suppression layer of the user preference prediction technique based on personalized attention network of the present invention. Network chooses whether to keep detection according to its confidence level;Confidence threshold value is different for different data sets;By threshold Value is set as 0.5, this can detect most of wisp and have rational rate of false alarm;NMS layers will indicate these predictive information Height is credibly converted into the two-dimentional tensor in image space.
In order to merge later with the pre- flow measurement of conspicuousness, the tensor created is arranged to size as NMS layers of output and is 38 × 38, port number N is identical as the quantity of exhaustive division;Each channel represents the prediction of a particular category.
For input picture, if there is classification CatiThe prediction of middle object, then according to predicted position and forecast confidence, I-th of channel of tensor will have non-zero pixels;Value at each pixel (x, y) is (Conf1,…,Confk), wherein Conf1,…,ConfkIt is the confidence level of the prediction with the bounding box for surrounding pixel (x, y).
Fig. 3 is a kind of mapping layer of the user preference prediction technique based on personalized attention network of the present invention.NMS layers Output channel is detailed classification, and mapping layer is combined into representing the channel of superclass;Mapping layer need two it is additional Input:User preference vector and superclass are clipped to the mapping of the definition between classification;In view of such mapping:
It indicatesTensor channel will be merged into an individual channel SCati;Indicate SCatiNew tunnel Pixel orientation value is:
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's Protection domain.Therefore, the following claims are intended to be interpreted as including preferred embodiment and falls into all changes of the scope of the invention More and change.

Claims (10)

1. a kind of user preference prediction technique based on personalized attention network, which is characterized in that include mainly that conspicuousness is pre- Flow measurement (one);Preference fitted flow (two);Merge two streams (three).
2. based on the personalized attention network (PANet) described in claims 1, which is characterized in that PANet is shared by two The convolutional neural networks (CNN) of common trait extract layer form;The model needs three inputs:Pending original image is used The detailed class that family defines to superclass not Ying She and the other user preference vector of superclass;Given input picture, PANet will be in multiple rulers Its depth characteristic is extracted on degree, and passes them to two streams:The pre- flow measurement of conspicuousness will generate Saliency maps, without by user The influence of preference, and preference fitted flow will generate the preference according to inputting preferences using object detection model architecture Figure;In conjunction with after the result that two streams obtain, being post-processed (including increase a center priori), prediction result is by conduct It is suitble to the Pixel-level notable figure of the specific user to provide;In order to train PANet models, carry out Pixel-level based on really marking Return, this be in the case of given inputting preferences in training generator dynamic generation.
3. based on the pre- flow measurement of conspicuousness (one) described in claims 1, which is characterized in that in order to combine more rulers of input picture It spends feature and carries out conspicuousness prediction, model uses the feature VGG-16 extracted and single detector (SSD) self-defined figure layer not Same layer;Using the feature of three kinds of different proportions, size be respectively 38 × 38,19 × 19,10 × 10 sample it is identical as first Size;The feature of second and third scale is combined to the accuracy that can dramatically and improve conspicuousness prediction;After re-scaling, Characteristic pattern is combined into three-dimensional tensor, and size is 38 × 38 × 3, totally 512 channels;Then combination tensor is passed through four respectively Spatial nuclei convolutional layer is respectively provided with 64,128,4,1 feature channels;Then, network remodeling is the triple channel that size is 38 × 38 Two-dimentional tensor, and by it by more 1 × 1 convolutional layers, therefrom network output has the single spy for conspicuousness detection stream Levy the final result in channel.
4. based on the preference fitted flow (two) described in claims 1, which is characterized in that the stream with it is self-defined in SSD models Layer is identical, including selected anchor point generation layer, and generate characteristic pattern on multiple scales;The output of this part be object type, The series connection of confidence level and coordinate information needs to convert it back to image tensor in non-maximum suppression (NMS) and mapping layer with into one Step processing.
5. based on the non-maximum suppression described in claims 4, which is characterized in that at NMS layers, network is selected according to its confidence level Whether holding detects;Confidence threshold value is different for different data sets;0.5 is set a threshold to, this can be examined It surveys most of wisp and there is rational rate of false alarm;NMS layers will indicate that these predictive information height are credibly converted into image sky Between in two-dimentional tensor.
6. based on the tensor described in claims 5, which is characterized in that in order to merge later with the pre- flow measurement of conspicuousness, created Tensor to be arranged to size as NMS layer of output be 38 × 38, port number N is identical as the quantity of exhaustive division;Each Channel represents the prediction of a particular category.
7. based on the predictive information described in claims 5, which is characterized in that for input picture, if there is classification CatiIn The prediction of object, then according to predicted position and forecast confidence, i-th of channel of tensor will have non-zero pixels;In each picture Value at plain (x, y) is (Conf1,…,Confk), wherein Conf1,…,ConfkIt is with the bounding box for surrounding pixel (x, y) Prediction confidence level.
8. based on the output described in claims 6, which is characterized in that NMS layers of output channel is detailed classification, and is mapped Layer is combined into representing the channel of superclass;Mapping layer needs two additional inputs:User preference vector and superclass are clipped to Definition mapping between classification;In view of such mapping:
It indicatesTensor channel will be merged into an individual channel SCati;Indicate SCatiNew tunnel pixel Direction value is:
9. based on two streams (three) of merging described in claims 1, which is characterized in that the model is cascaded by tensor by two Stream merges, and it is more non-linear to obtain to add two 1 × 1 convolutional layers that there is channel number to be 8 and 1;In addition, Since attention is generally concentrated at the center in the visual field, added in one before model before final active coating The heart demarcates truthful data SAL by all conspicuousnesses that summary data is concentratedgt, then by its Biao Zhunhuawei [0,1].
10. demarcating truthful data based on the conspicuousness described in claims 9, which is characterized in that from the conspicuousness in data set Label generates the mapping of this priori:
Prior=∑s SALgt (2)
Finally, a Softmax active coating is added, will finally predict to export as probability graph.
CN201810619393.2A 2018-06-11 2018-06-11 A kind of user preference prediction technique based on personalized attention network Withdrawn CN108694471A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810619393.2A CN108694471A (en) 2018-06-11 2018-06-11 A kind of user preference prediction technique based on personalized attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810619393.2A CN108694471A (en) 2018-06-11 2018-06-11 A kind of user preference prediction technique based on personalized attention network

Publications (1)

Publication Number Publication Date
CN108694471A true CN108694471A (en) 2018-10-23

Family

ID=63849724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810619393.2A Withdrawn CN108694471A (en) 2018-06-11 2018-06-11 A kind of user preference prediction technique based on personalized attention network

Country Status (1)

Country Link
CN (1) CN108694471A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636211A (en) * 2018-12-19 2019-04-16 淄博职业学院 Books automatic management system and its management method based on mobile Internet of Things
CN110166850A (en) * 2019-05-30 2019-08-23 上海交通大学 The method and system of multiple CNN neural network forecast panoramic video viewing location
CN111079739A (en) * 2019-11-28 2020-04-28 长沙理工大学 Multi-scale attention feature detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SIKUN LIN 等: ""Where"s YOUR focus: Personalized Attention"", 《网页在线公开:HTTPS://ARXIV.ORG/PDF/1802.07931》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109636211A (en) * 2018-12-19 2019-04-16 淄博职业学院 Books automatic management system and its management method based on mobile Internet of Things
CN110166850A (en) * 2019-05-30 2019-08-23 上海交通大学 The method and system of multiple CNN neural network forecast panoramic video viewing location
CN110166850B (en) * 2019-05-30 2020-11-06 上海交通大学 Method and system for predicting panoramic video watching position by multiple CNN networks
CN111079739A (en) * 2019-11-28 2020-04-28 长沙理工大学 Multi-scale attention feature detection method
CN111079739B (en) * 2019-11-28 2023-04-18 长沙理工大学 Multi-scale attention feature detection method

Similar Documents

Publication Publication Date Title
CN110276316B (en) Human body key point detection method based on deep learning
CN111754596B (en) Editing model generation method, device, equipment and medium for editing face image
CN110176027A (en) Video target tracking method, device, equipment and storage medium
CN110472627A (en) One kind SAR image recognition methods end to end, device and storage medium
CN108460338A (en) Estimation method of human posture and device, electronic equipment, storage medium, program
CN110310175A (en) System and method for mobile augmented reality
CN106096542B (en) Image video scene recognition method based on distance prediction information
CN108304761A (en) Method for text detection, device, storage medium and computer equipment
CN104049760B (en) The acquisition methods and system of a kind of man-machine interaction order
CN108694471A (en) A kind of user preference prediction technique based on personalized attention network
CN107563350A (en) A kind of method for detecting human face for suggesting network based on yardstick
CN109903339B (en) Video group figure positioning detection method based on multi-dimensional fusion features
Wang et al. BANet: Small and multi-object detection with a bidirectional attention network for traffic scenes
WO2022178833A1 (en) Target detection network training method, target detection method, and apparatus
CN108492301A (en) A kind of Scene Segmentation, terminal and storage medium
CN111160111A (en) Human body key point detection method based on deep learning
CN106780701A (en) The synthesis control method of non-homogeneous texture image, device, storage medium and equipment
CN114792359A (en) Rendering network training and virtual object rendering method, device, equipment and medium
CN108256400A (en) The method for detecting human face of SSD based on deep learning
CN111177811A (en) Automatic fire point location layout method applied to cloud platform
Wang et al. Image captioning using region-based attention joint with time-varying attention
CN115311403B (en) Training method of deep learning network, virtual image generation method and device
CN110363792A (en) A kind of method for detecting change of remote sensing image based on illumination invariant feature extraction
CN106845391B (en) Atmosphere field identification method and system in home environment
Kim et al. Adaptive surface splatting for facial rendering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20181023