CN110619344A - Microblog friend recommendation method based on SSD and time sequence model - Google Patents
Microblog friend recommendation method based on SSD and time sequence model Download PDFInfo
- Publication number
- CN110619344A CN110619344A CN201910635218.7A CN201910635218A CN110619344A CN 110619344 A CN110619344 A CN 110619344A CN 201910635218 A CN201910635218 A CN 201910635218A CN 110619344 A CN110619344 A CN 110619344A
- Authority
- CN
- China
- Prior art keywords
- user
- ssd
- prior frame
- follows
- interest
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012545 processing Methods 0.000 claims abstract description 15
- 238000001514 detection method Methods 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims description 33
- 239000011159 matrix material Substances 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 17
- 239000013598 vector Substances 0.000 claims description 9
- 239000000126 substance Substances 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 6
- 239000007787 solid Substances 0.000 abstract description 2
- 238000012300 Sequence Analysis Methods 0.000 abstract 1
- 238000013461 design Methods 0.000 description 2
- 244000097202 Rathbunia alamosensis Species 0.000 description 1
- 235000009776 Rathbunia alamosensis Nutrition 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of picture processing analysis, and particularly relates to a microblog friend recommendation method based on an SSD (solid State disk) and a time sequence model; according to the method, dynamic picture information issued by a blog user in a platform is collected and analyzed, the trained SSD target detection algorithm is utilized to retrieve user interest information hidden in the picture, and on the basis, a time sequence model is utilized to conduct time sequence analysis to obtain a friend recommendation list. The invention improves the scientificity of the friend recommendation strategy, improves the accuracy of the recommendation result, and enriches the friend recommendation strategy scheme, thereby improving the richness of the recommendation strategy.
Description
Technical Field
The invention relates to the field of picture processing analysis, in particular to a microblog friend recommendation method based on an SSD and a time sequence model.
Background
In order to improve the quality and interpretability of the acquired user information and recommend friends to the user and improve the acceptance rate of the user on recommendation results, a friend recommendation system needs to recommend according to different user interests; in order to implement this method, when a friend is recommended, effective information of an analysis user can be obtained from different angles as consideration content of user similarity.
In the existing microblog friend recommendation algorithm, the following two recommendation modes are mainly adopted: and recommending friends based on the social relationship, and mining user interest preference based on microblog content.
Recommending friends based on the social relationship, wherein user recommendation is mainly carried out according to similar social preferences, so that the acceptance rate of users is greatly improved, a social topological graph is established by using the attention relationship of the users, and social similarity is obtained through specific algorithm processing so as to recommend the friends; such as predicting users who may be friends using Pareto optimal genetic algorithms (Pareto-optimal). There is a problem in that it is limited to the user's own buddy list information.
In a huge social network such as microblog, a user not only has a social relationship, but also contains a large amount of blog information. Potential interest preference of the user can be deeply mined through a large amount of text information based on microblog content, potential friends are recommended to a target user, and a social circle is enlarged; for example, a feature vector of a user represented by a keyword with a higher weight is extracted by using a Term Frequency-Inverse Document Frequency (TF-IDF) algorithm, and a tag may be added to each user, so that friends with similar interests are recommended to the user according to the similarity of the interest tags. The microblog friend recommendation algorithm has common interests and hobbies for friend recommendation, processing means for text information generated by a user in a microblog platform are enriched continuously, the technology is mature continuously, but processing of picture data generated by the user is omitted, and the picture is often rich in information effective for user interest analysis. Therefore, the SSD target detection algorithm is used for extracting the information of the user picture data so as to obtain the user interest and hobbies, and the time sequence model is utilized to improve the effectiveness and the accuracy of the user interest analysis by combining the characteristic that the interest of the user is gradually reduced along with the lapse of time, which is very significant.
Disclosure of Invention
The invention provides a microblog friend recommendation method based on an SSD (solid state disk) and a time sequence model, aiming at the problem that important information of picture data in friend recommendation strategies is lack of effective utilization and processing.
In order to achieve the purpose, the specific technical scheme of the invention is as follows: a microblog friend recommendation method based on an SSD and a time sequence model comprises the following steps:
1) collecting data and constructing a data table per _ info, namely acquiring picture data and corresponding time data of a user from a microblog platform, and constructing the data table per _ info, wherein one picture corresponds to one entry of the data table per _ info, and each entry comprises 4 data elements: name, date, picture and category, wherein the name represents a user name, the date represents time, the picture represents a user picture, and the category represents the category;
2) selecting a public data set as a training data set;
3) training a model;
4) taking picture data in the data table per _ info as input of the trained SSD model, and constructing a user interest tag matrix;
5) calculating a potential interest distribution matrix, wherein the time sequence calculation formula of each interest is as follows:
taking time in a time interval as a time point, wherein n represents the total number of the time points, and t represents the t-th time point;
6) calculating an interest similarity matrix of each time period between users p and q by using a probability distribution distance formula
N rows in the matrix represent interest similarity of the nth time period, and the calculation formula is as follows:
wherein D isJSRepresenting JS divergence, DKLRepresenting the KL divergence.
7) And sequencing the similarity according to the obtained interest similarity between the user and other users to obtain the top k friends, and recommending the top k friends to corresponding target users.
Further, in the step 3), the model training includes the following steps:
3.1) preprocessing the training data set, processing the pictures of the training data set into images of 300 × 300 pixels;
3.2) building an SSD network architecture;
3.3) setting the prior frame size and the length-width ratio, and calculating the width w of the prior framekAnd a height hk;
3.4) calculating the intersection ratio IOU value, wherein the calculation formula is as follows:
wherein S isp∩SgRepresenting the intersection of the area of the prior frame and the area of the real target frame, Sp∪SgRepresenting the union of the area of the prior frame and the real target frame;
3.5) matching prior frames, wherein the prior frame matching method comprises the following steps: for each real target frame in the graph, taking the prior frame with the largest IOU value as an object which is successfully matched, for the prior frame which is not successfully matched, comparing the IOU value with a threshold value, and if the IOU value is larger than the threshold value, setting the prior frame as the object which is successfully matched;
3.6) calculating the SSD detection value, including the predicted position and the classification information; the classification information calculation method is that if a certain prior frame is successfully matched with a real target frame, the real target frame is considered to be consistent with the category information of the prior frame; the predicted position is recorded as l ═ (l)cx,lcy,lw,lh) The calculation formula is as follows:
lcx=(bcx-dcx)/dw,lcy=(bcy-dcy)/dh
lw=log(bw/dw),lh=log(bh/dh)
wherein d ═ d (d)cx,dcy,dw,dh) Representing a priori box position, b ═ bcx,bcy,bw,bh) Representing the position of the corresponding real target frame;
3.7) calculating the confidence error Lconf(x, c) and position error Lloc(x,l,g);
3.8) calculating a loss function L (x, c, L, g), said loss function being calculated as follows:
wherein N is the number of positive samples of the prior frame; α is a weight (typically 1).
Further, the method for constructing the user interest tag matrix in the step 4) is as follows:
4.1) carrying out interest identification on the input pictures of the trained model, outputting the model to obtain the category of each picture, and assigning the category to the category attribute in the per _ info;
4.2) data processing, namely dividing user data into n time periods by taking T days as time intervals according to the date attribute, and constructing data with the same name attribute into a group of data sets { name, date, picture, category } in each time period;
4.3) constructing to obtain a certain time period t of the useriH, { h ═ hy1,hy2,…,hymUser interest vector, where hyiRepresenting the proportion of the occurrence frequency of the objects in the ith type of interest tags in the interest tags of the user to the total occurrence frequency of the objects in the t-th time period;
4.4) integrating the user interest vectors of n time periods, constructing and obtaining a user interest label matrix, and recording
Is composed of
Further, in the step 3.3), the prior frame scale linear formula is as follows:
wherein S iskRepresents the corresponding scale size of the k prior frame, SminRepresenting the smallest prior boxDimension, value 0.2, SmaxRepresenting the maximum prior frame scale, taking the value of 0.9, and taking m as the number of the characteristic graphs;
the aspect ratio of the prior box satisfies
Width w of the prior framekAnd a height hkThe calculation formula is as follows:
wherein, WinputAnd HinputRespectively the width and height of the input image.
Further, in the step 3.7), the position error L is determinedloc(x, l, g) is calculated as follows:
wherein, smoothL1The loss function calculation formula is as follows:
for the loss error of the real target, the calculation formula is:
in the formula (I), the compound is shown in the specification,respectively represent the center of the jth real target frameThe horizontal coordinate, the vertical coordinate, the width and the height of the position;respectively representing the loss errors of the abscissa, the ordinate, the width and the height of the center position of the jth real target;respectively represents the abscissa and ordinate of the central position of the ith prior frame, the width and the height of the ith prior frame.
Further, in the step 3.7), the confidence error L is determinedconf(x, c), the calculation formula is as follows:
wherein the content of the first and second substances,when the value is 1, the value represents that the ith prior frame is matched with the jth group channel, and the category is p; c is a category confidence degree predicted value; l is the predicted value of the corresponding bounding box position, and g is the position parameter of the groudtruth.
Further, said D in the above step 7)JSThe calculation formula is as follows:
said DKLThe calculation formula is as follows:
the method has the advantages that the picture information generated by the user is fully utilized, and the characteristic that the interest and hobbies of the user change along with the lapse of time is considered, so that the recommendation strategy is more scientific, the accuracy of the recommendation result is improved, the friend recommendation strategy scheme is enriched, and the richness of the recommendation strategy is improved.
Drawings
FIG. 1 is a data flow diagram associated with an SSD destination detection algorithm.
FIG. 2 is a specific flowchart of a microblog friend recommendation algorithm based on an SSD and a timing model.
Fig. 3 is four main components of the SSD model.
FIG. 4 is the SSD model main execution sequence.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific embodiments, it should be noted that the technical solutions and design principles of the present invention are described in detail below only with one optimized technical solution, but the scope of the present invention is not limited thereto.
The present invention is not limited to the above-described embodiments, and any obvious improvements, substitutions or modifications can be made by those skilled in the art without departing from the spirit of the present invention.
Referring to fig. 1-4, a microblog friend recommendation algorithm based on an SSD and a timing model. The method collects relevant data published by the user from the Sina microblog. As shown in fig. 1, according to the algorithm design requirement, a training data set is used to train an SSD model, and then microblog user data is input into the SSD model to obtain classification information of each user picture, and finally a user interest tag matrix is obtained for further processing and analysis.
The invention provides a microblog friend recommendation method based on an SSD and a time sequence model, which mainly comprises the following steps as shown in FIG. 2: collecting data, constructing a data table, selecting a training data set, training a model, analyzing and processing the data, and calculating the data.
1) Collecting data and constructing a data table per _ info, namely acquiring picture data and corresponding time data of a user from a microblog platform, constructing the data table per _ info, wherein one picture corresponds to one item of the data table per _ info, each item comprises data elements such as name, date, picture, category and the like, the name represents a user name, the date represents time, the picture represents a user picture, and the category represents the category to which the category belongs;
2) selecting a public data set as a training data set, downloading a PASCAL VOC (VOC 2007) data set which is public in the specific embodiment of the invention and can be used for image classification, target detection and the like, wherein the PASCAL VOC data set comprises 9963 pictures with 20 categories, and each picture comprises position information of an object target frame in the picture, category information of the target frame and the like;
3) model training, which aims to make the result obtained by the input user picture through the SSD object detection algorithm model more accurate, four main components of the SSD model are shown in fig. 3, and the execution steps are shown in fig. 4, and include the following steps:
3.1) preprocessing a training data set, and processing pictures of the training data set into images of 300 × 300 pixels;
3.2) building an SSD network architecture, in the specific embodiment of the invention, the SSD network architecture is based on VGG16 and comprises six convolutional layers of Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2 and Conv11_2, wherein Conv4_3 is a convolutional layer of VGG16, and Conv7 is a convolutional layer converted from a full connection layer of VGG 16;
3.3) setting and generating prior frames, wherein Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2 and Conv11_2 are used as feature maps for detection, the sizes of the six feature maps are 38 × 38, 19 × 19,10 × 10,5 × 5,3 × 3 and 1 × 1, and the number of the prior frames generated by different feature maps is different; as the feature size decreases, the scale of the prior box should increase linearly;
the prior frame is obtained by setting the prior frame scale and the aspect ratio and calculating. First, the prior box scale linear formula is as follows:
wherein S iskRepresents the corresponding scale size of the k prior frame, SminRepresents the smallest prior frame size, and takes the value of 0.2, SmaxRepresents the maximum prior frame size, takes the value of 0.9, and m is specialFigure number of the figure;
second, the aspect ratio of the prior frame is generally chosenWherein alpha isrWhen 1', itFor the feature maps Conv4_3, Conv10_2, Conv11_2, only the aspect ratio of 4 prior boxes is used, 3 andall other feature maps are in aspect ratio;
from this, the width (w) of each prior frame can be calculatedk) And high (h)k) The calculation formula is as follows:
let W be the width and height of the input image, respectivelyinputAnd HinputThen the aspect ratio of the feature map is αrThe prior frame width and height calculation formula is as follows:
a total of 38 × 38 × 4+19 × 19 × 6+10 × 10 × 6+5 × 5 × 6+3 × 3 × 6+1 × 1 × 4 would be generated as 8732 prior frames;
3.4) calculating an IOU value, wherein the IOU (intersection over Union) is an intersection ratio, namely calculated by intersection and union of a prior box (prior box) of the SSD and a real target box (ground route, namely the position of a target in a picture), and a calculation formula is as follows:
wherein S isp∩SgRepresenting the intersection of the area of the prior frame and the area of the real target frame, Sp∪SgRepresenting the union of the area of the prior frame and the real target frame;
3.5) matching prior frames, wherein matching is mainly carried out through an IOU value; the higher the IOU value is, the higher the correlation degree between the real target and the prediction is;
the prior box matching follows the following two principles: firstly, regarding each real target frame in the graph, using a prior frame with the largest IOU value as an object which is successfully matched, and secondly, setting a threshold (generally 0.5) for the remaining prior frames which are not successfully matched, comparing the IOU value with the threshold, and if the IOU value is larger than the threshold, setting the prior frames as objects which are successfully matched;
for the prior frames successfully matched, the prior frames are positive samples, and for the prior frames not successfully matched, the prior frames are negative samples;
3.6) calculating SSD detection values (predicted position and classification information);
the classification information is mainly obtained through the step 4.5), if a certain prior frame is successfully matched with the real target frame, the real target frame is considered to be consistent with the prior frame type information, and the classification confidence is obtained;
the predicted position is a predicted position of the bounding box, and is defined as l ═ lcx,lcy,lw,lh) The code is obtained by calculation through a coding formula, and the calculation formula is as follows:
lcx=(bcx-dcx)/dw,lcy=(bcy-dcy)/dh
lw=log(bw/dw),lh=log(bh/dh)
wherein d ═ d (d)cx,dcy,dw,dh) Representing a priori box position, b ═ bcx,bcy,bw,bh) Representing the position of the corresponding real target frame;
3.7) calculating a confidence error (confidence loss) and a location error (localization loss);
for position error Lloc(x, L, g) using the loss function method of Smooth L1 loss (i.e., the loss calculation method of the regression box of RPN in fastercCNN), the calculation formula is as follows:
wherein smooth used in the position error formulaL1The calculation formula is as follows:
wherein the loss error of the real target used in the position error formulaThe calculation formula is as follows:
wherein the content of the first and second substances,respectively representing the horizontal coordinate, the vertical coordinate, the width and the height of the center position of the jth real target frame;respectively representing the loss errors of the abscissa, the ordinate, the width and the height of the center position of the jth real target;respectively representing the horizontal coordinate, the vertical coordinate, the width and the height of the center position of the ith prior frame;
for confidence error Lconf(x, c) calculated using softmax loss (i.e., the loss function of softmax), the formula is defined as follows:
wherein the content of the first and second substances,when the value is 1, the value represents that the ith prior frame is matched with the jth group channel, and the category is p; c is a category confidence degree predicted value; l is the predicted value of the corresponding bounding box position, and g is the position parameter of the groudtruth;
3.8) calculating the loss function by means of the confidence error Lconf(x, c) and position error Lloc(x, L, g) and then calculating a loss function L (x, c, L, g), wherein the loss function is calculated according to the following formula:
wherein N is the number of positive samples of the prior frame; α is a weight (generally 1);
4) taking picture data in the data table per _ info as input of the trained SSD model to be used for calculating a user interest tag matrix, and the specific steps are as follows;
4.1) carrying out interest identification on the input pictures of the trained model, outputting the model to obtain the category of each picture, and assigning the category to the category attribute in the per _ info;
4.2) data processing, namely dividing user data into n time periods by taking T days as time intervals according to the date attribute, and constructing data with the same name attribute into a group of data sets { name, date, picture, category } in each time period;
4.3) constructing to obtain a certain time period t of the useriH, { h ═ hy1,hy2,…,hymUser interest vector, where hyiRepresenting the proportion of the occurrence frequency of the objects in the ith type of interest tags in the interest tags of the user to the total occurrence frequency of the objects in the t-th time period;
4.4) integrating the user interest vectors of n time periods to construct and obtain a user interest label matrix
5) And (3) further processing the user interests by combining the thought of a time sequence model, calculating a user interest label matrix according to a time sequence attenuation function, and calculating a potential interest distribution matrix, namely a time sequence calculation formula of each interest is as follows:
taking time in a time interval as a time point, wherein n represents the total number of the time points, and t represents the t-th time point;
6) calculating an interest similarity matrix of each time period between users p and q by using a probability distribution distance formula
The interest similarity calculation formula of the nth time segment in the matrix is as follows:
wherein D isJSThe formula is defined as follows:
wherein D isKLThe formula is defined as follows:
wherein D isKLRepresents the KL Divergence (Kullback-Leibler Divergence), i.e.the relative entropy, DJSRepresenting JS divergence (Jensen-Shannon), the KL divergence formula has the divergence asymmetry problem, so the JS formula is used for solving the problem; sim (p, q)interestThe method is used for calculating the interest similarity of the user vector p and the user vector q in each time period;
7) and sequencing the similarity according to the obtained interest similarity between the user and other users to obtain the top k friends, and recommending the top k friends to corresponding target users.
Claims (7)
1. A microblog friend recommendation method based on an SSD and a time sequence model is characterized by comprising the following steps:
1) collecting data and constructing a data table per _ info, namely acquiring picture data and corresponding time data of a user from a microblog platform, and constructing the data table per _ info, wherein one picture corresponds to one entry of the data table per _ info, and each entry comprises 4 data elements: name, date, picture and category, wherein the name represents a user name, the date represents time, the picture represents a user picture, and the category represents the category;
2) selecting a public data set as a training data set;
3) training a model;
4) taking picture data in the data table per _ info as input of the trained SSD model, and constructing a user interest tag matrix;
5) calculating a potential interest distribution matrix, wherein the time sequence calculation formula of each interest is as follows:
taking a time interval of T days as a time period, wherein n represents the total time period, and T represents the T-th time period;
6) calculating an interest similarity matrix of each time period between users p and q by using a probability distribution distance formula
N rows in the matrix represent interest similarity of the nth time period, and the calculation formula is as follows:
wherein D isJSRepresenting JS divergence, DKLRepresents the KL divergence;
7) and sequencing the similarity according to the obtained interest similarity between the user and other users to obtain the top k friends, and recommending the top k friends to corresponding target users.
2. The microblog friend recommending method based on the SSD and the time sequence model according to claim 1, wherein in the step 3), the model training comprises the following steps:
3.1) preprocessing the training data set, processing the pictures of the training data set into images of 300 × 300 pixels;
3.2) building an SSD network architecture;
3.3) setting the prior frame size and the length-width ratio, and calculating the width w of the prior framekAnd a height hk;
3.4) calculating the intersection ratio IOU value, wherein the calculation formula is as follows:
wherein S isp∩SgRepresenting the intersection of the area of the prior frame and the area of the real target frame, Sp∪SgRepresenting the union of the area of the prior frame and the real target frame;
3.5) matching prior frames, wherein the prior frame matching method comprises the following steps: for each real target frame in the graph, taking the prior frame with the largest IOU value as an object which is successfully matched, for the prior frame which is not successfully matched, comparing the IOU value with a threshold value, and if the IOU value is larger than the threshold value, setting the prior frame as the object which is successfully matched;
3.6) calculating the SSD detection value, including the predicted position and the classification information; the classification information calculation method is that if a certain prior frame is successfully matched with a real target frame, the real target frame is considered to be consistent with the category information of the prior frame; the predicted position is recorded as l ═ (l)cx,lcy,lw,lh) The calculation formula is as follows:
lcx=(bcx-dcx)/dw,lcy=(bcy-dcy)/dh
lw=log(bw/dw),lh=log(bh/dh)
wherein d ═ d (d)cx,dcy,dw,dh) Representing a priori box position, b ═ bcx,bcy,bw,bh) Representing the position of the corresponding real target frame;
3.7) calculating the confidence error Lconf(x, c) and position error Lloc(x,l,g);
3.8) calculating a loss function L (x, c, L, g), said loss function being calculated as follows:
wherein N is the number of positive samples of the prior frame; α is a weight (typically 1).
3. The microblog friend recommending method based on the SSD and the time sequence model according to claim 1, wherein the user interest tag matrix constructing method in the step 4) is as follows:
4.1) carrying out interest identification on the input pictures of the trained model, outputting the model to obtain the category of each picture, and assigning the category to the category attribute in the per _ info;
4.2) data processing, dividing user data into n time periods by taking T days as time intervals according to the date attribute, and constructing the data with the same name attribute into a group of data sets in each time period
{name,date,picture,category};
4.3) constructing a certain time period t of the useriH, { h ═ hy1,hy2,…,hymUser interest vector, where hyiRepresenting the proportion of the occurrence frequency of the objects in the ith type of interest tags in the interest tags of the user to the total occurrence frequency of the objects in the t-th time period;
4.4) integrating the user interest vectors of n time periods, constructing and obtaining a user interest label matrix, and recording the user interest label matrix as
4. The microblog friend recommending method based on the SSD and the time sequence model according to claim 2, wherein in the step 3.3), the prior frame scale linear formula is as follows:
wherein S iskRepresents the corresponding scale size of the k prior frame, SminRepresents the smallest prior frame size, and takes the value of 0.2, SmaxRepresenting the maximum prior frame scale, taking the value of 0.9, and taking m as the number of the characteristic graphs;
the aspect ratio of the prior box satisfies
Width w of the prior framekAnd a height hkThe calculation formula is as follows:
wherein, WinputAnd HinputRespectively the width and height of the input image.
5. The microblog friend recommendation method based on the SSD and the timing model according to claim 2, wherein
In the step 3.7), the position error Lloc(x, l, g) is calculated as follows:
wherein, smoothL1Loss functionThe calculation formula is as follows:
for the loss error of the real target, the calculation formula is:
in the formula (I), the compound is shown in the specification,respectively representing the horizontal coordinate, the vertical coordinate, the width and the height of the center position of the jth real target frame;respectively representing the loss errors of the abscissa, the ordinate, the width and the height of the center position of the jth real target;respectively represents the abscissa and ordinate of the central position of the ith prior frame, the width and the height of the ith prior frame.
6. The microblog friend recommendation method based on the SSD and the timing model according to claim 2, wherein
In the step 3.7), the confidence error Lconf(x, c), the calculation formula is as follows:
wherein the content of the first and second substances,when the value is 1, the value represents that the ith prior frame is matched with the jth group channel, and the category is p; c is a category confidence degree predicted value; l is the predicted value of the corresponding bounding box position, and g is the position parameter of the groudtruth.
7. The microblog friend recommendation method based on the SSD and the time sequence model according to claim 1, wherein the D in the step 6) isJSThe calculation formula is as follows:
said DKLThe calculation formula is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910635218.7A CN110619344B (en) | 2019-07-15 | 2019-07-15 | Microblog friend recommendation method based on SSD and time sequence model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910635218.7A CN110619344B (en) | 2019-07-15 | 2019-07-15 | Microblog friend recommendation method based on SSD and time sequence model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110619344A true CN110619344A (en) | 2019-12-27 |
CN110619344B CN110619344B (en) | 2023-04-18 |
Family
ID=68921405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910635218.7A Active CN110619344B (en) | 2019-07-15 | 2019-07-15 | Microblog friend recommendation method based on SSD and time sequence model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110619344B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281650A (en) * | 2014-09-15 | 2015-01-14 | 南京锐角信息科技有限公司 | Friend search recommendation method and friend search recommendation system based on interest analysis |
CN108460153A (en) * | 2018-03-27 | 2018-08-28 | 广西师范大学 | A kind of social media friend recommendation method of mixing blog article and customer relationship |
CN109101614A (en) * | 2018-08-06 | 2018-12-28 | 百度在线网络技术(北京)有限公司 | Friend recommendation method and apparatus |
-
2019
- 2019-07-15 CN CN201910635218.7A patent/CN110619344B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281650A (en) * | 2014-09-15 | 2015-01-14 | 南京锐角信息科技有限公司 | Friend search recommendation method and friend search recommendation system based on interest analysis |
CN108460153A (en) * | 2018-03-27 | 2018-08-28 | 广西师范大学 | A kind of social media friend recommendation method of mixing blog article and customer relationship |
CN109101614A (en) * | 2018-08-06 | 2018-12-28 | 百度在线网络技术(北京)有限公司 | Friend recommendation method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN110619344B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhu et al. | A review of video object detection: Datasets, metrics and methods | |
CN108304435B (en) | Information recommendation method and device, computer equipment and storage medium | |
CN107220365B (en) | Accurate recommendation system and method based on collaborative filtering and association rule parallel processing | |
CN110489453B (en) | User game real-time recommendation method and system based on big data log analysis | |
US9483468B2 (en) | Tagging geographical areas | |
US10187344B2 (en) | Social media influence of geographic locations | |
CN107657008B (en) | Cross-media training and retrieval method based on deep discrimination ranking learning | |
WO2017133569A1 (en) | Evaluation index obtaining method and device | |
CN107085585A (en) | Accurate label dependency prediction for picture search | |
US20140122405A1 (en) | Information processing apparatus, information processing method, and program | |
CN110197207B (en) | Method and related device for classifying unclassified user group | |
CN113761259A (en) | Image processing method and device and computer equipment | |
JP6787831B2 (en) | Target detection device, detection model generation device, program and method that can be learned by search results | |
CN110751027B (en) | Pedestrian re-identification method based on deep multi-instance learning | |
CN109635647B (en) | Multi-picture multi-face clustering method based on constraint condition | |
CN109064229B (en) | Advertisement recommendation system based on somatosensory equipment | |
US20220366259A1 (en) | Method, apparatus and system for training a neural network, and storage medium storing instructions | |
CN110688565A (en) | Next item recommendation method based on multidimensional Hox process and attention mechanism | |
Wang et al. | Segment-tube: Spatio-temporal action localization in untrimmed videos with per-frame segmentation | |
Zhao et al. | A review of macroscopic carbon emission prediction model based on machine learning | |
CN110929169A (en) | Position recommendation method based on improved Canopy clustering collaborative filtering algorithm | |
CN112084913B (en) | End-to-end human body detection and attribute identification method | |
CN110619344B (en) | Microblog friend recommendation method based on SSD and time sequence model | |
US10331739B2 (en) | Video search apparatus, video search method, and non-transitory computer readable medium | |
Al Jawarneh et al. | Polygon Simplification for the Efficient Approximate Analytics of Georeferenced Big Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |