CN110110130B

CN110110130B - Personalized tag recommendation method and system based on convolution features and weighted random walk

Info

Publication number: CN110110130B
Application number: CN201910424549.6A
Authority: CN
Inventors: 刘峥; 赵天龙; 袁韶璟; 高珊珊; 韩慧健; 张彩明
Original assignee: Shandong University of Finance and Economics
Current assignee: Shandong University of Finance and Economics
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2021-03-02
Anticipated expiration: 2039-05-21
Also published as: CN110110130A

Abstract

The invention discloses a personalized tag recommendation method and system based on convolution characteristics and weighted random walk, which comprises the following steps: inputting a set test image into a pre-trained convolutional neural network, and taking the output of a convolutional layer in the convolutional neural network as the visual characteristic of the image; coding the visual features, and converting the image into a visual feature vector; k adjacent images of the test image are searched to serve as a data set for recommending labels to the test image; establishing a weighted image-label bipartite model, and calculating the correlation of each label relative to a test image through an improved weighted random walk algorithm; and selecting the top N labels with the highest correlation degree and recommending the labels to the test image. The invention has the beneficial effects that: an improved weighted random walk algorithm is provided to calculate the relevance of all labels relative to the designated image, and label recommendation is sequenced according to the relevance, so that the accuracy of label recommendation can be effectively improved.

Description

Personalized tag recommendation method and system based on convolution features and weighted random walk

Technical Field

The invention belongs to the technical field of label recommendation of multimedia data, and particularly relates to a personalized label recommendation method and system based on convolution characteristics and weighted random walk.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

In recent years, with the explosive growth of multimedia data, the act of adding keywords (tags) to multimedia data has become a popular way of managing various internet resources, such as tagging resources such as internet pages, academic publications, and multimedia objects (audio, images, video). The tags provide meaningful descriptors of the data and allow the user to organize and index the corresponding content. Assigning labels or keywords to images, music, or video clips all change the way a user retrieves various internet resources.

The inventor finds that with the rapid increase of the number of social images uploaded to the internet, a photo sharing website has richer metadata information, so that users can organize and access shared media content more conveniently while the amount of picture information is increased, and how to retrieve images meeting the requirements of specific users from a large number of uploaded photos becomes a difficult point at present. In particular, image retrieval has been extensively studied both in content-based and label-based ways, the former relying on visual descriptors extracted from images to return images that best match a user-specified sample image, and the latter returning identically labeled images primarily according to labels assigned to the images.

Obtaining high-quality image annotations, whether manual or automatic, has been a major obstacle in label-based image retrieval, and nowadays, in image sharing communities, annotations attached to users uploading images have become a valuable source of image labels. Therefore, tag-based image retrieval technology (TagIR) is becoming increasingly important. But a prerequisite for label-based image retrieval is that the image has been labeled with an associated label. Existing research has shown that many tags attached to uploaded images in many picture media sharing communities are not accurate, and in fact it is irrelevant to assign nearly 50% of the tags to pictures. In addition, the importance of the tags is not linked to the current tag list order, which is simply based on the order of the input, and is almost never ranked with importance or relevance. The current mainstream research work mainly focuses on establishing the connection between the text labels from the visual features of the images, and does not consider the potential connection between the metadata information attached to the images and the image labels in the internet.

Disclosure of Invention

In order to solve the problems, the invention provides a personalized label recommendation method and system based on convolution characteristics and weighted random walk, which adopts the convolution layer output of an image in a convolution neural network as the visual characteristics of the image, converts the image into a visual characteristic vector by coding the visual characteristics, and searches visual neighbors through the characteristic vector and group metadata information; and (3) executing a weighted random walk algorithm on an image-label bipartite graph formed by the adjacent images and the corresponding labels, and performing personalized label recommendation for the images through the adjacent labels.

In some embodiments, the following technical scheme is adopted:

a personalized tag recommendation method based on convolution features and weighted random walks comprises the following steps:

inputting a set test image into a pre-trained convolutional neural network, and taking the output of a convolutional layer in the convolutional neural network as the visual characteristic of the image;

coding the visual features, and converting the image into a visual feature vector;

searching k adjacent images of the test image as a data set for recommending labels to the test image through the visual feature vector and the group metadata information;

establishing a weighted image-label bipartite model, and calculating the correlation of each label relative to a test image through an improved weighted random walk algorithm;

and selecting the top N labels with the highest correlation degree and recommending the labels to the test image.

Further, inputting a set test image into a pre-trained convolutional neural network, and taking the output of a convolutional layer in the convolutional neural network as the visual characteristic of the image; the method specifically comprises the following steps:

adjusting the size of the test image to be n multiplied by n suitable for the convolutional neural network, and inputting the test image into a pre-trained L-layer convolutional neural network;

forward propagation through the network, at the ith convolutional layer L_iAfter the features of the previous layer are passed through a convolution kernel, the result of the convolution is given a size n^l×n^l×d^lCharacteristic diagram M of^l(ii) a Wherein d is^lIs L_iThe number of convolution kernels;

in the feature map M^lAt each (i, j) position of (a), a d is obtained^lVector of dimensions

Finally, the test image is obtained on the convolution layer L_iN in (1)^l×n^lA local feature vector.

Further, coding the visual features, and converting the image into a visual feature vector; the method specifically comprises the following steps: encoding the local feature vector into a single visual feature vector by using VLAD encoding; the VLAD code is calculated by computing the convolution features extracted from the test image at a certain layer as k d^lDimension vectors, thus converting the processing of the test image into the processing of k vectors.

Further, k neighbor images of the test image are searched as a data set for recommending labels to the test image, specifically:

by computing all images in the dataset at L_iObtaining the feature vector table X of all the images;

calculating the test image at L_iFeature vector x of_pThen calculate x_pEuclidean distances from all the feature vectors in the feature vector table X are used as visual similarity data;

calculating the group co-occurrence coefficient normalization score of the test image and the image in the data set as group similarity data;

carrying out linear weighting on the visual similarity data and the group similarity data, and calculating the correlation of the pictures in the data set relative to the test image;

and sorting the correlation results from small to large, and selecting the first k images as adjacent images.

Further, performing minimum and maximum normalization on the correlation result of each neighboring image, and taking the correlation result as the voting weight of the neighboring image to the test image; and establishing a weighted image-label bipartite graph model according to the weight.

Further, calculating the correlation degree of each label relative to the test image through an improved weighted random walk algorithm, specifically:

PR (i) is the correlation degree of the node i relative to the test image, PR (j) is the correlation degree of the node j relative to the test image, d is the probability of jump access of the user, out (j) refers to a webpage set hyperlinked by the webpage j, and in (i) refers to all webpage sets hyperlinked to the webpage i;

u represents an input image node, meaning that each time a walk is made from the input image node; when j belongs to a tag node, ω_jIs 1, when j belongs to a neighboring image node, is ω_jThe voting weights of the neighboring images relative to the test image are attached.

Further, the improved weighted random walk algorithm obtains the correlation degree of all labels and adjacent images relative to the input image, and selects a plurality of labels with the highest correlation degree to recommend to the input image.

In other embodiments, the following technical solutions are adopted:

a computer-readable storage medium having stored therein a plurality of instructions adapted to be loaded by a processor of a terminal device and to execute the above-mentioned personalized tag recommendation method based on convolution feature and weighted random walk.

In other embodiments, the following technical solutions are adopted:

a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the personalized tag recommendation method based on the convolution characteristics and the weighted random walk.

Compared with the prior art, the invention has the beneficial effects that:

1. a bipartite graph model composed of labels and images is provided and used for data modeling, a neighbor image and weight thereof are determined through group information and image visual characteristics by combining a neighbor selection method of the group information and the image visual characteristics, a weighted neighbor image-label bipartite graph model is provided, and potential relation between image metadata information and image labels is established.

2. An improved weighted random walk algorithm is provided to calculate the relevance of all labels relative to the designated image, and label recommendation is sequenced according to the relevance, so that the accuracy of label recommendation can be effectively improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a schematic diagram illustrating obtaining visual feature vectors according to a first embodiment;

FIGS. 2(a) - (c) are schematic diagrams illustrating the influence of group information on image neighbor selection according to an embodiment;

FIG. 3 is a diagram illustrating the establishment of a bipartite graph of weighted neighbor labels according to an embodiment;

FIG. 4 is a graph showing a comparison of the properties of different layers in the first embodiment;

FIG. 5 is a diagram illustrating the correlation comparison between neighboring images at different λ values according to the first embodiment.

FIG. 6 is a diagram illustrating the influence of the recommended number of different tags on the NDCG value according to the first embodiment;

FIG. 7 is a diagram illustrating the influence of group information and weighted random walks on tag recommendation according to an embodiment;

FIG. 8 is a diagram illustrating comparison of tag results recommended by different methods according to the first embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

In one or more embodiments, disclosed is a personalized tag recommendation method based on convolution features and weighted random walks, comprising the following steps:

(1) inputting a set test image into a pre-trained convolutional neural network, and taking the output of a convolutional layer in the convolutional neural network as the visual characteristic of the image;

by training the multilayer convolution filter, the CNN can automatically learn complex features to perform object recognition, and the CNN trained for the image classification task can be used for extracting general features of other visual recognition tasks.

Given a test image, the size of the test image is first adjusted to n × n suitable for the network, then the test image is input into a pre-trained L-layer convolutional neural network CNN, and then the test image is propagated forward through the network, as can be seen from FIG. 1, at the ith convolutional layer L_iAfter the features of the previous layer are passed through a convolution kernel, the result of the convolution is given a size n^l×n^l×d^lCharacteristic diagram M of^lWhere d is^lMeans L_iNumber of convolution kernels in the feature map M^lAt each (i, j) position of (a), where 0. ltoreq. i.ltoreq.n^l-1，0≤j≤n^l-1, to obtain a d^lVector of dimensions

In this way, the input image is obtained at the convolution layer L_iN in (1)^l×n^lThe local visual characteristics of the human body are measured,

in our work, it is necessary to study the convolution results of the layers to study the image feature extraction performance, so that the features obtained by the convolution kernel of each layer are preserved: { F¹,F²,···,F^L}。

(2) Coding the visual features, and converting the image into a visual feature vector;

since CNNs are trained for classification tasks, features from the highest or next highest level are typically used for decision making because they capture more of the semantic features at the classification level. More target local features can be captured from lower layers, while local features of objects are not well preserved on higher-layer networks, so they perform better in example-level image retrieval than features extracted from higher layers, which indicates that applying the last or higher layer designed for classification tasks directly to example-level image retrieval is not the best choice. Because different objects from similar categories need to be distinguished. Therefore, it is a problem for the example-level image retrieval task from which layer features are extracted.

Since it is complicated to use extracted features in CNN networks directly for instance-level image retrieval, the features are encoded to achieve efficient retrieval, and since each image contains a set of low-dimensional feature vectors whose structure is similar to dense SIFT, these feature vectors are encoded as a single feature vector using VLAD coding. VLAD coding can efficiently code local features into a single descriptor while achieving a good balance between retrieval accuracy and memory footprint.

VLAD coding is similar to constructing a BoW histogram. Mixing L with_iN in (1)^l×n^lThe convolution characteristics are subjected to L2 standardization, and K-Means clustering is performed on the standardized characteristics to obtain a vocabulary with K visual words

L_iExtracted features

By calculating the distanceThe most recent vocabulary

And (4) obtaining. For each visual vocabulary

To be distributed to

All feature calculations of

And

vector residuals between:

the VLAD algorithm operates by combining the image L_iThe convolution features extracted at a certain layer are calculated into k d^lDimension vector v^lThereby converting the processing of the image into k d^lDimension vector v^lAnd (4) processing. At the L th of the image_iThe VLAD code on a layer is formulated as:

in the formula

Representing visual words

With all convolution features belonging to this vocabulary

Cumulative residual of v, will v^lExpansion to obtain k × d^lThe long vector x of (2) is taken as a feature vector of the input image, i.e., the image is represented by x.

(3) K adjacent images of the test image are searched to serve as a data set for recommending labels to the test image;

for the personalized recommendation of the image label, the nearest neighbor of the image does not select the most similar image in vision but expresses the related image of the same theme. Therefore, not only the visual similarity of the images but also the information of the metadata of the group to which the images belong should be considered in the selection process of the neighbors. The effect of groups on inter-image correlation is illustrated by fig. 2(a) - (c), where the tag list of fig. 2(a) includes: baikal, ice, lack, winter, frozen; the tag list of fig. 2(b) includes: baikal, shore, ice, lack, winter, rock, winter; the tag list of fig. 2(c) includes: cliff, sea, shore, wave, rock, cloud; it can be seen that although fig. 2(b) is more visually similar to fig. 2(c), fig. 2(b) belongs to the same group as fig. 2 (a). It can be seen from the tag list that the tags between fig. 2(a) and fig. 2(b) are more similar, and it is illustrated from the side that the same group contains more relevant subjects, and the pictures therein are also more relevant.

1) Visual similarity

By computing all images in the dataset (obtained by crawling data of the Flickr website) at L_iFor the test image p, first calculate p at L_iFeature vector x of_pThen calculate x_pEuclidean distances to all feature vectors in X:

in the formula, rho is the visual distance of two images, x_i、p_iAre d-dimensional feature vectors of the two images. The p values are mapped between 0-1.

2) Group similarity

And calculating the group co-occurrence coefficient normalized score of the test image and the visual neighbor image.

The group is known in advance, and the user information comprises group information; the co-occurrence coefficient value is the number of group intersections of two pictures divided by the number of group union of the two pictures. The Jaccard coefficient is used to measure the similarity between two pictures over the group metadata.

3) Neighbor formalized representation

Calculating the correlation of the pictures in the data set relative to the test picture by calculating the visual similarity and the group similarity of the test image and the images in the data set and linearly weighting the visual similarity and the group similarity, and the formula is as follows:

y＝λ*(1-ρ)+(1-λ)*J (4)

where y represents the correlation score of the test image with the image in the dataset, and λ coefficient weight.

And sorting the relevance scores from small to large, selecting the first k images as adjacent images, and performing minimum and maximum normalization on the relevance scores of all the adjacent images as the voting weight of the adjacent images to the test image.

(4) Establishing a weighted image-label bipartite model, and calculating the correlation of each label relative to a test image through an improved weighted random walk algorithm;

(5) and selecting the top N labels with the highest correlation degree and recommending the labels to the test image.

The first k neighbor images obtained according to the magnitude of Y value, for which we build a weighted neighbor label bipartite graph, the magnitude of weight is Y value, assuming we have the following set of neighbor images, the process is shown in fig. 3.

The image photo is [ a, B, C, D ] is a neighbor image set, the previous value is the relevance score y of each image relative to the image a, and the tag is [ a, B, C, D, e ] is a tag set.

The Pagerank random walk algorithm is an algorithm used for calculating the access heat of each webpage in the Internet, the algorithm scores the webpages and then ranks the webpages, the basic idea is that each webpage in the Internet is mutually connected through a hyperlink, a user can jump to access another webpage through the hyperlink of one webpage, the webpages in the Internet form each node of a graph, and when one user accesses one webpage, two options are provided, one option is to stay at the current page, and the other option is to jump to other webpages for access through the hyperlinks contained in the current page. And if the probability of the user jumping to access is d, the probability of staying at the current webpage is 1-d. Assuming that a user accesses other web pages uniformly through hyperlinks of the current web page, a random walk process is formed, and after a large number of users access each web page in the internet for many times, the probability of each web page being accessed converges to a certain value, and the web page ranking can be performed through the value, wherein the random walk process is expressed by a formula:

PR (i) in the formula refers to the probability that the webpage i is accessed, d refers to the probability that the user jumps to access, N refers to the number of webpages in the Internet, in (i) refers to all webpage sets hyperlinked to the webpage i, and out (j) refers to the webpage set hyperlinked by the webpage j. The access probability of a web page i contains two parts: the first part is the probability that the user initially visits i and stays down:

the second part is the probability that the user accesses i through hyperlinks to other web pages:

these two parts constitute the access probability of web page i.

In the pagerank algorithm, the correlation degree of each vertex in the graph relative to other vertices is calculated, however in our work, the correlation of all labels relative to the input picture is needed, and meanwhile, the correlation (neighbor weight) of neighbor images relative to the input image needs to be considered, so that on the basis of the pagerank algorithm, the following weighted random walk formula is improved:

compared with the pagerank algorithm, the method has two differences, the first is r_iThe value of (b) represents an input image node, which means that the input image node is walked from each time; second is ω_jTaking a value, when j belongs to a label node, we ignore the parameter, and when j belongs to a neighboring image node, it is ω_jAnd attaching the weight y of the adjacent image relative to the input image, and recommending a plurality of labels with the highest correlation degree to the input image according to the result obtained by the algorithm, wherein the correlation degree of all vertexes (including labels and images) relative to the input image.

Experimental verification

The experimental data set contains the required group metadata information, namely an image is shared by a plurality of groups, and the experiment is carried out on the basis of the expanded data set. Firstly, images in a data set need to be converted into visual feature vectors, a classic AlexNet convolutional neural network structure is selected, the network comprises five convolutional layers and two full-connection layers, an ImageNet data set is used for training network parameters, and experiments are conducted on the trained network.

Inputting all pictures in the data set into a network to extract the characteristics of each convolution layer, carrying out VLAD coding, carrying out K-Means clustering on all the characteristics of each convolution layer, setting a clustering value K to be 100, thus obtaining 100 visual vocabularies in each convolution layer, and calculating residual errors for each visual vocabulary

VLAD codes of each image in each convolution layer are obtained, the codes are unfolded into long vectors, and feature vectors of each image in each convolution layer are obtained.

Inputting a test image, extracting convolution characteristics of each layer, calculating residual errors through visual vocabularies of the corresponding layer to carry out VLAD coding, obtaining a characteristic vector of the test image after expansion, and selecting the first 15 images with the minimum distance to compare the correlation between the input image and the selected image through MAP by calculating Euclidean distance.

As can be seen from fig. 4, the image features extracted by the fifth convolutional layer (conv5) perform better in the selection of visual neighbors than other layers, so the neighbors are selected by extracting the image features through the AlexNet network fifth convolutional layer.

The parameter of the formula (4) can be obtained through training data, for an input image I, the value of lambda is increased progressively from 0.1 to 0.9 by the step length of 0.2, and the correlation between the obtained neighbor image and the training image is calculated when different values are obtained. Similarly, the rest of the images are input, the correlation of the images of the respective groups under different parameters are averaged and compared, and the comparison result is shown in fig. 5.

As can be seen from fig. 5, when the λ value is set to 0.4, the resulting neighbor image is more correlated with the input image, while in comparison with fig. 4, it is found that the neighbor correlation obtained by combining the visual information and the group information is significantly higher than that obtained based on the visual information alone.

To verify the effect of the method of the present embodiment, the following verification was performed:

(1) the influence of the recommended number of the tags on the recommendation result is compared by setting different recommended numbers of the tags, as shown in fig. 6; as can be seen from fig. 6, when the recommended number of tags is 10, a better tag recommendation effect can be obtained.

(2) And comparing in four ways according to whether the group information is considered in the selection of the neighbor and whether the weight of the neighbor is considered in the random walk. We denote A, a for considering and not considering the group information and B, b for considering and not considering the weight of the neighbor, then there are four combinations of AB, AB. The effect of the four combinations on the recommended label is shown in FIG. 7; it can be seen from fig. 7 that the tag recommendation combining the group information and the weight achieves the best effect. Meanwhile, the positive effect of the group information in the label recommendation can be obtained by comparing Ab and Ab; comparing aB to aB can yield a positive role for neighbor weights in label recommendation.

(3) The comparison between the weighted random walk algorithm based on the convolution characteristic and other methods in the embodiment is shown in fig. 8, and it can be seen that the performance of the method in the embodiment is superior. Image search is performed after labels are recommended for images through a personalized label recommendation algorithm, and P (precision), R (recall), F1 (harmonic mean based on precision and recall) and MAP (mean of average precision) are calculated to evaluate the recommendation effect of each method, as shown in table 1.

TABLE 1 comparison of image tag recommendation results using P, R, F1 and MAP

As can be seen from table 1, after the image is labeled with the recommended label by the method of the present embodiment, the image retrieval performance is superior to that of other methods in the four evaluation indexes of the P, R, F1 values and MAP, which proves the superiority of the method of the present embodiment compared with other methods, more relevant neighbor images can be obtained by combining the group information and the visual features, and the accuracy of the label can be effectively improved by the weighted random walk algorithm.

Example two

In one or more embodiments, a terminal device is disclosed that includes a processor and a computer-readable storage medium, the processor to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the personalized tag recommendation method based on convolution characteristics and weighted random walk in the first embodiment. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The computer readable storage medium may include a read-only memory and a random access memory and provide instructions and data to the processor, and a portion of the memory may also include a non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

The steps of a method in connection with one embodiment may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. A personalized tag recommendation method based on convolution features and weighted random walks is characterized by comprising the following steps:

k adjacent images of the test image are searched through the visual feature vector and the group metadata information to serve as a data set for recommending labels to the test image;

establishing a weighted image-label bipartite model, and calculating the correlation of each label relative to a test image through an improved weighted random walk algorithm; the weighted random walk formula of the improved weighted random walk algorithm is as follows:

PR (i) is the correlation degree of the node i relative to the test image, PR (j) is the correlation degree of the node j relative to the test image, d is the probability of jump access of a user, out (j) refers to a webpage set hyperlinked by a webpage j, in (i) refers to all webpage sets hyperlinked to the webpage i, and u represents an input image node, which means that the input image node is walked from each time; when j belongs to a tag node, ω_jIs 1, when j belongs to a neighboring image node, is ω_jAssigning a voting weight y of the neighboring image relative to the test image, wherein y represents a relevance score of the test image and the image in the data set;

2. The personalized tag recommendation method based on the convolution feature and the weighted random walk according to claim 1, characterized in that a set test image is input into a pre-trained convolution neural network, and the output of convolution layers in the convolution neural network is used as the visual feature of the image; the method specifically comprises the following steps:

forward propagation through the network, at the ith convolutional layer L_iAfter the features of the previous layer are passed through a convolution kernel, the result of the convolution is given a size n^l×n^l×d^lCharacteristic diagram M of^l(ii) a Wherein d is^lIs L_iNumber of convolution kernels for a layer;

3. The personalized tag recommendation method based on convolution features and weighted random walk according to claim 1, characterized in that, the visual features are encoded, and the image is converted into a visual feature vector; the method specifically comprises the following steps: encoding the local feature vector into a single visual feature vector by using VLAD encoding; the VLAD code is calculated by computing the convolution features extracted from the test image at a certain layer as k d^lDimension vectors, thus converting the processing of the test image into the processing of k vectors.

4. The personalized tag recommendation method based on convolution feature and weighted random walk as claimed in claim 1, wherein k neighbor images of the test image are searched as a data set for recommending a tag to the test image, specifically:

by computing all images in the dataset at L_iObtaining the feature vector table X of all the images; the above-mentionedL_iIs the ith convolution layer;

5. The personalized tag recommendation method based on the convolution feature and the weighted random walk as claimed in claim 4, wherein the correlation result of each neighboring image is subjected to minimum and maximum normalization to be used as the voting weight given to the test image by the neighboring image; and establishing a weighted image-label bipartite graph model according to the weight.

6. The personalized tag recommendation method based on the convolution characteristic and the weighted random walk as claimed in claim 1, wherein the improved weighted random walk algorithm obtains the correlation degrees of all tags and neighboring images relative to the input image, and selects a plurality of tags with the highest correlation degrees to recommend to the input image.

7. A computer-readable storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor of a terminal device and to perform the method for personalized tag recommendation based on convolution feature and weighted random walk of any of claims 1-6.

8. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method for personalized tag recommendation based on convolution features and weighted random walks according to any one of claims 1 to 6.