CN113761305B

CN113761305B - Method and device for generating label hierarchical structure

Info

Publication number: CN113761305B
Application number: CN202010494685.5A
Authority: CN
Inventors: 陈希
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2024-07-16
Anticipated expiration: 2040-06-03
Also published as: CN113761305A

Abstract

The invention discloses a method and a device for generating a label hierarchical structure, and relates to the technical field of computers. One embodiment of the method comprises the following steps: according to the occurrence times of each label in each file object, screening out label pairs with association relations; generating a label relation diagram according to each label pair; the nodes in the relation graph are labels, and the weight of the edge is the co-occurrence times of the two labels in the same file object; and clustering each node in the label relation graph, and calculating the membership degree of the adjacent nodes so as to generate a label hierarchical structure. The embodiment can solve the technical problem that the position of the tag in the tag hierarchical structure is unique.

Description

Method and device for generating label hierarchical structure

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for generating a hierarchical label structure.

Background

In the content field of the internet, many websites endow users with a function of freely marking objects of interest (such as stamps, videos, pictures and the like), and labels marked by the users are called social labels, and are summarized into a system called a mass classification method (Folksonomy).

Although the number of the labels is rich, the coverage content of the same label is less, the labels are scattered and tiled, and the application value density is low. In order to overcome the problem of lack of organization of the social labels, the internal relation needs to be found from the labels and a label hierarchical structure needs to be constructed, so that the labels are applied in service scenes such as search recommendation, advertisement delivery and the like.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

The position of each label in the generated label hierarchical structure is unique, and the position cannot completely meet the actual requirements; if tags can appear at different locations in the same hierarchy, their corresponding weight duty cycles cannot be measured.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a method and apparatus for generating a tag hierarchy, so as to solve the technical problem that the tag is unique in position in the tag hierarchy.

To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of generating a tag hierarchy, including:

according to the occurrence times of each label in each file object, screening out label pairs with association relations;

Generating a label relation diagram according to each label pair; the nodes in the relation graph are labels, and the weight of the edge is the co-occurrence times of the two labels in the same file object;

and clustering each node in the label relation graph, and calculating the membership degree of the adjacent nodes so as to generate a label hierarchical structure.

Optionally, the clustering the nodes in the label relation graph and calculating membership degrees of adjacent nodes, so as to generate a label hierarchical structure, including:

Calculating the average centrality of each node in the label relation graph;

screening at least one secondary root node according to the average centrality of each node and the association relation between each node;

The membership degree of each secondary root node and each adjacent node is calculated respectively, so that candidate node sets corresponding to the secondary root nodes are determined, and each node in the candidate node sets has a membership relation with the secondary root node;

The above steps are repeatedly performed, thereby generating a tag hierarchy.

Optionally, the calculating the average centrality of each node in the label relation graph includes:

for each node, calculating the calculation degree centrality, the intermediary centrality, the proximity centrality and the webpage ranking value of the node respectively;

respectively carrying out normalization processing on the computation degree centrality, the intermediary centrality, the proximity centrality and the webpage ranking value;

And arithmetically averaging the calculated degree centrality, the intermediate centrality, the proximity centrality and the webpage ranking value after normalization processing, so as to obtain the average centrality of the node.

Optionally, the screening at least one secondary root node according to the average centrality of each node and the association relationship between each node includes:

The average centrality of the nodes is arranged in a descending order, and N nodes with the front average centrality are screened out; wherein N is an integer greater than zero;

For the N nodes, dividing the nodes with the association relation into a group, thereby obtaining at least one node group;

for each node group, the node with the largest average centrality in the node group is taken as a root node.

Optionally, the membership degree between the secondary root node and any adjacent node is calculated by adopting the following method:

the weight of the edge between the adjacent node and the secondary root node accounts for the duty ratio of the sum of the weights of all the edges of the adjacent node.

Optionally, the determining the candidate node set corresponding to each secondary root node includes:

And adding adjacent nodes with membership degree greater than or equal to a membership degree threshold value into a candidate node set corresponding to the secondary root nodes, so that each adjacent node is at least affiliated to one secondary root node.

Optionally, the screening the tag pairs with association relationships according to the occurrence times of each tag in each file object includes:

according to the occurrence times of each label in each file object, the co-occurrence times of any two labels in the same file object are calculated respectively;

for any two tags, judging whether an association relationship exists between the two tags according to the co-occurrence times of the two tags in the same file object, the total number of the file objects and the number of the file objects with one tag, so as to screen out tag pairs with association relationships.

Optionally, the determining whether the association relationship exists between the two labels according to the co-occurrence times of the two labels in the same file object, the total number of the file objects and the number of the file objects with one label, includes:

Dividing the co-occurrence times of the two labels in the same file object and the total number of the file objects to obtain a support degree;

Dividing the co-occurrence times of the two labels in the same file object with the number of the file objects with one label, and obtaining a confidence level;

And if the support degree is greater than or equal to a support degree threshold value and the confidence degree is greater than or equal to a confidence degree threshold value, judging that an association relationship exists between the two labels.

Optionally, after the generating the tag hierarchy, further comprising:

and matching corresponding labels for each file object according to the label hierarchical structure.

In addition, according to another aspect of an embodiment of the present invention, there is provided an apparatus for generating a tag hierarchy, including:

The screening module is used for screening out label pairs with association relations according to the occurrence times of each label in each file object;

The association module is used for generating a label relation diagram according to each label pair; the nodes in the relation graph are labels, and the weight of the edge is the co-occurrence times of the two labels in the same file object;

And the generation module is used for clustering each node in the label relation graph and calculating the membership degree of the adjacent nodes so as to generate a label hierarchical structure.

Optionally, the generating module is further configured to:

Calculating the average centrality of each node in the label relation graph;

The above steps are repeatedly performed, thereby generating a tag hierarchy.

Optionally, the generating module is further configured to:

Optionally, the generating module is further configured to: the membership degree of the secondary root node and any adjacent node is calculated by adopting the following method:

Optionally, the generating module is further configured to:

Optionally, the screening module is further configured to:

Optionally, the method further comprises a matching module for:

After the tag hierarchy is generated, corresponding tags are matched for each file object according to the tag hierarchy.

According to another aspect of an embodiment of the present invention, there is also provided an electronic device including:

One or more processors;

storage means for storing one or more programs,

The one or more processors implement the method of any of the embodiments described above when the one or more programs are executed by the one or more processors.

According to another aspect of an embodiment of the present invention, there is also provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of the embodiments described above.

One embodiment of the above invention has the following advantages or benefits: because the technical means of generating the label relation graph according to the label pairs with the association relation, clustering each node in the label relation graph and calculating the membership degree of the adjacent nodes is adopted, so that the label hierarchical structure is generated, and the technical problem that the position of the label in the label hierarchical structure is unique in the prior art is solved. The embodiment of the invention solves the problem that the labels have ambiguity by a fuzzy clustering method, so that the labels can appear at different positions, and calculates the probability value (namely membership) of each label appearing at different positions; and the label hierarchical structure can be automatically constructed by flexibly controlling the recursive clustering through membership, so that the labor cost can be saved.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of a prior art tag hierarchy;

FIG. 2 is a schematic diagram of the main flow of a method of generating a hierarchy of tags according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the main flow of a method of generating a hierarchy of tags according to one referenceable embodiment of the invention;

FIG. 4 is a schematic diagram of generating a hierarchy of tags according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of the major modules of an apparatus for generating a hierarchy of tags according to an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Typical tag hierarchy structure is shown in fig. 1, and currently, three ways of constructing a tag hierarchy structure are mainly manual, semi-manual and automatic. Of these, manual construction is the highest in quality, but requires a lot of manual effort and is subject to subjective factors, and construction costs are also the highest. The semi-manual mode is constructed manually by using a learning system, and still needs to participate in a large amount of work manually, so that large-scale expansion cannot be performed. The automatic construction of the label system is a current mainstream research trend, and the construction process is generally divided into two steps, namely, the relation among labels is found based on label semantics, and a hierarchy is constructed by utilizing the relation among the labels. It can be seen from the figure that the location of each tag in the generated tag hierarchy is unique and that even though the tags may appear at different locations in the same hierarchy, their corresponding weight duty cycle cannot be measured.

In order to solve the above technical problems in the prior art, the embodiment of the present invention provides a method for generating a hierarchical structure of labels, which not only enables the labels to appear in different positions, but also calculates membership degrees (i.e., weight duty ratio) of each label appearing in different positions.

Fig. 1 is a schematic diagram of the main flow of a method of generating a tag hierarchy according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 1, the method for generating a tag hierarchy may include:

Step 101, according to the occurrence times of each label in each file object, the label pairs with association relation are screened out.

In the embodiment of the invention, the file objects can be text, pictures, videos and the like, and authors and users can add tags to the respective file objects, so that the tags to be structured can be social tags or tags designed by the authors, but no clear relationship is established.

Prior to step 101, a file object needs to be associated with (i.e., tagged with) a tag. For example, a text describing make-up may be labeled "lipstick", "Tom Ford", "king". To increase the confidence of a tag, it is necessary to ensure the number of occurrences of the tag in the text, and for tags with very few occurrences, the tag may be merged into an existing tag by synonym.

Optionally, step 101 may include: according to the occurrence times of each label in each file object, the co-occurrence times of any two labels in the same file object are calculated respectively; for any two tags, judging whether an association relationship exists between the two tags according to the co-occurrence times of the two tags in the same file object, the total number of the file objects and the number of the file objects with one tag, so as to screen out tag pairs with association relationships. In the embodiment of the invention, whether the association relationship exists between any two labels is judged through the co-occurrence times of the two labels in the same file object, the total number of the file objects and the number of the file objects with one label, if so, the two labels are formed into a label pair, so that the association relationship between the labels is cut out, and the redundant relationship is filtered. It should be noted that one tag may form a respective pair of tags with a plurality of other tags.

Assuming that l= { L1, L2, …, ln } is a set of labels, and each text has a unique ID, the texts are labeled with a plurality of labels, and the number of times that each label appears in all the texts can be calculated as a basis for judging the strength of the connection between the labels. Alternatively, an apriori algorithm may be employed to mine the frequent item sets and generate association rules. The frequent item set is mined to obtain a label with a statistical minimum support (support) greater than a specified threshold, and the association rule is generated to obtain a minimum confidence (confiden ce) on the basis of conforming to the minimum support, namely, if the probability of occurrence of the label l2 under the condition that the label l1 is known to occur is greater than the specified threshold, the association rule l 1- & gt l2 is reached.

Optionally, determining whether an association relationship exists between the two tags according to the co-occurrence times of the two tags in the same file object, the total number of the file objects and the number of the file objects with one tag, includes: dividing the co-occurrence times of the two labels in the same file object and the total number of the file objects to obtain a support degree; dividing the co-occurrence times of the two labels in the same file object with the number of the file objects with one label, and obtaining a confidence level; and if the support degree is greater than or equal to a support degree threshold value and the confidence degree is greater than or equal to a confidence degree threshold value, judging that an association relationship exists between the two labels.

Examples are as follows:

ID	Iphone	Apple tree	Hua Cheng Ji (Chinese character)	Android device
					1	1	1	1	0
2	1	1	0	0
					3	1	0	0	0
4	1	0	1	0
					5	0	1	1	1
6	1	1	0	0

The table above is a text and label relationship containing 6 texts. The term set i= { iphone, apple, hua is android }. Consider the association rule: iphone and apple, text 1,2,3,4,6 contains iphone, text 1,2,6 contains iphone and apple at the same time, X n y=3, a=6, support (X n Y)/a=0.5; x=5, confidence (X n Y)/x=0.6. If the minimum support degree alpha=0.5 and the minimum confidence degree beta=0.6 are given, a strong association relationship exists between the iphone label and the apple label, and the two labels form a label pair.

Step 102, generating a label relation diagram according to each label pair; the nodes in the relation graph are labels, and the weight of the edge is the co-occurrence number of the two labels in the same file object.

According to each label pair screened in the step 101, a label relation graph (undirected graph) g= (V, E) is generated, V is a set of nodes (labels) in the graph, and E is a set of edges. The weight of the edge is the co-occurrence number of the two node labels. For example, if the iphone label and the apple label coexist 3 times, the edge connecting the two nodes has a weight of 3.

And step 103, clustering each node in the label relation graph and calculating the membership degree of the adjacent nodes so as to generate a label hierarchical structure.

The embodiment of the invention refers to the idea of membership functions of fuzzy clusters, calculates the association relation of each label relative to the cluster center by combining the characteristics of the graph, and then selects the center point representing the cluster by a centrality algorithm. Optionally, step 103 may include:

The first-level label hierarchy is generated by the following method:

Calculating the average centrality of each node in the label relation graph;

The step of generating a first level label hierarchy is repeated, thereby generating a label hierarchy.

In the embodiment of the present invention, the root node may be determined by the user, and the root node may not be a label in the label relation graph, but may also be a certain label (may be a node with the highest average centrality) in the label relation graph.

Optionally, calculating an average centrality of each node in the label relation graph includes: for each node, calculating the calculation degree centrality, the intermediary centrality, the proximity centrality and the webpage ranking value of the node respectively; respectively carrying out normalization processing on the computation degree centrality, the intermediary centrality, the proximity centrality and the webpage ranking value; and arithmetically averaging the calculated degree centrality, the intermediate centrality, the proximity centrality and the webpage ranking value after normalization processing, so as to obtain the average centrality of the node.

The centrality (DEGREE CENTRALITY) is the most direct metric that characterizes node Centrality (CENTRALITY) in network analysis. The greater the degree of a node means the greater the centrality of the node, the more important the node is in the network.

Intermediate centrality/intermediate centrality (Between Centrality) characterizes an index of node importance in terms of the number of shortest paths through a node.

Proximity centrality (Closeness Centrality), reflecting the proximity between a node and other nodes in the network, is represented by the reciprocal of the sum of the shortest path distances from one node to all other nodes. I.e., for one node, the closer it is to the other nodes, the greater its proximity centrality.

Webpage ranking (PageRank), also known as webpage level, google left ranking or petty ranking, is a technique calculated from hyperlinks between webpages, and is used as one of the elements of webpage ranking to represent the relevance and importance of webpages.

According to the label relation diagram generated in the step 102, the embodiment of the invention respectively calculates the calculation degree centrality, the intermediate centrality, the approximate centrality and the webpage sorting value (namely PageRank value) of each node in the diagram, normalizes each centrality and then calculates arithmetic average to obtain the average centrality of the node. Through experiments, the single-class centrality algorithm is applicable to different data scenes. According to the embodiment of the invention, the combination degree centrality is used for counting the total number of the access degrees, the intermediate centrality is used for bridge connection, interaction among nodes is considered near to the shortest path between centrality and other nodes and the pagerank value, the final average centrality is calculated after steel is removed, and finally the most representative label (namely the secondary root node) is selected from a stack of labels through the average centrality and is used as a representative of the group of labels (namely the node set in the label relation graph).

Optionally, screening at least one secondary root node according to the average centrality of each node and the association relationship between each node, including: the average centrality of the nodes is arranged in a descending order, and N nodes with the front average centrality are screened out; wherein N is an integer greater than zero; for the N nodes, dividing the nodes with the association relation into a group, thereby obtaining at least one node group; for each node group, the node with the largest average centrality in the node group is taken as a root node. Each node in the label relation graph can be considered as a candidate node set of the secondary root node, after N nodes with the front average centrality are screened out, whether the N nodes have the association relation or not is judged, and if the N nodes have the association relation, the nodes are divided into a group. Since it has been calculated in step 101 whether there is an association between the respective labels, if there are two directly connected nodes among the N nodes, the two nodes are considered to have an association, and are divided into one node group.

For example, in the label relation diagram of the root node "mobile phone", 9 node labels with the highest average centrality are screened out, and are respectively a battery, a screen, a full screen, a photographing, a camera, a performance, a pixel, a processor and a machine body. Based on the association relationship among the 9 nodes, the nodes are divided into five node groups: (photograph, camera, pixel) as a group, (processor, performance) as a group, (screen, full screen) as a group, battery as a group, fuselage as a group. And finally, screening out a node with the highest average centrality from the five node groups as a secondary root node: for example, a photo, processor, screen, battery, body are five classified secondary root nodes under the handset.

Optionally, the membership degree between the secondary root node and any adjacent node is calculated by adopting the following method: the weight of the edge between the adjacent node and the secondary root node accounts for the duty ratio of the sum of the weights of all the edges of the adjacent node.

For example: the kylin-processor side was 30, the kylin-photographed side was 5, and the kylin-battery side was 5, so that kylin 0.75 was attached to the processor, 0.125 was attached to the photograph, and 0.125 was attached to the battery. The edge of the processor-apple is 15, the edge of the photo-apple is 25, the edge of the screen-apple is 30, the edge of the battery-apple is 20, the edge of the machine body and the edge of the apple are 10, then the apple label corresponds to the processor, the photo is taken, the membership degree of the labels such as the screen, the battery and the machine body is as follows in sequence: 0.15,0.25,0.3,0.2,0.1.

Optionally, determining a candidate node set corresponding to each secondary root node includes: and adding adjacent nodes with membership degree greater than or equal to a membership degree threshold value into a candidate node set corresponding to the secondary root nodes, so that each adjacent node is at least affiliated to one secondary root node. And pruning is carried out by setting a membership threshold, so that nodes with membership lower than the membership threshold are removed, and each node is ensured to be at least affiliated to one secondary root node. In an embodiment of the present invention, one node may be subordinate to a plurality of secondary root nodes, and at least subordinate to one secondary root node.

For example, setting the effective membership of the reserved node with the membership degree of more than or equal to 0.2, wherein kylin is affiliated to the processor, and the membership degree is 0.75; the apples are subject to photographing, a screen and a battery, and the membership degree is 0.25,0.3,0.2.

And then, taking the membership nodes of each secondary root node as a candidate node set, and repeating the steps to generate a next-stage label level and membership degree until all level relations are established. Or a stop condition is set: the number of adjacent nodes is less than a specified threshold. It should be noted that the nodes of the first and last hierarchy do not need to calculate membership.

For example, a candidate node set relevant to the processor is taken, the node with the highest centrality is high-pass, performance and …, the nodes are used as secondary root nodes, the membership degree of the nodes and the processor is obtained respectively, and then the lower label and the membership degree of the nodes are calculated.

For example, the secondary root node is calculated from the mobile phone related text as: photographing, a processor, a screen, a battery and a machine body. Calculating membership degree of secondary root node and adjacent node as

Secondary root node	Adjacent node	Membership degree	Tag ID
				Processor and method for controlling the same	2.4G	0.576000	0
Battery cell	2.4G	0.218667	4
				Battery cell	360	0.313453	4
Processor and method for controlling the same	360	0.327013	0
				Fuselage body	3D	0.456526	3

The label hierarchical structure with the weight ratio (namely membership degree) constructed by the embodiment of the invention can be used for searching the accurate matching of recommended content and users and can also be used for finding similar interests among users.

Optionally, after the tag hierarchy is generated, corresponding tags may be matched for each file object according to the tag hierarchy. Because the label hierarchical structure constructed by the embodiment of the invention fully discovers the internal relation among the labels, the labels matched with each file object can better characterize the characteristics of each file object, and accurate recommendation and delivery are realized when the label is applied in service scenes such as search recommendation, advertisement delivery and the like.

According to the various embodiments described above, it can be seen that the technical means of generating the label hierarchical structure by generating the label relation graph according to each label pair with an association relation, clustering each node in the label relation graph and calculating the membership degree of the adjacent node, solves the technical problem of unique position of the label in the label hierarchical structure in the prior art. The embodiment of the invention solves the problem that the labels have ambiguity by a fuzzy clustering method, so that the labels can appear at different positions, and calculates the probability value (namely membership) of each label appearing at different positions; and the label hierarchical structure can be automatically constructed by flexibly controlling the recursive clustering through membership, so that the labor cost can be saved.

Fig. 3 is a schematic diagram of the main flow of a method of generating a hierarchy of tags according to one referenceable embodiment of the invention. As yet another embodiment of the present invention, as shown in fig. 3, the method for generating a tag hierarchy may include:

step 301, preparing basic data.

As shown in fig. 4, the basic data includes each file object and its corresponding tag. The file objects may be text, pictures, videos, etc., and the author and user may add tags to each file object, so the tags to be structured may be social tags or tags designed by the author, but no explicit relationship has been established.

Step 302, according to the occurrence times of each label in each file object, the co-occurrence times of any two labels in the same file object are calculated respectively.

Step 303, for any two tags, determining whether an association relationship exists between the two tags according to the co-occurrence times of the two tags in the same file object, the total number of the file objects and the number of the file objects with one tag, so as to screen out a tag pair with the association relationship.

As shown in fig. 4, an apriori algorithm may be employed to mine frequent item sets and generate association rules. Specifically, dividing the co-occurrence times of the two labels in the same file object with the total number of the file objects to obtain a support degree; dividing the co-occurrence times of the two labels in the same file object with the number of the file objects with one label, and obtaining a confidence level; and if the support degree is greater than or equal to a support degree threshold value and the confidence degree is greater than or equal to a confidence degree threshold value, judging that an association relationship exists between the two labels.

And step 304, generating a label relation diagram according to each label pair.

The nodes in the relation graph are labels, and the weight of the edge is the co-occurrence number of the two labels in the same file object.

And step 305, calculating the average centrality of each node in the label relation graph.

For each node, calculating the calculation degree centrality, the intermediary centrality, the proximity centrality and the webpage ranking value of the node respectively; respectively carrying out normalization processing on the computation degree centrality, the intermediary centrality, the proximity centrality and the webpage ranking value; and arithmetically averaging the calculated degree centrality, the intermediate centrality, the proximity centrality and the webpage ranking value after normalization processing, so as to obtain the average centrality of the node.

And 306, screening out at least one secondary root node according to the average centrality of each node and the association relation between each node.

The average centrality of the nodes is arranged in a descending order, and N nodes with the front average centrality are screened out; wherein N is an integer greater than zero; for the N nodes, dividing the nodes with the association relation into a group, thereby obtaining at least one node group; for each node group, the node with the largest average centrality in the node group is taken as a root node. Each node in the label relation graph can be considered as a candidate node set of the secondary root node, after N nodes with the front average centrality are screened out, whether the N nodes have the association relation or not is judged, and if the N nodes have the association relation, the nodes are divided into a group.

Step 307, calculating membership degrees of each secondary root node and each adjacent node, so as to determine a candidate node set corresponding to each secondary root node.

Optionally, the membership degree between the secondary root node and any adjacent node is calculated by adopting the following method: the weight of the edge between the adjacent node and the secondary root node accounts for the duty ratio of the sum of the weights of all the edges of the adjacent node. And then, adding adjacent nodes with membership degree larger than or equal to a membership degree threshold value into a candidate node set corresponding to the secondary root nodes, so that each adjacent node is at least affiliated to one secondary root node. And pruning is carried out by setting a membership threshold, so that nodes with membership lower than the membership threshold are removed, and each node is ensured to be at least affiliated to one secondary root node. In an embodiment of the present invention, one node may be subordinate to a plurality of secondary root nodes, and at least subordinate to one secondary root node.

Step 308, whether a stop condition is satisfied; if yes, go to step 309; if not, go to step 305.

The stop condition may be that the establishment of the entire hierarchical relationship is completed or that the number of adjacent nodes is less than a specified threshold.

Step 309, stopping generating the hierarchy structure to obtain the label hierarchy structure.

In addition, in the embodiment of the present invention, the method for generating the tag hierarchy is described in detail in the above method for generating the tag hierarchy, so that the description will not be repeated here.

FIG. 5 is a schematic diagram of main modules of an apparatus for generating a tag hierarchy according to an embodiment of the present invention, and as shown in FIG. 5, the apparatus 500 for generating a tag hierarchy includes a filtering module 501, an association module 502, and a generating module 503; the screening module 501 is configured to screen out a label pair with an association relationship according to the occurrence times of each label in each file object; the association module 502 is configured to generate a label relation graph according to each of the label pairs; the nodes in the relation graph are labels, and the weight of the edge is the co-occurrence times of the two labels in the same file object; the generating module 503 is configured to cluster each node in the label relation graph and calculate membership degrees of adjacent nodes, so as to generate a label hierarchical structure.

Optionally, the generating module 503 is further configured to:

Calculating the average centrality of each node in the label relation graph;

The above steps are repeatedly performed, thereby generating a tag hierarchy.

Optionally, the generating module 503 is further configured to:

Optionally, the generating module 503 is further configured to: the membership degree of the secondary root node and any adjacent node is calculated by adopting the following method:

Optionally, the generating module 503 is further configured to:

Optionally, the screening module 501 is further configured to:

Optionally, the method further comprises a matching module for:

The specific implementation of the apparatus for generating a tag hierarchy according to the present invention is described in detail in the method for generating a tag hierarchy described above, and thus the description thereof will not be repeated here.

Fig. 6 illustrates an exemplary system architecture 600 to which a method of generating a tag hierarchy or an apparatus of generating a tag hierarchy of an embodiment of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 601, 602, 603.

The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using terminal devices 601, 602, 603. The background management server may analyze and process the received data such as the article information query request, and feedback the processing result (e.g., the target push information, the article information—only an example) to the terminal device.

It should be noted that, the method for generating a tag hierarchy provided by the embodiment of the present invention is generally performed by the server 605, and accordingly, the device for generating a tag hierarchy is generally disposed in the server 605. The method for generating a tag hierarchy provided by the embodiment of the present invention may also be performed by the terminal devices 601, 602, 603, and accordingly, the apparatus for generating a tag hierarchy may be provided in the terminal devices 601, 602, 603.

It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a screening module, an association module, and a generation module, where the names of the modules do not constitute a limitation on the module itself in some cases.

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, implement the method of: according to the occurrence times of each label in each file object, screening out label pairs with association relations; generating a label relation diagram according to each label pair; the nodes in the relation graph are labels, and the weight of the edge is the co-occurrence times of the two labels in the same file object; and clustering each node in the label relation graph, and calculating the membership degree of the adjacent nodes so as to generate a label hierarchical structure.

According to the technical scheme provided by the embodiment of the invention, the technical means of generating the label hierarchical structure is adopted, and the technical problem of unique position of the label in the label hierarchical structure in the prior art is solved because the label relation graph is generated according to each label pair with the association relation, each node in the label relation graph is clustered, and the membership degree of the adjacent nodes is calculated. The embodiment of the invention solves the problem that the labels have ambiguity by a fuzzy clustering method, so that the labels can appear at different positions, and calculates the probability value (namely membership) of each label appearing at different positions; and the label hierarchical structure can be automatically constructed by flexibly controlling the recursive clustering through membership, so that the labor cost can be saved.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method of generating a hierarchy of tags, comprising:

clustering each node in the label relation graph and calculating the membership degree of adjacent nodes so as to generate a label hierarchical structure;

The step of screening out the label pairs with association relations according to the occurrence times of each label in each file object comprises the following steps:

2. The method of claim 1, wherein clustering each node in the label relationship graph and calculating membership of neighboring nodes to generate a label hierarchy comprises:

Calculating the average centrality of each node in the label relation graph;

The above steps are repeatedly performed, thereby generating a tag hierarchy.

3. The method of claim 2, wherein said calculating an average centrality of each node in the label relationship graph comprises:

4. The method according to claim 2, wherein the screening at least one secondary root node according to the average centrality of each node and the association relationship between each node comprises:

5. The method according to claim 2, wherein the membership degree of the secondary root node to any one of the neighboring nodes is calculated by:

6. The method of claim 2, wherein said determining the set of candidate nodes for each of the secondary root nodes comprises:

7. The method of claim 1, wherein determining whether an association exists between the two tags according to the number of co-occurrences of the two tags in the same file object, the total number of file objects, and the number of file objects in which one of the tags appears, comprises:

8. The method of claim 1, further comprising, after the generating the tag hierarchy:

9. An apparatus for generating a hierarchy of tags, comprising:

the generation module is used for clustering all the nodes in the label relation graph and calculating the membership degree of the adjacent nodes so as to generate a label hierarchical structure;

the screening module is further configured to:

10. An electronic device, comprising:

One or more processors;

storage means for storing one or more programs,

The one or more processors implement the method of any of claims 1-8 when the one or more programs are executed by the one or more processors.

11. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-8.