CN110489649B

CN110489649B - Method and device for associating content with tag

Info

Publication number: CN110489649B
Application number: CN201910764554.1A
Authority: CN
Inventors: 贺夏龙
Original assignee: Beijing Chuangxin Journey Network Technology Co ltd
Current assignee: Beijing Chuangxin Journey Network Technology Co ltd
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2023-06-27
Anticipated expiration: 2039-08-19
Also published as: CN110489649A

Abstract

The embodiment of the disclosure relates to the technical field of Internet, and provides a method and a device for associating content with a tag, wherein the method comprises the following steps: determining a label to be associated; determining the circled content in the data content according to the semantic matching result of the word label marked by the label to be associated and the data content; and associating the circled content to the label to be associated. The embodiment of the disclosure improves the efficiency and accuracy of the tag associated content.

Description

Method and device for associating content with tag

Technical Field

The disclosure relates to the technical field of internet, and in particular relates to a method and a device for associating content with tags.

Background

With the continuous rapid development of internet technology, more and more content is available from the internet. Setting operation labels in the process of operating the website to guide the user. After the operation label is determined, the stored data content is manually selected at the website server according to the operation label, and the operation label is associated with the data content stored at the server, so that the efficiency is low.

The data content types of the same website may be multiple, different types of data content belong to different services, and there may be a large difference in the form of the data content. Different people have different subjective awareness, are difficult to unify in content sorting and different types of data content descriptions, and are insufficient to label one data content with sufficient operation labels.

Disclosure of Invention

In order to solve the above-mentioned problems in the prior art, the present disclosure provides a solution for associating content with tags.

According to one aspect of the disclosed embodiments, there is provided a method for associating content with a tag, including: a label determining step of determining a label to be associated; a content circling step, namely determining circling content in the data content according to a semantic matching result of the label to be associated and the word label marked by the data content; and a content association step of associating the circled content to the label to be associated.

In one example, prior to the content-looping step, the method further comprises: a content analysis step of analyzing the data content to obtain keywords matched with the data content; a step of label matching, namely selecting word labels matched with the keywords from all word labels; and marking the data content by using the selected word label.

In one example, the content-sorting step includes: a label analyzing step, namely carrying out semantic analysis on labels to be associated to obtain semantic segmentation; a label recommending step of determining word labels with similarity to semantic segmentation greater than or equal to a preset threshold; a label screening step, namely carrying out matching processing on word labels based on labels to be associated and the operation instructions according to the received operation instructions, wherein the operation instructions comprise selection instructions and/or logic operation instructions; and a content determining step, namely taking the data content marked by the word label obtained after processing as the circled content.

In one example, the method further comprises: and a label changing step, namely changing the existing word label based on the keywords matched with the data content, wherein the changing comprises deleting operation and adding operation.

In one example, the data content includes teletext content, and the content parsing step includes: a picture and text content disassembling step, namely disassembling the picture and text content to obtain text content and image content; a text keyword obtaining step, namely carrying out semantic analysis and/or position importance analysis on text content to determine text keywords; and an image feature extraction step, namely carrying out feature extraction on image content based on corpus data in an image corpus database to obtain image feature keywords.

In one example, the tag matching step includes: a label selection step of respectively selecting word labels matched with the text keywords and the image feature keywords; and a label combining step, namely taking the same word label in the word labels matched with the text keywords and the word labels matched with the image characteristic keywords as the word labels for labeling the image-text contents.

In one example, the method further comprises: and pushing content to be selected matched with the label to be associated to the user based on the frequency of using the label to be associated by the user in a preset time range.

In one example, the method further comprises: and a content delivery step, namely delivering the circled content to a content position matched with the label to be associated.

According to another aspect of the embodiments of the present disclosure, there is provided an apparatus for tag-associating content, including: the label determining unit is used for determining labels to be associated; the content circling unit is used for determining circling content in the data content according to the semantic matching result of the label to be associated and the word label marked by the data content; and the content association unit is used for associating the circled content to the label to be associated.

In one example, the apparatus further comprises: the content analysis unit is used for analyzing the data content to obtain keywords matched with the data content; the label matching unit is used for selecting word labels matched with the keywords from all the word labels; and the content labeling unit is used for labeling the data content by using the selected word label.

In one example, the content circling unit includes: the label analysis module is used for carrying out semantic analysis on the labels to be associated to obtain semantic segmentation; the label recommending module is used for determining word labels with similarity to semantic word segmentation being greater than or equal to a preset threshold value; the label screening module is used for carrying out matching processing on the word labels based on the labels to be associated according to the received operation instructions, wherein the operation instructions comprise selection instructions and/or logic operation instructions; and the content determining module is used for taking the data content marked by the word label obtained after processing as the circled content.

In one example, the apparatus further comprises: and the label changing unit is used for changing the existing word label based on the keyword matched with the data content, wherein the changing comprises deleting operation and adding operation.

In one example, the data content includes teletext content, and the content analysis unit includes: the image-text content disassembling module is used for disassembling the image-text content to obtain text content and image content; the text keyword acquisition module is used for carrying out semantic analysis and/or position importance analysis on the text content and determining text keywords; and the image feature extraction module is used for extracting features of the image content based on the corpus data in the image corpus database to obtain image feature keywords.

In one example, the tag matching unit includes: the label selection module is used for respectively selecting word labels matched with the text keywords and the image characteristic keywords; the label combining module is used for taking the same word labels in word labels matched with text keywords and word labels matched with image characteristic keywords as word labels for labeling image-text contents.

In one example, the apparatus further comprises: and the content pushing unit is used for pushing the to-be-selected content matched with the to-be-associated tag to the user based on the frequency of using the to-be-associated tag by the user in the preset time range.

In one example, the device further comprises a content delivery unit, which is used for delivering the circled content to the content position matched with the label to be associated.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device, including:

a memory for storing a computer program;

and a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing the method of tag association content of any of the embodiments described above.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method of tag-related content of any of the above embodiments.

The method, the device, the electronic equipment and the computer readable storage medium for associating the content based on the labels of the present disclosure label various types of data content by adopting the unified word label, and associate the data content with the operation label after the word label is matched with the operation label, thereby improving the efficiency and the accuracy of associating the data content with the operation label.

Drawings

The above, as well as additional purposes, features, and advantages of embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:

FIG. 1 illustrates a flow diagram of one embodiment of a method of tag association content of the present disclosure;

FIG. 2 illustrates a flow diagram of another embodiment of a method of tag association content of the present disclosure;

FIG. 3 illustrates a flow diagram of another embodiment of a method of tag association content of the present disclosure;

FIG. 4 illustrates a flow diagram of another embodiment of a method of tag association content of the present disclosure;

FIG. 5 illustrates a flow diagram of another embodiment of a method of tag association content of the present disclosure;

FIG. 6 illustrates a flow diagram of another embodiment of a method of tag association content of the present disclosure;

FIG. 7 illustrates a schematic diagram of one embodiment of an apparatus of the present disclosure tag associated content;

FIG. 8 illustrates a schematic diagram of another embodiment of an apparatus of the present disclosure tag associated content;

FIG. 9 is a schematic diagram illustrating the structure of one embodiment of a content-looping unit of an apparatus of the present disclosure that tags content;

FIG. 10 illustrates a schematic diagram of another embodiment of an apparatus of the present disclosure tag associated content;

FIG. 11 is a schematic diagram showing the structure of an embodiment of a content parsing unit of the tag-related content apparatus of the present disclosure;

FIG. 12 is a schematic diagram illustrating a configuration of an embodiment of a tag matching unit of an apparatus of tag-related content of the present disclosure;

FIG. 13 illustrates a schematic diagram of another embodiment of an apparatus of the present disclosure tag associated content;

fig. 14 shows a schematic structural diagram of one embodiment of an electronic device of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way.

It should be noted that, although the terms "first", "second", etc. are used herein to describe various modules, steps, data, etc. of the embodiments of the present disclosure, the terms "first", "second", etc. are merely for distinguishing between different modules, steps, data, etc. and not to indicate a particular order or importance. Indeed, the expressions "first", "second", etc. may be used entirely interchangeably.

Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, and servers, which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, and servers, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.

Electronic devices such as terminal devices, computer systems, and servers may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

In the mobile internet era, many companies use content as a traffic portal, and use transaction means to realize the rendering of resources. Therefore, a content-transaction platform generally contains various UGC, OGC, PGC content, and various goods, services, and transactions. In some tourist websites, the content can be divided into various forms such as notes, tourist notes, attack, questions and answers, and the like, and commodity services also have various types such as free-running products, hotel air tickets, and the like. Under such a large platform, how to allow different types of content, content and goods, to form a good inter-conversion becomes an important issue.

Because the types of the content are numerous and belong to different service lines, the types of the content have larger difference, one service line only operates for one type of content, and no unified method exists in the different service lines in the description of the content. The efficiency of manual labeling content is lower, and when the operator is increased continuously, subjective operation labels can be enlarged continuously, and the manual work hardly labels all the content, and simultaneously, the sufficient label is difficult to label a content. The embodiment of the disclosure provides a scheme for associating content with tags in order to solve the problems.

A first aspect of embodiments of the present disclosure provides a method of tag-associating content. FIG. 1 is a flow chart of one embodiment of a method of tag association content of the present disclosure. As shown in fig. 1, the method of this embodiment includes: step 100-step 300. The method of tag association contents of the present embodiment will be described in detail with reference to fig. 1.

Step 100, determining a tag to be associated. In this embodiment, the label to be associated may be a label for attracting the user to set manually, may be a title of notes, tourists, attack, questions and answers, may be a label for naming a product, a brief description of the action of the product, and the like, and this embodiment is not limited thereto.

Step 200, determining the circled content in the data content according to the semantic matching result of the label to be associated and the word label marked by the data content. The word label can be a label stored in a label database for labeling data content, and the word label can be in the form of a single word or a word symbol with a certain meaning such as a word or phrase.

In this embodiment, word labels stored in the label database may be stored after being classified step by step according to categories. For example, in a tag database of a travel website, the word tags of the first level may include "travel time", "travel crowd", "travel mode", "travel preparation", "delicates", "accommodation", "traffic", "travel scene", "shopping", "entertainment items", and the like; a second-level word label such as "dining time", "dining service", "restaurant" and the like can be further included under the first-level word label of "food; a third level word label such as "michelin", "chinese dining hall", "theme restaurant", "roadside stall", "tea restaurant" and the like may also be included under the second level word label "restaurant".

The word labels at all levels are related words surrounding travel, the word labels from coarse granularity to fine granularity can be classified from large classification to small classification, to subjects, to entity words and the like, and the word labels in the word label library can be used for carrying out detailed granular depiction on the content, so that different types of data contents such as notes, wander, attack, questions and answers are converted into an expression form composed of multiple levels of labels. The word labels are managed according to the classification level, and corresponding word labels can be more conveniently searched when the word labels in the label library are operated. In the process that the labels to be associated match the word labels, carrying out indifferent matching association on all the word labels in a label library, and searching the word labels with the similarity within a preset threshold range with the labels to be associated in all the word labels. The operation of classifying the labels to be associated according to the classification level of the word labels is avoided, and the efficiency of associating the word labels with the labels to be associated is improved.

By using the same word label in the mode, different types of data contents can be marked, and the defect that the data contents on different service lines are difficult to unify and correlate with each other can be effectively avoided. For example, the word label is "adult", which can be used to label notes, wandering notes, aggressions, questions and answers in the aspects of food, game of adult, etc., and also can be used to label "free-running" products on travel websites that are destined for adult, and airline tickets, train ticket orders, etc. that are destined for adult.

The same data content may be annotated with one or more word tags. The purpose of marking the data content by adopting the established unified data tag is to unify the data content of various different types through the data tag.

The tags to be associated associate data content through word tags, so that sources of the data content associated with the same tag to be associated come from different business lines, the variety of types of the data content associated with the tag to be associated is improved, and the utilization rate of the data content is improved. The user can acquire more types of data content through the same label to be associated, so that the use experience level of the user is improved.

And step 300, associating the circled content to the label to be associated.

After semantic analysis, the labels to be associated are matched with word labels in a label library, so that the word labels used for representing the semantics of the labels to be associated can be determined. The data content marked by the word label can be determined. And associating the label to be associated with the data content by taking the established word label as a medium, so as to establish the association relationship between the label to be associated and the data content.

According to the method for associating the tag with the content, which is provided by the embodiment, the word tags used for marking the data content are stored in the tag library, and all types of data content can be marked by the word tags in the tag library. So that the same word label can mark different types of data contents. The tags to be associated are further associated with corresponding data contents through word tags, so that the diversity of the data contents associated with the tags to be associated is improved. The labor cost is reduced, and the efficiency and the accuracy of matching the data content by the tags to be associated are improved.

In one example, the data content may be tagged with a hierarchical word tag, such as a tour. "Zhangzhou ancient city gives off a strong old early taste, and if Fujian Zhangzhou is available, it can be faked in Taiwan. The building has a large red brick ancient house, a riding type storefront and a Chinese-western combined building, and also completely reserves the memorial archway left in the open-clear age and emits a dense ancient and early taste. The ancient city people Wen Qixi are thick, and there are tens of early-taste snacks in the southern Min region, zhangzhou braised noodles, triangular rice, four-fruit soup, zhangzhou fruit juice and the like, and many snacks in the Taiwan are derived from the southern Min region. Zhangzhou ancient city, really can meet all demands of your one-stop, the cheap price of the materials enables you to eat the buttress by 100 yuan, and the Zhangzhou ancient city is also used as a stock-eating heaven! "in such a context, it is de facto using the label system of the present embodiment. Firstly, word segmentation is carried out on data content, semantic analysis, position importance analysis and the like are carried out on the data content after word segmentation to extract keywords of the data content, and the keywords are matched and screened with all word labels to determine the word labels for labeling the data content. Since the word labels are managed according to the classification level, the word label determined after the word label matching to be associated is completed may be a word label including the upper level of the word label. For example, in the above-mentioned biography, the word label marked with the biography is finally determined to be "delicacy", "old town", "Fujian", "Zhou", "four fruit soup", "Zhangzhou halogen face", "triangle cake", etc. by matching the keyword with the word label, when determining the final word label, the last-stage word label of the above-mentioned word label may be added, and the finally extracted word label has "in-line activity": a food; scenery: ancient town; destination: fujian (a good fortune); destination: zhangzhou; topic: tp_548; topic: tp_153; POI: zhangzhou ancient city; entity word: four fruit soup; entity word: zhangzhou marinated noodles; entity word: triangular rice cake, etc. The method solves the problem that the description of different types of data contents is difficult to label by adopting a unified label.

In some embodiments, fig. 2 shows a flow diagram of another embodiment of a method of tag association of the present disclosure. As shown in fig. 2, the method further comprises steps 400-600 before step 200. Wherein, the liquid crystal display device comprises a liquid crystal display device,

and step 400, analyzing the data content to obtain keywords matched with the data content.

After the data content is submitted to the server, the data content can be analyzed in real time, so that word labels matched with the data content can be found in a label database, and the data content is marked. The data content can be found in time conveniently through the associated label to be associated or the operation label, and the utilization rate of the data content is improved.

In one example, the result of parsing the data content may be a keyword that can summarize the data content or the product.

Step 500, selecting word labels matched with the keywords from all word labels.

After the keywords of the data content are acquired, the keywords are matched with the words or the phrases of the word labels in a similarity mode, and the words or the phrases which are the same as the keywords are selected from all the word labels. Matching of keywords and word labels can also be performed by setting a similarity threshold. For example, word tags with similarity to keywords up to 0.8 may be used for tagging, word tags less than a preset threshold may not be used for tagging the data content. The similarity between the keyword and the word label may be determined based on the number of the same words in the keyword and the word label. For example, the keyword is "pearl-mucky peak", and the similar word is found to be "pearl peak". The number of words of the keyword identical to the word label is same=2, the number of different words is diff=3, and the length of the "pearl-mulaman peak" is lena=5. The keyword and word label similarity calculation formula may be: same/lena approximately 0.4. The present embodiment is not limited to how to determine the similarity between the keyword and the word label. The word labels used for labeling the data content are matched through the data content keywords, so that the efficiency of matching the word labels with the data content can be improved.

Step 600, annotating the data content with the selected word label.

The term tags selected to label the data content are obtained based on the similarity of the keywords of the data content to the term tags. Tags to label the data content may adequately summarize the data content. And, the same type of data content and different types of data content may be labeled with the same word label. The variety of the content types of the word label marking data can be improved.

In some embodiments, fig. 3 shows a flow diagram of another embodiment of a method of tag association of the present disclosure. As shown in fig. 3, step 200, including steps 201-204, wherein,

step 201, performing semantic analysis on the tags to be associated to obtain semantic segmentation.

The labels to be associated are artificially added for the data content by operators for popularizing the corresponding data content, so that the labels have certain subjectivity, and subjective factors are more obvious along with the continuous increase of operators. In order to match accurate data content for the tags to be associated and avoid being influenced by subjective factors, the embodiment performs semantic analysis on the tags to be associated and determines semantic segmentation. Meaning expressed by the labels to be associated can be unified after semantic analysis, for example, the labels to be associated are respectively 'Shanghai sign', 'Oriental pearl', and after semantic analysis, unified semantic segmentation with the destination of Shanghai and the scenic spot of Oriental pearl tower can be obtained.

In one example, semantic analysis of tags to be associated may be performed through a trained neural network. The method has the advantages that the labels to be associated of the word labels which are associated for a period of time are used as training samples, machine learning is conducted on the operation labels, so that the operation labels can be analyzed when one label to be associated is input, semantic segmentation is obtained, some word labels are automatically recommended according to the semantic segmentation, and therefore work of operators when the labels to be associated are associated is greatly simplified.

Step 202, determining word labels with similarity to semantic segmentation greater than or equal to a preset threshold. And carrying out similarity matching on the semantic segmentation words and the word labels after analyzing the labels to be associated, and selecting the word labels which are the same as the semantic segmentation words or have the similarity larger than or equal to a preset threshold value.

The semantic segmentation is obtained after the semantic analysis of the labels to be associated, and the word labels for expressing the labels to be associated are determined through the semantic segmentation, so that the accuracy of matching the labels to be associated with the word labels is improved.

And 203, carrying out matching processing on the word label and the operation instruction based on the label to be associated according to the received operation instruction, wherein the operation instruction comprises a selection instruction and/or a logic operation instruction.

The word labels related in the embodiment are matched through the labels to be associated, and the obtained word labels are provided for manufacturers of the labels to be associated for selection, so that the work of operators in the process of label association is greatly simplified. The selected word label can be a word label which is finally reserved for use in the recommended word label; or may be a portion of the word tags that need to be deleted from the recommended word tags. The logical operation instructions may include an intersection, a union, and a difference. Taking two word labels A, B as an example, wherein, the intersection can be marked by both A and B in the same data content circled according to the word labels; the union set can be marked by A alone, B alone or both A and B in the same data content selected according to word label; the difference set may be labeled a but not B, or labeled B but not a, on the same data content that is circled by word label.

Taking the above notes as examples, such as "foodborne snack gathering", an operator associates data content for this operation tag, and designates the rule corresponding to the tag as the word tag "destination: fujian, action in rows: food, thereby preserving the association of the lower operation label and the word label. The step solves the problem of association of the word label and the operation label, so that the technical word label can be associated with the operation business through the system to generate specific business application.

After the word labels are matched with the labels to be associated, operators can operate the matched word labels to formulate the rules, so that the work of associating the data content with the same or similar labels to be associated is reduced, and the efficiency of associating the data content with the labels to be associated is improved.

And 204, taking the data content marked by the word label obtained after processing as the circled content.

For data content which has been parsed in real time, all types of data content are unified over the same dimension, including word tags of various levels, as well as tags to be associated. In this way, operators can select desired and word tag data content according to the dimension of the operation topic.

According to the method and the device, the word label matched with the label to be associated is selected and subjected to logical operation, so that the selection of the data content can be more targeted.

Besides selecting word labels matched with the labels to be associated and selecting data contents after logic operation, the method also supports selecting according to conditions such as release time, data content length, picture number and the like, and also supports screening conditions of various combinations of certain word labels, so that required data contents with pertinence can be screened out.

Fig. 4 shows a flow diagram of another embodiment of a method of tag association content of the present disclosure. As shown in fig. 4, the method of tag-related content of the present disclosure further includes a step 700 of modifying an existing word tag based on keywords matching the data content. Wherein, the modification comprises deleting operation and adding operation.

The word labels stored in the label library for labeling the data content can be modified according to factors such as the writing mode, writing words and the like of the data content issued by the user. For example, with the development of network technology, people often use network terms in the writing process, for example, shanghai may be magic, beijing may be city blocking and the like, and word tags in a tag library may be changed with keywords matched with data content. The change comprises a deletion operation and an addition operation. The modification operation can also be performed, specifically, deleting the word label in the label library, and then adding a new word label.

The changing of the word labels in the label library can be performed according to a preset period, for example, the word labels in the label library are changed every month; the method can also be changed according to the use frequency of the keywords obtained after the analysis of the data content, for example, 48000 word labels in a label library are deleted when the word labels in Beijing are represented by 'blocked city' in 50000 random loose marks, then the word label of 'blocked city' is added, or the word label of 'blocked city' is directly added on the premise of retaining 'Beijing'.

By changing the word label, the accurate labeling of the word label can be carried out on the data content of various writing habits and writing styles.

Fig. 5 shows a flow diagram of another embodiment of a method of tag association content of the present disclosure. As shown in fig. 5, in the method of associating content with a tag of the present disclosure when the data content includes teletext content, step 400 may include steps 401-403, step 500 may include

steps

501, 502, where,

and step 401, disassembling the image-text content to obtain text content and image content. Keyword extraction is required to be performed in different manners for different data content forms to determine word labels for labeling the image-text content.

Step 402, performing semantic analysis and/or location importance analysis on the text content to determine text keywords.

Training the model to be trained by adopting the jockey to obtain a semantic model. Each divided sentence of the diary is used as an input, importance is not important as a classification label, a word sequence (a text or a sentence) is input, and the probability that the word sequence belongs to different classification labels is output. The words and phrases in the sentence are split to form feature vectors, which are mapped to the middle layer by linear transformation, and the middle layer is remapped to the label. The nonlinear activation function is used when predicting the tag, and is not used in the middle layer. For example, 10000 tokens are selected as training samples, and whether each segmentation sentence in the samples is important or not is manually marked. And training the model to be trained. Based on the context and semantic content, learning and training are carried out, and semantic importance model training is carried out, so that a semantic model for judging whether the segmentation sentence is important or not according to the input segmentation sentence is obtained.

Predicting the important probability and the unimportant probability of the segmentation sentences by using the semantic model obtained after training, inputting each segmentation sentence of the transcript to be analyzed into the semantic model for prediction, and obtaining the important probability P of each segmentation sentence _{fast_pos} And unimportant probability P _{fast_neg} Segmentation statement-based importance probability P _{fast_pos} And unimportant probability P _{fast_neg} The sentence semantic importance score is calculated, which may be divided, subtracted, or otherwise calculated, and is not specifically limited herein, and may be P _{fast_pos} /P _{fast_neg} May also be P _{fast_pos} +P _{fast_neg} Etc.

Since the transcription expresses complete meaning through sentences, the extraction range of the keywords can be narrowed in the process of extracting the keywords by determining the semantic importance of the segmented sentences, namely, the keywords can be extracted from the segmented sentences with higher semantic importance scores.

And training the segmentation statement, the semantic importance score and the position mark of the segmentation statement, namely the chapter number, the paragraph number and the segmentation statement sequence number of the segmentation statement, of the data content as input features to obtain a position importance model. The important probability P can be obtained by using the model to predict the importance of sentences in the biography _{xgb_pos} And a probability P of unimportance _{xgb_neg} Using P _{xgb_pos} /P _{xgb_neg} (or other calculation methods, not limited herein) as the importance score of the final segmented statement.

According to the embodiment, the text keywords obtained after semantic analysis and/or position importance analysis of the text content are used for improving the generalization of important information of the article, so that the main content of the article is more accurately expressed.

Step 403, extracting features of the image content based on the corpus data in the image corpus database to obtain image feature keywords.

In order to summarize the image content by text and determine the keywords matched with the image content, the pixel characteristics of the picture can be used in the process of extracting the image characteristics, and the object or scene results in the picture, such as restaurants, lakeboxes, dishes, puppies and the like, can be obtained through an ImageNet pre-training model. And respectively acquiring results of a plurality of pictures of one content to obtain texts matched with the image content, and extracting keywords from the texts matched with the image content in a mode of step 402.

Step 501, word labels matched with text keywords and image feature keywords are selected respectively. Keywords matching with the teletext content are obtained by

steps

402, 403. The embodiment respectively determines the matching word labels of the text content part and the image content part based on the obtained keywords.

Step 502, the same word labels in the word labels matched with the text keywords and the word labels matched with the image feature keywords are used as word labels for labeling the image-text content. In one example, the word label matched with the text content may be stored in the set a, the word label matched with the image content may be stored in the set b, and the word labels in the set a and the set b are both word labels in the intersection of the set a and the set b, wherein the word labels corresponding to the image content are the word labels in the element.

Fig. 6 shows a flow diagram of another embodiment of a method of tag association content of the present disclosure. As shown in fig. 6, the method for associating content with a tag of the present disclosure may further include step 800 and step 900, where step 800 pushes, to a user, content to be encircled matching the tag to be associated based on the frequency of the user using the tag to be associated within a preset time range.

In one example, a user frequently browses the strategy that the tag to be associated is "how Beijing is played" in a certain time period, and after a certain frequency is reached, the server can push other types of data content associated with the tag to be associated to the user, cater to user preferences, and improve the use experience of the user on the travel platform.

And step 900, putting the circled content to a content position matched with the label to be associated. The content delivery task performs task release, records the adjusted data content list, and performs real-time sequencing on the data content according to different service requirements, including sequencing according to time, sequencing according to content quality, sequencing according to praise collection reply numbers and the like. For the same task, different service lines can output data content according to different orders, and the output mode comprises two modes of interface docking of technology and Excel export of operation. For the technical interface mode, automatic content delivery can be completed, and the content is directly output and delivered to a specific content position.

The method and the device solve the problems that various data contents associated with the tags to be associated cannot be unified throughout and the operation labeling is imperfect and low in efficiency, and greatly improve the efficiency of content circling and delivering.

The embodiment of the disclosure can also count the content of various content types in the data content associated with the same label to be associated. For example, how many notes, how many questions and answers are recorded, how many contents can be covered by each word label, and the like, and the uniform perspective analysis is carried out on the labels to be associated, the release time, the length of the labels to be associated, and the like, so that the overall situation of the selected contents can be intuitively known, and whether the contents meet the requirement of the position to be released or not can be evaluated.

If the selected content is determined to basically accord with the label to be associated, the establishment task can be confirmed. If not, deleting the temporary task and carrying out circle selection again. For the determined task, the selection condition of the task is stored in the background, the service line for creating the task and the creator information are recorded, and a task number is generated. The results of the tasks are updated according to the update frequency set by the creator, and disposable tasks are selected or updated by hour, updated by day, and the like.

For an established task, the operation can intervene on the task content by the content auditing platform, including deleting a piece of content from the task, adjusting word labels of the content, or adjusting operation labels, adjusting titles and contents of the content, and meanwhile, for updated content, the updated content is marked as unverified content, so that the following link is not entered. The results of the task of the manual intervention are stored, and in the later use, if the same task is needed, the results of the task can be directly used without re-intervention.

Based on the same inventive concept, a second aspect of the present disclosure provides an apparatus for tag-related content, which is used to implement each step in the method for tag-related content according to the first aspect and embodiments.

FIG. 7 illustrates a schematic diagram of one embodiment of an apparatus of the present disclosure tag associated content; as shown in fig. 7, an apparatus for tag-related content of the present disclosure includes: a tag determination unit 10 for determining a tag to be associated; the content circling unit 20 is configured to determine circling content in the data content according to a semantic matching result of the tag to be associated and the word tag marked by the data content; and a content association unit 30 for associating the circled content to the tag to be associated.

In this embodiment, the label to be associated may be a label for attracting the user to set manually, may be a title of notes, tourists, attack, questions and answers, may be a label for naming a product, a brief description of the action of the product, and the like, and this embodiment is not limited thereto. The word label can be a label stored in a label database for labeling data content, and the word label can be in the form of a single word or a word symbol with a certain meaning such as a word or phrase. The same data content may be annotated with one or more word tags. The purpose of marking the data content by adopting the established unified data tag is to unify the data content of various different types through the data tag.

The content association unit 30 associates the data content with the tags to be associated through the word tags, so that sources of the data content associated with the same tag to be associated come from different service lines, the variety of types of the data content associated with the tag to be associated is improved, and the utilization rate of the data content is improved. The user can acquire more types of data content through the same label to be associated, so that the use experience level of the user is improved.

By establishing a tag library to store word tags for marking data contents, each type of data contents can be marked by the word tags in the tag library. So that the same word label can mark different types of data contents. The tags to be associated are further associated with corresponding data contents through word tags, so that the diversity of the data contents associated with the tags to be associated is improved. The efficiency and the accuracy of the data content matching of the tags to be associated are improved.

Fig. 8 is a schematic structural diagram of another embodiment of an apparatus for tag-related content of the present disclosure, and as shown in fig. 8, the apparatus for tag-related content of the present embodiment further includes: a content parsing unit 40, configured to parse the data content to obtain keywords matched with the data content; a tag matching unit 50 for selecting a word tag matching the keyword among all the word tags; the content labeling unit 60 is configured to label the data content with the selected word label.

The content parsing unit 40 may parse the data content uploaded to the server in real time so as to find word tags matching the data content in the tag database and tag the data content. The result obtained by the content analysis unit 40 analyzing the data content may be a keyword that can summarize the data content or the product, so that the data content can be conveniently found in time through the associated label to be associated or the operation label, and the data content utilization rate is improved.

After acquiring the data content keyword, the tag matching unit 50 selects the same word or phrase as the keyword from all the word tags by similarity matching of the keyword with the word or phrase of the word tag. Matching of keywords and word labels can also be performed by setting a similarity threshold. For example, word tags with similarity to keywords up to 0.8 may be used for tagging, word tags less than a preset threshold may not be used for tagging the data content. The similarity between the keyword and the word label may be determined based on the number of the same words in the keyword and the word label.

The content tagging unit 60 selects word tags for tagging data content based on similarity of keywords of the data content to the word tags. Tags to label the data content may adequately summarize the data content. And, the same type of data content and different types of data content may be labeled with the same word label. The variety of the content types of the word label marking data can be improved.

Fig. 9 is a schematic diagram showing the structure of an embodiment of a content-sorting unit of the apparatus of the tag-related content of the present disclosure. As shown in fig. 9, the content circling unit 20 includes: the tag analysis module 21 is used for carrying out semantic analysis on the tags to be associated to obtain semantic segmentation; the tag recommendation module 22 is configured to determine a word tag with a similarity to the semantic word segment greater than or equal to a preset threshold; the tag screening module 23 is configured to perform processing for matching with an operation instruction on the word tag based on the tag to be associated according to the received operation instruction, where the operation instruction includes a selection instruction and/or a logic operation instruction; the content determining module 24 is configured to use the data content marked by the word label obtained after processing as the circled content.

In order to match accurate data content with the tag to be associated and avoid being influenced by subjective factors, the tag analysis module 21 of the embodiment performs semantic analysis on the tag to be associated and determines semantic segmentation. After semantic analysis, the meaning expressed by the labels to be associated can be unified.

The tag recommendation module 22 analyzes the semantic word obtained after the tag is to be associated, performs similarity matching on the semantic word and the word tag, and selects the word tag which is the same as the semantic word or the word tag with similarity greater than or equal to a preset threshold value. The semantic segmentation is obtained after the semantic analysis of the labels to be associated, and the word labels for expressing the labels to be associated are determined through the semantic segmentation, so that the accuracy of matching the labels to be associated with the word labels is improved.

The word label selected by the label screening module 23 may be a word label which is finally reserved for use in the recommended word label; or may be a portion of the word tags that need to be deleted from the recommended word tags. The logical operation instructions may include an intersection, a union, and a difference. Taking two word labels A, B as an example, wherein, the intersection can be marked by both A and B in the same data content circled according to the word labels; the union set can be marked by A alone, B alone or both A and B in the same data content selected according to word label; the difference set may be labeled a but not B, or labeled B but not a, on the same data content that is circled by word label.

For data content that has been parsed in real time, the content determination module 24 unifies all types of data content over the same dimension, including various levels of word tags, as well as tags to be associated. In this way, operators can select desired and word tag data content according to the dimension of the operation topic.

FIG. 10 illustrates a schematic diagram of another embodiment of an apparatus of the present disclosure tag associated content; as shown in fig. 10, the apparatus for tag-related content of the present embodiment further includes: the tag changing unit 70 is configured to change an existing word tag based on a keyword matched with the data content, where the change includes a deletion operation and an addition operation.

In this embodiment, the word labels for labeling the data content stored in the label library may be modified according to factors such as the writing mode and writing words of the data content issued by the user. The change of the word labels in the label library can be performed according to a preset period, and the change comprises deletion operation and addition operation. The modification operation can also be performed, specifically, deleting the word label in the label library, and then adding a new word label. For example, each month a word label in the label library is changed; the use frequency of the keywords obtained after the analysis of the data content can be changed. By changing the word label, the accurate labeling of the word label can be carried out on the data content of various writing habits and writing styles.

In some embodiments, the data content includes a teletext content, and fig. 11 shows a schematic structural diagram of an embodiment of a content parsing unit of the apparatus for tag-related content of the present disclosure. As shown in fig. 11, the content parsing unit 40 includes: the image-text content disassembling module 41 is configured to disassemble the image-text content to obtain text content and image content; a text keyword obtaining module 42, configured to perform semantic analysis and/or location importance analysis on the text content, and determine text keywords; the image feature extraction module 43 is configured to perform feature extraction on image content based on corpus data in the image corpus database, so as to obtain image feature keywords.

In order to ensure the accuracy and the integrity of the keyword extraction of the data content, the keyword extraction is required to be performed in different modes for different data content forms so as to determine word labels for labeling the image-text content. Training the model to be trained by adopting the wandering marks to obtain a trained semantic model. The trained semantic model is used for carrying out semantic analysis and/or position importance analysis on text content, and determining text keywords. Each divided sentence of the diary is used as an input, importance is not important as a classification label, a word sequence (a text or a sentence) is input, and the probability that the word sequence belongs to different classification labels is output. The words and phrases in the sentence are split to form feature vectors, which are mapped to the middle layer by linear transformation, and the middle layer is remapped to the label. The nonlinear activation function is used when predicting the tag, and is not used in the middle layer. According to the embodiment, the text keywords obtained after semantic analysis and/or position importance analysis of the text content are used for improving the generalization of important information of the article, so that the main content of the article is more accurately expressed.

In order to summarize the image content by text and determine the keywords matched with the image content, the pixel characteristics of the picture can be used in the process of extracting the image characteristics, and the object or scene results in the picture, such as restaurants, lakeboxes, dishes, puppies and the like, can be obtained through an ImageNet pre-training model. The text matching with the image content is obtained by respectively obtaining the results of a plurality of pictures of one content, and keywords are extracted from the text matching with the image content through the text keyword obtaining module 42.

FIG. 12 is a schematic diagram illustrating a configuration of an embodiment of a tag matching unit of an apparatus of tag-related content of the present disclosure; as shown in fig. 12, the tag matching unit of the present embodiment includes: a tag selection module 51, configured to select word tags that match the text keywords and the image feature keywords, respectively; the label combining module 52 is configured to use the same word label as the word label marked with the text keyword and the same word label as the word label matched with the image feature keyword.

The data content is subjected to a text keyword acquisition module 42 to determine text keywords, and an image feature extraction module 43 to obtain image feature keywords, and a tag selection module 51 is used for respectively determining matching word tags for the text content part and the image content part based on the obtained keywords. In one example, the word label matching the text content may be stored in the set a, the word label matching the image content may be stored in the set b, the word label in the sets a and b each use the word label as an element, the word label in the intersection of the sets a and b corresponding to the image content may be stored in the set a, the word label matching the text content may be stored in the set b, the word label in the sets a and b each use the word label as an element, and the word label in the intersection of the sets a and b corresponding to the image content may be used as the word label.

FIG. 13 illustrates a schematic diagram of another embodiment of an apparatus of the present disclosure tag associated content; as shown in fig. 13, the apparatus for tag association of the present embodiment further includes a content pushing unit 80 for pushing the to-be-selected content matching the to-be-associated tag to the user based on the frequency of the user using the to-be-associated tag within the preset time range. After the user frequently browses the same or similar tags to be associated within a certain time period and reaches a certain frequency, the content pushing unit 80 can push other types of data content associated with the tags to be associated to the user, cater to user preferences, and improve user experience of the travel platform.

With continued reference to fig. 13, as shown in fig. 13, the apparatus for tag-related content of the present embodiment further includes a content delivery unit 90 for delivering the circled content to a content location matching the tag to be associated. The content delivery task performs task release, records the adjusted data content list, and performs real-time sequencing on the data content according to different service requirements, including sequencing according to time, sequencing according to content quality, sequencing according to praise collection reply numbers and the like. For the same task, different service lines can output data content according to different orders, and the output mode comprises two modes of interface docking of technology and Excel export of operation.

The device for operating the tag related content related to any embodiment solves the problem that multiple types of data content related to the tag to be related cannot penetrate through the same content and the problem of imperfect operation labeling and low efficiency, and greatly improves the efficiency of content circling and releasing.

Fig. 14 shows a schematic structural diagram of one embodiment of an electronic device of the present disclosure. Referring now to fig. 14, a schematic diagram of an electronic device suitable for use in implementing a terminal device or server of an embodiment of the present application is shown. As shown in fig. 14, the electronic device includes a processor and a memory. The electronic device may also include input-output means. The memory and the input and output device are connected with the processor through buses. The memory is used for storing instructions executed by the processor; and the processor is used for calling the instructions stored in the memory and executing the method for associating the content with the tag according to the embodiment.

The processor in the embodiment of the disclosure can call the instruction stored in the memory to determine the label to be associated; determining the circled content in the data content according to the semantic matching result of the word label marked by the label to be associated and the data content; and associating the circled content to the label to be associated. The process of executing the tag-related content by the electronic device may refer to the method implementation process of the tag-related content described in the foregoing embodiment, which is not described herein.

The disclosed embodiments also provide a computer-readable storage medium storing computer-executable instructions that, when run on a computer, perform the method of tag-related content related to the above embodiments.

The disclosed embodiments also provide a computer program product containing instructions which, when executed on a computer, cause the computer to perform the method of tag-related content related to the above embodiments.

In one or more alternative implementations, the disclosed embodiments also provide a computer-readable storage medium storing computer-readable instructions that, when executed, cause a computer to perform the method of tag-related content in any of the possible implementations described above. In another alternative example, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

Although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous.

The methods and apparatus of the present disclosure can be implemented using standard programming techniques with various method steps being performed using rule-based logic or other logic. It should also be noted that the words "apparatus" and "module" as used herein and in the claims are intended to include implementations using one or more lines of software code and/or hardware implementations and/or equipment for receiving inputs.

Any of the steps, operations, or procedures described herein may be performed or implemented using one or more hardware or software modules alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code capable of being executed by a computer processor for performing any or all of the described steps, operations, or programs.

The foregoing description of implementations of the present disclosure has been presented for purposes of illustration and description. The foregoing description is not intended to be exhaustive or to limit the disclosure to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the disclosure. The embodiments were chosen and described in order to explain the principles of the present disclosure and its practical application to enable one skilled in the art to utilize the present disclosure in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. A method of tag-related content, comprising:

a label determining step of determining a label to be associated;

a content analysis step of analyzing the data content to obtain keywords matched with the data content;

a step of label matching, namely selecting word labels matched with the keywords from all word labels;

a content labeling step, namely labeling the data content by using the selected word label;

a content circling step, namely determining circling content in the data content according to a semantic matching result of the to-be-associated label and the word label marked by the data content;

a content association step of associating the circled content to the tag to be associated;

a content pushing step, based on the frequency of using the label to be associated by a user in a preset time range, pushing the circled content matched with the label to be associated to the user;

the data content comprises image-text content, the keywords comprise text keywords and image feature keywords, and the content analysis step comprises the following steps:

a picture and text content disassembling step, namely disassembling the picture and text content to obtain text content and image content;

A text keyword obtaining step, namely carrying out semantic analysis and/or position importance analysis on the text content to determine text keywords;

and an image feature extraction step, namely carrying out feature extraction on the image content based on the corpus data in an image corpus database to obtain image feature keywords.

2. The method of claim 1, wherein the content-looping step comprises:

a label analyzing step, namely carrying out semantic analysis on the label to be associated to obtain semantic segmentation;

a label recommending step of determining word labels with similarity to the semantic word segmentation greater than or equal to a preset threshold;

a tag screening step, namely carrying out matching processing on the word tag and the operation instruction based on the tag to be associated according to the received operation instruction, wherein the operation instruction comprises a selection instruction and/or a logic operation instruction;

and a content determining step, namely taking the data content marked by the word label obtained after processing as the circled content.

3. The method according to claim 1 or 2, wherein the method further comprises:

and a label changing step, namely changing the existing word label based on the keywords matched with the data content, wherein the changing comprises deleting operation and adding operation.

4. The method of claim 1, wherein the tag matching step comprises:

a label selection step of selecting word labels matched with the text keywords and the image feature keywords respectively;

and a label combining step, namely taking the same word label in the word labels matched with the text keywords and the word labels matched with the image characteristic keywords as the word labels for labeling the image-text contents.

5. The method of claim 1, wherein the method further comprises:

and a content delivery step of delivering the circled content to a content position matched with the label to be associated.

6. An apparatus for tag-associating content, comprising:

the label determining unit is used for determining labels to be associated;

the content analysis unit is used for analyzing the data content to obtain keywords matched with the data content;

the label matching unit is used for selecting word labels matched with the keywords from all the word labels;

the content labeling unit is used for labeling the data content by utilizing the selected word label;

the content circling unit is used for determining circling content in the data content according to the semantic matching result of the to-be-associated label and the word label marked by the data content;

A content association unit for associating the circled content to the tag to be associated;

the content pushing unit is used for pushing the circled content matched with the label to be associated to the user based on the frequency of using the label to be associated by the user in a preset time range;

the data content comprises image-text content, the keywords comprise text keywords and image feature keywords, and the content analysis unit analyzes the data content in the following manner to obtain keywords matched with the data content:

7. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing a computer program stored in the memory and which, when executed, implements the method of tag-related content of any one of claims 1-5.

8. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method of tag-related content of any of claims 1-2.