CN105893349B - Classification tag match mapping method and device - Google Patents

Classification tag match mapping method and device Download PDF

Info

Publication number
CN105893349B
CN105893349B CN201610195707.1A CN201610195707A CN105893349B CN 105893349 B CN105893349 B CN 105893349B CN 201610195707 A CN201610195707 A CN 201610195707A CN 105893349 B CN105893349 B CN 105893349B
Authority
CN
China
Prior art keywords
label
similarity
target
information
source classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610195707.1A
Other languages
Chinese (zh)
Other versions
CN105893349A (en
Inventor
方庆安
范羽
崔世起
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sina Technology China Co Ltd
Original Assignee
Sina Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sina Technology China Co Ltd filed Critical Sina Technology China Co Ltd
Priority to CN201610195707.1A priority Critical patent/CN105893349B/en
Publication of CN105893349A publication Critical patent/CN105893349A/en
Application granted granted Critical
Publication of CN105893349B publication Critical patent/CN105893349B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The embodiment of the present invention provides a kind of classification tag match mapping method and device, this method comprises: obtaining the label information of source classification label and the label information of target class target label;According to tag characters string, the literal similarity of each source classification label and each target class target label is determined respectively;The vectorization information of label is obtained according to label information, combination tag routing information determines the semantic similarity of each source classification label and each target class target label respectively;According to tag path information, the structural similarity of each source classification label and each target class target label is determined respectively;According at least one of literal similarity, semantic similarity and the structural similarity of each source classification label and each target class target label, source classification label and target class target label that similarity meets setting condition are selected, mapping relations are established.It can be realized fast and accurately label similarity matching and label mapping, match the high-efficient of mapping, do not need manually to participate in, save human and material resources financial resources.

Description

Classification tag match mapping method and device
Technical field
The present invention relates to network data processing technology fields of understanding mutually, espespecially a kind of to be used for data management platform The classification tag match mapping method and device of (DataManagement Platform, DMP).
Background technique
In big data era, data management platform (DMP) becomes Internet advertising field, personalized recommendation field must can not Few a part is mainly used for storing user browsing behavior, user interest and goods attribute etc., in order to provide better personalization Service.However due to the complexity of DMP technology, largely need to handle website or enterprise of these user data etc., it can be by number It is processed according to third party's data management platform is supplied to, to facilitate application.
Therefore, third party's data management platform can receive the user data from different web sites or enterprise, unified to provide Data processing service.And these user data are from different websites and enterprise, even if the user data of same nature or classification, Its label may also be not quite similar, and therefore, label, which is normalized, then becomes problem to be solved.Third party's data management Platform will do it working process when receiving the user data of enterprise or website, and user data is uniformly mapped to identical classification Under system, in order to provide more accurately servicing;
Realize that the normalized solution of label has at present:
1) label mapping is carried out by literal similarity or near synonym extension;
2) two classification tree constructions are given, are manually mapped one by one.
The existing normalized solution of label has the following problems:
1) label mapping is carried out by literal similarity or near synonym table, recall rate is relatively low, and does not account for Semantic information may result in matching error, such as mobile phone brand-apple and fruit-apple, two apple labels are mapped When, mistake will be sent.
2) manpower, such as the tag tree of two 1000 nodes are consumed by manually mapping disadvantage, it is necessary to artificial mapping 100W times.
As it can be seen that existing label normalization solution is easy to appear matching error, the accuracy for matching mapping is low, and consumes When effort, match the speed and low efficiency of mapping.
Summary of the invention
The embodiment of the present invention provides a kind of classification tag match mapping method and device, exists in the prior art to solve Label normalization during matching accuracy it is low, take time and effort, matching mapping speed and the problem of low efficiency can be realized fast Fast, accurate label similarity matching and label mapping, save human and material resources financial resources.
On the one hand, the embodiment of the present invention provides a kind of classification tag match mapping method, comprising:
The label information of acquisition source classification label and the label information of target class target label;
According to the tag characters string for including in label information, each source classification label and each target class target label are determined respectively Literal similarity;
The vectorization information of label is obtained according to label information, includes according in the vectorization information and label information of label Tag path information, determine the semantic similarity of each source classification label and each target class target label respectively;
It is determined respectively according to the tag path information for including in label information in conjunction with literal similarity and semantic similarity The structural similarity of each source classification label and each target class target label;
According in the literal similarity of each source classification label and each target class target label, semantic similarity and structural similarity At least one, select source classification label and target class target label that similarity meets setting condition, establish mapping relations.
In some alternative embodiments, source classification label and target classification are determined at least one of in the following manner The literal similarity of label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels Similarity;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined Degree;
The N-gram N-gram similarity for calculating two tag characters strings, obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information String LCS similarity.
In some alternative embodiments, source classification label and target classification are determined at least one of in the following manner The semantic similarity of label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: obtain source classification label to The vectorization information of quantitative information and target class target label calculates two vector Jaccard similarities, similar as the semanteme Degree;
The cosine similarity of calculating source classification label and target class target label: obtain source classification label vectorization information and The vectorization information of target class target label calculates two vector cosine similarities, as the semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as the semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
In some alternative embodiments, the process of the structural similarity of source classification label and target class target label is determined, It specifically includes:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting Or selection rule, determine the structural similarity of source classification label and target class target label.
In some alternative embodiments, according to the literal similarity of each source classification label and each target class target label, language At least one of adopted similarity and structural similarity select the target class target label that similarity meets setting condition, establish Mapping relations specifically include:
For each source classification label, the mesh with the maximum first setting quantity of the source literal similarity of classification label is obtained Mark classification label;It is obtained from the target class target label got and maximum second setting of the source classification label semantic similarity The target class target label of quantity, the second setting quantity is less than the first setting quantity;It is obtained from the target class target label got With the maximum target class target label of source classification label construction similarity, and mapping relations are established;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and Target class target label establish mapping relations.
In some alternative embodiments, the label information of source classification label and the label information of target class target label are obtained Later, further includes:
The label information of label information and target class target label to the source classification label of acquisition carries out participle operation, filtering Fall stop word.
On the other hand, the embodiment of the present invention provides a kind of classification tag match mapping device, comprising:
Data obtaining module, for obtaining the label information of source classification label and the label information of target class target label;
First determining module, for determining each source classification label respectively according to the tag characters string for including in label information With the literal similarity of each target class target label;
Second determining module is believed for obtaining the vectorization information of label according to label information according to the vectorization of label The tag path information for including in breath and label information, determines the semantic phase of each source classification label and each target class target label respectively Like degree;
Third determining module, for according to the tag path information for including in label information, in conjunction with literal similarity and language Adopted similarity determines the structural similarity of each source classification label and each target class target label respectively;
Mapping block is matched, according to the literal similarity of each source classification label and each target class target label, semantic similarity At least one of with structural similarity, source classification label and target class target label that similarity meets setting condition are selected, Establish mapping relations.
In some alternative embodiments, first determining module, specifically at least one in the following manner Kind determines the literal similarity of source classification label and target class target label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels Similarity;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined Degree;
The N-gram similarity for calculating two tag characters strings obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information String LCS similarity.
In some alternative embodiments, second determining module, specifically at least one in the following manner Kind determines the semantic similarity of source classification label and target class target label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: obtain source classification label to The vectorization information of quantitative information and target class target label calculates two vector Jaccard similarities, similar as the semanteme Degree;
The cosine similarity of calculating source classification label and target class target label: obtain source classification label vectorization information and The vectorization information of target class target label calculates two vector cosine similarities, as the semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as the semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
In some alternative embodiments, the third determining module, is specifically used for:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting Or selection rule, determine the structural similarity of source classification label and target class target label.
In some alternative embodiments, the matching mapping block, is specifically used for:
For each source classification label, the mesh with the maximum first setting quantity of the source literal similarity of classification label is obtained Mark classification label;It is obtained from the target class target label got and maximum second setting of the source classification label semantic similarity The target class target label of quantity, the second setting quantity is less than the first setting quantity;It is obtained from the target class target label got With the maximum target class target label of source classification label construction similarity, and mapping relations are established;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and Target class target label establish mapping relations.
In some alternative embodiments, the data obtaining module, is also used to:
After the label information of acquisition source classification label and the label information of target class target label, to the source class target of acquisition The label information of label and the label information of target class target label carry out participle operation, filter out stop word.
Above-mentioned technical proposal has the following beneficial effects: through the label information of source classification label and target class target label Label information determines literal similarity, semantic similarity and the structural similarity of source classification label and target class target label respectively, Comprehensively consider source classification label and target that literal similarity, semantic similarity and structural similarity select similarity mode best Mapping is normalized in classification label, so that the accuracy of matching mapping is more preferable, effective place to go ambiguity guarantees accurate Rate;Furthermore the realization label that this method can automate matching mapping does not need artificial treatment, it is time saving and energy saving, processing speed and Efficiency is also relatively high.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is the tree exemplary diagram of source classification label and target class mesh label mapping in the embodiment of the present invention one;
Fig. 2 is the flow chart of classification tag match mapping method in the embodiment of the present invention one;
Fig. 3 is the flow chart of classification tag match mapping method in the embodiment of the present invention two;
Fig. 4 is that semantic similarity of the embodiment of the present invention determines a kind of optional flow chart;
Fig. 5 is that structural similarity of the embodiment of the present invention determines a kind of optional flow chart;
Fig. 6 is the structural schematic diagram of classification tag match mapping device in the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
In order to solve the problem of it is existing during label normalization in the prior art matching accuracy is low takes time and effort, The embodiment of the present invention provides a kind of classification tag match mapping method, and this method can be realized at the label normalization of automation Reason fast and accurately realizes label similarity matching and the mapping of label.It is retouched in detail below by specific embodiment It states.
First introduce the bibliography system framework of classification label.It is as shown in Figure 1 source classification label and target class target label The tree illustrated example of mapping.
The structure of the structure of source classification label system and target class target label system is illustrated as tree respectively in Fig. 1. Such as: one of label system, it is assumed that source classification label system under root node, there is " Mobile " and " vegetables " two labels; Under " Mobile " label, there are " Iphone " and " XiaoMi " two labels;Under " vegetables " label, there is " apple " this label. Another label system, it is assumed that target class label system under root node, there is " mobile phone " and " fruit " two labels;" mobile phone " mark It signs, there is " apple " and " millet " two labels;Under " fruit " label, there is " apple " this label.
The output of label normalized mapping is that the label in source classification label construction system is mapped one by one or one-to-many reflected It is mapped on the label in target class target label structural system.As shown in figure 1, " Mobile " is mapped to " mobile phone ", " Mobile " below " Iphone " be mapped to " mobile phone " following " apple ", " Mobile " following " XiaoMi " to be mapped to " mobile phone " following " millet ", " vegetables " following " apple " be mapped to " fruit " following " apple " ... etc..
Embodiment one
The embodiment of the present invention provides a kind of classification label similarity matching process, and process is as shown in Fig. 2, include following step It is rapid:
Step S101: the label information of source classification label and the label information of target class target label are obtained.
The label information of each label in the classification label system of acquisition source and in target class target label, wherein label information is at least Including one of following message: tag characters string, the vectorization information of label, tag path information, label nodal information.Label Nodal information may include one or more of information such as child node information, parent information, brotgher of node information.
Step S102: according to the tag characters string for including in label information, each source classification label and each target are determined respectively The literal similarity of classification label.
The step carries out primary or first layer label similarity mainly for label data and calculates, and is mainly based upon label Literal similarity algorithm, the literal similarity of each source classification label of output to target class target label.Belong to the first level Similarity determines.
The literal similarity of source classification label and target class target label is determined at least one of in the following manner:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels Similarity:
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined Degree;
N-gram (N-gram) similarity for calculating two tag characters strings, obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information String (Longest Common Subsequence, LCS) similarity.
Step S103: the vectorization information of label is obtained according to label information, according to the vectorization information and label of label The tag path information for including in information determines the semantic similarity of each source classification label and each target class target label respectively.
The step carries out middle rank mainly for label data or second layer label similarity calculates, and is mainly based upon label Arithmetic of Semantic Similarity, the semantic similarity of each source classification label of output to target class target label.Belong to the second level Similarity determines.
The semantic similarity of source classification label and target class target label is determined at least one of in the following manner:
Jie Kade (Jaccard) similarity of calculating source classification label and target class target label, specifically includes: obtaining source class The vectorization information of target label and the vectorization information of target class target label, calculate two vector Jaccard similarities, as institute State semantic similarity;The direct Jaccard similarity of two label vectors can be generally calculated herein;
The cosine similarity of calculating source classification label and target class target label, specifically includes: obtain source classification label to The vectorization information of quantitative information and target class target label calculates two vector cosine similarities;Two can be generally calculated herein The direct cosine similarity of a label vector, as the semantic similarity;
The vector point mutual information similarity (PointwiseMutual of calculating source classification label and target class target label Information, PMI), as the semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
Step S104: according to the tag path information for including in label information, in conjunction with literal similarity and semantic similarity, The structural similarity of each source classification label and each target class target label is determined respectively.
The step carries out advanced or third layer label similarity mainly for label data and calculates, and is mainly based upon label Structural similarity algorithm, the structural similarity of each source classification label of output to target class target label.Belong to third level Similarity determines.The optional method of determination that structural similarity calculates can be carried out by least one of having structure similarity It determines: ancestor node similarity, descendant nodes similarity and brotgher of node similarity.
A kind of scheme of optional determining structural similarity is as follows:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting Or selection rule, determine the structural similarity of source classification label and target class target label.
Above-mentioned basis similarity can be selected from semantic similarity and literal similarity first, for example selecting biggish one It is a;Can also both weighted calculation obtain, such as respectively sum multiplied by weighting coefficient.
Similarity based on label father node is weighted, i.e. the ancestor node similarity of label node pair is bigger, the mark It is bigger to similarity to sign node;It is weighted based on label child node similarity, i.e. the descendant nodes similarity of label node pair Bigger, the label node is bigger to similarity;It is weighted based on label brotgher of node similarity, i.e. the brother of label node pair Node similarity is bigger, and the label node is bigger to similarity.
Above-mentioned optional way is weighted ancestor node similarity, descendant nodes similarity and brotgher of node similarity Processing, can set the weight ratio of each similarity, determine a comprehensive structural similarity, also can choose wherein Xi'an Four degree maximum as structural similarity.When wherein setting the weight ratio of each similarity, weight ratio can be 0, than It such as says that brotgher of node Similarity-Weighted ratio is 0, means in fact at this time only similar by ancestor node similarity, descendant nodes It spends to be weighted the structural similarity of determining label.
Above-mentioned optional way can also select ancestor node similarity, descendant nodes similarity and brother according to selection rule Younger brother's node similarity biggish one is used as structural similarity.
Step S105: according to literal similarity, semantic similarity and the knot of each source classification label and each target class target label At least one of structure similarity selects the source classification label and target class target label for meeting setting condition, establishes mapping and closes System.
It, can be according to the selection rule of setting, according to literal similarity, semantic similarity and structural similarity in the step In the qualified source classification label of one or several selections and target class target label.More preferably, according to each source class target The structural similarity of label and each target class target label, or according to structural similarity and combine in literal similarity and semantic similarity At least one, select source classification label and target class target label that similarity meets setting condition, establish mapping relations.
According to literal similarity, semantic similarity and the structural similarity of the source classification label and each target class target label When establishing label mapping relationship, mapping can be realized according to the rule of setting, rule can be set as needed screening similarity The condition of convergence of two best labels determines the mapping relations between two labels when the condition of convergence meets.Such as: it can To be weighted to literal similarity, semantic similarity and structural similarity, the maximum label pair of comprehensive similarity is determined, it can also To set certain screening rule, selection wherein the maximum label of some similarity to, etc., be certainly not limited in the step The mode, specifically can be set as needed different rules, and mapping relations are established in realization.
It, specifically can be using in following work principle of filter when establishing the mapping relations of source classification label and target class target label It is one or more, with output label mapping relations:
Label can be carried out according to expertise to filtering, obtained qualified label pair, established mapping relations;
Label can be carried out according to rule to filtering, obtained qualified label pair, established mapping relations;
Label can be carried out according to threshold value to filtering, obtained qualified label pair, established mapping relations;
Also it can choose the best label of similarity to the output as last mapping relations.
During specific implementation, some optional implementations for establishing mapping relations are as follows:
For each source classification label, the mesh with the maximum first setting quantity of the source literal similarity of classification label is obtained Mark classification label;It is obtained from the target class target label got and maximum second setting of the source classification label semantic similarity The target class target label of quantity, the second setting quantity is less than the first setting quantity;It is obtained from the target class target label got With the maximum target class target label of source classification label construction similarity, and mapping relations are established;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and Target class target label establish mapping relations.
Embodiment two
Classification tag match mapping method provided by Embodiment 2 of the present invention, process is as shown in figure 3, include following step Suddenly.
Step S201: the label information of source classification label and the label information of target class target label are obtained.
Step S202: the label information of label information and target class target label to the source classification label of acquisition segments Operation, filters out stop word.
The label information of label information and target class target label to the source classification label of acquisition carries out data prediction, with Just subsequent unnecessary processing is reduced, the speed of subsequent processing is further increased.Participle operation is carried out to label information, it will be complicated Phrase structure be decomposed into single word rank, such as mobile phone brand, be decomposed into two mobile phone, brand words;Participle is filtered, Useless word is filtered out, such as: some rubbish words or meaningless word, such as " ", " ground ", " I " etc. can filter off Fall;
Optionally, each nodal information of label can also be extract in this step, such as the father of label is saved The calculating such as point information, brotgher of node information and child node information are determined, such as the child node of the label " mobile phone " in Fig. 2 Including " apple " and " millet ".
Optionally, other initialization operations can also be carried out in the step, such as: load term vector dictionary, topic model Deng.
Step S203: the literal similarity of source classification label and target class target label is determined.
With reference to step S103, each source classification label and target class mesh label body in the classification label system of source are determined respectively The literal similarity of each target class target label in system.
Step S204: the semantic similarity of source classification label and target class target label is determined.
With reference to step S104, each source classification label and target class mesh label body in the classification label system of source are determined respectively The semantic similarity of each target class target label in system.
Step S205: the structural similarity of source classification label and target class target label is determined.
With reference to step S105, each source classification label and target class mesh label body in the classification label system of source are determined respectively The structural similarity of each target class target label in system.
When determining the structural similarity of source classification label and target class target label, mark can be established according to basic similarity Similarity between label corresponds to table, determines source classification label and target class target label respectively according to the basic similarity in corresponding table Structural similarity, if calculate structural similarity, since the similarity of its father node, child node or the brotgher of node is not deposited temporarily And when cannot obtain meeting the result of the condition of convergence, can settle accounts and finish in a wheel construction similarity, with having obtained source classification The structural similarity of label and target class target label updates the basic similarity in corresponding table.Carry out the structural similarity of next round It calculates, until the result for obtaining meeting the condition of convergence.
Such as shown in the following table 1:
Table 1
In table 1, "/" indicates unknown, and numerical value indicates basic similarity.
Step S206: being directed to each source classification label, executes following steps:
Step S207: the target class target with the maximum first setting quantity of the source literal similarity of classification label is filtered out Label.
Step S208: it is obtained and the source classification label semantic similarity maximum second in the target class target label got Set the target class target label of quantity.
Step S209: it is obtained and the maximum mesh of source classification label construction similarity from the target class target label got Mark classification label.
Step S210: the mapping relations of source classification label and target class target label are established.
It by the above process can be for the label in the label and target class target label system in source classification label system One-to-one or one-to-many mapping relations are set up, several labels pair with mapping relations are formed.
A kind of optional method that label distribution indicates is calculated in the embodiment of the present invention, considers label semanteme and label construction Change information, process is as shown in figure 4, can indicate that result is used for semantic similarity and determines for the distributed of label.The label point The realization process that cloth indicates includes the following steps:
Step S301: each label in source bibliography system and target bibliography system is obtained, by each label vector table Show, obtains the vectorization information of label.
The step obtains input data, and input data is each label in two label bibliography systems, finally provides this One vectorization of a little labels indicates, such as mobile phone is expressed as vector (0.1,0.3,0.25,0.25,0.1), when calculating label When semantic similarity, by taking cosine similarity as an example, the semantic similarity for finally calculating two labels, which is converted to, calculates two vectors Cosine similarity.
Step S302: loading basic word vectors dictionary, and the basic term vector for obtaining label indicates.
The acquisition of the dictionary may include one of following manner:
Based on neural metwork training term vector model, i.e. word2vector model;
Term vector model, i.e. Global2Vector are obtained based on word global statistics information;
Obtain that word is distributed on theme and a kind of vectorization indicates that topic model is potentially based on based on topic model Potential applications index (Latent Semantic Indexing, LSI), probability potential applications index (Probabilistic Latent SemanticIndexing, PLSI) or latent Dirichletal location (Latent Dirichlet Allocation, LDA), one of deep learning etc..
Step S303: the nodal information of label is generated.
According to the label information in classification label, all father nodes of each label node are obtained, it can be excellent using depth First, breadth first traversal algorithm obtains, i.e., label node is expressed as " [root node, mobile phone, apple] " of this sort node letter Breath.
Step S304: calculating the distributed of label indicates.
For calculating label distribution based on ancestor node weighting scheme and indicate, calculated using following equation:
Wherein, XtagIt is indicated for the vector of target class target label;
P indicates a node in routing information;
V is that the basic term vector of label indicates;
π is the routing information of the label node;
W is ancestor node weighted value.
Step S305: distributed by label node indicates that result is used for Semantic Similarity Measurement.
The distributed of each label node of above-mentioned steps final output indicates as a result, being used for Semantic Similarity Measurement, this point Cloth indicates that advantage is simply to be combined semantic similarity and structural similarity, can effectively solve label ambiguity problem.
Determine a kind of optional implementation process of structural similarity as shown in figure 5, including the following steps: in above-described embodiment
Step S401: obtaining the literal similarity determined and semantic similarity, obtains each source classification label and each mesh Mark the basic similarity of class label.
Ginseng sees the above table 1.
Step S402: according to the parent information of label and basic similarity, source classification label and target class target are calculated The ancestor node similarity of label.
The optional thinking that calculates is as follows, traces forward from tag path own node, calculates separately node label phase two-by-two Like degree, and weighted sum.One layer of ancestor node is at least traced, according at least the one of source classification label and target class target label The basic similarity of the basic similarity of a ancestor node, the source classification label and target class target label, weighting obtain ancestors' section Point similarity.
By taking source classification label S1 and target class target label T2 as an example, the calculating formula of similarity of two labels is as follows:
Wherein: similarity of the Sim (S1, T2) between source classification label S1 and target class target label T2;
sim(ps, pt) it is source classification tag path node P in routing informationSBetween target class target label path node Pt Similarity;
Basic Similarity-Weighted coefficient of the w between node;
P is the node in the intersection of source classification tag path and target class mesh tag path;
S1 is source classification label;
T2 is target class target label;
The routing information of π (s1) expression source classification label;
The routing information of π (t2) expression target class target label;
S is source classification label node subscript, indicates s-th of source classification label node;
T is target class mesh label node subscript, indicates t-th of source classification label node.
It being exemplified below, two node labels are respectively<A1, B1, C1>,<A2, D2, C2>, then the similarity of label C 1 and C2 Sim (C1, C2) are as follows:
Sim (C1, C2)=0.7*base_sim (c1, c2)+0.2*base_sim (B1, D2)+0.1*base_sim (A1, A2)
Wherein: base_sim (C1, C2) is basic similarity of the label to (C1, C2)
Base_sim (B1, D2) is basic similarity of the label to (B1, D2)
Base_sim (A1, A2) is basic similarity of the label to (A1, A2).
Step S403: according to the descendant nodes information of label and basic similarity, source classification label and target classification are calculated The descendant nodes similarity of label.
The optional thinking that calculates is as follows, and it is similar to each target labels descendants to calculate each descendant nodes of source label Degree, is maximized as the node to target descendant nodes similarity, and weighted sum.
By taking source classification label S1 and target class target label T2 as an example, the calculating formula of similarity of two labels is as follows:
Wherein: Sim (S1, T2) is the similarity of source classification label S1 and target class target label T2;
sim(ps, pt) it is that source classification tag path node Ps is similar with target class target label path node Pt in routing information Degree;
It indicates to traverse each target class mesh label node;
Basic Similarity-Weighted coefficient of the w between node;
P is source classification label node to root node path node set;
S1 is the source classification label to be solved;
T2 is the target class target label to be solved;
The routing information of π (s1) expression source classification label;
S is source classification label node path subscript, indicates s-th of source classification label node;
T is target class target label node path subscript, indicates t-th of source classification label node.
It being exemplified below, the descendant nodes of two node labels C1 and C2 are respectively<A1, B1>,<A2, D2>, then 1 He of label C The similarity of C2 are as follows:
Sim (C1, C2)=0.7*base_sim (c1, c2)+0.2*Max (base_sim (A1, A2), base_sim (A1, D2))+0.1*Max (base_sim (B1, A2), base_sim (B1, D2))
Step S404: according to the brotgher of node information of label and basic similarity, source classification label and target classification are calculated The brotgher of node similarity of label.
The optional thinking that calculates is as follows, calculates each brotgher of node of source label to each target labels brotgher of node phase Like degree, it is maximized as the node to target brotgher of node similarity, and weighted sum, calculates thinking and step S403 class Seemingly.
Step S405: according to ancestor node similarity, descendant nodes similarity and brotgher of node similarity using setting Weighted Rule or selection rule, determine the structural similarity of source classification label and target class target label.
Select rule and policy optional way: choose wherein similarity value maximum one as structural similarity.
Weighted Rule strategy optional way: similar to ancestor node similarity, descendant nodes according to the weight ratio of setting Degree and brotgher of node similarity are weighted summation, i.e. ancestor node similarity, descendant nodes similarity is similar with the brotgher of node Degree obtains structural similarity average respectively multiplied by summing after corresponding weight ratio, or after summation.
Based on the same inventive concept, the embodiment of the present invention also provides a kind of classification tag match mapping device, which can To be arranged on the server for realizing third party's data processing, it also can be set and providing data to third party's data processing service On other websites of device or the data server of enterprise.The structure of such mesh tag match mapping device is as shown in Figure 6, comprising: Data obtaining module 101, the first determining module 102, the second determining module 103, third determining module 104 and matching mapping block 105。
Data obtaining module 101, for obtaining the label information of source classification label and the label information of target class target label.
First determining module 103, for determining each source class target respectively according to the tag characters string for including in label information The literal similarity of label and each target class target label
Second determining module 104, for obtaining the vectorization information of label according to label information, according to the vectorization of label The tag path information for including in information and label information determines the semanteme of each source classification label and each target class target label respectively Similarity.
Third determining module 105, for according to the tag path information for including in label information, in conjunction with literal similarity and Semantic similarity determines the structural similarity of respectively the source classification label and each target class target label respectively.
Mapping block 106 is matched, it is similar to the literal similarity of each target class target label, semanteme according to each source classification label At least one of degree and structural similarity, select source classification label and target class target that similarity meets setting condition Label, establish mapping relations.
Preferably, above-mentioned first determining module 103 is specifically used for determining source classification at least one of in the following manner The literal similarity of label and target class target label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels Similarity;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined Degree;
The N-gram similarity for calculating two tag characters strings obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information String LCS similarity.
Preferably, above-mentioned second determining module 104 is specifically used for determining source classification at least one of in the following manner The semantic similarity of label and target class target label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: obtain source classification label to The vectorization information of quantitative information and target class target label calculates two vector Jaccard similarities, as semantic similarity;
The cosine similarity of calculating source classification label and target class target label: obtain source classification label vectorization information and The vectorization information of target class target label calculates two vector cosine similarities, as semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
Preferably, above-mentioned third determining module 105, is specifically used for:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting Or selection rule, determine the structural similarity of source classification label and target class target label.
Preferably, above-mentioned matching mapping block 106 is specifically used for being directed to each source classification label, obtain and the source classification The target class target label of the maximum first setting quantity of the literal similarity of label;From the target class target label got obtain with The target class target label of the maximum second setting quantity of the source classification label semantic similarity, the second setting quantity are set less than first Fixed number amount;It is obtained and the maximum target class target of source classification label construction similarity from the target class target label got Label, and establish mapping relations;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and Target class target label establish mapping relations.
Preferably, above- mentioned information obtain module 101, are also used to obtain the label information and target class target of source classification label After the label information of label, the label information of label information and target class target label to the source classification label of acquisition is segmented Operation, filters out stop word.
Above-mentioned classification tag match mapping method and device provided in an embodiment of the present invention can be based on ontology alignment techniques Normalized mapping is indicated classification label distribution and carries out, this method is a set of automation label normalized technology, complete Classification label mapping is performed fully automatic;The distribution of label semanteme can be calculated with semantic models such as word-based vector model, topic models Formula indicates, and based on ontologies alignment techniques such as label semantic similarity, structural similarity, carries out classification label semantic level normalizing Change.It is calculated by the label similarity of information pre-processing and multi-layer, the literal similarity of fusion tag, semantic similarity and knot A variety of ontology alignment techniques such as structure similarity solve label similarity, it is contemplated that label construction and semantic information, Neng Gouyou Effect removal ambiguity guarantees accuracy rate, finally obtains more accurately similarity mode as a result, realizing preferable normalized mapping.It should Method can automate realization, effectively liberation manpower, and save human and material resources financial resources, improves processing speed and efficiency.
Those skilled in the art will also be appreciated that the various illustrative components, blocks that the embodiment of the present invention is listed (illustrative logical block), unit and step can by electronic hardware, computer software, or both knot Conjunction is realized.For the replaceability (interchangeability) for clearly showing that hardware and software, above-mentioned various explanations Property component (illustrative components), unit and step universally describe their function.Such function It can be that the design requirement for depending on specific application and whole system is realized by hardware or software.Those skilled in the art Can be can be used by various methods and realize the function, but this realization is understood not to for every kind of specific application Range beyond protection of the embodiment of the present invention.
Various illustrative logical blocks or unit described in the embodiment of the present invention can by general processor, Digital signal processor, specific integrated circuit (ASIC), field programmable gate array or other programmable logic devices, discrete gate Or transistor logic, discrete hardware components or above-mentioned any combination of design carry out implementation or operation described function.General place Managing device can be microprocessor, and optionally, which may be any traditional processor, controller, microcontroller Device or state machine.Processor can also be realized by the combination of computing device, such as digital signal processor and microprocessor, Multi-microprocessor, one or more microprocessors combine a digital signal processor core or any other like configuration To realize.
The step of method described in the embodiment of the present invention or algorithm can be directly embedded into hardware, processor execute it is soft The combination of part module or the two.Software module can store in RAM memory, flash memory, ROM memory, EPROM storage Other any form of storaging mediums in device, eeprom memory, register, hard disk, moveable magnetic disc, CD-ROM or this field In.Illustratively, storaging medium can be connect with processor, so that processor can read information from storaging medium, and It can be to storaging medium stored and written information.Optionally, storaging medium can also be integrated into the processor.Processor and storaging medium can To be set in asic, ASIC be can be set in user terminal.Optionally, processor and storaging medium also can be set in In different components in the terminal of family.
In one or more exemplary designs, above-mentioned function described in the embodiment of the present invention can be in hardware, soft Part, firmware or any combination of this three are realized.If realized in software, these functions be can store and computer-readable On medium, or it is transferred on a computer readable medium in the form of one or more instructions or code forms.Computer readable medium includes electricity Brain storaging medium and convenient for so that computer program is allowed to be transferred to from a place telecommunication media in other places.Storaging medium can be with It is that any general or special computer can be with the useable medium of access.For example, such computer readable media may include but It is not limited to RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk storage or other magnetic storage devices or other What can be used for carry or store with instruct or data structure and it is other can be by general or special computer or general or specially treated The medium of the program code of device reading form.In addition, any connection can be properly termed computer readable medium, example Such as, if software is to pass through a coaxial cable, fiber optic cables, double from a web-site, server or other remote resources Twisted wire, Digital Subscriber Line (DSL) are defined with being also contained in for the wireless way for transmitting such as example infrared, wireless and microwave In computer readable medium.The disk (disk) and disk (disc) includes compress disk, radium-shine disk, CD, DVD, floppy disk And Blu-ray Disc, disk is usually with magnetic replicate data, and disk usually carries out optically replicated data with laser.Combinations of the above Also it may be embodied in computer readable medium.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (6)

1. a kind of classification tag match mapping method characterized by comprising
The label information of acquisition source classification label and the label information of target class target label, and the mark of the source classification label to acquisition The label information of label information and target class target label carries out participle operation, filters out stop word;
According to the tag characters string for including in label information, the literal of each source classification label and each target class target label is determined respectively Similarity;
The vectorization information of label is obtained according to label information, according to the mark for including in the vectorization information and label information of label Routing information is signed, determines the semantic similarity of each source classification label and each target class target label respectively;
According to the tag path information for including in label information, in conjunction with literal similarity and semantic similarity, each source is determined respectively The structural similarity of classification label and each target class target label;
According in the literal similarity of each source classification label and each target class target label, semantic similarity and structural similarity extremely It is one few, source classification label and target class target label that similarity meets setting condition are selected, mapping relations are established;
Wherein, the process of the structural similarity of the determining source classification label and target class target label, specifically includes: according to source class The tag path information of target label and the tag path information of target class target label obtain the father node letter in tag path information Breath, child node information and brotgher of node information;And basic similarity is determined according to literal similarity and semantic similarity;It is based on Parent information, according to the ancestor node similarity of basic similarity calculation source classification label and target class target label;Based on son Nodal information, according to the descendant nodes similarity of basic similarity calculation source classification label and target class target label;Based on brother Nodal information, according to the brotgher of node similarity of basic similarity calculation source classification label and target class target label;According to ancestors Node similarity, descendant nodes similarity and brotgher of node similarity determine source using the Weighted Rule or selection rule of setting The structural similarity of classification label and target class target label.
2. the method as described in claim 1, which is characterized in that determine source classification label at least one of in the following manner With the literal similarity of target class target label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similarity of two labels;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal similar of two labels Degree;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similarity of two labels is determined;
The N-gram N-gram similarity for calculating two tag characters strings, obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The Longest Common Substring LCS of two labels is calculated according to the long common subsequence for the tag characters string for including in label information Similarity.
3. the method as described in claim 1, which is characterized in that determine source classification label at least one of in the following manner With the semantic similarity of target class target label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: the vectorization of source classification label is obtained The vectorization information of information and target class target label calculates two vector Jaccard similarities, as the semantic similarity;
The cosine similarity of calculating source classification label and target class target label: the vectorization information and target of source classification label are obtained The vectorization information of classification label calculates two vector cosine similarities, as the semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as the semantic similarity;
Term vector based on source classification label and target class target label calculates the semantic phase of source classification label and target class target label Like degree;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
4. the method as described in claim 1, which is characterized in that according to the literal of each source classification label and each target class target label At least one of similarity, semantic similarity and structural similarity select the target classification that similarity meets setting condition Label establishes mapping relations, specifically includes:
For each source classification label, the target class with the maximum first setting quantity of the source literal similarity of classification label is obtained Target label;It is obtained from the target class target label got and the maximum second setting quantity of the source classification label semantic similarity Target class target label, second setting quantity less than first setting quantity;It obtains and is somebody's turn to do from the target class target label got The maximum target class target label of source classification label construction similarity, and establish mapping relations;Or
For each source classification label, obtains and reflected with the maximum target class target label of source classification label construction similarity, foundation Penetrate relationship;Or
It obtains literal similarity and is greater than the second similarity threshold, and structure greater than the first similarity threshold and/or semantic similarity Similarity is greater than the label pair of third similarity threshold, and the source classification label and target class target label for including for label centering are established Mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and target Classification label establishes mapping relations.
5. a kind of classification tag match mapping device characterized by comprising
Data obtaining module, for obtaining the label information of source classification label and the label information of target class target label, and to obtaining The label information of the source classification label taken and the label information of target class target label carry out participle operation, filter out stop word;
First determining module, for according to the tag characters string for including in label information, determining each source classification label and each respectively The literal similarity of target class target label;
Second determining module, for obtaining the vectorization information of label according to label information, according to the vectorization information of label and The tag path information for including in label information determines that each source classification label is similar with the semanteme of each target class target label respectively Degree;
Third determining module, for according to the tag path information for including in label information, in conjunction with literal similarity and semantic phase Like degree, the structural similarity of each source classification label and each target class target label is determined respectively;
Mapping block is matched, according to literal similarity, semantic similarity and the knot of each source classification label and each target class target label At least one of structure similarity selects source classification label and target class target label that similarity meets setting condition, establishes Mapping relations;
Wherein, the third determining module, specifically for the tag path information and target class target label according to source classification label Tag path information, obtain tag path information in parent information, child node information and brotgher of node information;And root Basic similarity is determined according to literal similarity and semantic similarity;Based on parent information, according to basic similarity calculation source class The ancestor node similarity of target label and target class target label;Based on child node information, according to basic similarity calculation source classification The descendant nodes similarity of label and target class target label;Based on brotgher of node information, according to basic similarity calculation source classification The brotgher of node similarity of label and target class target label;According to ancestor node similarity, descendant nodes similarity and brother's section Point similarity determines that source classification label is similar with the structure of target class target label using the Weighted Rule or selection rule of setting Degree.
6. device as claimed in claim 5, which is characterized in that the matching mapping block is specifically used for:
For each source classification label, the target class with the maximum first setting quantity of the source literal similarity of classification label is obtained Target label;It is obtained from the target class target label got and the maximum second setting quantity of the source classification label semantic similarity Target class target label, second setting quantity less than first setting quantity;It obtains and is somebody's turn to do from the target class target label got The maximum target class target label of source classification label construction similarity, and establish mapping relations;Or
For each source classification label, obtains and reflected with the maximum target class target label of source classification label construction similarity, foundation Penetrate relationship;Or
It obtains literal similarity and is greater than the second similarity threshold, and structure greater than the first similarity threshold and/or semantic similarity Similarity is greater than the label pair of third similarity threshold, and the source classification label and target class target label for including for label centering are established Mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and target Classification label establishes mapping relations.
CN201610195707.1A 2016-03-31 2016-03-31 Classification tag match mapping method and device Active CN105893349B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610195707.1A CN105893349B (en) 2016-03-31 2016-03-31 Classification tag match mapping method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610195707.1A CN105893349B (en) 2016-03-31 2016-03-31 Classification tag match mapping method and device

Publications (2)

Publication Number Publication Date
CN105893349A CN105893349A (en) 2016-08-24
CN105893349B true CN105893349B (en) 2019-06-04

Family

ID=57014519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610195707.1A Active CN105893349B (en) 2016-03-31 2016-03-31 Classification tag match mapping method and device

Country Status (1)

Country Link
CN (1) CN105893349B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528599B (en) * 2016-09-23 2019-05-14 深圳凡豆信息科技有限公司 A kind of character string Fast Fuzzy matching algorithm in magnanimity audio data
CN107958008B (en) * 2016-10-18 2020-10-27 中国移动通信有限公司研究院 Method and device for updating unified tag library
CN108509458B (en) * 2017-02-28 2022-12-16 阿里巴巴集团控股有限公司 Business object identification method and device
CN106970912A (en) * 2017-04-21 2017-07-21 北京慧闻科技发展有限公司 Chinese sentence similarity calculating method, computing device and computer-readable storage medium
CN108595476B (en) * 2018-03-12 2021-01-15 广东睿江云计算股份有限公司 Intelligent parameter matching and converting method
CN108920458A (en) * 2018-06-21 2018-11-30 武汉斗鱼网络科技有限公司 A kind of label method for normalizing, device, server and storage medium
CN108876470B (en) * 2018-06-29 2022-03-01 腾讯科技(深圳)有限公司 Tag user expansion method, computer device, and storage medium
CN110766486A (en) * 2018-07-09 2020-02-07 北京京东尚科信息技术有限公司 Method and device for determining item category
CN109165382B (en) * 2018-08-03 2022-08-23 南京工业大学 Similar defect report recommendation method combining weighted word vector and potential semantic analysis
CN109857957B (en) * 2019-01-29 2021-06-15 掌阅科技股份有限公司 Method for establishing label library, electronic equipment and computer storage medium
CN109992583A (en) * 2019-03-15 2019-07-09 上海益普索信息技术有限公司 A kind of management platform and method based on DMP label
CN110008341B (en) * 2019-03-29 2023-01-17 电子科技大学 Indonesia news text classification method capable of adaptively misword and new word
CN109977319A (en) * 2019-04-04 2019-07-05 睿驰达新能源汽车科技(北京)有限公司 A kind of method and device of generation behavior label
CN110362741B (en) * 2019-06-11 2022-02-25 新浪网技术(中国)有限公司 Intelligent issuing method and system of Feed stream information
CN110530872B (en) * 2019-07-26 2021-02-26 华中科技大学 Multi-channel plane information detection method, system and device
CN110569289B (en) * 2019-09-11 2020-06-02 星环信息科技(上海)有限公司 Column data processing method, equipment and medium based on big data
CN110795607A (en) * 2019-10-29 2020-02-14 中国人民解放军32181部队 Equipment guarantee data matching method and system based on multi-stage similarity calculation
CN111382255B (en) * 2020-03-17 2023-08-01 北京百度网讯科技有限公司 Method, apparatus, device and medium for question-answering processing
CN112818117A (en) * 2021-01-19 2021-05-18 新华智云科技有限公司 Label mapping method, system and computer readable storage medium
CN112949277A (en) * 2021-02-19 2021-06-11 中国科学院计算机网络信息中心 Subject classification system alignment method, system and medium based on fusion characterization learning
CN112711666B (en) * 2021-03-26 2021-08-06 武汉优品楚鼎科技有限公司 Futures label extraction method and device
CN113239276A (en) * 2021-05-31 2021-08-10 上海明略人工智能(集团)有限公司 Method and device for determining recommended materials based on session information
CN117216688B (en) * 2023-11-07 2024-01-23 西南科技大学 Enterprise industry identification method and system based on hierarchical label tree and neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859311A (en) * 2009-04-10 2010-10-13 索尼公司 Contents processing apparatus and method, program and recording medium
CN101930462A (en) * 2010-08-20 2010-12-29 华中科技大学 Comprehensive body similarity detection method
CN102937994A (en) * 2012-11-15 2013-02-20 北京锐安科技有限公司 Similar document query method based on stop words
CN103092943A (en) * 2013-01-10 2013-05-08 北京亿赞普网络技术有限公司 Method of advertisement dispatch and advertisement dispatch server

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9130972B2 (en) * 2009-05-26 2015-09-08 Websense, Inc. Systems and methods for efficient detection of fingerprinted data and information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859311A (en) * 2009-04-10 2010-10-13 索尼公司 Contents processing apparatus and method, program and recording medium
CN101930462A (en) * 2010-08-20 2010-12-29 华中科技大学 Comprehensive body similarity detection method
CN102937994A (en) * 2012-11-15 2013-02-20 北京锐安科技有限公司 Similar document query method based on stop words
CN103092943A (en) * 2013-01-10 2013-05-08 北京亿赞普网络技术有限公司 Method of advertisement dispatch and advertisement dispatch server

Also Published As

Publication number Publication date
CN105893349A (en) 2016-08-24

Similar Documents

Publication Publication Date Title
CN105893349B (en) Classification tag match mapping method and device
CN110609902B (en) Text processing method and device based on fusion knowledge graph
US10310812B2 (en) Matrix ordering for cache efficiency in performing large sparse matrix operations
US10394956B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN107391677B (en) Method and device for generating Chinese general knowledge graph with entity relation attributes
US20210018332A1 (en) Poi name matching method, apparatus, device and storage medium
CN104516910A (en) Method and system for recommending content in client-side server environment
CN109582799A (en) The determination method, apparatus and electronic equipment of knowledge sample data set
CN110489558A (en) Polymerizable clc method and apparatus, medium and calculating equipment
CN106844341A (en) News in brief extracting method and device based on artificial intelligence
US9940354B2 (en) Providing answers to questions having both rankable and probabilistic components
CN110162637B (en) Information map construction method, device and equipment
CN113535977B (en) Knowledge graph fusion method, device and equipment
CN109783484A (en) The construction method and system of the data service platform of knowledge based map
CN107679035A (en) A kind of information intent detection method, device, equipment and storage medium
CN113033194B (en) Training method, device, equipment and storage medium for semantic representation graph model
CN110119410A (en) Processing method and processing device, computer equipment and the storage medium of reference book data
CN107704538A (en) A kind of rubbish text processing method, device, equipment and storage medium
US10296585B2 (en) Assisted free form decision definition using rules vocabulary
CN113807102B (en) Method, device, equipment and computer storage medium for establishing semantic representation model
CN113139110B (en) Regional characteristic processing method, regional characteristic processing device, regional characteristic processing equipment, storage medium and program product
CN113344214B (en) Training method and device of data processing model, electronic equipment and storage medium
CN109471969A (en) A kind of application searches method, device and equipment
CN115129885A (en) Entity chain pointing method, device, equipment and storage medium
CN113377739A (en) Knowledge graph application method, knowledge graph application platform, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230407

Address after: Room 501-502, 5/F, Sina Headquarters Scientific Research Building, Block N-1 and N-2, Zhongguancun Software Park, Dongbei Wangxi Road, Haidian District, Beijing, 100193

Patentee after: Sina Technology (China) Co.,Ltd.

Address before: 100080, International Building, No. 58 West Fourth Ring Road, Haidian District, Beijing, 20 floor

Patentee before: Sina.com Technology (China) Co.,Ltd.