CN105893349B - Classification tag match mapping method and device - Google Patents
Classification tag match mapping method and device Download PDFInfo
- Publication number
- CN105893349B CN105893349B CN201610195707.1A CN201610195707A CN105893349B CN 105893349 B CN105893349 B CN 105893349B CN 201610195707 A CN201610195707 A CN 201610195707A CN 105893349 B CN105893349 B CN 105893349B
- Authority
- CN
- China
- Prior art keywords
- label
- similarity
- target
- information
- source classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Abstract
The embodiment of the present invention provides a kind of classification tag match mapping method and device, this method comprises: obtaining the label information of source classification label and the label information of target class target label;According to tag characters string, the literal similarity of each source classification label and each target class target label is determined respectively;The vectorization information of label is obtained according to label information, combination tag routing information determines the semantic similarity of each source classification label and each target class target label respectively;According to tag path information, the structural similarity of each source classification label and each target class target label is determined respectively;According at least one of literal similarity, semantic similarity and the structural similarity of each source classification label and each target class target label, source classification label and target class target label that similarity meets setting condition are selected, mapping relations are established.It can be realized fast and accurately label similarity matching and label mapping, match the high-efficient of mapping, do not need manually to participate in, save human and material resources financial resources.
Description
Technical field
The present invention relates to network data processing technology fields of understanding mutually, espespecially a kind of to be used for data management platform
The classification tag match mapping method and device of (DataManagement Platform, DMP).
Background technique
In big data era, data management platform (DMP) becomes Internet advertising field, personalized recommendation field must can not
Few a part is mainly used for storing user browsing behavior, user interest and goods attribute etc., in order to provide better personalization
Service.However due to the complexity of DMP technology, largely need to handle website or enterprise of these user data etc., it can be by number
It is processed according to third party's data management platform is supplied to, to facilitate application.
Therefore, third party's data management platform can receive the user data from different web sites or enterprise, unified to provide
Data processing service.And these user data are from different websites and enterprise, even if the user data of same nature or classification,
Its label may also be not quite similar, and therefore, label, which is normalized, then becomes problem to be solved.Third party's data management
Platform will do it working process when receiving the user data of enterprise or website, and user data is uniformly mapped to identical classification
Under system, in order to provide more accurately servicing;
Realize that the normalized solution of label has at present:
1) label mapping is carried out by literal similarity or near synonym extension;
2) two classification tree constructions are given, are manually mapped one by one.
The existing normalized solution of label has the following problems:
1) label mapping is carried out by literal similarity or near synonym table, recall rate is relatively low, and does not account for
Semantic information may result in matching error, such as mobile phone brand-apple and fruit-apple, two apple labels are mapped
When, mistake will be sent.
2) manpower, such as the tag tree of two 1000 nodes are consumed by manually mapping disadvantage, it is necessary to artificial mapping
100W times.
As it can be seen that existing label normalization solution is easy to appear matching error, the accuracy for matching mapping is low, and consumes
When effort, match the speed and low efficiency of mapping.
Summary of the invention
The embodiment of the present invention provides a kind of classification tag match mapping method and device, exists in the prior art to solve
Label normalization during matching accuracy it is low, take time and effort, matching mapping speed and the problem of low efficiency can be realized fast
Fast, accurate label similarity matching and label mapping, save human and material resources financial resources.
On the one hand, the embodiment of the present invention provides a kind of classification tag match mapping method, comprising:
The label information of acquisition source classification label and the label information of target class target label;
According to the tag characters string for including in label information, each source classification label and each target class target label are determined respectively
Literal similarity;
The vectorization information of label is obtained according to label information, includes according in the vectorization information and label information of label
Tag path information, determine the semantic similarity of each source classification label and each target class target label respectively;
It is determined respectively according to the tag path information for including in label information in conjunction with literal similarity and semantic similarity
The structural similarity of each source classification label and each target class target label;
According in the literal similarity of each source classification label and each target class target label, semantic similarity and structural similarity
At least one, select source classification label and target class target label that similarity meets setting condition, establish mapping relations.
In some alternative embodiments, source classification label and target classification are determined at least one of in the following manner
The literal similarity of label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels
Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels
Similarity;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined
Degree;
The N-gram N-gram similarity for calculating two tag characters strings, obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information
String LCS similarity.
In some alternative embodiments, source classification label and target classification are determined at least one of in the following manner
The semantic similarity of label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: obtain source classification label to
The vectorization information of quantitative information and target class target label calculates two vector Jaccard similarities, similar as the semanteme
Degree;
The cosine similarity of calculating source classification label and target class target label: obtain source classification label vectorization information and
The vectorization information of target class target label calculates two vector cosine similarities, as the semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as the semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label
Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
In some alternative embodiments, the process of the structural similarity of source classification label and target class target label is determined,
It specifically includes:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained
Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity
Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label
Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting
Or selection rule, determine the structural similarity of source classification label and target class target label.
In some alternative embodiments, according to the literal similarity of each source classification label and each target class target label, language
At least one of adopted similarity and structural similarity select the target class target label that similarity meets setting condition, establish
Mapping relations specifically include:
For each source classification label, the mesh with the maximum first setting quantity of the source literal similarity of classification label is obtained
Mark classification label;It is obtained from the target class target label got and maximum second setting of the source classification label semantic similarity
The target class target label of quantity, the second setting quantity is less than the first setting quantity;It is obtained from the target class target label got
With the maximum target class target label of source classification label construction similarity, and mapping relations are established;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built
Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and
Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering
Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and
Target class target label establish mapping relations.
In some alternative embodiments, the label information of source classification label and the label information of target class target label are obtained
Later, further includes:
The label information of label information and target class target label to the source classification label of acquisition carries out participle operation, filtering
Fall stop word.
On the other hand, the embodiment of the present invention provides a kind of classification tag match mapping device, comprising:
Data obtaining module, for obtaining the label information of source classification label and the label information of target class target label;
First determining module, for determining each source classification label respectively according to the tag characters string for including in label information
With the literal similarity of each target class target label;
Second determining module is believed for obtaining the vectorization information of label according to label information according to the vectorization of label
The tag path information for including in breath and label information, determines the semantic phase of each source classification label and each target class target label respectively
Like degree;
Third determining module, for according to the tag path information for including in label information, in conjunction with literal similarity and language
Adopted similarity determines the structural similarity of each source classification label and each target class target label respectively;
Mapping block is matched, according to the literal similarity of each source classification label and each target class target label, semantic similarity
At least one of with structural similarity, source classification label and target class target label that similarity meets setting condition are selected,
Establish mapping relations.
In some alternative embodiments, first determining module, specifically at least one in the following manner
Kind determines the literal similarity of source classification label and target class target label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels
Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels
Similarity;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined
Degree;
The N-gram similarity for calculating two tag characters strings obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information
String LCS similarity.
In some alternative embodiments, second determining module, specifically at least one in the following manner
Kind determines the semantic similarity of source classification label and target class target label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: obtain source classification label to
The vectorization information of quantitative information and target class target label calculates two vector Jaccard similarities, similar as the semanteme
Degree;
The cosine similarity of calculating source classification label and target class target label: obtain source classification label vectorization information and
The vectorization information of target class target label calculates two vector cosine similarities, as the semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as the semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label
Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
In some alternative embodiments, the third determining module, is specifically used for:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained
Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity
Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label
Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting
Or selection rule, determine the structural similarity of source classification label and target class target label.
In some alternative embodiments, the matching mapping block, is specifically used for:
For each source classification label, the mesh with the maximum first setting quantity of the source literal similarity of classification label is obtained
Mark classification label;It is obtained from the target class target label got and maximum second setting of the source classification label semantic similarity
The target class target label of quantity, the second setting quantity is less than the first setting quantity;It is obtained from the target class target label got
With the maximum target class target label of source classification label construction similarity, and mapping relations are established;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built
Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and
Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering
Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and
Target class target label establish mapping relations.
In some alternative embodiments, the data obtaining module, is also used to:
After the label information of acquisition source classification label and the label information of target class target label, to the source class target of acquisition
The label information of label and the label information of target class target label carry out participle operation, filter out stop word.
Above-mentioned technical proposal has the following beneficial effects: through the label information of source classification label and target class target label
Label information determines literal similarity, semantic similarity and the structural similarity of source classification label and target class target label respectively,
Comprehensively consider source classification label and target that literal similarity, semantic similarity and structural similarity select similarity mode best
Mapping is normalized in classification label, so that the accuracy of matching mapping is more preferable, effective place to go ambiguity guarantees accurate
Rate;Furthermore the realization label that this method can automate matching mapping does not need artificial treatment, it is time saving and energy saving, processing speed and
Efficiency is also relatively high.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the tree exemplary diagram of source classification label and target class mesh label mapping in the embodiment of the present invention one;
Fig. 2 is the flow chart of classification tag match mapping method in the embodiment of the present invention one;
Fig. 3 is the flow chart of classification tag match mapping method in the embodiment of the present invention two;
Fig. 4 is that semantic similarity of the embodiment of the present invention determines a kind of optional flow chart;
Fig. 5 is that structural similarity of the embodiment of the present invention determines a kind of optional flow chart;
Fig. 6 is the structural schematic diagram of classification tag match mapping device in the embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In order to solve the problem of it is existing during label normalization in the prior art matching accuracy is low takes time and effort,
The embodiment of the present invention provides a kind of classification tag match mapping method, and this method can be realized at the label normalization of automation
Reason fast and accurately realizes label similarity matching and the mapping of label.It is retouched in detail below by specific embodiment
It states.
First introduce the bibliography system framework of classification label.It is as shown in Figure 1 source classification label and target class target label
The tree illustrated example of mapping.
The structure of the structure of source classification label system and target class target label system is illustrated as tree respectively in Fig. 1.
Such as: one of label system, it is assumed that source classification label system under root node, there is " Mobile " and " vegetables " two labels;
Under " Mobile " label, there are " Iphone " and " XiaoMi " two labels;Under " vegetables " label, there is " apple " this label.
Another label system, it is assumed that target class label system under root node, there is " mobile phone " and " fruit " two labels;" mobile phone " mark
It signs, there is " apple " and " millet " two labels;Under " fruit " label, there is " apple " this label.
The output of label normalized mapping is that the label in source classification label construction system is mapped one by one or one-to-many reflected
It is mapped on the label in target class target label structural system.As shown in figure 1, " Mobile " is mapped to " mobile phone ", " Mobile " below
" Iphone " be mapped to " mobile phone " following " apple ", " Mobile " following " XiaoMi " to be mapped to " mobile phone " following
" millet ", " vegetables " following " apple " be mapped to " fruit " following " apple " ... etc..
Embodiment one
The embodiment of the present invention provides a kind of classification label similarity matching process, and process is as shown in Fig. 2, include following step
It is rapid:
Step S101: the label information of source classification label and the label information of target class target label are obtained.
The label information of each label in the classification label system of acquisition source and in target class target label, wherein label information is at least
Including one of following message: tag characters string, the vectorization information of label, tag path information, label nodal information.Label
Nodal information may include one or more of information such as child node information, parent information, brotgher of node information.
Step S102: according to the tag characters string for including in label information, each source classification label and each target are determined respectively
The literal similarity of classification label.
The step carries out primary or first layer label similarity mainly for label data and calculates, and is mainly based upon label
Literal similarity algorithm, the literal similarity of each source classification label of output to target class target label.Belong to the first level
Similarity determines.
The literal similarity of source classification label and target class target label is determined at least one of in the following manner:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels
Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels
Similarity:
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined
Degree;
N-gram (N-gram) similarity for calculating two tag characters strings, obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information
String (Longest Common Subsequence, LCS) similarity.
Step S103: the vectorization information of label is obtained according to label information, according to the vectorization information and label of label
The tag path information for including in information determines the semantic similarity of each source classification label and each target class target label respectively.
The step carries out middle rank mainly for label data or second layer label similarity calculates, and is mainly based upon label
Arithmetic of Semantic Similarity, the semantic similarity of each source classification label of output to target class target label.Belong to the second level
Similarity determines.
The semantic similarity of source classification label and target class target label is determined at least one of in the following manner:
Jie Kade (Jaccard) similarity of calculating source classification label and target class target label, specifically includes: obtaining source class
The vectorization information of target label and the vectorization information of target class target label, calculate two vector Jaccard similarities, as institute
State semantic similarity;The direct Jaccard similarity of two label vectors can be generally calculated herein;
The cosine similarity of calculating source classification label and target class target label, specifically includes: obtain source classification label to
The vectorization information of quantitative information and target class target label calculates two vector cosine similarities;Two can be generally calculated herein
The direct cosine similarity of a label vector, as the semantic similarity;
The vector point mutual information similarity (PointwiseMutual of calculating source classification label and target class target label
Information, PMI), as the semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label
Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
Step S104: according to the tag path information for including in label information, in conjunction with literal similarity and semantic similarity,
The structural similarity of each source classification label and each target class target label is determined respectively.
The step carries out advanced or third layer label similarity mainly for label data and calculates, and is mainly based upon label
Structural similarity algorithm, the structural similarity of each source classification label of output to target class target label.Belong to third level
Similarity determines.The optional method of determination that structural similarity calculates can be carried out by least one of having structure similarity
It determines: ancestor node similarity, descendant nodes similarity and brotgher of node similarity.
A kind of scheme of optional determining structural similarity is as follows:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained
Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity
Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label
Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting
Or selection rule, determine the structural similarity of source classification label and target class target label.
Above-mentioned basis similarity can be selected from semantic similarity and literal similarity first, for example selecting biggish one
It is a;Can also both weighted calculation obtain, such as respectively sum multiplied by weighting coefficient.
Similarity based on label father node is weighted, i.e. the ancestor node similarity of label node pair is bigger, the mark
It is bigger to similarity to sign node;It is weighted based on label child node similarity, i.e. the descendant nodes similarity of label node pair
Bigger, the label node is bigger to similarity;It is weighted based on label brotgher of node similarity, i.e. the brother of label node pair
Node similarity is bigger, and the label node is bigger to similarity.
Above-mentioned optional way is weighted ancestor node similarity, descendant nodes similarity and brotgher of node similarity
Processing, can set the weight ratio of each similarity, determine a comprehensive structural similarity, also can choose wherein Xi'an
Four degree maximum as structural similarity.When wherein setting the weight ratio of each similarity, weight ratio can be 0, than
It such as says that brotgher of node Similarity-Weighted ratio is 0, means in fact at this time only similar by ancestor node similarity, descendant nodes
It spends to be weighted the structural similarity of determining label.
Above-mentioned optional way can also select ancestor node similarity, descendant nodes similarity and brother according to selection rule
Younger brother's node similarity biggish one is used as structural similarity.
Step S105: according to literal similarity, semantic similarity and the knot of each source classification label and each target class target label
At least one of structure similarity selects the source classification label and target class target label for meeting setting condition, establishes mapping and closes
System.
It, can be according to the selection rule of setting, according to literal similarity, semantic similarity and structural similarity in the step
In the qualified source classification label of one or several selections and target class target label.More preferably, according to each source class target
The structural similarity of label and each target class target label, or according to structural similarity and combine in literal similarity and semantic similarity
At least one, select source classification label and target class target label that similarity meets setting condition, establish mapping relations.
According to literal similarity, semantic similarity and the structural similarity of the source classification label and each target class target label
When establishing label mapping relationship, mapping can be realized according to the rule of setting, rule can be set as needed screening similarity
The condition of convergence of two best labels determines the mapping relations between two labels when the condition of convergence meets.Such as: it can
To be weighted to literal similarity, semantic similarity and structural similarity, the maximum label pair of comprehensive similarity is determined, it can also
To set certain screening rule, selection wherein the maximum label of some similarity to, etc., be certainly not limited in the step
The mode, specifically can be set as needed different rules, and mapping relations are established in realization.
It, specifically can be using in following work principle of filter when establishing the mapping relations of source classification label and target class target label
It is one or more, with output label mapping relations:
Label can be carried out according to expertise to filtering, obtained qualified label pair, established mapping relations;
Label can be carried out according to rule to filtering, obtained qualified label pair, established mapping relations;
Label can be carried out according to threshold value to filtering, obtained qualified label pair, established mapping relations;
Also it can choose the best label of similarity to the output as last mapping relations.
During specific implementation, some optional implementations for establishing mapping relations are as follows:
For each source classification label, the mesh with the maximum first setting quantity of the source literal similarity of classification label is obtained
Mark classification label;It is obtained from the target class target label got and maximum second setting of the source classification label semantic similarity
The target class target label of quantity, the second setting quantity is less than the first setting quantity;It is obtained from the target class target label got
With the maximum target class target label of source classification label construction similarity, and mapping relations are established;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built
Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and
Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering
Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and
Target class target label establish mapping relations.
Embodiment two
Classification tag match mapping method provided by Embodiment 2 of the present invention, process is as shown in figure 3, include following step
Suddenly.
Step S201: the label information of source classification label and the label information of target class target label are obtained.
Step S202: the label information of label information and target class target label to the source classification label of acquisition segments
Operation, filters out stop word.
The label information of label information and target class target label to the source classification label of acquisition carries out data prediction, with
Just subsequent unnecessary processing is reduced, the speed of subsequent processing is further increased.Participle operation is carried out to label information, it will be complicated
Phrase structure be decomposed into single word rank, such as mobile phone brand, be decomposed into two mobile phone, brand words;Participle is filtered,
Useless word is filtered out, such as: some rubbish words or meaningless word, such as " ", " ground ", " I " etc. can filter off
Fall;
Optionally, each nodal information of label can also be extract in this step, such as the father of label is saved
The calculating such as point information, brotgher of node information and child node information are determined, such as the child node of the label " mobile phone " in Fig. 2
Including " apple " and " millet ".
Optionally, other initialization operations can also be carried out in the step, such as: load term vector dictionary, topic model
Deng.
Step S203: the literal similarity of source classification label and target class target label is determined.
With reference to step S103, each source classification label and target class mesh label body in the classification label system of source are determined respectively
The literal similarity of each target class target label in system.
Step S204: the semantic similarity of source classification label and target class target label is determined.
With reference to step S104, each source classification label and target class mesh label body in the classification label system of source are determined respectively
The semantic similarity of each target class target label in system.
Step S205: the structural similarity of source classification label and target class target label is determined.
With reference to step S105, each source classification label and target class mesh label body in the classification label system of source are determined respectively
The structural similarity of each target class target label in system.
When determining the structural similarity of source classification label and target class target label, mark can be established according to basic similarity
Similarity between label corresponds to table, determines source classification label and target class target label respectively according to the basic similarity in corresponding table
Structural similarity, if calculate structural similarity, since the similarity of its father node, child node or the brotgher of node is not deposited temporarily
And when cannot obtain meeting the result of the condition of convergence, can settle accounts and finish in a wheel construction similarity, with having obtained source classification
The structural similarity of label and target class target label updates the basic similarity in corresponding table.Carry out the structural similarity of next round
It calculates, until the result for obtaining meeting the condition of convergence.
Such as shown in the following table 1:
Table 1
In table 1, "/" indicates unknown, and numerical value indicates basic similarity.
Step S206: being directed to each source classification label, executes following steps:
Step S207: the target class target with the maximum first setting quantity of the source literal similarity of classification label is filtered out
Label.
Step S208: it is obtained and the source classification label semantic similarity maximum second in the target class target label got
Set the target class target label of quantity.
Step S209: it is obtained and the maximum mesh of source classification label construction similarity from the target class target label got
Mark classification label.
Step S210: the mapping relations of source classification label and target class target label are established.
It by the above process can be for the label in the label and target class target label system in source classification label system
One-to-one or one-to-many mapping relations are set up, several labels pair with mapping relations are formed.
A kind of optional method that label distribution indicates is calculated in the embodiment of the present invention, considers label semanteme and label construction
Change information, process is as shown in figure 4, can indicate that result is used for semantic similarity and determines for the distributed of label.The label point
The realization process that cloth indicates includes the following steps:
Step S301: each label in source bibliography system and target bibliography system is obtained, by each label vector table
Show, obtains the vectorization information of label.
The step obtains input data, and input data is each label in two label bibliography systems, finally provides this
One vectorization of a little labels indicates, such as mobile phone is expressed as vector (0.1,0.3,0.25,0.25,0.1), when calculating label
When semantic similarity, by taking cosine similarity as an example, the semantic similarity for finally calculating two labels, which is converted to, calculates two vectors
Cosine similarity.
Step S302: loading basic word vectors dictionary, and the basic term vector for obtaining label indicates.
The acquisition of the dictionary may include one of following manner:
Based on neural metwork training term vector model, i.e. word2vector model;
Term vector model, i.e. Global2Vector are obtained based on word global statistics information;
Obtain that word is distributed on theme and a kind of vectorization indicates that topic model is potentially based on based on topic model
Potential applications index (Latent Semantic Indexing, LSI), probability potential applications index (Probabilistic
Latent SemanticIndexing, PLSI) or latent Dirichletal location (Latent Dirichlet
Allocation, LDA), one of deep learning etc..
Step S303: the nodal information of label is generated.
According to the label information in classification label, all father nodes of each label node are obtained, it can be excellent using depth
First, breadth first traversal algorithm obtains, i.e., label node is expressed as " [root node, mobile phone, apple] " of this sort node letter
Breath.
Step S304: calculating the distributed of label indicates.
For calculating label distribution based on ancestor node weighting scheme and indicate, calculated using following equation:
Wherein, XtagIt is indicated for the vector of target class target label;
P indicates a node in routing information;
V is that the basic term vector of label indicates;
π is the routing information of the label node;
W is ancestor node weighted value.
Step S305: distributed by label node indicates that result is used for Semantic Similarity Measurement.
The distributed of each label node of above-mentioned steps final output indicates as a result, being used for Semantic Similarity Measurement, this point
Cloth indicates that advantage is simply to be combined semantic similarity and structural similarity, can effectively solve label ambiguity problem.
Determine a kind of optional implementation process of structural similarity as shown in figure 5, including the following steps: in above-described embodiment
Step S401: obtaining the literal similarity determined and semantic similarity, obtains each source classification label and each mesh
Mark the basic similarity of class label.
Ginseng sees the above table 1.
Step S402: according to the parent information of label and basic similarity, source classification label and target class target are calculated
The ancestor node similarity of label.
The optional thinking that calculates is as follows, traces forward from tag path own node, calculates separately node label phase two-by-two
Like degree, and weighted sum.One layer of ancestor node is at least traced, according at least the one of source classification label and target class target label
The basic similarity of the basic similarity of a ancestor node, the source classification label and target class target label, weighting obtain ancestors' section
Point similarity.
By taking source classification label S1 and target class target label T2 as an example, the calculating formula of similarity of two labels is as follows:
Wherein: similarity of the Sim (S1, T2) between source classification label S1 and target class target label T2;
sim(ps, pt) it is source classification tag path node P in routing informationSBetween target class target label path node Pt
Similarity;
Basic Similarity-Weighted coefficient of the w between node;
P is the node in the intersection of source classification tag path and target class mesh tag path;
S1 is source classification label;
T2 is target class target label;
The routing information of π (s1) expression source classification label;
The routing information of π (t2) expression target class target label;
S is source classification label node subscript, indicates s-th of source classification label node;
T is target class mesh label node subscript, indicates t-th of source classification label node.
It being exemplified below, two node labels are respectively<A1, B1, C1>,<A2, D2, C2>, then the similarity of label C 1 and C2
Sim (C1, C2) are as follows:
Sim (C1, C2)=0.7*base_sim (c1, c2)+0.2*base_sim (B1, D2)+0.1*base_sim (A1,
A2)
Wherein: base_sim (C1, C2) is basic similarity of the label to (C1, C2)
Base_sim (B1, D2) is basic similarity of the label to (B1, D2)
Base_sim (A1, A2) is basic similarity of the label to (A1, A2).
Step S403: according to the descendant nodes information of label and basic similarity, source classification label and target classification are calculated
The descendant nodes similarity of label.
The optional thinking that calculates is as follows, and it is similar to each target labels descendants to calculate each descendant nodes of source label
Degree, is maximized as the node to target descendant nodes similarity, and weighted sum.
By taking source classification label S1 and target class target label T2 as an example, the calculating formula of similarity of two labels is as follows:
Wherein: Sim (S1, T2) is the similarity of source classification label S1 and target class target label T2;
sim(ps, pt) it is that source classification tag path node Ps is similar with target class target label path node Pt in routing information
Degree;
It indicates to traverse each target class mesh label node;
Basic Similarity-Weighted coefficient of the w between node;
P is source classification label node to root node path node set;
S1 is the source classification label to be solved;
T2 is the target class target label to be solved;
The routing information of π (s1) expression source classification label;
S is source classification label node path subscript, indicates s-th of source classification label node;
T is target class target label node path subscript, indicates t-th of source classification label node.
It being exemplified below, the descendant nodes of two node labels C1 and C2 are respectively<A1, B1>,<A2, D2>, then 1 He of label C
The similarity of C2 are as follows:
Sim (C1, C2)=0.7*base_sim (c1, c2)+0.2*Max (base_sim (A1, A2), base_sim (A1,
D2))+0.1*Max (base_sim (B1, A2), base_sim (B1, D2))
Step S404: according to the brotgher of node information of label and basic similarity, source classification label and target classification are calculated
The brotgher of node similarity of label.
The optional thinking that calculates is as follows, calculates each brotgher of node of source label to each target labels brotgher of node phase
Like degree, it is maximized as the node to target brotgher of node similarity, and weighted sum, calculates thinking and step S403 class
Seemingly.
Step S405: according to ancestor node similarity, descendant nodes similarity and brotgher of node similarity using setting
Weighted Rule or selection rule, determine the structural similarity of source classification label and target class target label.
Select rule and policy optional way: choose wherein similarity value maximum one as structural similarity.
Weighted Rule strategy optional way: similar to ancestor node similarity, descendant nodes according to the weight ratio of setting
Degree and brotgher of node similarity are weighted summation, i.e. ancestor node similarity, descendant nodes similarity is similar with the brotgher of node
Degree obtains structural similarity average respectively multiplied by summing after corresponding weight ratio, or after summation.
Based on the same inventive concept, the embodiment of the present invention also provides a kind of classification tag match mapping device, which can
To be arranged on the server for realizing third party's data processing, it also can be set and providing data to third party's data processing service
On other websites of device or the data server of enterprise.The structure of such mesh tag match mapping device is as shown in Figure 6, comprising:
Data obtaining module 101, the first determining module 102, the second determining module 103, third determining module 104 and matching mapping block
105。
Data obtaining module 101, for obtaining the label information of source classification label and the label information of target class target label.
First determining module 103, for determining each source class target respectively according to the tag characters string for including in label information
The literal similarity of label and each target class target label
Second determining module 104, for obtaining the vectorization information of label according to label information, according to the vectorization of label
The tag path information for including in information and label information determines the semanteme of each source classification label and each target class target label respectively
Similarity.
Third determining module 105, for according to the tag path information for including in label information, in conjunction with literal similarity and
Semantic similarity determines the structural similarity of respectively the source classification label and each target class target label respectively.
Mapping block 106 is matched, it is similar to the literal similarity of each target class target label, semanteme according to each source classification label
At least one of degree and structural similarity, select source classification label and target class target that similarity meets setting condition
Label, establish mapping relations.
Preferably, above-mentioned first determining module 103 is specifically used for determining source classification at least one of in the following manner
The literal similarity of label and target class target label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similar of two labels
Degree;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal of two labels
Similarity;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similar of two labels is determined
Degree;
The N-gram similarity for calculating two tag characters strings obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The public son of longest of two labels is calculated according to the long common subsequence for the tag characters string for including in label information
String LCS similarity.
Preferably, above-mentioned second determining module 104 is specifically used for determining source classification at least one of in the following manner
The semantic similarity of label and target class target label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: obtain source classification label to
The vectorization information of quantitative information and target class target label calculates two vector Jaccard similarities, as semantic similarity;
The cosine similarity of calculating source classification label and target class target label: obtain source classification label vectorization information and
The vectorization information of target class target label calculates two vector cosine similarities, as semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as semantic similarity;
Term vector based on source classification label and target class target label calculates the language of source classification label and target class target label
Adopted similarity;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
Preferably, above-mentioned third determining module 105, is specifically used for:
According to the tag path information of the tag path information of source classification label and target class target label, tag path is obtained
Parent information, child node information and brotgher of node information in information;And it is true according to literal similarity and semantic similarity
Fixed basis similarity;
Based on parent information, according to the ancestor node phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on child node information, according to the descendant nodes phase of basic similarity calculation source classification label and target class target label
Like degree;
Based on brotgher of node information, according to the brotgher of node of basic similarity calculation source classification label and target class target label
Similarity;
According to ancestor node similarity, descendant nodes similarity and brotgher of node similarity, using the Weighted Rule of setting
Or selection rule, determine the structural similarity of source classification label and target class target label.
Preferably, above-mentioned matching mapping block 106 is specifically used for being directed to each source classification label, obtain and the source classification
The target class target label of the maximum first setting quantity of the literal similarity of label;From the target class target label got obtain with
The target class target label of the maximum second setting quantity of the source classification label semantic similarity, the second setting quantity are set less than first
Fixed number amount;It is obtained and the maximum target class target of source classification label construction similarity from the target class target label got
Label, and establish mapping relations;Or
For each source classification label, acquisition and the maximum target class target label of source classification label construction similarity are built
Vertical mapping relations;Or
It obtains literal similarity and is greater than the second similarity threshold greater than the first similarity threshold and/or semantic similarity, and
Structural similarity is greater than the label pair of third similarity threshold, the source classification label and target class target label for including for label centering
Establish mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and
Target class target label establish mapping relations.
Preferably, above- mentioned information obtain module 101, are also used to obtain the label information and target class target of source classification label
After the label information of label, the label information of label information and target class target label to the source classification label of acquisition is segmented
Operation, filters out stop word.
Above-mentioned classification tag match mapping method and device provided in an embodiment of the present invention can be based on ontology alignment techniques
Normalized mapping is indicated classification label distribution and carries out, this method is a set of automation label normalized technology, complete
Classification label mapping is performed fully automatic;The distribution of label semanteme can be calculated with semantic models such as word-based vector model, topic models
Formula indicates, and based on ontologies alignment techniques such as label semantic similarity, structural similarity, carries out classification label semantic level normalizing
Change.It is calculated by the label similarity of information pre-processing and multi-layer, the literal similarity of fusion tag, semantic similarity and knot
A variety of ontology alignment techniques such as structure similarity solve label similarity, it is contemplated that label construction and semantic information, Neng Gouyou
Effect removal ambiguity guarantees accuracy rate, finally obtains more accurately similarity mode as a result, realizing preferable normalized mapping.It should
Method can automate realization, effectively liberation manpower, and save human and material resources financial resources, improves processing speed and efficiency.
Those skilled in the art will also be appreciated that the various illustrative components, blocks that the embodiment of the present invention is listed
(illustrative logical block), unit and step can by electronic hardware, computer software, or both knot
Conjunction is realized.For the replaceability (interchangeability) for clearly showing that hardware and software, above-mentioned various explanations
Property component (illustrative components), unit and step universally describe their function.Such function
It can be that the design requirement for depending on specific application and whole system is realized by hardware or software.Those skilled in the art
Can be can be used by various methods and realize the function, but this realization is understood not to for every kind of specific application
Range beyond protection of the embodiment of the present invention.
Various illustrative logical blocks or unit described in the embodiment of the present invention can by general processor,
Digital signal processor, specific integrated circuit (ASIC), field programmable gate array or other programmable logic devices, discrete gate
Or transistor logic, discrete hardware components or above-mentioned any combination of design carry out implementation or operation described function.General place
Managing device can be microprocessor, and optionally, which may be any traditional processor, controller, microcontroller
Device or state machine.Processor can also be realized by the combination of computing device, such as digital signal processor and microprocessor,
Multi-microprocessor, one or more microprocessors combine a digital signal processor core or any other like configuration
To realize.
The step of method described in the embodiment of the present invention or algorithm can be directly embedded into hardware, processor execute it is soft
The combination of part module or the two.Software module can store in RAM memory, flash memory, ROM memory, EPROM storage
Other any form of storaging mediums in device, eeprom memory, register, hard disk, moveable magnetic disc, CD-ROM or this field
In.Illustratively, storaging medium can be connect with processor, so that processor can read information from storaging medium, and
It can be to storaging medium stored and written information.Optionally, storaging medium can also be integrated into the processor.Processor and storaging medium can
To be set in asic, ASIC be can be set in user terminal.Optionally, processor and storaging medium also can be set in
In different components in the terminal of family.
In one or more exemplary designs, above-mentioned function described in the embodiment of the present invention can be in hardware, soft
Part, firmware or any combination of this three are realized.If realized in software, these functions be can store and computer-readable
On medium, or it is transferred on a computer readable medium in the form of one or more instructions or code forms.Computer readable medium includes electricity
Brain storaging medium and convenient for so that computer program is allowed to be transferred to from a place telecommunication media in other places.Storaging medium can be with
It is that any general or special computer can be with the useable medium of access.For example, such computer readable media may include but
It is not limited to RAM, ROM, EEPROM, CD-ROM or other optical disc storages, disk storage or other magnetic storage devices or other
What can be used for carry or store with instruct or data structure and it is other can be by general or special computer or general or specially treated
The medium of the program code of device reading form.In addition, any connection can be properly termed computer readable medium, example
Such as, if software is to pass through a coaxial cable, fiber optic cables, double from a web-site, server or other remote resources
Twisted wire, Digital Subscriber Line (DSL) are defined with being also contained in for the wireless way for transmitting such as example infrared, wireless and microwave
In computer readable medium.The disk (disk) and disk (disc) includes compress disk, radium-shine disk, CD, DVD, floppy disk
And Blu-ray Disc, disk is usually with magnetic replicate data, and disk usually carries out optically replicated data with laser.Combinations of the above
Also it may be embodied in computer readable medium.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (6)
1. a kind of classification tag match mapping method characterized by comprising
The label information of acquisition source classification label and the label information of target class target label, and the mark of the source classification label to acquisition
The label information of label information and target class target label carries out participle operation, filters out stop word;
According to the tag characters string for including in label information, the literal of each source classification label and each target class target label is determined respectively
Similarity;
The vectorization information of label is obtained according to label information, according to the mark for including in the vectorization information and label information of label
Routing information is signed, determines the semantic similarity of each source classification label and each target class target label respectively;
According to the tag path information for including in label information, in conjunction with literal similarity and semantic similarity, each source is determined respectively
The structural similarity of classification label and each target class target label;
According in the literal similarity of each source classification label and each target class target label, semantic similarity and structural similarity extremely
It is one few, source classification label and target class target label that similarity meets setting condition are selected, mapping relations are established;
Wherein, the process of the structural similarity of the determining source classification label and target class target label, specifically includes: according to source class
The tag path information of target label and the tag path information of target class target label obtain the father node letter in tag path information
Breath, child node information and brotgher of node information;And basic similarity is determined according to literal similarity and semantic similarity;It is based on
Parent information, according to the ancestor node similarity of basic similarity calculation source classification label and target class target label;Based on son
Nodal information, according to the descendant nodes similarity of basic similarity calculation source classification label and target class target label;Based on brother
Nodal information, according to the brotgher of node similarity of basic similarity calculation source classification label and target class target label;According to ancestors
Node similarity, descendant nodes similarity and brotgher of node similarity determine source using the Weighted Rule or selection rule of setting
The structural similarity of classification label and target class target label.
2. the method as described in claim 1, which is characterized in that determine source classification label at least one of in the following manner
With the literal similarity of target class target label:
It is whether same or similar according to the tag characters string for including in label information, determine the literal similarity of two labels;
Whether it is synonym in label information according to the participle in the tag characters string for including, determines the literal similar of two labels
Degree;
According to the similar proportion for the tag characters string prefix for including in label information, the literal similarity of two labels is determined;
The N-gram N-gram similarity for calculating two tag characters strings, obtains the literal similarity of two labels;
The editing distance similarity for calculating two labels obtains the literal similarity of two labels;
The Longest Common Substring LCS of two labels is calculated according to the long common subsequence for the tag characters string for including in label information
Similarity.
3. the method as described in claim 1, which is characterized in that determine source classification label at least one of in the following manner
With the semantic similarity of target class target label:
The Jie Kade Jaccard similarity of calculating source classification label and target class target label: the vectorization of source classification label is obtained
The vectorization information of information and target class target label calculates two vector Jaccard similarities, as the semantic similarity;
The cosine similarity of calculating source classification label and target class target label: the vectorization information and target of source classification label are obtained
The vectorization information of classification label calculates two vector cosine similarities, as the semantic similarity;
The vector point mutual information similarity of calculating source classification label and target class target label, as the semantic similarity;
Term vector based on source classification label and target class target label calculates the semantic phase of source classification label and target class target label
Like degree;
Based on topic model, the semantic similarity of source classification label and target class target label is calculated;
Based on machine learning algorithm, the semantic similarity of source classification label and target class target label is determined.
4. the method as described in claim 1, which is characterized in that according to the literal of each source classification label and each target class target label
At least one of similarity, semantic similarity and structural similarity select the target classification that similarity meets setting condition
Label establishes mapping relations, specifically includes:
For each source classification label, the target class with the maximum first setting quantity of the source literal similarity of classification label is obtained
Target label;It is obtained from the target class target label got and the maximum second setting quantity of the source classification label semantic similarity
Target class target label, second setting quantity less than first setting quantity;It obtains and is somebody's turn to do from the target class target label got
The maximum target class target label of source classification label construction similarity, and establish mapping relations;Or
For each source classification label, obtains and reflected with the maximum target class target label of source classification label construction similarity, foundation
Penetrate relationship;Or
It obtains literal similarity and is greater than the second similarity threshold, and structure greater than the first similarity threshold and/or semantic similarity
Similarity is greater than the label pair of third similarity threshold, and the source classification label and target class target label for including for label centering are established
Mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and target
Classification label establishes mapping relations.
5. a kind of classification tag match mapping device characterized by comprising
Data obtaining module, for obtaining the label information of source classification label and the label information of target class target label, and to obtaining
The label information of the source classification label taken and the label information of target class target label carry out participle operation, filter out stop word;
First determining module, for according to the tag characters string for including in label information, determining each source classification label and each respectively
The literal similarity of target class target label;
Second determining module, for obtaining the vectorization information of label according to label information, according to the vectorization information of label and
The tag path information for including in label information determines that each source classification label is similar with the semanteme of each target class target label respectively
Degree;
Third determining module, for according to the tag path information for including in label information, in conjunction with literal similarity and semantic phase
Like degree, the structural similarity of each source classification label and each target class target label is determined respectively;
Mapping block is matched, according to literal similarity, semantic similarity and the knot of each source classification label and each target class target label
At least one of structure similarity selects source classification label and target class target label that similarity meets setting condition, establishes
Mapping relations;
Wherein, the third determining module, specifically for the tag path information and target class target label according to source classification label
Tag path information, obtain tag path information in parent information, child node information and brotgher of node information;And root
Basic similarity is determined according to literal similarity and semantic similarity;Based on parent information, according to basic similarity calculation source class
The ancestor node similarity of target label and target class target label;Based on child node information, according to basic similarity calculation source classification
The descendant nodes similarity of label and target class target label;Based on brotgher of node information, according to basic similarity calculation source classification
The brotgher of node similarity of label and target class target label;According to ancestor node similarity, descendant nodes similarity and brother's section
Point similarity determines that source classification label is similar with the structure of target class target label using the Weighted Rule or selection rule of setting
Degree.
6. device as claimed in claim 5, which is characterized in that the matching mapping block is specifically used for:
For each source classification label, the target class with the maximum first setting quantity of the source literal similarity of classification label is obtained
Target label;It is obtained from the target class target label got and the maximum second setting quantity of the source classification label semantic similarity
Target class target label, second setting quantity less than first setting quantity;It obtains and is somebody's turn to do from the target class target label got
The maximum target class target label of source classification label construction similarity, and establish mapping relations;Or
For each source classification label, obtains and reflected with the maximum target class target label of source classification label construction similarity, foundation
Penetrate relationship;Or
It obtains literal similarity and is greater than the second similarity threshold, and structure greater than the first similarity threshold and/or semantic similarity
Similarity is greater than the label pair of third similarity threshold, and the source classification label and target class target label for including for label centering are established
Mapping relations;Or
Obtain the label pair that structural similarity is greater than third similarity threshold, the source classification label for including for label centering and target
Classification label establishes mapping relations.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610195707.1A CN105893349B (en) | 2016-03-31 | 2016-03-31 | Classification tag match mapping method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610195707.1A CN105893349B (en) | 2016-03-31 | 2016-03-31 | Classification tag match mapping method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105893349A CN105893349A (en) | 2016-08-24 |
CN105893349B true CN105893349B (en) | 2019-06-04 |
Family
ID=57014519
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610195707.1A Active CN105893349B (en) | 2016-03-31 | 2016-03-31 | Classification tag match mapping method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105893349B (en) |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106528599B (en) * | 2016-09-23 | 2019-05-14 | 深圳凡豆信息科技有限公司 | A kind of character string Fast Fuzzy matching algorithm in magnanimity audio data |
CN107958008B (en) * | 2016-10-18 | 2020-10-27 | 中国移动通信有限公司研究院 | Method and device for updating unified tag library |
CN108509458B (en) * | 2017-02-28 | 2022-12-16 | 阿里巴巴集团控股有限公司 | Business object identification method and device |
CN106970912A (en) * | 2017-04-21 | 2017-07-21 | 北京慧闻科技发展有限公司 | Chinese sentence similarity calculating method, computing device and computer-readable storage medium |
CN108595476B (en) * | 2018-03-12 | 2021-01-15 | 广东睿江云计算股份有限公司 | Intelligent parameter matching and converting method |
CN108920458A (en) * | 2018-06-21 | 2018-11-30 | 武汉斗鱼网络科技有限公司 | A kind of label method for normalizing, device, server and storage medium |
CN108876470B (en) * | 2018-06-29 | 2022-03-01 | 腾讯科技(深圳)有限公司 | Tag user expansion method, computer device, and storage medium |
CN110766486A (en) * | 2018-07-09 | 2020-02-07 | 北京京东尚科信息技术有限公司 | Method and device for determining item category |
CN109165382B (en) * | 2018-08-03 | 2022-08-23 | 南京工业大学 | Similar defect report recommendation method combining weighted word vector and potential semantic analysis |
CN109857957B (en) * | 2019-01-29 | 2021-06-15 | 掌阅科技股份有限公司 | Method for establishing label library, electronic equipment and computer storage medium |
CN109992583A (en) * | 2019-03-15 | 2019-07-09 | 上海益普索信息技术有限公司 | A kind of management platform and method based on DMP label |
CN110008341B (en) * | 2019-03-29 | 2023-01-17 | 电子科技大学 | Indonesia news text classification method capable of adaptively misword and new word |
CN109977319A (en) * | 2019-04-04 | 2019-07-05 | 睿驰达新能源汽车科技(北京)有限公司 | A kind of method and device of generation behavior label |
CN110362741B (en) * | 2019-06-11 | 2022-02-25 | 新浪网技术(中国)有限公司 | Intelligent issuing method and system of Feed stream information |
CN110530872B (en) * | 2019-07-26 | 2021-02-26 | 华中科技大学 | Multi-channel plane information detection method, system and device |
CN110569289B (en) * | 2019-09-11 | 2020-06-02 | 星环信息科技(上海)有限公司 | Column data processing method, equipment and medium based on big data |
CN110795607A (en) * | 2019-10-29 | 2020-02-14 | 中国人民解放军32181部队 | Equipment guarantee data matching method and system based on multi-stage similarity calculation |
CN111382255B (en) * | 2020-03-17 | 2023-08-01 | 北京百度网讯科技有限公司 | Method, apparatus, device and medium for question-answering processing |
CN112818117A (en) * | 2021-01-19 | 2021-05-18 | 新华智云科技有限公司 | Label mapping method, system and computer readable storage medium |
CN112949277A (en) * | 2021-02-19 | 2021-06-11 | 中国科学院计算机网络信息中心 | Subject classification system alignment method, system and medium based on fusion characterization learning |
CN112711666B (en) * | 2021-03-26 | 2021-08-06 | 武汉优品楚鼎科技有限公司 | Futures label extraction method and device |
CN113239276A (en) * | 2021-05-31 | 2021-08-10 | 上海明略人工智能(集团)有限公司 | Method and device for determining recommended materials based on session information |
CN117216688B (en) * | 2023-11-07 | 2024-01-23 | 西南科技大学 | Enterprise industry identification method and system based on hierarchical label tree and neural network |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101859311A (en) * | 2009-04-10 | 2010-10-13 | 索尼公司 | Contents processing apparatus and method, program and recording medium |
CN101930462A (en) * | 2010-08-20 | 2010-12-29 | 华中科技大学 | Comprehensive body similarity detection method |
CN102937994A (en) * | 2012-11-15 | 2013-02-20 | 北京锐安科技有限公司 | Similar document query method based on stop words |
CN103092943A (en) * | 2013-01-10 | 2013-05-08 | 北京亿赞普网络技术有限公司 | Method of advertisement dispatch and advertisement dispatch server |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9130972B2 (en) * | 2009-05-26 | 2015-09-08 | Websense, Inc. | Systems and methods for efficient detection of fingerprinted data and information |
-
2016
- 2016-03-31 CN CN201610195707.1A patent/CN105893349B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101859311A (en) * | 2009-04-10 | 2010-10-13 | 索尼公司 | Contents processing apparatus and method, program and recording medium |
CN101930462A (en) * | 2010-08-20 | 2010-12-29 | 华中科技大学 | Comprehensive body similarity detection method |
CN102937994A (en) * | 2012-11-15 | 2013-02-20 | 北京锐安科技有限公司 | Similar document query method based on stop words |
CN103092943A (en) * | 2013-01-10 | 2013-05-08 | 北京亿赞普网络技术有限公司 | Method of advertisement dispatch and advertisement dispatch server |
Also Published As
Publication number | Publication date |
---|---|
CN105893349A (en) | 2016-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105893349B (en) | Classification tag match mapping method and device | |
CN110609902B (en) | Text processing method and device based on fusion knowledge graph | |
US10310812B2 (en) | Matrix ordering for cache efficiency in performing large sparse matrix operations | |
US10394956B2 (en) | Methods, devices, and systems for constructing intelligent knowledge base | |
CN107391677B (en) | Method and device for generating Chinese general knowledge graph with entity relation attributes | |
US20210018332A1 (en) | Poi name matching method, apparatus, device and storage medium | |
CN104516910A (en) | Method and system for recommending content in client-side server environment | |
CN109582799A (en) | The determination method, apparatus and electronic equipment of knowledge sample data set | |
CN110489558A (en) | Polymerizable clc method and apparatus, medium and calculating equipment | |
CN106844341A (en) | News in brief extracting method and device based on artificial intelligence | |
US9940354B2 (en) | Providing answers to questions having both rankable and probabilistic components | |
CN110162637B (en) | Information map construction method, device and equipment | |
CN113535977B (en) | Knowledge graph fusion method, device and equipment | |
CN109783484A (en) | The construction method and system of the data service platform of knowledge based map | |
CN107679035A (en) | A kind of information intent detection method, device, equipment and storage medium | |
CN113033194B (en) | Training method, device, equipment and storage medium for semantic representation graph model | |
CN110119410A (en) | Processing method and processing device, computer equipment and the storage medium of reference book data | |
CN107704538A (en) | A kind of rubbish text processing method, device, equipment and storage medium | |
US10296585B2 (en) | Assisted free form decision definition using rules vocabulary | |
CN113807102B (en) | Method, device, equipment and computer storage medium for establishing semantic representation model | |
CN113139110B (en) | Regional characteristic processing method, regional characteristic processing device, regional characteristic processing equipment, storage medium and program product | |
CN113344214B (en) | Training method and device of data processing model, electronic equipment and storage medium | |
CN109471969A (en) | A kind of application searches method, device and equipment | |
CN115129885A (en) | Entity chain pointing method, device, equipment and storage medium | |
CN113377739A (en) | Knowledge graph application method, knowledge graph application platform, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20230407 Address after: Room 501-502, 5/F, Sina Headquarters Scientific Research Building, Block N-1 and N-2, Zhongguancun Software Park, Dongbei Wangxi Road, Haidian District, Beijing, 100193 Patentee after: Sina Technology (China) Co.,Ltd. Address before: 100080, International Building, No. 58 West Fourth Ring Road, Haidian District, Beijing, 20 floor Patentee before: Sina.com Technology (China) Co.,Ltd. |