CN113468885A - Chinese trademark similarity calculation method - Google Patents

Chinese trademark similarity calculation method Download PDF

Info

Publication number
CN113468885A
CN113468885A CN202110790797.XA CN202110790797A CN113468885A CN 113468885 A CN113468885 A CN 113468885A CN 202110790797 A CN202110790797 A CN 202110790797A CN 113468885 A CN113468885 A CN 113468885A
Authority
CN
China
Prior art keywords
word
similarity
sim
trademark
forest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110790797.XA
Other languages
Chinese (zh)
Inventor
李学俊
高仕锦
廖伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Green Industry Innovation Research Institute of Anhui University
Original Assignee
Green Industry Innovation Research Institute of Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Green Industry Innovation Research Institute of Anhui University filed Critical Green Industry Innovation Research Institute of Anhui University
Priority to CN202110790797.XA priority Critical patent/CN113468885A/en
Publication of CN113468885A publication Critical patent/CN113468885A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Chinese trademark similarity calculation method, which belongs to the technical field of trademark retrieval and comprises the following steps: acquiring names of a first trademark and a second trademark, and segmenting words of the first trademark name and the second trademark name to respectively obtain a word list; calculating the comprehensive similarity of the Word forest Word similarity, the web Word similarity and the Word2Vec Word similarity by pairwise combination of the words in the two Word lists as the Word similarity, and taking the maximum value as the local similarity to obtain two local similarity lists; and calculating the meaning similarity of the two trademarks according to the two local similarity lists, and finally judging whether the two trademarks are similar trademark applications. The method can solve the problems of inaccurate synonym recognition, limited knowledge base and inaccurate similarity calculation result when the semantic dictionary is used for calculating the trademark meaning similarity in the ontology knowledge method.

Description

Chinese trademark similarity calculation method
Technical Field
The invention relates to the technical field of trademark retrieval, in particular to a Chinese trademark similarity calculation method.
Background
The research of the meaning approximate judgment method for the text trademark at present has some defects by reading relevant documents. For example, in a conventional short text semantic similarity calculation method based on ontology knowledge, a similarity calculation method based on a synonym forest and a similarity calculation method based on a knowledge network are commonly used, which essentially calculate the similarity of trademark meanings according to different semantic dictionaries, have high dependency on the ontology of the word forest and cannot be updated in time, and have the problems that synonyms in the semantic dictionary do not accord with the judgment of the trademark field on approximate trademarks, a knowledge base is limited, and the similarity calculation result is inaccurate.
Disclosure of Invention
The invention aims to overcome the defects in the background art and solve the problem of inaccurate synonym identification.
In order to achieve the purpose, a Chinese trademark similarity calculation method is adopted, and comprises the following steps:
acquiring names of a first trademark and a second trademark to be compared, and performing word segmentation processing on the first trademark name and the second trademark name to respectively obtain a first word segmentation list WaAnd a second participle list Wb
Calculating the Word forest Word similarity, the web Word similarity and the Word2Vec Word similarity for the pairwise combination of the words in the two Word segmentation lists;
respectively calculating the comprehensive similarity of the Word-forest Word similarity, the web Word similarity and the Word2Vec Word similarity of each Word in the first Word segmentation list and each Word in the second Word segmentation list by adopting a dynamic weighting strategy to serve as a Word similarity group corresponding to each Word in the first Word segmentation list, taking the maximum value in the Word similarity group corresponding to each Word as the local similarity of the current Word, and forming a first local similarity list by using the local similarities of all words in the first Word segmentation list;
respectively calculating the comprehensive similarity of the Word-forest Word similarity, the web Word similarity and the Word2Vec Word similarity of each Word in the second Word segmentation list and each Word in the first Word segmentation list by adopting a dynamic weighting strategy to serve as a Word similarity group corresponding to each Word in the second Word segmentation list, taking the maximum value in the Word similarity group corresponding to each Word as the local similarity of the current Word, and forming a second local similarity list by using the local similarities of all words in the second Word segmentation list;
and calculating the meaning similarity of the name of the first trademark and the name of the second trademark according to the first local similarity list and the second local similarity list.
Further, the r-th word W in the first word-dividing listarWith the mth word W in the second word-dividing listbmWord similarity Sim of the words betweenCilin(War,Wbm) The calculating step comprises:
constructing a trade mark forest synonym library traCilin File by using a dictionary file cillinFile of synonym forest expansion edition;
converting word W into brand word forest synonym library traCilinFilearThe word and phrase WbmConverting into word forest code and obtaining word WarThe word and phrase WbmThe corresponding combination of all word forest codes;
judging whether the word forest codes are equal in the combination or not based on the combination of all the word forest codes;
if yes, reading the word group of the row of the current code, and judging the word WarThe word and phrase WbmIf they are similar, then the sum is recordedCilin(War,Wbm) 0, if approximate, then SimCilin(War,Wbm)=1;
If not, calculating the similarity of all word forest coding combinations by adopting a word forest similarity calculation method based on information content, and taking the maximum value as SCilin(War,Wbm)。
Further, the similarity calculation formula of the forest code combination is as follows:
Figure BDA0003160798530000021
in the formula (II), Sim'Cilin(Cai,Cbj) Means word WarThe ith word forest of (1) encodes CaiThe word and phrase WbmThe jth word forest of (1) encodes CbjThe similarity of (2); n is a radical of1And N2Are all positive integers;
two-word-forest-coded word forest similarity Sim'Cilin(Ca,Cb) The calculation formula of (a) is as follows:
Figure BDA0003160798530000031
in the formula (II), LCS (C)a,Cb) Expression forest code CaHeyulin code CbThe nearest common parent node of; IC (C) represents the information content of the word forest code C, and the calculation formula is as follows:
Figure BDA0003160798530000032
wherein hypo (C) is the number of lower nodes of C in the body, and C is CaOr Cb(ii) a Maxnodes is the total number of ontology nodes.
Further, the method for constructing the trade mark forest synonym library tracillinfile by using the synonym forest expansion edition dictionary file cillinfile comprises the following steps:
marking words in the same line with the same word forest codes in the dictionary file cilinFile of the synonym forest expansion edition as a number 0, wherein the words are dissimilar;
and marking words in the same line with the same word forest code in the synonym forest expansion edition dictionary file cilinFile as the same non-0 number, and constructing the trademark word forest synonym library traCilinFile.
Further, the r-th word W in the first word-dividing listarWith the mth word W in the second word-dividing listbmBetween web word similarity SimHowNet(War,Wbm) The calculating step comprises:
constructing a trademark and known network synonym library traHownetFile by utilizing a dictionary file of the known network;
obtaining words W according to trademark known network synonym library traHownetFilearAnd the word WbmAnd obtains the word WarThe word and phrase WbmA combination of corresponding all meanings;
judging whether the combination has the condition that the meaning items are equal or not based on the combination of all the meaning items;
if yes, reading the word group of the line of the current meaning item, and judging W according to the word markarAnd WbmIf they are similar, then the sum is recordedHowNet(War,Wbm) 0, if approximate, then SimHowNet(War,Wbm)=1;
If not, calculating the similarity of all the meaning item combinations by adopting a similarity calculation method based on the known network, and taking the maximum value as SimHowNet(War,Wbm)。
Further, the similarity calculation formula for all the combinations of the meaning terms is as follows:
Figure BDA0003160798530000041
in the formula (II), Sim'HowNet(Sai,Sbj) Represents WarThe ith item of sense SaiAnd WbmThe j-th item ofbjThe similarity of (2); n is a radical of1And N2Are all positive integers; similarity Sim 'of two artificial items'HowNet(Sa,Sb) The calculation formula of (a) is as follows:
Figure BDA0003160798530000042
in the formula (II), Sim'1(Sa,Sb) Representing two items of significance SaAnd SbA first degree of similarity to an independent sense; sim'2(Sa,Sb) Representing similarity of other independent senses; sim'3(Sa,Sb) Representing similarity of the relation senses; sim'4(Sa,Sb) Representing the similarity of the symbol senses; beta is akK is more than or equal to 1 and less than or equal to 4 and has beta for adjustable parameters1234=1,β1≥β2≥β3≥β4,βkThe values of (A) are as follows: beta is a1=0.5,β2=0.2,β3=0.17,β4=0.13。
Further, the two items of significance SaAnd SbIs the first independent sense similarity Sim'1(Sa,Sb) The calculation formula of (2) is as follows:
Figure BDA0003160798530000043
in the formula, paAnd pbRepresents an antigen; alpha is an adjustable parameter, and alpha is 1.6; dep (p)a)、dep(pb) Represents pa、pbDepth on the hierarchy tree of the sememe, i.e. sememe depth, min (dep (p)a),dep(pb) Is represented by pa、pbThe minimum value of the depth of the sememe; dist (p)a,pb) Represents paAnd pbPath length in the hierarchy tree of the sememe, i.e. the sememe distance, when paAnd pbWhen not in the same semantic hierarchy tree, the distance between the sememes is uniformly set to 20.
Further, the Sim'2(Sa,Sb)、Sim′3(Sa,Sb) And Sim'4(Sa,Sb) The calculation process of (2) is as follows:
if no sememe exists in the two sememe description formulas, the similarity is directly 1;
if only one of the semantic description formulas does not have any semantic, the similarity takes a default value of 0.2;
if the two semantic source description formulas contain one or more semantic sources, calculating the similarity of every two semantic sources according to the similarity calculation mode of the first independent semantic source description formula by all combinations of the semantic sources, and taking the maximum value as the similarity value.
Further, the method for constructing a trademark and known network synonym library traHownetFile by utilizing the "known network" dictionary file, comprises the following steps:
marking words in the same line with the same meaning item in the < Zhi network > dictionary file as a number 0, wherein the words are not similar to each other;
and marking words in the same row with the same meaning item in the < namely known network > dictionary file housewife as the same non-0 number, and constructing the trademark wordbook thesaurus trahousewife.
Further, the r-th word W in the first word-dividing listarWith the mth word W in the second word-dividing listbmWord2Vec Word similarity Sim betweenWord2Vec(War,Wbm) The calculating step comprises:
training a Wikipedia Chinese language database by using a Word2Vec deep learning model to obtain a Word vector file Word2 vecFile;
according to the word vector file word2vecFile, the word W is converted into the word vector filearThe word and phrase WbmConverting into word vector, and calculating Sim by cosine formulaWord2Vec(War,Wbm)。
Further, SimWord2Vec(War,Wbm) Calculating according to the word vector corresponding to the word, wherein the calculation formula is as follows:
Figure BDA0003160798530000061
in the formula, VarAnd VbmRespectively represent WaMiddle (r) th word WarWord vector of and WbM-th word WbmN denotes the dimension of the Word vector during the training of the Word2Vec model, VarnRepresents VarN-th value of, VbmnRepresents VbmThe nth value of (a).
Further, the training of the wikipedia Chinese corpus by using the Word2Vec deep learning model to obtain the Word vector file Word2vecFile includes:
acquiring a Wikipedia Chinese corpus, and cleaning and preprocessing the Wikipedia Chinese corpus to obtain a corpus to be trained;
constructing a sentence iterator by using a LineStrentence () method;
setting model parameters, inputting the model parameters into a Word2Vec model, and starting training;
and saving the trained Word2Vec deep learning model, and saving the Word vector file Word2vecFile in a non-binary form.
Further, the integrated similarity SimW(War,Wbm) The calculation formula of (2) is as follows:
SimW(War,Wbm)=λ1SimCilin(War,Wbm)+λ2SimHowNet(War,Wbm)+λ3SimWord2Vec(War,Wbm)
in the formula, λ1、λ2And lambda3Respectively represent SimCilin(War,Wbm)、SimHowNet(War,Wbm) And SimWord2Vec(War,Wbm) And satisfies lambda123=1。
Further, said λ1、λ2And lambda3The value taking situation is as follows:
(1) when W isar∈D,WbmE.g. when WarAnd WbmWhen present in traCilinFile, traHownetFile and word2vecFile at the same time, lambda1=λ2=λ3=1/3;
(2) When W isar∈E,Wbm∈E;War∈D,Wbm∈E;War∈E,WbmE.g. when WarAnd WbmOne of the words is present in both tracillinfile and word2vecFile, and the other word is present in both tracillinfile and word2vecFile or both, lambda is present in both tracillinfile, traHownetFile and word2vecFile1=λ3=1/2,λ2=0;
(3) When W isar∈F,Wbm∈F;War∈D,Wbm∈F;War∈F,WbmWhen e is equal to D, λ2=λ3=1/2,λ1=0;
(4) When W isar∈G,Wbm∈G;War∈D,Wbm∈G;War∈G,WbmWhen e is equal to D, λ1=λ2=1/2,λ3=0;
(5) When W isar∈A,Wbm∈A;War∈A,Wbm∈D;War∈D,Wbm∈A;WarE is A, Wbm E; war E is belonged to 0E, Wbm A is belonged to 1A; war is an element of A, Wbm is an element of G; war E is G, Wbm E is A; war E, Wbm G; war is G, Wbm is E, λ 1 is 1, λ 2 is 0;
(6) when W isar∈B,Wbm∈B;War∈B,Wbm∈D;War∈D,Wbm∈B;WarE.g. B, Wbm e.g. F; war is belonged to 0F, Wbm is belonged to 1B; war E B, Wbm E G; war E G, Wbm E B; war E F, Wbm E G; war is G, Wbm is F, λ 2 is 1, λ 1 is 0;
(7) when W isar∈C,Wbm∈C;War∈C,Wbm∈D;War∈D,Wbm∈C;WarC, Wbm E; war E, Wbm C1C; war belongs to C, Wbm belongs to F; war belongs to F, Wbm belongs to C; war E, Wbm F; war is belonged to F, Wbm is belonged to E, λ 3 is equal to 1, λ 1 is equal to λ 2 is equal to 0;
(8) the two words do not have any cross in tracillinFile, traHownetFile or word2vecFile, and the similarity weight lambda1、λ2、λ3Meaningless;
wherein, A represents a word set included in traCilinFile only; b represents a set of words that are only included in traHownetFile; c represents a set of words that exist only in word2 vecFile; d represents a word set which simultaneously exists in traCilinFile, traHownetFile and word2 vecFile; g represents a word set which is simultaneously included by traCilinFile and traHownetFile; e represents a set of words that exist in both traCilinFile and word2 vecFile; f denotes a set of words existing in both traHownetFile and word2 vecFile.
Further, the calculating the meaning similarity of the name of the first trademark and the name of the second trademark according to the first local similarity list and the second local similarity list includes:
calculating the meaning similarity Sim (a, b) of the first and second brand names using the following formula from the first and second local similarity lists:
Figure BDA0003160798530000081
in the formula, simarRepresents WaMiddle (r) th word and WbLocal similarity of (2); simbmRepresents WbM-th word and WaLocal similarity of (2); the first local similarity list is [ sim ]a1,sima2,…,simas](ii) a The second local similarity list is [ sim ]b1,simb2,…,simbt]。
Compared with the prior art, the invention has the following technical effects: the invention constructs a synonym library meeting the trademark examination standard for synonym forest and the unknown web by combining trademark examination and trial standard revised by trademark office and trademark review committee in 2016 (12 months), solves the problem of inaccurate synonym identification, trains a high-quality Word vector model by using a Word2Vec deep learning model, greatly expands the range of calculable words, solves the problem of limited knowledge base, and calculates the final trademark meaning similarity by using a dynamic weighting strategy, so that the result is more uniform and reasonable, and the accuracy of the approximate detection of the trademark meaning is improved.
Drawings
The following detailed description of embodiments of the invention refers to the accompanying drawings in which:
FIG. 1 is a flow chart of a method for calculating the similarity of Chinese trademarks;
FIG. 2 is an overall flow chart of a method for calculating the similarity of Chinese trademarks;
FIG. 3 is a text structure diagram of a synonym library of a forest of trademarks;
FIG. 4 is a text structure diagram of a synonym library of trademark Hopkins;
FIG. 5 is a diagram of an semantic hierarchy tree;
FIG. 6 is a schematic diagram of a distribution of words.
Detailed Description
To further illustrate the features of the present invention, refer to the following detailed description of the invention and the accompanying drawings. The drawings are for reference and illustration purposes only and are not intended to limit the scope of the present disclosure.
As shown in fig. 1 to 2, the present embodiment discloses a chinese trademark similarity calculation method, which takes the calculation of the similarity between the meaning of the first trademark a ═ sumitomo property "and the meaning of the second trademark b ═ sumitomo property" as an example, and includes the following steps S1 to S5:
s1, acquiring the name str of the first trademark a to be comparedaAnd the name str of the second trademark bbAnd performing word segmentation processing on the first trademark name and the second trademark name to respectively obtain a first word segmentation list WaAnd a second participle list Wb
It should be noted that str is a word segmentation tool pair in ansjaAnd strbPerforming word segmentation to obtain word lists W of a and b respectivelya{ 'Sumitomo', 'Gegen' } and WbIn the formula, the term "a" means a word "and" b "means a word" and "a" and "b", respectively.
S2, calculating the Word similarity of a Word forest, the Word similarity of a web and the Word similarity of Word2Vec by pairwise combination of the words in the two Word segmentation lists;
the method specifically comprises the following steps: traversing the W according to the front-back sequenceaIn each Word, respectively calculating the currently traversed Word and the W by adopting the calculation methods of Word forest Word similarity, web Word similarity and Word2Vec Word similaritybThe Word forest similarity, the web similarity and the Word2Vec similarity of each Word in the Chinese; traversing the W according to the front-back sequencebEach term in the Chinese language adopts term similarity of term forest, term similarity of web andthe Word2Vec Word similarity calculation method respectively calculates the currently traversed Word and WaThe Word forest similarity, the web similarity and the Word2Vec similarity of each Word in the Chinese sentence.
S3, respectively calculating the comprehensive similarity of the Word forest Word similarity, the Hopkinson web Word similarity and the Word2Vec Word similarity of each Word in the first Word segmentation list and each Word in the second Word segmentation list by adopting a dynamic weighting strategy to serve as a Word similarity group corresponding to each Word in the first Word segmentation list, taking the maximum value in the Word similarity group corresponding to each Word as the local similarity of the current Word, and forming a first local similarity list by using the local similarities of all the words in the first Word segmentation list;
s4, respectively calculating the comprehensive similarity of the Word forest Word similarity, the Hopkinson web Word similarity and the Word2Vec Word similarity of each Word in the second Word segmentation list and each Word in the first Word segmentation list by adopting a dynamic weighting strategy to serve as a Word similarity group corresponding to each Word in the second Word segmentation list, taking the maximum value in the Word similarity group corresponding to each Word as the local similarity of the current Word, and forming a second local similarity list by using the local similarities of all words in the second Word segmentation list;
the method specifically comprises the following steps: calculating the comprehensive similarity of the three similarities as W by adopting a dynamic weighting strategyaThe traversed words and WbThe similarity of each word and phrase is taken as W, and the maximum value in the similarity of all words and phrases is taken as WaThe traversed words and WbLocal similarity of (1), when W is traversedaAll the words in (1) can obtain a first local similarity list (sim) with the length of sa1,sima2,…,simas](ii) a In the same way, traverse WbEach term in (1) can obtain WbThe traversed words and WaFinally, a second local similarity list [ sim ] with the length of t can be obtainedb1,simb2,…,simbt]。
And S5, calculating the meaning similarity of the name of the first trademark and the name of the second trademark according to the first local similarity list and the second local similarity list.
In the embodiment, a dynamic weighting strategy is adopted, and the comprehensive similarity of the Word forest Word similarity, the web Word similarity and the Word2Vec Word similarity is calculated to serve as the Word similarity, so that the trademark meaning similarity is calculated, the calculation result is more uniform and reasonable, and the accuracy of trademark meaning approximate judgment is improved.
As a further preferable technical solution, in step S2, the r-th word W in the first word listarWith the mth word W in the second word-dividing listbmWord similarity Sim of the words betweenCilin(War,Wbm) The calculating step comprises:
(1) constructing a trade mark forest synonym library traCilin File by using a dictionary file cillinFile of synonym forest expansion edition;
the method specifically comprises the following steps: marking words in the same line with the same word forest codes in the cilinFile as a number 0, wherein the words are dissimilar; marking words in the same line with the same word forest code in the cilinFile as the same non-0 number, wherein the words are similar to each other; as shown in fig. 3, a text structure diagram of tracillinfile, taking "property" and "real property" as examples, the two words are "Dj 03a09 #", and the following labels are all marked as number 1, which indicates that the two are likely to cause confusion and cause approximation in the trademark field.
(2) Calculating the similarity Sim of the words and forest words according to the traCilinFile of the brand words and forest synonym libraryCilin(War,Wbm)。
The method specifically comprises the following steps: converting word W into brand word forest synonym library traCilinFilearThe word and phrase WbmConverting into word forest code and obtaining word WarThe word and phrase WbmThe corresponding combination of all word forest codes;
judging whether the word forest codes are equal in the combination or not based on the combination of all the word forest codes;
if yes, reading the word group of the row of the current code, and judging the word WarThe word and phrase WbmIf they are similar, then the sum is recordedCilin(War,Wbm) 0, if approximate, then SimCilin(War,Wbm)=1;
If not, calculating the similarity of all word forest coding combinations by using the word forest similarity calculation method based on the information content, and taking the maximum value as SimCilin(War,Wbm) The calculation formula is as follows:
Figure BDA0003160798530000121
in the formula (II), Sim'Cilin(Cai,Cbj) Means word WarThe ith word forest of (1) encodes CaiThe word and phrase WbmThe jth word forest of (1) encodes CbjThe similarity of (2); n is a radical of1And N2Are all positive integers;
two-word-forest-coded word forest similarity Sim'Cilin(Ca,Cb) The calculation formula of (a) is as follows:
Figure BDA0003160798530000122
in the formula (II), LCS (C)a,Cb) Expression forest code CaHeyulin code CbThe nearest common parent node of; IC (C) represents the information content of the word forest code C, and the calculation formula is as follows:
Figure BDA0003160798530000123
wherein hypo (C) is the number of lower nodes of C in the body, and C is CaOr Cb(ii) a Maxnodes is the total number of ontology nodes.
In this example, "Sumitomo" is not recorded in traCilinFile, and therefore SimCilin('Sumitomo', 'real estate') 0, SimCilin('property', 'friend') -0;
in summary, SimCilin('Sumitomo' ) -1,SimCilin('Sumitomo', 'real estate') 0, SimCilin('property', 'Sumitomo') -0, SimCilin('property', 'real property') 1.
It should be noted that, in this embodiment, the influence of the information content on the meaning of the word is reflected by constructing the trademark word forest synonym library meeting the trademark review standard and calculating the word forest word similarity based on the trademark word forest synonym library.
As a further preferable technical solution, in step S2, the r-th word W in the first word listarWith the mth word W in the second word-dividing listbmBetween web word similarity SimHowNet(War,Wbm) The calculating step comprises:
(1) constructing a trademark and known network synonym library traHownetFile by utilizing a dictionary file of the known network;
the method specifically comprises the following steps: for each semantic item in the said nowetFile, firstly making statistics and obtaining word group of each semantic item, then marking the word group of the same line in each semantic item to obtain traHownetFile, the construction of the said traHownetFile needs to combine "trademark examination and trial standard" to judge synonyms, the construction steps include: marking words in the same line with the same meaning item in the hosnetFile as a number 0, wherein the words are not similar to each other; the words in the same line with the same meaning item in the houseleetfile are marked as the same non-0 number; as shown in fig. 4, which is a text structure diagram of traHownetFile, taking "property" and "real property" as an example, because two words are different from each other in terms of meaning item codes, they are not in the same line, and the mark after the two words is shown in fig. 4.
(2) Calculating the similarity Sim of the words in the cognitive network according to the trademark-cognitive network synonym library traHownetFileHowNet(War,Wbm)。
The method specifically comprises the following steps: obtaining words W according to trademark known network synonym library traHownetFilearAnd the word WbmAnd obtains the word WarThe word and phrase WbmA combination of corresponding all meanings;
judging whether the combination has the condition that the meaning items are equal or not based on the combination of all the meaning items;
if yes, reading the word group of the line of the current meaning item, and judging W according to the word markarAnd WbmIf they are similar, then the sum is recordedHowNet(War,Wbm) 0, if approximate, then SimHowNet(War,Wbm)=1;
If not, calculating the similarity of all the meaning item combinations by adopting a similarity calculation method based on the known network, and taking the maximum value as SimHowNet(War,Wbm) The calculation formula is as follows:
Figure BDA0003160798530000141
in the formula (II), Sim'HowNet(Sai,Sbj) Represents WarThe ith item of sense SaiAnd WbmThe j-th item ofbjSimilarity of (2), N1And N2Are all positive integers;
similarity Sim 'of two artificial items'HowNet(Sa,Sb) The calculation formula of (a) is as follows:
Figure BDA0003160798530000142
in the formula (II), Sim'1(Sa,Sb) Representing two items of significance SaAnd SbA first degree of similarity to an independent sense; sim'2(Sa,Sb) Representing similarity of other independent senses; sim'3(Sa,Sb) Representing similarity of the relation senses; sim'4(Sa,Sb) Representing the similarity of the symbol senses; beta is akK is more than or equal to 1 and less than or equal to 4 and has beta for adjustable parameters1234=1,β1≥β2≥β3≥β4,βkThe values of (A) are as follows: beta is a1=0.5,β2=0.2,β3=0.17,β4=0.13。
In the present embodiment, since there is no equality between the meaning item codes of "property" and "real property", the similarity of all the meaning item combinations is calculated by directly using the similarity calculation method based on the knowledge network.
Obtaining a semantic description formula corresponding to each word according to the semantic expression of the word: in the semantic expression of 'property', the 'welth | money and money' is a first independent semantic description formula, and has two symbolic semantic description formulas of '# earth | earth' and '# building | building', and no other independent semantic description formula and relational semantic description formula exist; in the semantic expression of real estate, physical substance is a first independent semantic description formula, and has two symbolic semantic description formulas of # welth money and ^ TakeAway removal, and no other independent semantic description formula and relational semantic description formula.
Calculating Sim'1(Sa,Sb): because the first independent semantic-describing formula only contains one semantic, the similarity formula of the semantic can be directly adopted for calculation, and the calculation formula is as follows:
Figure BDA0003160798530000151
in the formula, paAnd pbRepresents an antigen; alpha is an adjustable parameter, and alpha is 1.6; dep (p)a)、dep(pb) Represents pa、pbDepth on the hierarchy tree of the sememe, i.e. sememe depth, min (dep (p)a),dep(pb) Is represented by pa、pbThe minimum value of the depth of the sememe; dist (p)a,pb) Represents paAnd pbPath length in the hierarchy tree of the sememe, i.e. the sememe distance, when paAnd pbWhen not in the same primitive hierarchical tree, setting the primitive distance as 20;
as shown in FIG. 5 as paEqual to "welth | money" and pbDist (p) can be obtained from the tree of the semantic hierarchy where "physical | substance" is locateda,pb)=3,dep(pa)=5,dep(pb)=2,min(dep(pa),dep(pb) 2, so Sim (p) is calculated according to the formulaa,pb)′=Sim′1(Sa,Sb)=0.5161;
Calculating Sim'2(Sa,Sb)、Sim′3(Sa,Sb) And Sim'4(Sa,Sb): there are three cases: if no sense exists in both the two sense description formulas, the similarity is directly 1; if only one of the semantic description formulas does not have any semantic, the similarity takes a default value of 0.2; if the two semantic source description formulas contain one or more semantic sources, calculating the similarity of every two semantic sources according to the similarity calculation mode of the first independent semantic source description formula by all combinations of the semantic sources, and taking the maximum value as the similarity value.
In the embodiment, none of the real estate and the real estate has other independent and relational sememes, and is consistent with the first case, so Sim'2(Sa,Sb)=1,Sim′3(Sa,Sb) 1, there are a plurality of symbol semaphores of "local" and "real" which are "# earth | ground, # building | building" # weather | money, and ^ TakeAway | moving ", respectively, and hence Sim 'is calculated as the third case'4(Sa,Sb)=0.2。
Finally, according to Sim'1(Sa,Sb)、Sim′2(Sa,Sb)、Sim′3(Sa,Sb) And Sim'4(Sa,Sb) Calculating to obtain the similarity Sim ' of the meaning items of ' real estate ' and ' real estate 'HowNet(Sa,Sb) Sim is available as 0.4625, both words having only one meaning termHowNet('property', 'real property') 0.4625; in this example, Sim is not registered in traHownetFile, so "sumitomo" is not registered in traHownetFileHowNet('Sumitomo', 'real estate') 0, SimHowNet('property', 'friend') -0.
In summary, SimHowNet('Sumitomo' ) -1, SimHowNet('Sumitomo', 'real estate') 0, SimHowNet('property', 'Sumitomo') -0, SimHowNet('property', 'real property') 0.4625.
In the embodiment, a trademark-known web synonym library meeting the trademark examination standard is constructed on the basis of a web-known item dictionary, the web similarity of the words is calculated on the basis of the trademark-known web synonym library, and the influence of the depth of the sense and the distance of the sense on the meaning similarity of the words is considered.
It should be noted that, in this embodiment, a synonym library meeting the trademark review standard is respectively constructed for synonyms in the synonym forest and the synonym in the two ontology knowledge bases of the public network in combination with the trademark review and trial standard, so that words with the same forest code and the same meaning term better meet the judgment of synonyms in the trademark field.
As a further preferable technical solution, in step S2, the r-th word W in the first word listarWith the mth word W in the second word-dividing listbmWord2Vec Word similarity Sim betweenWord2Vec(War,Wbm) The calculating step comprises:
(1) training a Wikipedia Chinese language corpus by using a Word2Vec deep learning model to obtain a Word vector file Word2vecFile, which specifically comprises the following steps:
1-1) downloading a Wikipedia Chinese language database of 12 months in 2020, cleaning and preprocessing the Wikipedia Chinese language database to obtain a language database to be trained;
1-2) constructing a sentence iterator by using a LineStrength () method;
1-3) setting model parameters: the word vector dimension size is set to 100; setting the maximum distance window between the current central word and the predicted context word as 5; the minimum word frequency min _ count allowed in the corpus is set to be 5; the training model sg is set to 1; the iteration number iter is set to 5;
1-4) inputting the model parameters into a Word2Vec model, and starting training;
1-5) storing the trained Word2Vec deep learning model and storing the Word vector file Word2vecFile in a non-binary form.
(2) According to the word vector file word2vecFile, the word W is converted into the word vector filearThe word and phrase WbmConverting into word vector, and calculating Sim by cosine formulaWord2Vec(War,Wbm) The formula is as follows:
Figure BDA0003160798530000171
in the formula, VarAnd VbmRespectively represent WaMiddle (r) th word WarWord vector of and WbM-th word WbmN denotes the dimension of the Word vector during the training of the Word2Vec model, VarnRepresents VarN-th value of, BbmnRepresents VbmThe nth value of (a). It should be noted that Sim is calculated in this embodimentWord2Vec('Sumitomo' ) -1, SimWord2Vec('Sumitomo', 'real estate') -0.5816, SimWord2Vec('property', 'Sumitomo') -0.4892, SimWord2Vec('property', 'real property') 0.6853.
It should be noted that, in this embodiment, a Word2Vec deep learning model is used to train a wikipedia chinese corpus with rich vocabularies, a higher-quality Word vector model is obtained, and then, more similar vocabularies are obtained by using Word vectors, so that the problem of limited knowledge base in a body semantic dictionary is solved.
As a more preferable technical solution, in step S3, the first local similarity list [ sim [ ]a1,sima2,…,simas]The construction process comprises the following steps:
calculating a first list of terms W using the dynamic weighting strategyaList W of words and second participles in (1)bThe comprehensive similarity of the Word forest Word similarity, the web Word similarity and the Word2Vec Word similarity of each Word in the Chinese language is taken as WaThe traversed words and WbThe similarity of each word and phrase is taken as W, and the maximum value in the similarity of all words and phrases is taken as WaThe traversed words and WbLocal similarity of (1), when W is traversedaAll the words in the Chinese character are used for obtaining a local similarity list with the length of sa1,sima2,…,simas];
In step S4, the second local similarity list [ simb1,simb2,…,simbt]The construction process comprises the following steps:
calculating a second participle list W using the dynamic weighting policybList W of words and first word-segments in (1)aThe comprehensive similarity of the Word forest Word similarity, the web Word similarity and the Word2Vec Word similarity of each Word in the Chinese language is taken as WbThe traversed words and WaThe similarity of each word and phrase is taken as W, and the maximum value in the similarity of all words and phrases is taken as WbThe traversed words and WaLocal similarity of (1), when W is traversedbAll the words in the Chinese character are used for obtaining a local similarity list (sim) with the length of tb1,simb2,…,simbt]。
Further, the integrated similarity SimW(War,Wbm) The calculation formula of (2) is as follows:
SimW(War,Wbm)=λ1SimCilin(War,Wbm)+λ2SimHowNet(War,Wbm)+λ3SimWord2Vec(War,Wbm)
in the formula, λ1、λ2And lambda3Respectively represent SimCilin(War,Wbm)、SimHowNet(War,Wbm) And SimWord2Vec(War,Wbm) And satisfies lambda1231, said λ, as shown in fig. 61、λ2And lambda3Is based on WarAnd WbmThe distribution in tracillinfile, traHownetFile and word2vecFile is obtained, in fig. 6, U represents all word sets; a represents a word that is only included in tracillinfile; b is only indicated atWords included in traHownetFile; c represents a word existing only in word2 vecFile; d represents a word existing in traCilinFile, traHownetFile and word2vecFile at the same time; g represents a word simultaneously included by traCilinFile and traHownetFile; e represents a word existing in both traCilinFile and word2 vecFile; f represents a word existing in both traHownetFile and word2vecFile, and the value taking situation is specifically divided into 8 types:
(1) when W isar∈D,WbmE.g. when WarAnd WbmWhen present in traCilinFile, traHownetFile and word2vecFile at the same time, lambda1=λ2=λ3=1/3;
(2) When W isar∈E,Wbm∈E;War∈D,Wbm∈E;War∈E,WbmE.g. when WarAnd WbmOne of the words is present in both tracillinfile and word2vecFile, and the other word is present in both tracillinfile and word2vecFile or both, lambda is present in both tracillinfile, traHownetFile and word2vecFile1=λ3=1/2,λ2=0;
In addition, W isar∈D,WbmE or War∈E,WbmE D all represents that one Word simultaneously exists in the Word forest, the knownnet and the Word2Vec, and the other Word simultaneously exists in the Word forest and the Word2Vec, namely one Word is not in the knownnet, and the similarity in the knownnet is necessarily 0, so the lambda is taken1=λ3=1/2,λ2=0。
(3) When W isar∈F,Wbm∈F;War∈D,Wbm∈F;War∈F,WbmWhen e is equal to D, λ2=λ3=1/2,λ1=0;
(4) When W isar∈G,Wbm∈G;War∈D,Wbm∈G;War∈G,WbmWhen e is equal to D, λ1=λ2=1/2,λ3=0;
(5) When W isar∈A,Wbm∈A;War∈A,Wbm∈D;War∈D,Wbm∈A;WarE is A, Wbm E; war E is belonged to 0E, Wbm A is belonged to 1A; war is an element of A, Wbm is an element of G; war E is G, Wbm E is A; war E, Wbm G; war is G, Wbm is E, λ 1 is 1, λ 2 is 0;
in addition, W isar∈A,WbmE.g. D or War∈D,WbmThe epsilon A represents that one Word simultaneously exists in tracillinFile, traHownetFile and Word2vecFile, and the other Word only exists in tracillinFile, namely one Word simultaneously does not exist in traHownetFile and Word2vecFile, and the similarity of the known net and the similarity of Word2vecFile are necessarily 0, so that lambda is taken1=1,λ2=λ3=0。
(6) When W isar∈B,Wbm∈B;War∈B,Wbm∈D;War∈D,Wbm∈B;WarE.g. B, Wbm e.g. F; war is belonged to 0F, Wbm is belonged to 1B; war E B, Wbm E G; war E G, Wbm E B; war E F, Wbm E G; war is G, Wbm is F, λ 2 is 1, λ 1 is 0;
(7) when W isar∈C,Wbm∈C;War∈C,Wbm∈D;War∈D,Wbm∈C;WarC, Wbm E; war E, Wbm C1C; war belongs to C, Wbm belongs to F; war belongs to F, Wbm belongs to C; war E, Wbm F; war is belonged to F, Wbm is belonged to E, λ 3 is equal to 1, λ 1 is equal to λ 2 is equal to 0;
(8) other cases, considering the weight is meaningless;
it should be noted that, in this embodiment, the other condition means that in this condition, there is no intersection between two words in tracillinfile, traHownetFile or word2vecFile, and at this time, all three similarity values are necessarily equal to 0, and this condition considers that the similarity weight is meaningless, for example, W is meaninglessar∈B,WbmE.g. C or War∈C,WbmE.b, i.e. the case where one word is present only in traHownetFile and the other word is present only in word2 vecFile.
This example was calculated to obtain SimW('Sumitomo' ) -1, SimW(' Sumitomo ', ' real estate)′)=0.5816,SimW('property', 'Sumitomo') -0.4892, SimW('property') and 'real property') 0.7159, so that two local similarity lists are available, both of which are [1.0, 0.7159 ]]。
It should be noted that, in the embodiment, a dynamic weighting strategy is adopted, and the comprehensive similarity of the Word forest Word similarity, the web Word similarity, and the Word2Vec Word similarity is calculated as the Word similarity, so that the trademark meaning similarity is calculated, the calculation result is more uniform and reasonable, and the accuracy of the trademark meaning approximate judgment is improved.
As a more preferable embodiment, in step S5: calculating the meaning similarity of the name of the first trademark and the name of the second trademark according to the first local similarity list and the second local similarity list, wherein the meaning similarity comprises the following steps:
according to the first local similarity list [ sima1,sima2,…,simas]And said second local similarity list simb1,simb2,…,simbt]Calculating the similarity of meaning Sim (a, b) of the first brand name and the second brand name using the formula:
Figure BDA0003160798530000211
in the formula, simarRepresents WaMiddle (r) th word and WbLocal similarity of (2), simbmRepresents WbM-th word and WaLocal similarity of (3).
In the present embodiment, Sim (a, b) is calculated to be 0.858 by using the meaning similarity formula according to the first local similarity list and the second local similarity list calculated by the first trademark name and the second trademark name.
As a more preferable mode, in the present embodiment, Sim (a, b) is compared with the infringement threshold θ of the similarity between the meanings of trademarks, which is 0.75, and if it is equal to or greater than the infringement threshold, it is determined as the approximate trademark application, so that "sumitomo property" is the approximate trademark application of "sumitomo property".
It should be understood that the specific value of the infringement threshold in this embodiment is an example, and those skilled in the art may set the specific value of the infringement threshold according to actual situations.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A Chinese trademark similarity calculation method is characterized by comprising the following steps:
acquiring names of a first trademark and a second trademark to be compared, and performing word segmentation processing on the first trademark name and the second trademark name to respectively obtain a first word segmentation list and a second word segmentation list;
calculating the Word forest Word similarity, the web Word similarity and the Word2Vec Word similarity for the pairwise combination of the words in the two Word segmentation lists;
respectively calculating the comprehensive similarity of the Word-forest Word similarity, the web Word similarity and the Word2Vec Word similarity of each Word in the first Word segmentation list and each Word in the second Word segmentation list by adopting a dynamic weighting strategy to serve as a Word similarity group corresponding to each Word in the first Word segmentation list, taking the maximum value in the Word similarity group corresponding to each Word as the local similarity of the current Word, and forming a first local similarity list by using the local similarities of all words in the first Word segmentation list;
respectively calculating the comprehensive similarity of the Word-forest Word similarity, the web Word similarity and the Word2Vec Word similarity of each Word in the second Word segmentation list and each Word in the first Word segmentation list by adopting a dynamic weighting strategy to serve as a Word similarity group corresponding to each Word in the second Word segmentation list, taking the maximum value in the Word similarity group corresponding to each Word as the local similarity of the current Word, and forming a second local similarity list by using the local similarities of all words in the second Word segmentation list;
and calculating the meaning similarity of the name of the first trademark and the name of the second trademark according to the first local similarity list and the second local similarity list.
2. The method of calculating the similarity of a chinese trademark according to claim 1, wherein the r-th word W in the first word-dividing listarWith the mth word W in the second word-dividing listbmWord similarity Sim of the words betweenCilin(War,Wbm) The calculating step comprises:
constructing a trade mark forest synonym library traCilin File by using a dictionary file cillinFile of synonym forest expansion edition;
converting word W into brand word forest synonym library traCilinFilearThe word and phrase WbmConverting into word forest code and obtaining word WarThe word and phrase WbmThe corresponding combination of all word forest codes;
judging whether the word forest codes are equal in the combination or not based on the combination of all the word forest codes;
if yes, reading the word group of the row of the current code, and judging the word WarThe word and phrase WbmIf they are similar, then the sum is recordedCilin(War,Wbm) 0, if approximate, then SimCilin(War,Wbm)=1;
If not, calculating the similarity of all word forest coding combinations by adopting a word forest similarity calculation method based on information content, and taking the maximum value as SimCilin(War,Wbm)。
3. The method for calculating the similarity of a Chinese trademark according to claim 2, wherein the construction of a synonym library traCilin File of a trademark word forest by using a "synonym forest extension" dictionary file cillin File comprises:
marking words in the same line with the same word forest codes in the dictionary file cilinFile of the synonym forest expansion edition as a number 0, wherein the words are dissimilar;
and marking words in the same line with the same word forest code in the synonym forest expansion edition dictionary file cilinFile as the same non-0 number, and constructing the trademark word forest synonym library traCilinFile.
4. The method of calculating the similarity of a chinese trademark according to claim 1, wherein the r-th word W in the first word-dividing listarWith the mth word W in the second word-dividing listbmBetween web word similarity SimHowNet(War,Wbm) The calculating step comprises:
constructing a trademark and known network synonym library traHownetFile by utilizing a dictionary file of the known network;
obtaining words W according to trademark known network synonym library traHownetFilearAnd the word WbmAnd obtains the word WarThe word and phrase WbmA combination of corresponding all meanings;
judging whether the combination has the condition that the meaning items are equal or not based on the combination of all the meaning items;
if yes, reading the word group of the line of the current meaning item, and judging W according to the word markarAnd WbmIf they are similar, then the sum is recordedHowNet(War,Wbm) 0, if approximate, then SimHowNet(War,Wbm)=1;
If not, calculating the similarity of all the meaning item combinations by adopting a similarity calculation method based on the known network, and taking the maximum value as SimHowNet(War,Wbm)。
5. The method for calculating the similarity of the Chinese trademark according to claim 4, wherein the method for constructing the synonym library traHownetFile of the trademark and the known web by using the dictionary file of the known web comprises the following steps:
marking words in the same line with the same meaning item in the < Zhi network > dictionary file as a number 0, wherein the words are not similar to each other;
and marking words in the same row with the same meaning item in the < namely known network > dictionary file housewife as the same non-0 number, and constructing the trademark wordbook thesaurus trahousewife.
6. The method of calculating the similarity of a chinese trademark according to claim 1, wherein the r-th word W in the first word-dividing listarWith the mth word W in the second word-dividing listbmWord2Vec Word similarity Sim betweenWord2Vec(War,Wbm) The calculating step comprises:
training a Wikipedia Chinese language database by using a Word2Vec deep learning model to obtain a Word vector file Word2 vecFile;
according to the word vector file word2vecFile, the word W is converted into the word vector filearThe word and phrase WbmConverting into word vector, and calculating Sim by cosine formulaWord2Vec(War,Wbm)。
7. The method for calculating the similarity of the Chinese trademark according to claim 6, wherein the training of the Wikipedia Chinese corpus by using the Word2Vec deep learning model to obtain the Word vector file Word2vecFile comprises the following steps:
acquiring a Wikipedia Chinese corpus, and cleaning and preprocessing the Wikipedia Chinese corpus to obtain a corpus to be trained;
constructing a sentence iterator by using a LineStrentence () method;
setting model parameters, inputting the model parameters into a Word2Vec model, and starting training;
and saving the trained Word2Vec deep learning model, and saving the Word vector file Word2vecFile in a non-binary form.
8. The method for calculating the similarity of the Chinese trademarks of claim 1, wherein the integrated similarity SimW(War,Wbm) The calculation formula of (2) is as follows:
Simw(War,Wbm)=λ1SimCilin(War,Wbm)+λ2SimHowNet(War,Wbm)+λ3SimWord2Vec(War,Wbm)
in the formula, λ1、λ2And lambda3Respectively represent SimCilin(War,Wbm)、SimHowNet(War,Wbm) And SimWord2Vec(War,Wbm) And satisfies lambda123=1。
9. The method for calculating the similarity of Chinese trademarks of claim 8, wherein λ is1、λ2And lambda3The value taking situation is as follows:
(1) when W isar∈D,WbmE.g. when WarAnd WbmWhen present in traCilinFile, traHownetFile and word2vecFile at the same time, lambda1=λ2=λ3=1/3;
(2) When W isar∈E,Wbm∈E;War∈D,Wbm∈E;War∈E,WbmE.g. when WarAnd WbmOne of the words is present in both tracillinfile and word2vecFile, and the other word is present in both tracillinfile and word2vecFile or both, lambda is present in both tracillinfile, traHownetFile and word2vecFile1=λ3=1/2,λ2=0;
(3) When W isar∈F,Wbm∈F;War∈D,Wbm∈F;War∈F,WbmWhen e is equal to D, λ2=λ3=1/2,λ1=0;
(4) When W isar∈G,Wbm∈G;War∈D,Wbm∈G;War∈G,WbmWhen e is equal to D, λ1=λ2=1/2,λ3=0;
(5) When W isar∈A,Wbm∈A;War∈A,Wbm∈D;War∈D,Wbm∈A;WarE is A, Wbm E; war E is belonged to 0E, Wbm A is belonged to 1A; war is an element of A, Wbm is an element of G; war E is G, Wbm E is A; war E, Wbm G; war is G, Wbm is E, λ 1 is 1, λ 2 is 0;
(6) when W isar∈B,Wbm∈B;War∈B,Wbm∈D;War∈D,Wbm∈B;WarE.g. B, Wbm e.g. F; war is belonged to 0F, Wbm is belonged to 1B; war E B, Wbm E G; war E G, Wbm E B; war E F, Wbm E G; war is G, Wbm is F, λ 2 is 1, λ 1 is 0;
(7) when W isar∈C,Wbm∈C;War∈C,Wbm∈D;War∈D,Wbm∈C;WarC, Wbm E; war E, Wbm C1C; war belongs to C, Wbm belongs to F; war belongs to F, Wbm belongs to C; war E, Wbm F; war is belonged to F, Wbm is belonged to E, λ 3 is equal to 1, λ 1 is equal to λ 2 is equal to 0;
(8) the two words do not have any cross in tracillinFile, traHownetFile or word2vecFile, and the similarity weight lambda1、λ2、λ3Meaningless;
wherein, A represents a word set included in traCilinFile only; b represents a set of words that are only included in traHownetFile; c represents a set of words that exist only in word2 vecFile; d represents a word set which simultaneously exists in traCilinFile, traHownetFile and word2 vecFile; g represents a word set which is simultaneously included by traCilinFile and traHownetFile; e represents a set of words that exist in both traCilinFile and word2 vecFile; f denotes a set of words existing in both traHownetFile and word2 vecFile.
10. The chinese trademark similarity calculation method according to any one of claims 1 to 9, wherein calculating the meaning similarity of the name of the first trademark and the name of the second trademark based on the first local similarity list and the second local similarity list includes:
calculating the meaning similarity Sim (a, b) of the first and second brand names using the following formula from the first and second local similarity lists:
Figure FDA0003160798520000061
in the formula, simarRepresents WaMiddle (r) th word and WbLocal similarity of (2); simbmRepresents WbM-th word and WaLocal similarity of (2); the first local similarity list is [ sim ]a1,sima2,...,simas](ii) a The second local similarity list is [ sim ]b1,simb2,...,simbt]。
CN202110790797.XA 2021-07-13 2021-07-13 Chinese trademark similarity calculation method Pending CN113468885A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110790797.XA CN113468885A (en) 2021-07-13 2021-07-13 Chinese trademark similarity calculation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110790797.XA CN113468885A (en) 2021-07-13 2021-07-13 Chinese trademark similarity calculation method

Publications (1)

Publication Number Publication Date
CN113468885A true CN113468885A (en) 2021-10-01

Family

ID=77880056

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110790797.XA Pending CN113468885A (en) 2021-07-13 2021-07-13 Chinese trademark similarity calculation method

Country Status (1)

Country Link
CN (1) CN113468885A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628245A (en) * 2023-07-24 2023-08-22 四海良田(天津)智能科技有限公司 Intelligent trademark recommendation method and system based on artificial intelligence
CN117521116A (en) * 2024-01-04 2024-02-06 卓世科技(海南)有限公司 Large language model privacy information protection method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628245A (en) * 2023-07-24 2023-08-22 四海良田(天津)智能科技有限公司 Intelligent trademark recommendation method and system based on artificial intelligence
CN116628245B (en) * 2023-07-24 2023-10-27 四海良田(天津)智能科技有限公司 Intelligent trademark recommendation method and system based on artificial intelligence
CN117521116A (en) * 2024-01-04 2024-02-06 卓世科技(海南)有限公司 Large language model privacy information protection method
CN117521116B (en) * 2024-01-04 2024-04-19 卓世科技(海南)有限公司 Large language model privacy information protection method

Similar Documents

Publication Publication Date Title
CN106407333B (en) Spoken language query identification method and device based on artificial intelligence
CN111444726B (en) Chinese semantic information extraction method and device based on long-short-term memory network of bidirectional lattice structure
CN108897857B (en) Chinese text subject sentence generating method facing field
CN110825881B (en) Method for establishing electric power knowledge graph
CN110019839B (en) Medical knowledge graph construction method and system based on neural network and remote supervision
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN106599032B (en) Text event extraction method combining sparse coding and structure sensing machine
CN106970910B (en) Keyword extraction method and device based on graph model
CN107967257A (en) A kind of tandem type composition generation method
CN107527073A (en) The recognition methods of entity is named in electronic health record
CN110134954B (en) Named entity recognition method based on Attention mechanism
CN110287323B (en) Target-oriented emotion classification method
CN108388651A (en) A kind of file classification method based on the kernel of graph and convolutional neural networks
CN113468885A (en) Chinese trademark similarity calculation method
CN110879831A (en) Chinese medicine sentence word segmentation method based on entity recognition technology
CN111950283B (en) Chinese word segmentation and named entity recognition system for large-scale medical text mining
Gao et al. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF
CN105893481B (en) Relationship digestion procedure between a kind of entity based on Markov clustering
CN111144119A (en) Entity identification method for improving knowledge migration
CN114528919A (en) Natural language processing method and device and computer equipment
CN113515632A (en) Text classification method based on graph path knowledge extraction
CN106886565A (en) A kind of basic house type auto-polymerization method
CN113836306B (en) Composition automatic evaluation method, device and storage medium based on chapter component identification
CN115238040A (en) Steel material science knowledge graph construction method and system
CN112215007B (en) Organization named entity normalization method and system based on LEAM model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination