CN112507176A - Automatic determination method and device for domain name infringement, electronic equipment and storage medium - Google Patents

Automatic determination method and device for domain name infringement, electronic equipment and storage medium Download PDF

Info

Publication number
CN112507176A
CN112507176A CN202011393629.9A CN202011393629A CN112507176A CN 112507176 A CN112507176 A CN 112507176A CN 202011393629 A CN202011393629 A CN 202011393629A CN 112507176 A CN112507176 A CN 112507176A
Authority
CN
China
Prior art keywords
domain name
length
character string
similarity
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011393629.9A
Other languages
Chinese (zh)
Inventor
张师琲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202011393629.9A priority Critical patent/CN112507176A/en
Publication of CN112507176A publication Critical patent/CN112507176A/en
Priority to PCT/CN2021/082729 priority patent/WO2022116419A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of artificial intelligence, and particularly discloses an automatic judgment method and device for domain name infringement, electronic equipment and a storage medium, wherein the automatic judgment method for domain name infringement comprises the following steps: acquiring feature information of a domain name to be maintained, wherein the feature information comprises: domain name information consisting of English letters, and Chinese character information; screening the domain name matched with the characteristic information from a preset domain name library as a candidate domain name suspected of infringing; comparing the domain name to be maintained with the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name; and judging the infringement of the candidate domain name according to the similarity. By adopting the automatic judgment method for domain name infringement, similar domain names can be quickly positioned for comparison, automatic processing of domain name infringement is realized, and a large amount of labor cost is saved while the comparison accuracy is ensured.

Description

Automatic determination method and device for domain name infringement, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a high-precision automatic domain name infringement judgment method and device, electronic equipment and a storage medium.
Background
With the popularization of networks, network communication technology has gone deep into various fields in an irreplaceable position, and the importance of a domain name system is self-evident as the basis of the internet. At present, the internet is full of counterfeit domain names of various large or well-known enterprises, and the counterfeit domain names not only threaten the network security, but also bring reputation influence to the large or well-known enterprises.
However, the form of domain name spoofing is complex, whether domain name spoofing really constitutes infringement or not, and domain name dispute judgment is required. Domain name disputes are disputes that are usually caused by the registration or use of internet domain names.
Currently, the dispute degree of the domain name is determined by checking the information of the right-maintaining party and the information of the infringing party one by one in a manual checking mode. However, this method is not only inefficient, has high requirements for the examiners, and is high in labor cost, but also the determination result is easily affected by the subjective opinion of the individual, and is not just fair.
Disclosure of Invention
In order to solve the above problems in the prior art, embodiments of the present application provide an automatic determination method and apparatus for domain name infringement, an electronic device, and a storage medium, which can quickly locate similar domain names for comparison, and save a large amount of labor cost while achieving high-accuracy automatic determination of domain name infringement.
In a first aspect, an embodiment of the present application provides an automatic determination method for domain name infringement, including:
acquiring feature information of a domain name to be maintained, wherein the feature information comprises: domain name information consisting of English letters, and Chinese character information;
screening a domain name matched with the characteristic information in a preset domain name library to serve as a suspected infringement candidate domain name;
comparing the domain name to be maintained with the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name;
and judging the infringement of the candidate domain name according to the similarity.
In some embodiments of the present application, comparing the domain name to be maintained with the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name includes:
extracting a first characteristic character string of a domain name to be maintained;
extracting a second characteristic character string of the candidate domain name;
acquiring the length of the longest common substring of the first characteristic character string and the second characteristic character string;
and determining the similarity between the domain name to be maintained and the candidate domain name according to the length of the longest common substring.
In some embodiments of the present application, obtaining the length of the longest common substring of the first and second characteristic strings comprises:
acquiring the number of characters of a first characteristic character string and the number of characters of a second characteristic character string;
if the number of the characters of the first characteristic character string and/or the number of the characters of the second characteristic character string are/is 0, setting the length of the longest common substring to be 0;
if the number of the characters of the first characteristic character string and the number of the characters of the second characteristic character string are both greater than 0, acquiring a tail character of the first characteristic character string as a first character, and acquiring a tail character of the second characteristic character string as a second character;
if the first character and the second character are the same, setting the length of the longest common sub-string as the sum of the lengths of the first characteristic character string without the first character and the second characteristic character string without the second character;
and if the first character and the second character are different, setting the length of the longest common substring of the first characteristic character string without the first character and the second characteristic character string as a first length, setting the length of the longest common substring of the first characteristic character string without the second character as a second length, and setting the length of the longest common substring as the maximum value of the first length and the second length.
In some embodiments of the present application, determining the similarity between the domain name to be protected and the candidate domain name according to the length of the longest common substring includes:
acquiring a first length of a first characteristic character string;
acquiring a second length of the second characteristic character string;
obtaining a weight according to the first length and the second length;
and weighting the length of the longest common substring according to the weight to obtain the similarity.
In some embodiments of the present application, obtaining the weight according to the first length and the second length includes:
acquiring a difference value between the first length and the second length and a sum of the first length and the second length;
acquiring a first coefficient according to the difference, wherein the smaller the difference is, the larger the first coefficient is;
and acquiring a weight according to the first coefficient and the sum of the first length and the second length.
In some embodiments of the present application, when the comparison processing is first comparison processing of an english dimension, the first characteristic character string is an english character string of the domain name to be protected, and the similarity is an english similarity;
when the comparison processing is the second comparison processing of the Chinese dimensionality, the first characteristic character string is a pinyin character string of the Chinese keyword of the domain name to be protected, and the similarity is the Chinese similarity.
In some embodiments of the present application, the determining that the candidate domain name is infringed according to the processing result of the comparison processing includes:
and if the English similarity is greater than a first threshold and/or the Chinese similarity is greater than a second threshold, judging the infringement of the candidate domain name.
In a second aspect, an embodiment of the present application provides an automatic determination apparatus for domain name infringement, including:
the feature extraction module is used for acquiring feature information of the domain name to be maintained, wherein the feature information comprises: domain name information consisting of English letters, and Chinese character information;
the candidate domain name determining module is used for screening the domain name matched with the characteristic information from a preset domain name library to serve as a suspected infringement candidate domain name;
the comparison module is used for comparing the domain name to be maintained with the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name;
and the judging module is used for judging the infringement of the candidate domain name according to the similarity.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor coupled to a memory, the memory configured to store a computer program, the processor configured to execute the computer program stored in the memory to cause the electronic device to perform the method of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium storing a computer program, the computer program causing a computer to execute the method according to the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer operable to cause a computer to perform a method according to the first aspect.
The implementation of the embodiment of the application has the following beneficial effects:
it can be seen that in the embodiment of the application, a feature extraction comparison mode is adopted first to quickly locate the candidate domain name suspected of infringement, then the to-be-maintained domain name and the candidate domain name are compared, and the infringement judgment is performed on the candidate domain name according to the comparison processing result, so that the automatic processing of domain name infringement is realized, and a large amount of labor cost is saved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of an automated method for determining domain name infringement according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a comparison process provided in the present embodiment;
FIG. 3 is a diagram illustrating a sub-string in a character string according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a method for obtaining a length of a longest common substring according to an embodiment of the present disclosure;
fig. 5 is a schematic flowchart of a method for determining similarity between a domain name to be maintained and a candidate domain name according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of a method for determining a weight according to an embodiment of the present disclosure;
fig. 7 is a block diagram illustrating functional modules of an automatic domain name infringement determination apparatus according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 is a schematic flowchart of an automatic determination method for domain name infringement according to an embodiment of the present disclosure. The automatic judgment method for domain name infringement comprises the following steps:
101: and acquiring the characteristic information of the domain name to be maintained.
In this embodiment, the domain name to be maintained refers to the domain name suspected of having infringement and the domain name to be maintained.
Generally, the infringement domain name is obtained by performing processes such as expansion, deletion, replacement, segmentation and the like on the basis of the domain name to be maintained, for example, for the domain name www.abcd.com to be maintained, through the expansion process, the infringement domain name may be www.efabcdhi.com; through the pruning process, its infringing domain name may be www.abc.com; through the splitting process, its infringing domain name may be www.asqbcd.com.
Meanwhile, the partially hidden infringement domain name adopts the text information related to the main body characteristics of the domain name to be maintained as the domain name main body. In this embodiment, the main characteristic of the domain name to be maintained refers to the operation range, the topic, and the like of the website corresponding to the domain name to be maintained, for example, if the operation range of the website corresponding to a certain domain name to be maintained is idle article transaction, the main characteristic of the domain name to be maintained may be "second-hand transaction". Then the infringing domain name for the domain name to be maintained may be www.ershou.com, www.2jiaoyi.com, www.zjiaoyi.com, etc.
Therefore, in this embodiment, the feature information may be domain name information composed of english letters of the domain name to be maintained itself, and some chinese character information corresponding to the main feature of the domain name to be maintained. Based on the method, the suspected infringement candidate domain name related to the domain name to be maintained can be captured more comprehensively.
102: and screening the domain name matched with the characteristic information in a preset domain name library to serve as a candidate domain name suspected of infringement.
In this embodiment, the domain names stored in the domain name library may be screened according to the feature information, so as to obtain the candidate domain names suspected of infringement. The domain name library is a database used for storing domain names existing in the Internet.
For example, when the number of characters in the feature information included in a certain domain name in the domain name library exceeds a preset value, it is determined that the domain name is matched with the feature information, and the domain name can be used as a candidate domain name suspected of infringement. And then judging whether the candidate domain name is infringing or not through comparison processing. In addition, the candidate domain name can be screened by adopting modes such as neural network model identification and the like, and the method for screening the candidate domain name is not limited in the application.
103: and comparing the domain name to be maintained with the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name.
In this embodiment, referring to fig. 2, fig. 2 is a schematic flowchart of a comparison process provided in the embodiments of the present application. The alignment treatment comprises the following steps:
201: and extracting a first characteristic character string of the domain name to be maintained.
202: and extracting a second characteristic character string of the candidate domain name.
203: and acquiring the length of the longest common substring of the first characteristic character string and the second characteristic character string.
In this embodiment, the longest common substring represents one common substring having the longest length among all common substrings of the two character strings, where a common substring represents a character string composed of consecutive and identical elements of the two character strings.
Illustratively, referring to fig. 3, fig. 3 is a schematic diagram illustrating a sub-string in a character string according to an embodiment of the present application. For a given string { a, b, c, d, e, f, g, h }, an example of its substring may be { c, d, e, f }, i.e., a string of consecutive elements c, d, e, f in the string { a, b, c, d, e, f, g, h }. For another example, the strings { a, b, c, d }, { g, h } etc. composed of consecutive elements are also substrings thereof.
Based on this, if a string { b, c, e, f, g, i, e, w } is given, the common substring between the two strings may include { b }, { c }, { b, c }, { e }, { f }, { g }, { e, f }, { f, g } and { e, f, g }, and { e, f, g } is the longest common substring between the two strings because the length of { e, f, g } is the longest.
However, for the more complex two-string character string, the number of common substrings may be very large, which makes it difficult to obtain the longest common substring and the length thereof. For this case, in the present embodiment, an algorithm of dynamic programming is given to solve the above problem.
In general, algorithms for dynamic programming are used to solve problems with some optimal nature. In such problems, there may be many possible solutions, where each solution corresponds to a value, and it is ultimately desirable to find the solution with the optimal value. The algorithm of dynamic programming is similar to the divide-and-conquer method in common calculation, and the basic idea is to decompose the problem to be solved into a plurality of sub-problems, solve the sub-problems first, and then obtain the solution of the original problem according to the solutions of the sub-problems. However, unlike the divide and conquer approach, the problem that is suitable for solving with dynamic programming, the sub-problems obtained by decomposition are often not independent of each other. Therefore, if the divide and conquer method is used to solve such problems, the number of sub-problems obtained by decomposition is too large, so that some sub-problems are repeatedly calculated many times. Therefore, if the answers of the solved subproblems can be saved and the obtained answers can be found out when needed, a large amount of repeated calculation can be avoided, and time is saved. Based on this, a table can be used to record the answers to all solved sub-questions, and regardless of whether the sub-question is used later, the results are filled into the table as long as it is calculated. This is the basic idea of the algorithm for dynamic programming.
Exemplarily, referring to fig. 4, fig. 4 is a schematic flowchart of a method for obtaining the length of the longest common substring according to an embodiment of the present application. The method adopts the idea of the dynamic programming algorithm, and can comprise the following steps:
401: and acquiring the number of characters of the first characteristic character string and the number of characters of the second characteristic character string.
402: and if the number of the characters of the first characteristic character string and/or the number of the characters of the second characteristic character string are/is 0, setting the length of the longest common substring to be 0.
403: and if the number of the characters of the first characteristic character string and the number of the characters of the second characteristic character string are both greater than 0, acquiring a tail character of the first characteristic character string as a first character, and acquiring a tail character of the second characteristic character string as a second character.
404: and if the first character and the second character are the same, setting the length of the longest common sub-string as the sum of the lengths of the first characteristic character string without the first character and the second characteristic character string without the second character.
405: and if the first character and the second character are different, setting the length of the longest common substring of the first characteristic character string without the first character and the second characteristic character string as a first length, setting the length of the longest common substring of the first characteristic character string without the second character as a second length, and setting the length of the longest common substring as the maximum value of the first length and the second length.
Therefore, the advantages of a large amount of repeated calculation can be avoided by combining with the algorithm of dynamic programming, so that the length of the longest common substring can be quickly obtained, the calculation efficiency of the similarity is further improved, and the infringement judgment efficiency is finally improved.
204: and determining the similarity between the domain name to be maintained and the candidate domain name according to the length of the longest common substring.
In this embodiment, referring to fig. 5, fig. 5 is a flowchart illustrating a method for determining similarity between a domain name to be maintained and a candidate domain name according to an embodiment of the present application. The method comprises the following steps:
501: a first length of a first feature string is obtained.
502: and acquiring a second length of the second characteristic character string.
503: and obtaining the weight according to the first length and the second length.
504: and weighting the length of the longest common substring according to the weight to obtain the similarity.
In the infringement comparison of domain names, the domain name of one party of the two parties is too long, and only very short character strings in the domain name are matched to be similar, while the domain name of the other party is too short, for example: the case where the domain name of company a is www.abcdefghiskuhdusagsa.com and the domain name of company B is www.bcd.com. In this case, although the public part of the domain names of both is large, it is obvious that it cannot be counted as infringement for B.
Therefore, in order to reduce the occurrence of unfair infringement determination under the above-mentioned conditions, in this embodiment, referring to fig. 6, fig. 6 is a flowchart illustrating a method for determining a weight according to an embodiment of the present application. The method comprises the following steps:
601: the difference between the first length and the second length and the sum of the first length and the second length are obtained.
602: and acquiring a first coefficient according to the difference value.
In the present embodiment, the smaller the difference between the first length and the second length, the larger the first coefficient.
603: and acquiring a weight according to the first coefficient and the sum of the first length and the second length.
Therefore, the proportion of the length of the overlapped part to the length of the whole character string is restrained, so that the length difference between the two parts is larger, the weight is smaller, and the accuracy of the comparison result is further improved.
In this embodiment, the feature information may include domain name information of the domain name to be maintained itself and some text information aspects that conform to the main feature of the domain name to be maintained. Therefore, in the present embodiment, the comparison process may include a first comparison process in an english dimension and a second comparison process in a chinese dimension, so as to correspond to the two kinds of feature information, respectively.
Therefore, when the comparison processing is the first comparison processing of the english dimensionality, the first characteristic character string may be an english character string of the domain name to be maintained, and the obtained similarity is called the english similarity. It should be understood that, in the present embodiment, the english character string may include special characters, such as: '/', '? ', '% ', and' ', etc. Therefore, when the special characters are encountered, the special characters can be treated as conventional English characters.
When the comparison processing is the second comparison processing of the Chinese dimensionality, the first characteristic character string can be a pinyin character string of the Chinese keyword of the domain name to be maintained, and the obtained similarity is called Chinese similarity.
104: and judging the infringement of the candidate domain name according to the similarity.
Since the comparison processing may include first comparison processing for an english dimension and second comparison processing for a chinese dimension, the processing result may also include an english similarity and a chinese similarity. Therefore, in the present embodiment, the infringement determination of the candidate domain name can be performed by:
and if the English similarity is greater than a first threshold and/or the Chinese similarity is greater than a second threshold, judging the infringement of the candidate domain name.
In summary, the automatic determination method for domain name infringement provided by the invention can quickly locate similar domain names by adopting a feature extraction and comparison mode. Meanwhile, the English domain name information comparison and the Chinese keyword information comparison are adopted, so that the comparison result is more accurate, the whole similarity calculation process is free of manual participation and is not influenced by manual subjective judgment, and therefore the infringement judgment result can be used as a basis for maintaining the right. And moreover, the accuracy of the comparison result is further improved by introducing the constraint of the proportion of the length of the overlapped part to the length of the whole character string. In addition, automatic processing of domain name infringement is realized, and a large amount of labor cost is saved.
Hereinafter, the method for automatically determining domain name infringement provided by the present invention will be described with reference to the specific embodiments.
In this embodiment, first, feature information extraction is performed on the domain name to be maintained, specifically including english domain name features and chinese keyword features. And then, searching the domain name library by a search engine in a characteristic information matching mode, screening candidate domain names from the domain name library, and carrying out infringement comparison.
Based on different dimensions of the feature information, in the present embodiment, the infringement comparison can be divided into domain name comparison processing in the english dimension and keyword comparison processing in the chinese dimension, which will be described below:
(1) and (3) domain name comparison treatment:
in the present embodiment, first, main feature extraction is performed on both domain names by comparison, and a feature character string including a main feature is acquired and invalid comparison elements are removed. For example: for domain name www.xsdjf.com, common parts such as "www.", ". com" that conform to most domain names do not characterize the domain name and there is no meaning to compare the similarity of these elements. Therefore, it is necessary to remove such elements first and retain the feature body "xsdjf" as the aligned feature string. Therefore, invalid comparison elements are removed, so that the subsequent comparison processing flow can be simplified, and the comparison efficiency is improved.
Generally, comparing similarity of english parts, the principle mainly adopted is: in a character string, a plurality of overlapping portions of a plurality of continuous letters must exist, and the more overlapping portions, the higher the similarity score. In the embodiment, the similarity between the first characteristic character string of the domain name to be maintained after feature extraction and the second characteristic character string of the candidate domain name after feature extraction is obtained to determine the length of the longest common substring.
Illustratively, the length of the longest common substring of the present application can be represented by formula (r):
Figure BDA0002814467780000101
where { x1, x2 … xi } denotes the first characteristic string, i denotes the length of the first characteristic string, { y1, y2 … yj } denotes the second characteristic string, j denotes the length of the second characteristic string, and C [ i, j ] denotes the length of the longest common sub-string of the first characteristic string and the second characteristic string.
The following describes the length of the longest common substring obtained in connection with practical examples:
for example, for the first domain name www.abcd.com, and the second domain name www.bcde.com, the characteristic strings are the first characteristic string { a, b, c, d } and the second characteristic string { b, c, d, e }, respectively. It is easy to see that the length i of the first characteristic string is 4, and the length j of the second characteristic string is 4.
Since i, j >0, xi ═ d, yj ═ e, which are not identical, the formula (r) is substituted to obtain:
C[4,4]=max{C[4,3],C[3,4]}
=max{C[3,2]+1,max{C[3,3],C[2,4]}}
=max{C[2,1]+1+1,max{max{C[3,2],C[2,3]},max{C[2,3],C[1,4]}}
=max{C[1,0]+1+1+1,max{max{C[2,1]+1,max{C[2,2],C[1,3]}},max{max{C[2,2],C[1,3]},max{C[1,3],C[0,4]}}}
=max{3,max{max{C[1,0]+1+1,max{max{C[2,1],C[1,2]},max{C[1,2],C[0,3]}}},max{max{max{C[2,1],C[1,2]},max{C[1,2],C[0,3]}},max{C[1,2],C[0,3]}}}
=max{3,max{max{2,max{max{C[1,0]+1,max{C[1,1],C[0,2]}},max{C[1,1],C[0,2]}}},max{max{max{C[1,0]+1,max{C[1,1],C[0,2]}},max{C[1,1],C[0,2]}},max{C[1,1],C[0,2]}}}
=max{3,max{max{2,max{max{1,max{C[1,0],C[0,1]}},max{C[1,0],C[0,1]}}},max{max{max{1,max{C[1,0],C[0,1]}},max{C[1,0],C[0,1]}},max{C[1,0],C[0,1]}}}
=max{3,max{max{2,max{max{1,0},0}},max{max{max{1,0},0},0}}
=3
in this embodiment, after the length of the longest common substring is obtained, the english similarity may be obtained. Illustratively, the length of the longest common substring of the present application can be represented by formula (ii):
Figure BDA0002814467780000111
wherein i represents the length of the first characteristic string, j represents the length of the second characteristic string, and G (i-j) represents a function inversely proportional to the distance difference of (i-j), i.e., the smaller (i-j), the larger G (i-j), i is an absolute value symbol.
It can be seen from the formula 2 that when the length difference between the two parties is large, under the influence of G (i-j), the value of the english similarity sim (e) is smaller, thereby avoiding the situation that in the comparison process, only very short character strings are matched out of the overlong domain name and the over-short domain name is matched out of the similarity in the whole, so that the result of similarity comparison is unfair, because the domain name of one party is overlong and the domain name of the other party is overlong. Equivalently, the proportion of the length of the overlapped part of the two parties and the length of the whole character string is restrained through comparison, so that unfairness judgment under the condition is reduced, and the accuracy of the comparison result is further improved.
(2) And (3) keyword comparison processing:
for the comparison between the keyword of the domain name to be maintained in the chinese dimension and the candidate domain name, in the present embodiment, the chinese similarity may be obtained in a manner similar to the implementation manner of the domain name comparison processing by converting the chinese keyword into the pinyin character string, and details are not repeated here.
In addition, in the present embodiment, the acquisition of the english similarity and the chinese similarity may be performed synchronously, and at the same time, as long as one of the two is greater than the corresponding threshold, the domain name infringement candidate may be determined, and the infringement determination pushing may be performed.
Meanwhile, the English domain name information comparison and the Chinese keyword information comparison are adopted, so that the comparison result is more accurate, and the whole similarity calculation process is not manually participated and is not influenced by manual subjective judgment, so that the infringement judgment result can be used as a basis for maintaining the right.
Referring to fig. 7, fig. 7 is a block diagram illustrating functional modules of an automatic determination apparatus for domain name infringement according to an embodiment of the present disclosure. The automatic judgment device for domain name infringement comprises:
the feature extraction module 11 is configured to acquire feature information of a domain name to be maintained, where the feature information includes: domain name information consisting of English letters, and Chinese character information.
And the candidate domain name determining module 12 is configured to screen a domain name matched with the feature information from a preset domain name library as a suspected infringement candidate domain name.
And the comparison module 13 is configured to compare the domain name to be maintained with the candidate domain name to obtain a similarity between the domain name to be maintained and the candidate domain name.
And the judging module 14 is used for judging the infringement of the candidate domain name according to the similarity.
In an embodiment of the present invention, the alignment module 13 is specifically configured to: and extracting a first characteristic character string of the domain name to be maintained and extracting a second characteristic character string of the candidate domain name. And acquiring the length of the longest common substring of the first characteristic character string and the second characteristic character string, and determining the similarity between the domain name to be maintained and the candidate domain name according to the length of the longest common substring.
In an embodiment of the present invention, in terms of obtaining the length of the longest common substring of the first characteristic character string and the second characteristic character string, the comparing module 13 is specifically configured to:
acquiring the number of characters of a first characteristic character string and the number of characters of a second characteristic character string;
if the number of the characters of the first characteristic character string and/or the number of the characters of the second characteristic character string are/is 0, setting the length of the longest common substring to be 0;
if the number of the characters of the first characteristic character string and the number of the characters of the second characteristic character string are both greater than 0, acquiring a tail character of the first characteristic character string as a first character, and acquiring a tail character of the second characteristic character string as a second character;
if the first character and the second character are the same, setting the length of the longest common sub-string as the sum of the lengths of the first characteristic character string without the first character and the second characteristic character string without the second character;
and if the first character and the second character are different, setting the length of the longest common substring of the first characteristic character string without the first character and the second characteristic character string as a first length, setting the length of the longest common substring of the first characteristic character string without the second character as a second length, and setting the length of the longest common substring as the maximum value of the first length and the second length.
In the embodiment of the present invention, in terms of determining the similarity between the domain name to be maintained and the candidate domain name according to the length of the longest common substring, the comparison module 13 is specifically configured to:
acquiring a first length of a first characteristic character string;
acquiring a second length of the second characteristic character string;
obtaining a weight according to the first length and the second length;
and weighting the length of the longest common substring according to the weight to obtain the similarity.
In the embodiment of the present invention, in obtaining the weight according to the first length and the second length, the comparing module 13 is specifically configured to:
obtaining a difference between the first length and the second length;
acquiring a first coefficient according to the difference, wherein the smaller the difference is, the larger the first coefficient is;
and acquiring a weight according to the first coefficient and the sum of the first length and the second length.
In the embodiment of the present invention, when the comparison processing is first comparison processing of an english dimension, the first feature character string is a domain name character string of the domain name to be protected, and the similarity is an english similarity. When the comparison processing is the second comparison processing of the Chinese dimensionality, the first characteristic character string is a pinyin character string of the Chinese keyword of the domain name to be protected, and the similarity is the Chinese similarity.
Based on this, in the embodiment of the present invention, the determination module 14 is specifically configured to: and if the English similarity is greater than a first threshold and/or the Chinese similarity is greater than a second threshold, judging the infringement of the candidate domain name.
It should be understood that the automatic determination device for domain name infringement in the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile Internet device MID (MID), a wearable device, or the like. The above automatic determination device for domain name infringement is merely an example, and is not exhaustive, and includes, but is not limited to, the above automatic determination device for domain name infringement. In practical applications, the automatic determination apparatus for domain name infringement may further include: intelligent vehicle-mounted terminal, computer equipment and the like.
Embodiments of the present application also provide an electronic device that includes a processor, a memory, a communication interface, and one or more programs. Wherein one or more programs are stored in the memory and configured to be executed by the processor to implement the method for automated determination of domain name infringement provided by the foregoing embodiments or implementations of the invention.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention can be implemented by combining software and a hardware platform. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments.
Accordingly, the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program, and the computer program is executed by a processor to implement part or all of the steps of any one of the automatic domain name infringement determination methods described in the above method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the above-described method embodiments of automated determination of domain name infringement methods.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required by the application.
In the above embodiments, the description of each embodiment has its own emphasis, and for parts not described in detail in a certain embodiment, reference may be made to the description of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, and the memory may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the methods and their core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. An automated method for determining domain name infringement, the method comprising:
acquiring feature information of a domain name to be maintained, wherein the feature information comprises: domain name information consisting of English letters, and Chinese character information;
screening the domain name matched with the characteristic information from a preset domain name library to serve as a candidate domain name suspected of infringing;
comparing the domain name to be maintained with the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name;
and judging the infringement of the candidate domain name according to the similarity.
2. The method according to claim 1, wherein the comparing the domain name to be maintained and the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name comprises:
extracting a first characteristic character string of the domain name to be maintained;
extracting a second characteristic character string of the candidate domain name;
acquiring the length of the longest common substring of the first characteristic character string and the second characteristic character string;
and determining the similarity between the domain name to be maintained and the candidate domain name according to the length of the longest public substring.
3. The method of claim 2, wherein obtaining the length of the longest common substring of the first and second characteristic strings comprises:
acquiring the number of characters of the first characteristic character string and the number of characters of the second characteristic character string;
if the number of the characters of the first characteristic character string and/or the number of the characters of the second characteristic character string are/is 0, setting the length of the longest common substring to be 0;
if the number of the characters of the first characteristic character string and the number of the characters of the second characteristic character string are both greater than 0, acquiring a tail character of the first characteristic character string as a first character, and acquiring a tail character of the second characteristic character string as a second character;
if the first character and the second character are the same, setting the length of the longest common sub-string as the sum of the lengths of the longest common sub-string of the first characteristic character string without the first character and the longest common sub-string of the second characteristic character string without the second character;
and if the first character and the second character are different, taking the length of the longest common sub-string of the first characteristic character string without the first character and the second characteristic character string without the second character as a first length, taking the length of the longest common sub-string of the first characteristic character string with the second characteristic character string without the second character as a second length, and setting the length of the longest common sub-string as the maximum value of the first length and the second length.
4. The method according to claim 3, wherein the determining the similarity between the domain name to be maintained and the candidate domain name according to the length of the longest common substring comprises:
acquiring a first length of the first characteristic character string;
acquiring a second length of the second characteristic character string;
obtaining a weight according to the first length and the second length;
and carrying out weighting processing on the length of the longest common substring according to the weight value to obtain the similarity.
5. The method of claim 4, wherein obtaining the weight according to the first length and the second length comprises:
obtaining a difference value between the first length and the second length and a sum of the first length and the second length;
obtaining a first coefficient according to the difference, wherein the smaller the difference is, the larger the first coefficient is;
and acquiring the weight according to the first coefficient and the sum of the first length and the second length.
6. The method according to any one of claims 2 to 5,
when the comparison processing is English dimensionality first comparison processing, the first characteristic character string is an English character string of the domain name to be maintained, and the similarity is English similarity;
when the comparison processing is the second comparison processing of the Chinese dimensionality, the first characteristic character string is a pinyin character string of the Chinese keyword of the domain name to be maintained, and the similarity is the Chinese similarity.
7. The method of claim 6, wherein the determining infringement of the candidate domain name according to the similarity comprises:
and if the English similarity is greater than a first threshold and/or the Chinese similarity is greater than a second threshold, judging that the candidate domain name infringes.
8. An apparatus for automated determination of domain name infringement, the apparatus comprising:
the feature extraction module is configured to acquire feature information of a domain name to be maintained, where the feature information includes: domain name information consisting of English letters, and Chinese character information;
the candidate domain name determining module is used for screening the domain name matched with the characteristic information from a preset domain name library to serve as a suspected infringement candidate domain name;
the comparison module is used for comparing the domain name to be maintained with the candidate domain name to obtain the similarity between the domain name to be maintained and the candidate domain name;
and the judging module is used for judging the infringement of the candidate domain name according to the similarity.
9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the one or more programs including instructions for performing the steps in the method of any of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-7.
CN202011393629.9A 2020-12-03 2020-12-03 Automatic determination method and device for domain name infringement, electronic equipment and storage medium Pending CN112507176A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011393629.9A CN112507176A (en) 2020-12-03 2020-12-03 Automatic determination method and device for domain name infringement, electronic equipment and storage medium
PCT/CN2021/082729 WO2022116419A1 (en) 2020-12-03 2021-03-24 Automatic determination method and apparatus for domain name infringement, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011393629.9A CN112507176A (en) 2020-12-03 2020-12-03 Automatic determination method and device for domain name infringement, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112507176A true CN112507176A (en) 2021-03-16

Family

ID=74969271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011393629.9A Pending CN112507176A (en) 2020-12-03 2020-12-03 Automatic determination method and device for domain name infringement, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112507176A (en)
WO (1) WO2022116419A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022116419A1 (en) * 2020-12-03 2022-06-09 平安科技(深圳)有限公司 Automatic determination method and apparatus for domain name infringement, electronic device, and storage medium
CN114710468A (en) * 2022-03-31 2022-07-05 绿盟科技集团股份有限公司 Domain name generation and identification method, device, equipment and medium

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115841113B (en) * 2023-02-24 2023-05-12 山东云天安全技术有限公司 Domain name label detection method, storage medium and electronic equipment
CN117271499A (en) * 2023-11-17 2023-12-22 威海市驰云网络科技有限公司 Wi-Fi geographic positioning datum point data cleaning method in IP technology

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428307B (en) * 2013-08-09 2016-07-20 中国科学院计算机网络信息中心 Counterfeit domain name detection method and equipment
CN106330811A (en) * 2015-06-15 2017-01-11 中兴通讯股份有限公司 Domain name credibility determination method and device
GB2555801A (en) * 2016-11-09 2018-05-16 F Secure Corp Identifying fraudulent and malicious websites, domain and subdomain names
CN110958244A (en) * 2019-11-29 2020-04-03 北京邮电大学 Method and device for detecting counterfeit domain name based on deep learning
CN112507176A (en) * 2020-12-03 2021-03-16 平安科技(深圳)有限公司 Automatic determination method and device for domain name infringement, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022116419A1 (en) * 2020-12-03 2022-06-09 平安科技(深圳)有限公司 Automatic determination method and apparatus for domain name infringement, electronic device, and storage medium
CN114710468A (en) * 2022-03-31 2022-07-05 绿盟科技集团股份有限公司 Domain name generation and identification method, device, equipment and medium
CN114710468B (en) * 2022-03-31 2024-05-14 绿盟科技集团股份有限公司 Domain name generation and identification method, device, equipment and medium

Also Published As

Publication number Publication date
WO2022116419A1 (en) 2022-06-09

Similar Documents

Publication Publication Date Title
US11194965B2 (en) Keyword extraction method and apparatus, storage medium, and electronic apparatus
CN112507176A (en) Automatic determination method and device for domain name infringement, electronic equipment and storage medium
CN105138652B (en) A kind of enterprise's incidence relation recognition methods and system
CN110929125B (en) Search recall method, device, equipment and storage medium thereof
CN111737499B (en) Data searching method based on natural language processing and related equipment
WO2022116418A1 (en) Method and apparatus for automatically determining trademark infringement, electronic device, and storage medium
US20110106805A1 (en) Method and system for searching multilingual documents
CN111831804B (en) Method and device for extracting key phrase, terminal equipment and storage medium
CN103885937A (en) Method for judging repetition of enterprise Chinese names on basis of core word similarity
CN110569350B (en) Legal recommendation method, equipment and storage medium
CN108304377B (en) Extraction method of long-tail words and related device
CN110737821B (en) Similar event query method, device, storage medium and terminal equipment
CN110765760B (en) Legal case distribution method and device, storage medium and server
CN109446299B (en) Method and system for searching e-mail content based on event recognition
CN116882372A (en) Text generation method, device, electronic equipment and storage medium
TW202123026A (en) Data archiving method, device, computer device and storage medium
CN107862016A (en) A kind of collocation method of the thematic page
CN112148837A (en) Maintenance scheme acquisition method, device, equipment and storage medium
CN115374793B (en) Voice data processing method based on service scene recognition and related device
CN114817518B (en) License handling method, system and medium based on big data archive identification
CN107992501B (en) Social network information identification method, processing method and device
CN115905885A (en) Data identification method, device, storage medium and program product
CN114706948A (en) News processing method and device, storage medium and electronic equipment
CN106649367B (en) Method and device for detecting keyword popularization degree
CN113129057A (en) Software cost information processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40041506

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination