CN107609059B - Chinese domain name similarity measurement method based on J-W distance - Google Patents

Chinese domain name similarity measurement method based on J-W distance Download PDF

Info

Publication number
CN107609059B
CN107609059B CN201710749659.0A CN201710749659A CN107609059B CN 107609059 B CN107609059 B CN 107609059B CN 201710749659 A CN201710749659 A CN 201710749659A CN 107609059 B CN107609059 B CN 107609059B
Authority
CN
China
Prior art keywords
domain name
chinese
str
character
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710749659.0A
Other languages
Chinese (zh)
Other versions
CN107609059A (en
Inventor
龙华
祁俊辉
邵玉斌
杜庆治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201710749659.0A priority Critical patent/CN107609059B/en
Publication of CN107609059A publication Critical patent/CN107609059A/en
Application granted granted Critical
Publication of CN107609059B publication Critical patent/CN107609059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to a Chinese domain name similarity measurement method based on J-W distance, belonging to the technical field of network security. The method maps the Chinese character after being coded into a string of digital character strings through a Unicode Chinese character stroke sequence table, and simultaneously creatively introduces a Jaro-Winner Distance algorithm in the field of machine learning to be combined with the longest public substring so as to measure the similarity of the Chinese domain name. Firstly, acquiring a domain name to be detected and a target domain name, and initializing the domain name to be detected and the target domain name to generate a domain name main body; secondly, coding the domain name main body according to a Unicode Chinese character stroke sequence table to generate a digital character string which is used as an input of a Jaro-Winner Distance algorithm to generate a detection matrix; then, the similarity of the digital character strings is calculated according to the relevant rules by combining with the longest public substring of the digital character strings, and the similarity of the digital character strings can effectively represent the similarity between Chinese characters.

Description

Chinese domain name similarity measurement method based on J-W distance
Technical Field
The invention relates to a Chinese domain name similarity measurement method based on J-W distance, belonging to the technical field of network security.
Background
With the development and popularization of the internet, the chinese domain name gradually becomes an important component of the internationalized domain name, and meanwhile, domain name counterfeiting attacks against the chinese domain name are increasing, and the counterfeiting forms of the domain name are increasingly complex. Because Chinese characters have many shapes and are close to characters, and the habit of fast reading of people is added, visual misjudgment is inevitably caused to a certain degree.
The traditional domain name similarity measurement method can only be applied to similarity measurement of English domain names, but the effect is not obvious for Chinese domain names. Moreover, at present, domestic related research on Chinese domain name similarity measurement is relatively deficient, and research results are relatively few.
At present, most Chinese domain name similarity measurement methods calculate the similarity of Chinese characters according to single characters and overall similarity, so that the methods have certain defects in time complexity or accuracy, and no specific implementation algorithm exists for calculating the single character similarity or the overall similarity.
Disclosure of Invention
The invention aims to solve the technical problem of limitation and deficiency of the prior art and provides a Chinese domain similarity measurement method based on J-W Distance. Compared with the Chinese domain name similarity measurement method in the prior art, the method mainly solves the problems of insufficient accuracy, poor efficiency and the like in the prior art, and aims to improve the accuracy and the timeliness of the Chinese domain name similarity measurement in the prior art.
The technical scheme of the invention is as follows: a Chinese domain name similarity measurement method based on J-W distance comprises the following specific steps:
step 1: acquiring a domain name X to be detected and a target domain name Y;
step 2: the domain name X to be detected and the target domain name Y are given a dot sign ". or a period". ' splitting, ignoring network name and domain suffix, reserving domain subject and generating Chinese character set x: { x1,x2…xp{ y: } and y: { y1,y2…yq};
Step 3: traversing the domain name main body Chinese character set x obtained in Step2 according to the Unicode Chinese character stroke sequence table1,x2…xp{ y: } and y: { y1,y2…yqFor each Chinese character x according to the aggregate character orderi,i∈[1,p]Or yi,i∈[1,q]Searching the stroke sequence of the corresponding Chinese character, converting according to the corresponding coding rule, and generating the coded character string str of the main body of the domain name X to be detectedxAnd the code string str of the main body of the target domain name Y domain nameyAnd obtaining the code string strxAnd stryLength of (len)xAnd leny
Step4.1: subjecting the main domain name coding character string str of the domain name X to be detected and the target domain name YxAnd stryAs input to the J-W algorithm, and generates a detection matrix
Figure BDA0001390841210000021
Step4.2: the matching window value MW is calculated according to equation (1):
Figure BDA0001390841210000022
step4.3: by a detection matrix
Figure BDA0001390841210000023
And a matching window value MW, calculating the number m of matched characters and the number n of replaced matched characters according to the relevant rules;
step4.4: the number m of the matched characters and the number n of the conversion digits of the matched characters are calculated by Step4.3, and the domain name main body coding character string str of the domain name X to be detected and the target domain name Y is calculated according to a formula (2)xAnd stryJaro Distance of (1):
Figure BDA0001390841210000024
step4.5: acquiring a domain name main body coding character string str of a domain name X to be detected and a target domain name YxAnd stryThe longest common substring strxyAnd obtain the length len thereofxy
Step4.6: further calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (3)xAnd stryJaro-Winkler Distance of (1):
Figure BDA0001390841210000025
wherein, btTo determine whether further computation of the threshold is required, p is a scaling factor.
The domain name X to be detected and the target domain name Y in Step1 may be a primary domain name or a secondary domain name.
In Step2, if the domain name X to be detected and the target domain name Y are primary domain names, only the domain name suffix needs to be ignored. In addition, since the chinese domain name is not yet popular, some domain names may not need the "www" network name during registration, and at this time, the initialization process of Step2 can be adjusted accordingly, and in short, the next Step can be performed only by extracting the domain name body.
In the Step1, the domain name X to be detected and the target domain name Y need to be consistent with the conventional domain name, namely, after the Step2 is initialized, a Chinese character set X of the domain name main body is generated1,x2…xp{ y: } and y: { y1,y2…yqThe requirements are satisfied:
p,q∈N+
similarly, after the encoding process of Step3, an encoded string str is generatedxAnd stryLength of (len)xAnd lenyThe following requirements should be satisfied:
lenx,leny∈N+
the Unicode Chinese character stroke sequence table in Step3 is 1,2, 3,4, 5 for coding the Chinese character stroke sequence of horizontal, vertical, left falling, right falling and turning into numbers, and all Chinese characters are processed according to the coding
According to the encoding rule in Step3, a null character string is generated first, and then a main Chinese character set X of the domain name X to be compared is formed1,x2…xpGet over each Chinese character x according to the sequencei,i∈[1,p]Searching the stroke sequence of the corresponding Chinese character according to the stroke sequence table of the Unicode Chinese character, adding the stroke sequence to the tail part of the character string, and processing all the elements in the set to obtain the character string which is the coded character string str of the main body of the domain name X to be detectedxSimilarly, the target domain name Y is processed by the method, so that the coded character string str of the main body of the target domain name Y can be generatedy
Calculating the number m of matched characters in the step Step4.1, if the character string str is codedxAnd stryIf the difference distance between the same characters is smaller than the matching window value MW, the characters are considered to be matched; it should be noted, however, that in the matching process, it is excludedIf the matched character is found, skipping the matching and matching the next character;
for the calculation of the number n of the conversion bits of the matched character, the code character string str is needed to be looked atxAnd stryIf the sequence of the matched character set is consistent, half of the transposition number is the transposition number n of the matched character; in addition, the number m of matched characters and the number n of transposed matched characters should satisfy the following requirements:
Figure BDA0001390841210000031
in said step Step4.6 a further threshold value b is calculatedtThe value is 0.7, and small-amplitude adjustment can be performed according to the actual detection result, mainly for improving the detection accuracy; the value of the scaling factor p is 0.1, and small-amplitude adjustment can be performed according to an actual detection result, mainly to avoid the situation that the final calculation result is greater than 1, but the method adds a new code string strxAnd stryReciprocal of the longest distance in
Figure BDA0001390841210000032
Improving the calculation formula here
Figure BDA0001390841210000033
The value of the scaling factor p has little influence on the final calculation result.
Dis calculated in Steps Step4.4 and Step4.6jAnd DisjwThe following requirements should be met:
Figure BDA0001390841210000034
if not, indicating that the calculation is wrong and needing to be recalculated; if the domain name is satisfied, the closer the value is to 1, the more similar the domain name X to be detected and the target domain name Y are.
Usually, the domain name X to be detected needs to be a set of target domain names Y1,Y2…YkCarry out similarity calculation for extractionHigh detection rate, the target domain name Yi,i∈[1,k]Its code character string can be calculated in advance
Figure BDA0001390841210000035
And storing the data into a database, and directly calling the database when the data is needed to be used.
The invention has the beneficial effects that: the method comprises the steps of mapping a coded Chinese character into a string of digital character strings through a Unicode Chinese character stroke sequence table, innovatively introducing a Jaro-Window Distance algorithm in the field of machine learning, combining the Jaro-Window Distance algorithm with a longest public substring, and further performing similarity measurement on the Chinese domain name. Firstly, acquiring a domain name to be detected and a target domain name, and initializing the domain name to be detected and the target domain name to generate a domain name main body; secondly, coding the domain name main body according to a Unicode Chinese character stroke sequence table to generate a digital character string which is used as an input of a Jaro-Winner Distance algorithm to generate a detection matrix; then, the similarity of the digital character strings is calculated according to the relevant rules by combining with the longest public substring of the digital character strings, and the similarity of the digital character strings can effectively represent the similarity between Chinese characters. Compared with the prior art, the method mainly solves the problems of insufficient accuracy, poor efficiency and the like in the prior art, and aims to improve the accuracy and the timeliness of the similarity measurement of the Chinese domain name at present.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
Example 1: as shown in fig. 1, a method for measuring similarity of a chinese domain name based on J-W distance includes the following steps:
step 1: acquiring a domain name X to be detected and a target domain name Y;
step 2: the domain name X to be detected and the target domain name Y are given a dot sign ". or a period". ' splitting, ignoring network name and domain suffix, reserving domain subject and generating Chinese character set x: { x1,x2…xp{ y: } and y: { y1,y2…yq};
Step 3: traversing the domain name main body Chinese character set x obtained in Step2 according to the Unicode Chinese character stroke sequence table1,x2…xp{ y: } and y: { y1,y2…yqFor each Chinese character x according to the aggregate character orderi,i∈[1,p]Or yi,i∈[1,q]Searching the stroke sequence of the corresponding Chinese character, converting according to the corresponding coding rule, and generating the coded character string str of the main body of the domain name X to be detectedxAnd the code string str of the main body of the target domain name Y domain nameyAnd obtaining the code string strxAnd stryLength of (len)xAnd leny
Step4.1: subjecting the main domain name coding character string str of the domain name X to be detected and the target domain name YxAnd stryAs input to the J-W algorithm, and generates a detection matrix
Figure BDA0001390841210000041
Step4.2: the matching window value MW is calculated according to equation (1):
Figure BDA0001390841210000042
step4.3: by a detection matrix
Figure BDA0001390841210000043
And a matching window value MW, calculating the number m of matched characters and the number n of replaced matched characters according to the relevant rules;
step4.4: the number m of the matched characters and the number n of the conversion digits of the matched characters are calculated by Step4.3, and the domain name main body coding character string str of the domain name X to be detected and the target domain name Y is calculated according to a formula (2)xAnd stryJaro Distance of (1):
Figure BDA0001390841210000051
step4.5: acquiring a domain name main body coding character string str of a domain name X to be detected and a target domain name YxAnd stryThe longest common substring strxyAnd obtain the length len thereofxy
Step4.6: further calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (3)xAnd stryJaro-Winkler Distance of (1):
Figure BDA0001390841210000052
wherein, btTo determine whether further computation of the threshold is required, p is a scaling factor.
The domain name X to be detected and the target domain name Y in Step1 may be a primary domain name or a secondary domain name.
In Step2, if the domain name X to be detected and the target domain name Y are primary domain names, only the domain name suffix needs to be ignored. In addition, since the chinese domain name is not yet popular, some domain names may not need the "www" network name during registration, and at this time, the initialization process of Step2 can be adjusted accordingly, and in short, the next Step can be performed only by extracting the domain name body.
In the Step1, the domain name X to be detected and the target domain name Y need to be consistent with the conventional domain name, namely, after the Step2 is initialized, a Chinese character set X of the domain name main body is generated1,x2…xp{ y: } and y: { y1,y2…yqThe requirements are satisfied:
p,q∈N+
similarly, after the encoding process of Step3, an encoded string str is generatedxAnd stryLength of (len)xAnd lenyThe following requirements should be satisfied:
lenx,leny∈N+
the Unicode Chinese character stroke sequence table in Step3 is 1,2, 3,4, 5 for coding the Chinese character stroke sequence of horizontal, vertical, left falling, right falling and turning into numbers, and all Chinese characters are processed according to the coding
The encoding rule in Step3 is to generate a null character string and then treat the comparisonDomain name X Domain name Main Chinese character set X: { X1,x2…xpGet over each Chinese character x according to the sequencei,i∈[1,p]Searching the stroke sequence of the corresponding Chinese character according to the stroke sequence table of the Unicode Chinese character, adding the stroke sequence to the tail part of the character string, and processing all the elements in the set to obtain the character string which is the coded character string str of the main body of the domain name X to be detectedxSimilarly, the target domain name Y is processed by the method, so that the coded character string str of the main body of the target domain name Y can be generatedy
Calculating the number m of matched characters in the step Step4.1, if the character string str is codedxAnd stryIf the difference distance between the same characters is smaller than the matching window value MW, the characters are considered to be matched; however, it should be noted that in the matching process, the matched character needs to be excluded, and if the matched character is found, the matching needs to be skipped for the next character matching;
for the calculation of the number n of the conversion bits of the matched character, the code character string str is needed to be looked atxAnd stryIf the sequence of the matched character set is consistent, half of the transposition number is the transposition number n of the matched character; in addition, the number m of matched characters and the number n of transposed matched characters should satisfy the following requirements:
Figure BDA0001390841210000061
in said step Step4.6 a further threshold value b is calculatedtThe value is 0.7, and small-amplitude adjustment can be performed according to the actual detection result, mainly for improving the detection accuracy; the value of the scaling factor p is 0.1, and small-amplitude adjustment can be performed according to an actual detection result, mainly to avoid the situation that the final calculation result is greater than 1, but the method adds a new code string strxAnd stryReciprocal of the longest distance in
Figure BDA0001390841210000062
Improving the calculation formula here
Figure BDA0001390841210000063
The value of the scaling factor p has little influence on the final calculation result.
Dis calculated in Steps Step4.4 and Step4.6jAnd DisjwThe following requirements should be met:
Figure BDA0001390841210000064
if not, indicating that the calculation is wrong and needing to be recalculated; if the domain name is satisfied, the closer the value is to 1, the more similar the domain name X to be detected and the target domain name Y are.
Example 2: the calculation of the number m of matched characters and the number n of transposed characters will be further explained on the basis of embodiment 1. Assuming that the domain main bodies of the domain name X to be detected and the target domain name Y are respectively ' treatment ' and ' treatment ', searching corresponding Chinese character codes ' treatment ' 44154251 ', ' treatment ' 4134112534 ' and ' treatment ' 4154251 ' through a Unicode Chinese character stroke sequence table, and generating a code character string strx、stryRespectively "441542514134112534" and "41542514134112534".
Calculating a matching window value MW:
Figure BDA0001390841210000065
binding detection matrix I (X, Y)18×17Calculating the number m of matched characters and the number n of transposed matched characters:
Figure BDA0001390841210000066
Figure BDA0001390841210000071
as shown in the above table (matrix): the value of "/" in the table (matrix) indicates that the matching window value MW is exceeded, and whether the matching window value MW is matched or not is not considered; "1" indicates that the corresponding column value matches the row value; a "0" indicates that the corresponding column value does not match the row value.
To sum up, the number of matching characters m is 17, the matching character set is {4,4,1,5,4,2,5,1,4,1,3,4,1,1,2,5,3}, and the code string str is a string of code charactersyThe number of transpositions is 15 for "41542514134112534", so that the number of transpositions n of the resulting matched character is 7.
Example 3: the practice of the present invention is further illustrated on the basis of example 1. Assuming that the domain name X to be detected and the target domain name Y are 'today's science and technology, China 'command's science and technology, respectively, the initialized domain name main bodies are 'today's science and technology 'command' technology, searching corresponding Chinese character codes through a Unicode Chinese character stroke sequence table, and generating a code character string str according to rulesx、stryRespectively "344525113123444121211254" and "3445425113123444121211254".
Calculating a matching window value MW:
Figure BDA0001390841210000072
binding detection matrix I (X, Y)24×25The number m of the matched characters obtained by calculation is 24, and the number n of the replaced matched characters is 8.
Calculating the code string strx、stryJaro Distance of (1):
Figure BDA0001390841210000073
longest common substring strxyLength of (len)xyThe code string str is further calculated 20x、stryJaro-Winkler Distance of (1):
Figure BDA0001390841210000081
the result shows that human eyes of the domain name X to be detected and the target domain name Y look similar, the result obtained by calculation of the invention also accords with the human eye detection effect, the defects that a counterfeited website maker replaces one Chinese character with an approximate Chinese character, the judgment accuracy rate is low, the efficiency is low and the like in the prior art are effectively prevented, and the method is more humanized in practical application.
Example 4: on the basis of embodiment 3, suppose that the domain name X to be detected and the target domain name Y are "science and technology of this day" and "science and technology of china" of this day ", respectively, and the final calculation result Jaro-WinklerDistance is calculated by the steps described in embodiment 4:
Figure BDA0001390841210000082
by combining the present example and the embodiment 3, it is comprehensively shown that the method for judging the similarity of the Chinese domain names has good implementation effect and almost the same result as the result judged by human eyes, effectively prevents the counterfeiter from replacing one Chinese character with an approximate Chinese character, has the defects of low accuracy, low efficiency and the like of the judgment of the prior art, and is more humanized in practical application.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.

Claims (8)

1. A Chinese domain name similarity measurement method based on J-W distance is characterized by comprising the following steps:
step 1: acquiring a domain name X to be detected and a target domain name Y;
step 2: the domain name X to be detected and the target domain name Y are given a dot sign ". or a period". ' splitting, ignoring network name and domain suffix, reserving domain subject and generating Chinese character set x: { x1,x2…xp{ y: } and y: { y1,y2…yq};
Step 3: traversing the domain name main body Chinese character set x obtained in Step2 according to the Unicode Chinese character stroke sequence table1,x2…xp{ y: } and y: { y1,y2…yqFor each Chinese character x according to the aggregate character orderi,i∈[1,p]Or yi,i∈[1,q]Finding out the stroke order of corresponding Chinese characters according to the corresponding codesThe code rules are converted to generate a code character string str of the main body of the domain name X to be detectedxAnd the code string str of the main body of the target domain name Y domain nameyAnd obtaining the code string strxAnd stryLength of (len)xAnd leny
Step4.1: subjecting the main domain name coding character string str of the domain name X to be detected and the target domain name YxAnd stryAs input to the J-W algorithm, and generates a detection matrix
Figure FDA0002480220370000011
Step4.2: the matching window value MW is calculated according to equation (1):
Figure FDA0002480220370000012
step4.3: by a detection matrix
Figure FDA0002480220370000013
And a matching window value MW, calculating the number m of matched characters and the number n of replaced matched characters according to the relevant rules; for the calculation of the number m of matched characters, if the character string str is codedxAnd stryIf the difference distance between the same characters is smaller than the matching window value MW, the characters are considered to be matched; in the matching process, the matched characters need to be excluded, if the matched characters are found, the matching needs to be skipped out, and the matching of the next character is carried out;
for the calculation of the number n of the conversion bits of the matched character, the code character string str is needed to be looked atxAnd stryIf the sequence of the matched character set is consistent, half of the transposition number is the transposition number n of the matched character; the number m of matched characters and the number n of transposed matched characters should satisfy the following requirements:
Figure FDA0002480220370000014
step4.4: the number m of the matched characters and the transposition number of the matched characters are calculated by Step4.3n, calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (2)xAnd stryJaro Distance of (1):
Figure FDA0002480220370000015
step4.5: acquiring a domain name main body coding character string str of a domain name X to be detected and a target domain name YxAnd stryThe longest common substring strxyAnd obtain the length len thereofxy
Step4.6: further calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (3)xAnd stryJaro-Winkler Distance of (1):
Figure FDA0002480220370000021
wherein, btTo determine whether further computation of the threshold is required, p is a scaling factor.
2. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: the domain name X to be detected and the target domain name Y in Step1 may be a primary domain name or a secondary domain name.
3. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: in Step2, if the domain name X to be detected and the target domain name Y are primary domain names, only the domain name suffix needs to be ignored.
4. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: in the Step1, the domain name X to be detected and the target domain name Y need to be consistent with the conventional domain name, namely, after the Step2 is initialized, a Chinese character set X of the domain name main body is generated1,x2…xp{ y: } and y: { y1,y2…yqThe requirements are satisfied:
p,q∈N+
similarly, after the encoding process of Step3, an encoded string str is generatedxAnd stryLength of (len)xAnd lenyThe following requirements should be satisfied:
lenx,leny∈N+
5. the J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: the Unicode Chinese character stroke sequence table in Step3 is 1,2, 3,4 and 5 for coding the Chinese character stroke sequence of horizontal, vertical, left falling, right falling and turning into numbers, and all Chinese characters are coded according to the coding.
6. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: according to the encoding rule in Step3, a null character string is generated first, and then a main Chinese character set X of the domain name X to be compared is formed1,x2…xpGet over each Chinese character x according to the sequencei,i∈[1,p]Searching the stroke sequence of the corresponding Chinese character according to the stroke sequence table of the Unicode Chinese character, adding the stroke sequence to the tail part of the character string, and processing all the elements in the set to obtain the character string which is the coded character string str of the main body of the domain name X to be detectedxSimilarly, the target domain name Y is processed by the method, so that the coded character string str of the main body of the target domain name Y can be generatedy
7. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: in said step Step4.6 a further threshold value b is calculatedtThe value is 0.7, and the value of the scaling factor p is 0.1.
8. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: dis calculated in Steps Step4.4 and Step4.6jAnd DisjwThe following requirements should be met:
Figure FDA0002480220370000031
if not, indicating that the calculation is wrong and needing to be recalculated; if the domain name is satisfied, the closer the value is to 1, the more similar the domain name X to be detected and the target domain name Y are.
CN201710749659.0A 2017-08-28 2017-08-28 Chinese domain name similarity measurement method based on J-W distance Active CN107609059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710749659.0A CN107609059B (en) 2017-08-28 2017-08-28 Chinese domain name similarity measurement method based on J-W distance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710749659.0A CN107609059B (en) 2017-08-28 2017-08-28 Chinese domain name similarity measurement method based on J-W distance

Publications (2)

Publication Number Publication Date
CN107609059A CN107609059A (en) 2018-01-19
CN107609059B true CN107609059B (en) 2020-10-20

Family

ID=61056242

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710749659.0A Active CN107609059B (en) 2017-08-28 2017-08-28 Chinese domain name similarity measurement method based on J-W distance

Country Status (1)

Country Link
CN (1) CN107609059B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807087B (en) * 2020-06-16 2023-11-28 中国电信股份有限公司 Method and device for detecting similarity of website domain names
CN112395877A (en) * 2020-11-04 2021-02-23 苏宁云计算有限公司 Character string detection method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999040517A1 (en) * 1998-02-09 1999-08-12 Ibi Co., Ltd. Method for connection for computer network on internet by real name and computer network system thereof
CN103399907A (en) * 2013-07-31 2013-11-20 深圳市华傲数据技术有限公司 Method and device for calculating similarity of Chinese character strings on the basis of edit distance
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
CN106170002A (en) * 2016-09-08 2016-11-30 中国科学院信息工程研究所 A kind of Chinese counterfeit domain name detection method and system
CN106375288A (en) * 2016-08-29 2017-02-01 中国科学院信息工程研究所 Chinese domain name similarity calculation method and counterfeit domain name detection method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999040517A1 (en) * 1998-02-09 1999-08-12 Ibi Co., Ltd. Method for connection for computer network on internet by real name and computer network system thereof
CN103399907A (en) * 2013-07-31 2013-11-20 深圳市华傲数据技术有限公司 Method and device for calculating similarity of Chinese character strings on the basis of edit distance
CN103428307A (en) * 2013-08-09 2013-12-04 中国科学院计算机网络信息中心 Method and equipment for detecting counterfeit domain names
CN106375288A (en) * 2016-08-29 2017-02-01 中国科学院信息工程研究所 Chinese domain name similarity calculation method and counterfeit domain name detection method
CN106170002A (en) * 2016-09-08 2016-11-30 中国科学院信息工程研究所 A kind of Chinese counterfeit domain name detection method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于文本聚类的网络攻击检测方法》;杨晓峰 等;《智能系统学报》;20140228;第9卷(第1期);全文 *
高效的基于段模式的恶意 URL 检测方法;林海伦 等;《通信学报》;20151130;第36卷(第Z1期);全文 *

Also Published As

Publication number Publication date
CN107609059A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN108154167B (en) Chinese character font similarity calculation method
CN110909548B (en) Chinese named entity recognition method, device and computer readable storage medium
WO2018040899A1 (en) Error correction method and device for search term
CN112468501B (en) URL-oriented phishing website detection method
CN109344263B (en) Address matching method
CN111382298B (en) Image retrieval method and device based on picture content and electronic equipment
CN108182401B (en) Safe iris identification method based on aggregated block information
CN105808709A (en) Quick retrieval method and device of face recognition
CN107609059B (en) Chinese domain name similarity measurement method based on J-W distance
WO2019201295A1 (en) File identification method and feature extraction method
CN111680480A (en) Template-based job approval method and device, computer equipment and storage medium
CN111639183A (en) Financial industry consensus public opinion analysis method and system based on deep learning algorithm
CN113343025B (en) Sparse attack resisting method based on weighted gradient Hash activation thermodynamic diagram
CN114973229A (en) Text recognition model training method, text recognition device, text recognition equipment and medium
CN107679029B (en) English domain name similarity detection method
CN104021184B (en) A kind of localization method and system
CN113094465A (en) Method and system for checking duplicate of design product
CN116975864A (en) Malicious code detection method and device, electronic equipment and storage medium
CN116662388A (en) Efficient hidden query method and system
CN114866246B (en) Computer network security intrusion detection method based on big data
CN107464268A (en) A kind of joint coding method using global and local feature
CN107967472A (en) A kind of search terms method encoded using dynamic shape
KR20220152167A (en) A system and method for detecting phishing-domains in a set of domain name system(dns) records
CN108171115A (en) A kind of incompleteness English word recognition methods
CN108874978A (en) One method that conference content abstract task is solved based on layering adaptability segmented network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant