CN107609059B - Chinese domain name similarity measurement method based on J-W distance - Google Patents
Chinese domain name similarity measurement method based on J-W distance Download PDFInfo
- Publication number
- CN107609059B CN107609059B CN201710749659.0A CN201710749659A CN107609059B CN 107609059 B CN107609059 B CN 107609059B CN 201710749659 A CN201710749659 A CN 201710749659A CN 107609059 B CN107609059 B CN 107609059B
- Authority
- CN
- China
- Prior art keywords
- domain name
- chinese
- str
- character
- detected
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention relates to a Chinese domain name similarity measurement method based on J-W distance, belonging to the technical field of network security. The method maps the Chinese character after being coded into a string of digital character strings through a Unicode Chinese character stroke sequence table, and simultaneously creatively introduces a Jaro-Winner Distance algorithm in the field of machine learning to be combined with the longest public substring so as to measure the similarity of the Chinese domain name. Firstly, acquiring a domain name to be detected and a target domain name, and initializing the domain name to be detected and the target domain name to generate a domain name main body; secondly, coding the domain name main body according to a Unicode Chinese character stroke sequence table to generate a digital character string which is used as an input of a Jaro-Winner Distance algorithm to generate a detection matrix; then, the similarity of the digital character strings is calculated according to the relevant rules by combining with the longest public substring of the digital character strings, and the similarity of the digital character strings can effectively represent the similarity between Chinese characters.
Description
Technical Field
The invention relates to a Chinese domain name similarity measurement method based on J-W distance, belonging to the technical field of network security.
Background
With the development and popularization of the internet, the chinese domain name gradually becomes an important component of the internationalized domain name, and meanwhile, domain name counterfeiting attacks against the chinese domain name are increasing, and the counterfeiting forms of the domain name are increasingly complex. Because Chinese characters have many shapes and are close to characters, and the habit of fast reading of people is added, visual misjudgment is inevitably caused to a certain degree.
The traditional domain name similarity measurement method can only be applied to similarity measurement of English domain names, but the effect is not obvious for Chinese domain names. Moreover, at present, domestic related research on Chinese domain name similarity measurement is relatively deficient, and research results are relatively few.
At present, most Chinese domain name similarity measurement methods calculate the similarity of Chinese characters according to single characters and overall similarity, so that the methods have certain defects in time complexity or accuracy, and no specific implementation algorithm exists for calculating the single character similarity or the overall similarity.
Disclosure of Invention
The invention aims to solve the technical problem of limitation and deficiency of the prior art and provides a Chinese domain similarity measurement method based on J-W Distance. Compared with the Chinese domain name similarity measurement method in the prior art, the method mainly solves the problems of insufficient accuracy, poor efficiency and the like in the prior art, and aims to improve the accuracy and the timeliness of the Chinese domain name similarity measurement in the prior art.
The technical scheme of the invention is as follows: a Chinese domain name similarity measurement method based on J-W distance comprises the following specific steps:
step 1: acquiring a domain name X to be detected and a target domain name Y;
step 2: the domain name X to be detected and the target domain name Y are given a dot sign ". or a period". ' splitting, ignoring network name and domain suffix, reserving domain subject and generating Chinese character set x: { x1,x2…xp{ y: } and y: { y1,y2…yq};
Step 3: traversing the domain name main body Chinese character set x obtained in Step2 according to the Unicode Chinese character stroke sequence table1,x2…xp{ y: } and y: { y1,y2…yqFor each Chinese character x according to the aggregate character orderi,i∈[1,p]Or yi,i∈[1,q]Searching the stroke sequence of the corresponding Chinese character, converting according to the corresponding coding rule, and generating the coded character string str of the main body of the domain name X to be detectedxAnd the code string str of the main body of the target domain name Y domain nameyAnd obtaining the code string strxAnd stryLength of (len)xAnd leny;
Step4.1: subjecting the main domain name coding character string str of the domain name X to be detected and the target domain name YxAnd stryAs input to the J-W algorithm, and generates a detection matrix
Step4.2: the matching window value MW is calculated according to equation (1):
step4.3: by a detection matrixAnd a matching window value MW, calculating the number m of matched characters and the number n of replaced matched characters according to the relevant rules;
step4.4: the number m of the matched characters and the number n of the conversion digits of the matched characters are calculated by Step4.3, and the domain name main body coding character string str of the domain name X to be detected and the target domain name Y is calculated according to a formula (2)xAnd stryJaro Distance of (1):
step4.5: acquiring a domain name main body coding character string str of a domain name X to be detected and a target domain name YxAnd stryThe longest common substring strxyAnd obtain the length len thereofxy;
Step4.6: further calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (3)xAnd stryJaro-Winkler Distance of (1):
wherein, btTo determine whether further computation of the threshold is required, p is a scaling factor.
The domain name X to be detected and the target domain name Y in Step1 may be a primary domain name or a secondary domain name.
In Step2, if the domain name X to be detected and the target domain name Y are primary domain names, only the domain name suffix needs to be ignored. In addition, since the chinese domain name is not yet popular, some domain names may not need the "www" network name during registration, and at this time, the initialization process of Step2 can be adjusted accordingly, and in short, the next Step can be performed only by extracting the domain name body.
In the Step1, the domain name X to be detected and the target domain name Y need to be consistent with the conventional domain name, namely, after the Step2 is initialized, a Chinese character set X of the domain name main body is generated1,x2…xp{ y: } and y: { y1,y2…yqThe requirements are satisfied:
p,q∈N+
similarly, after the encoding process of Step3, an encoded string str is generatedxAnd stryLength of (len)xAnd lenyThe following requirements should be satisfied:
lenx,leny∈N+。
the Unicode Chinese character stroke sequence table in Step3 is 1,2, 3,4, 5 for coding the Chinese character stroke sequence of horizontal, vertical, left falling, right falling and turning into numbers, and all Chinese characters are processed according to the coding
According to the encoding rule in Step3, a null character string is generated first, and then a main Chinese character set X of the domain name X to be compared is formed1,x2…xpGet over each Chinese character x according to the sequencei,i∈[1,p]Searching the stroke sequence of the corresponding Chinese character according to the stroke sequence table of the Unicode Chinese character, adding the stroke sequence to the tail part of the character string, and processing all the elements in the set to obtain the character string which is the coded character string str of the main body of the domain name X to be detectedxSimilarly, the target domain name Y is processed by the method, so that the coded character string str of the main body of the target domain name Y can be generatedy。
Calculating the number m of matched characters in the step Step4.1, if the character string str is codedxAnd stryIf the difference distance between the same characters is smaller than the matching window value MW, the characters are considered to be matched; it should be noted, however, that in the matching process, it is excludedIf the matched character is found, skipping the matching and matching the next character;
for the calculation of the number n of the conversion bits of the matched character, the code character string str is needed to be looked atxAnd stryIf the sequence of the matched character set is consistent, half of the transposition number is the transposition number n of the matched character; in addition, the number m of matched characters and the number n of transposed matched characters should satisfy the following requirements:
in said step Step4.6 a further threshold value b is calculatedtThe value is 0.7, and small-amplitude adjustment can be performed according to the actual detection result, mainly for improving the detection accuracy; the value of the scaling factor p is 0.1, and small-amplitude adjustment can be performed according to an actual detection result, mainly to avoid the situation that the final calculation result is greater than 1, but the method adds a new code string strxAnd stryReciprocal of the longest distance inImproving the calculation formula hereThe value of the scaling factor p has little influence on the final calculation result.
Dis calculated in Steps Step4.4 and Step4.6jAnd DisjwThe following requirements should be met:
if not, indicating that the calculation is wrong and needing to be recalculated; if the domain name is satisfied, the closer the value is to 1, the more similar the domain name X to be detected and the target domain name Y are.
Usually, the domain name X to be detected needs to be a set of target domain names Y1,Y2…YkCarry out similarity calculation for extractionHigh detection rate, the target domain name Yi,i∈[1,k]Its code character string can be calculated in advanceAnd storing the data into a database, and directly calling the database when the data is needed to be used.
The invention has the beneficial effects that: the method comprises the steps of mapping a coded Chinese character into a string of digital character strings through a Unicode Chinese character stroke sequence table, innovatively introducing a Jaro-Window Distance algorithm in the field of machine learning, combining the Jaro-Window Distance algorithm with a longest public substring, and further performing similarity measurement on the Chinese domain name. Firstly, acquiring a domain name to be detected and a target domain name, and initializing the domain name to be detected and the target domain name to generate a domain name main body; secondly, coding the domain name main body according to a Unicode Chinese character stroke sequence table to generate a digital character string which is used as an input of a Jaro-Winner Distance algorithm to generate a detection matrix; then, the similarity of the digital character strings is calculated according to the relevant rules by combining with the longest public substring of the digital character strings, and the similarity of the digital character strings can effectively represent the similarity between Chinese characters. Compared with the prior art, the method mainly solves the problems of insufficient accuracy, poor efficiency and the like in the prior art, and aims to improve the accuracy and the timeliness of the similarity measurement of the Chinese domain name at present.
Drawings
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
Example 1: as shown in fig. 1, a method for measuring similarity of a chinese domain name based on J-W distance includes the following steps:
step 1: acquiring a domain name X to be detected and a target domain name Y;
step 2: the domain name X to be detected and the target domain name Y are given a dot sign ". or a period". ' splitting, ignoring network name and domain suffix, reserving domain subject and generating Chinese character set x: { x1,x2…xp{ y: } and y: { y1,y2…yq};
Step 3: traversing the domain name main body Chinese character set x obtained in Step2 according to the Unicode Chinese character stroke sequence table1,x2…xp{ y: } and y: { y1,y2…yqFor each Chinese character x according to the aggregate character orderi,i∈[1,p]Or yi,i∈[1,q]Searching the stroke sequence of the corresponding Chinese character, converting according to the corresponding coding rule, and generating the coded character string str of the main body of the domain name X to be detectedxAnd the code string str of the main body of the target domain name Y domain nameyAnd obtaining the code string strxAnd stryLength of (len)xAnd leny;
Step4.1: subjecting the main domain name coding character string str of the domain name X to be detected and the target domain name YxAnd stryAs input to the J-W algorithm, and generates a detection matrix
Step4.2: the matching window value MW is calculated according to equation (1):
step4.3: by a detection matrixAnd a matching window value MW, calculating the number m of matched characters and the number n of replaced matched characters according to the relevant rules;
step4.4: the number m of the matched characters and the number n of the conversion digits of the matched characters are calculated by Step4.3, and the domain name main body coding character string str of the domain name X to be detected and the target domain name Y is calculated according to a formula (2)xAnd stryJaro Distance of (1):
step4.5: acquiring a domain name main body coding character string str of a domain name X to be detected and a target domain name YxAnd stryThe longest common substring strxyAnd obtain the length len thereofxy;
Step4.6: further calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (3)xAnd stryJaro-Winkler Distance of (1):
wherein, btTo determine whether further computation of the threshold is required, p is a scaling factor.
The domain name X to be detected and the target domain name Y in Step1 may be a primary domain name or a secondary domain name.
In Step2, if the domain name X to be detected and the target domain name Y are primary domain names, only the domain name suffix needs to be ignored. In addition, since the chinese domain name is not yet popular, some domain names may not need the "www" network name during registration, and at this time, the initialization process of Step2 can be adjusted accordingly, and in short, the next Step can be performed only by extracting the domain name body.
In the Step1, the domain name X to be detected and the target domain name Y need to be consistent with the conventional domain name, namely, after the Step2 is initialized, a Chinese character set X of the domain name main body is generated1,x2…xp{ y: } and y: { y1,y2…yqThe requirements are satisfied:
p,q∈N+
similarly, after the encoding process of Step3, an encoded string str is generatedxAnd stryLength of (len)xAnd lenyThe following requirements should be satisfied:
lenx,leny∈N+。
the Unicode Chinese character stroke sequence table in Step3 is 1,2, 3,4, 5 for coding the Chinese character stroke sequence of horizontal, vertical, left falling, right falling and turning into numbers, and all Chinese characters are processed according to the coding
The encoding rule in Step3 is to generate a null character string and then treat the comparisonDomain name X Domain name Main Chinese character set X: { X1,x2…xpGet over each Chinese character x according to the sequencei,i∈[1,p]Searching the stroke sequence of the corresponding Chinese character according to the stroke sequence table of the Unicode Chinese character, adding the stroke sequence to the tail part of the character string, and processing all the elements in the set to obtain the character string which is the coded character string str of the main body of the domain name X to be detectedxSimilarly, the target domain name Y is processed by the method, so that the coded character string str of the main body of the target domain name Y can be generatedy。
Calculating the number m of matched characters in the step Step4.1, if the character string str is codedxAnd stryIf the difference distance between the same characters is smaller than the matching window value MW, the characters are considered to be matched; however, it should be noted that in the matching process, the matched character needs to be excluded, and if the matched character is found, the matching needs to be skipped for the next character matching;
for the calculation of the number n of the conversion bits of the matched character, the code character string str is needed to be looked atxAnd stryIf the sequence of the matched character set is consistent, half of the transposition number is the transposition number n of the matched character; in addition, the number m of matched characters and the number n of transposed matched characters should satisfy the following requirements:
in said step Step4.6 a further threshold value b is calculatedtThe value is 0.7, and small-amplitude adjustment can be performed according to the actual detection result, mainly for improving the detection accuracy; the value of the scaling factor p is 0.1, and small-amplitude adjustment can be performed according to an actual detection result, mainly to avoid the situation that the final calculation result is greater than 1, but the method adds a new code string strxAnd stryReciprocal of the longest distance inImproving the calculation formula hereThe value of the scaling factor p has little influence on the final calculation result.
Dis calculated in Steps Step4.4 and Step4.6jAnd DisjwThe following requirements should be met:
if not, indicating that the calculation is wrong and needing to be recalculated; if the domain name is satisfied, the closer the value is to 1, the more similar the domain name X to be detected and the target domain name Y are.
Example 2: the calculation of the number m of matched characters and the number n of transposed characters will be further explained on the basis of embodiment 1. Assuming that the domain main bodies of the domain name X to be detected and the target domain name Y are respectively ' treatment ' and ' treatment ', searching corresponding Chinese character codes ' treatment ' 44154251 ', ' treatment ' 4134112534 ' and ' treatment ' 4154251 ' through a Unicode Chinese character stroke sequence table, and generating a code character string strx、stryRespectively "441542514134112534" and "41542514134112534".
Calculating a matching window value MW:
binding detection matrix I (X, Y)18×17Calculating the number m of matched characters and the number n of transposed matched characters:
as shown in the above table (matrix): the value of "/" in the table (matrix) indicates that the matching window value MW is exceeded, and whether the matching window value MW is matched or not is not considered; "1" indicates that the corresponding column value matches the row value; a "0" indicates that the corresponding column value does not match the row value.
To sum up, the number of matching characters m is 17, the matching character set is {4,4,1,5,4,2,5,1,4,1,3,4,1,1,2,5,3}, and the code string str is a string of code charactersyThe number of transpositions is 15 for "41542514134112534", so that the number of transpositions n of the resulting matched character is 7.
Example 3: the practice of the present invention is further illustrated on the basis of example 1. Assuming that the domain name X to be detected and the target domain name Y are 'today's science and technology, China 'command's science and technology, respectively, the initialized domain name main bodies are 'today's science and technology 'command' technology, searching corresponding Chinese character codes through a Unicode Chinese character stroke sequence table, and generating a code character string str according to rulesx、stryRespectively "344525113123444121211254" and "3445425113123444121211254".
Calculating a matching window value MW:
binding detection matrix I (X, Y)24×25The number m of the matched characters obtained by calculation is 24, and the number n of the replaced matched characters is 8.
Calculating the code string strx、stryJaro Distance of (1):
longest common substring strxyLength of (len)xyThe code string str is further calculated 20x、stryJaro-Winkler Distance of (1):
the result shows that human eyes of the domain name X to be detected and the target domain name Y look similar, the result obtained by calculation of the invention also accords with the human eye detection effect, the defects that a counterfeited website maker replaces one Chinese character with an approximate Chinese character, the judgment accuracy rate is low, the efficiency is low and the like in the prior art are effectively prevented, and the method is more humanized in practical application.
Example 4: on the basis of embodiment 3, suppose that the domain name X to be detected and the target domain name Y are "science and technology of this day" and "science and technology of china" of this day ", respectively, and the final calculation result Jaro-WinklerDistance is calculated by the steps described in embodiment 4:
by combining the present example and the embodiment 3, it is comprehensively shown that the method for judging the similarity of the Chinese domain names has good implementation effect and almost the same result as the result judged by human eyes, effectively prevents the counterfeiter from replacing one Chinese character with an approximate Chinese character, has the defects of low accuracy, low efficiency and the like of the judgment of the prior art, and is more humanized in practical application.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.
Claims (8)
1. A Chinese domain name similarity measurement method based on J-W distance is characterized by comprising the following steps:
step 1: acquiring a domain name X to be detected and a target domain name Y;
step 2: the domain name X to be detected and the target domain name Y are given a dot sign ". or a period". ' splitting, ignoring network name and domain suffix, reserving domain subject and generating Chinese character set x: { x1,x2…xp{ y: } and y: { y1,y2…yq};
Step 3: traversing the domain name main body Chinese character set x obtained in Step2 according to the Unicode Chinese character stroke sequence table1,x2…xp{ y: } and y: { y1,y2…yqFor each Chinese character x according to the aggregate character orderi,i∈[1,p]Or yi,i∈[1,q]Finding out the stroke order of corresponding Chinese characters according to the corresponding codesThe code rules are converted to generate a code character string str of the main body of the domain name X to be detectedxAnd the code string str of the main body of the target domain name Y domain nameyAnd obtaining the code string strxAnd stryLength of (len)xAnd leny;
Step4.1: subjecting the main domain name coding character string str of the domain name X to be detected and the target domain name YxAnd stryAs input to the J-W algorithm, and generates a detection matrix
Step4.2: the matching window value MW is calculated according to equation (1):
step4.3: by a detection matrixAnd a matching window value MW, calculating the number m of matched characters and the number n of replaced matched characters according to the relevant rules; for the calculation of the number m of matched characters, if the character string str is codedxAnd stryIf the difference distance between the same characters is smaller than the matching window value MW, the characters are considered to be matched; in the matching process, the matched characters need to be excluded, if the matched characters are found, the matching needs to be skipped out, and the matching of the next character is carried out;
for the calculation of the number n of the conversion bits of the matched character, the code character string str is needed to be looked atxAnd stryIf the sequence of the matched character set is consistent, half of the transposition number is the transposition number n of the matched character; the number m of matched characters and the number n of transposed matched characters should satisfy the following requirements:
step4.4: the number m of the matched characters and the transposition number of the matched characters are calculated by Step4.3n, calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (2)xAnd stryJaro Distance of (1):
step4.5: acquiring a domain name main body coding character string str of a domain name X to be detected and a target domain name YxAnd stryThe longest common substring strxyAnd obtain the length len thereofxy;
Step4.6: further calculating the main domain name coding character string str of the domain name X to be detected and the target domain name Y according to the formula (3)xAnd stryJaro-Winkler Distance of (1):
wherein, btTo determine whether further computation of the threshold is required, p is a scaling factor.
2. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: the domain name X to be detected and the target domain name Y in Step1 may be a primary domain name or a secondary domain name.
3. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: in Step2, if the domain name X to be detected and the target domain name Y are primary domain names, only the domain name suffix needs to be ignored.
4. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: in the Step1, the domain name X to be detected and the target domain name Y need to be consistent with the conventional domain name, namely, after the Step2 is initialized, a Chinese character set X of the domain name main body is generated1,x2…xp{ y: } and y: { y1,y2…yqThe requirements are satisfied:
p,q∈N+
similarly, after the encoding process of Step3, an encoded string str is generatedxAnd stryLength of (len)xAnd lenyThe following requirements should be satisfied:
lenx,leny∈N+。
5. the J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: the Unicode Chinese character stroke sequence table in Step3 is 1,2, 3,4 and 5 for coding the Chinese character stroke sequence of horizontal, vertical, left falling, right falling and turning into numbers, and all Chinese characters are coded according to the coding.
6. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: according to the encoding rule in Step3, a null character string is generated first, and then a main Chinese character set X of the domain name X to be compared is formed1,x2…xpGet over each Chinese character x according to the sequencei,i∈[1,p]Searching the stroke sequence of the corresponding Chinese character according to the stroke sequence table of the Unicode Chinese character, adding the stroke sequence to the tail part of the character string, and processing all the elements in the set to obtain the character string which is the coded character string str of the main body of the domain name X to be detectedxSimilarly, the target domain name Y is processed by the method, so that the coded character string str of the main body of the target domain name Y can be generatedy。
7. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: in said step Step4.6 a further threshold value b is calculatedtThe value is 0.7, and the value of the scaling factor p is 0.1.
8. The J-W distance-based chinese domain name similarity measurement method according to claim 1, wherein: dis calculated in Steps Step4.4 and Step4.6jAnd DisjwThe following requirements should be met:
if not, indicating that the calculation is wrong and needing to be recalculated; if the domain name is satisfied, the closer the value is to 1, the more similar the domain name X to be detected and the target domain name Y are.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710749659.0A CN107609059B (en) | 2017-08-28 | 2017-08-28 | Chinese domain name similarity measurement method based on J-W distance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710749659.0A CN107609059B (en) | 2017-08-28 | 2017-08-28 | Chinese domain name similarity measurement method based on J-W distance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107609059A CN107609059A (en) | 2018-01-19 |
CN107609059B true CN107609059B (en) | 2020-10-20 |
Family
ID=61056242
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710749659.0A Active CN107609059B (en) | 2017-08-28 | 2017-08-28 | Chinese domain name similarity measurement method based on J-W distance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107609059B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113807087B (en) * | 2020-06-16 | 2023-11-28 | 中国电信股份有限公司 | Method and device for detecting similarity of website domain names |
CN112395877A (en) * | 2020-11-04 | 2021-02-23 | 苏宁云计算有限公司 | Character string detection method and device, computer equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999040517A1 (en) * | 1998-02-09 | 1999-08-12 | Ibi Co., Ltd. | Method for connection for computer network on internet by real name and computer network system thereof |
CN103399907A (en) * | 2013-07-31 | 2013-11-20 | 深圳市华傲数据技术有限公司 | Method and device for calculating similarity of Chinese character strings on the basis of edit distance |
CN103428307A (en) * | 2013-08-09 | 2013-12-04 | 中国科学院计算机网络信息中心 | Method and equipment for detecting counterfeit domain names |
CN106170002A (en) * | 2016-09-08 | 2016-11-30 | 中国科学院信息工程研究所 | A kind of Chinese counterfeit domain name detection method and system |
CN106375288A (en) * | 2016-08-29 | 2017-02-01 | 中国科学院信息工程研究所 | Chinese domain name similarity calculation method and counterfeit domain name detection method |
-
2017
- 2017-08-28 CN CN201710749659.0A patent/CN107609059B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1999040517A1 (en) * | 1998-02-09 | 1999-08-12 | Ibi Co., Ltd. | Method for connection for computer network on internet by real name and computer network system thereof |
CN103399907A (en) * | 2013-07-31 | 2013-11-20 | 深圳市华傲数据技术有限公司 | Method and device for calculating similarity of Chinese character strings on the basis of edit distance |
CN103428307A (en) * | 2013-08-09 | 2013-12-04 | 中国科学院计算机网络信息中心 | Method and equipment for detecting counterfeit domain names |
CN106375288A (en) * | 2016-08-29 | 2017-02-01 | 中国科学院信息工程研究所 | Chinese domain name similarity calculation method and counterfeit domain name detection method |
CN106170002A (en) * | 2016-09-08 | 2016-11-30 | 中国科学院信息工程研究所 | A kind of Chinese counterfeit domain name detection method and system |
Non-Patent Citations (2)
Title |
---|
《基于文本聚类的网络攻击检测方法》;杨晓峰 等;《智能系统学报》;20140228;第9卷(第1期);全文 * |
高效的基于段模式的恶意 URL 检测方法;林海伦 等;《通信学报》;20151130;第36卷(第Z1期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107609059A (en) | 2018-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108154167B (en) | Chinese character font similarity calculation method | |
CN110909548B (en) | Chinese named entity recognition method, device and computer readable storage medium | |
WO2018040899A1 (en) | Error correction method and device for search term | |
CN112468501B (en) | URL-oriented phishing website detection method | |
CN109344263B (en) | Address matching method | |
CN111382298B (en) | Image retrieval method and device based on picture content and electronic equipment | |
CN108182401B (en) | Safe iris identification method based on aggregated block information | |
CN105808709A (en) | Quick retrieval method and device of face recognition | |
CN107609059B (en) | Chinese domain name similarity measurement method based on J-W distance | |
WO2019201295A1 (en) | File identification method and feature extraction method | |
CN111680480A (en) | Template-based job approval method and device, computer equipment and storage medium | |
CN111639183A (en) | Financial industry consensus public opinion analysis method and system based on deep learning algorithm | |
CN113343025B (en) | Sparse attack resisting method based on weighted gradient Hash activation thermodynamic diagram | |
CN114973229A (en) | Text recognition model training method, text recognition device, text recognition equipment and medium | |
CN107679029B (en) | English domain name similarity detection method | |
CN104021184B (en) | A kind of localization method and system | |
CN113094465A (en) | Method and system for checking duplicate of design product | |
CN116975864A (en) | Malicious code detection method and device, electronic equipment and storage medium | |
CN116662388A (en) | Efficient hidden query method and system | |
CN114866246B (en) | Computer network security intrusion detection method based on big data | |
CN107464268A (en) | A kind of joint coding method using global and local feature | |
CN107967472A (en) | A kind of search terms method encoded using dynamic shape | |
KR20220152167A (en) | A system and method for detecting phishing-domains in a set of domain name system(dns) records | |
CN108171115A (en) | A kind of incompleteness English word recognition methods | |
CN108874978A (en) | One method that conference content abstract task is solved based on layering adaptability segmented network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |