WO2022124573A1 - Procédé d'évaluation de similarité de site web sur la base d'une structure de menu et d'un mot-clé dans un script - Google Patents

Procédé d'évaluation de similarité de site web sur la base d'une structure de menu et d'un mot-clé dans un script Download PDF

Info

Publication number
WO2022124573A1
WO2022124573A1 PCT/KR2021/015431 KR2021015431W WO2022124573A1 WO 2022124573 A1 WO2022124573 A1 WO 2022124573A1 KR 2021015431 W KR2021015431 W KR 2021015431W WO 2022124573 A1 WO2022124573 A1 WO 2022124573A1
Authority
WO
WIPO (PCT)
Prior art keywords
similarity
processor
website
name information
similarity evaluation
Prior art date
Application number
PCT/KR2021/015431
Other languages
English (en)
Korean (ko)
Inventor
이준식
김용현
조양현
Original Assignee
주식회사 앰진시큐러스
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 주식회사 앰진시큐러스 filed Critical 주식회사 앰진시큐러스
Publication of WO2022124573A1 publication Critical patent/WO2022124573A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • G06F21/121Restricting unauthorised execution of programs
    • G06F21/128Restricting unauthorised execution of programs involving web programs, i.e. using technology especially used in internet, generally interacting with a web browser, e.g. hypertext markup language [HTML], applets, java
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1483Countermeasures against malicious traffic service impersonation, e.g. phishing, pharming or web spoofing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2119Authenticating web pages, e.g. with suspicious links

Definitions

  • the present invention relates to a similarity evaluation of a web site, and more particularly, to a technique for evaluating the similarity of a web site based on a menu structure in a web site and a keyword in a script.
  • the conventional similarity evaluation method through image comparison is style-dependent, and if the style structure is similar even if the actual content is not similar, the similarity may be highly measured and an error may be made in the actual similarity evaluation. In addition, it is impossible to evaluate sites that do not show a specific image-based pattern.
  • An object of the present specification is to provide a method for evaluating the similarity of a web site that is improved compared to the prior art.
  • a website similarity evaluation method for solving the above-described problems, the method comprising: (a) storing, by a processor, an HTML file and a script file of a similarity evaluation target site; (b) extracting, by the processor, tag information from the HTML file, extracting function name information and variable name information from the script file, and storing the extracted information; (c) storing, by a processor, converting a menu structure into a tree structure by using a depth of a tag related to a menu among the tag information; (d) the processor calculates a degree of similarity (hereinafter, 'first similarity') between the converted tree-structured data of the similarity evaluation target site and the tree-structured data of the similarity evaluation reference site, and function name information and variables of the similarity evaluation target site calculating the similarity (hereinafter, 'second similarity') between the function name information and the variable name information of each similarity evaluation standard site for the name information; and (e) calculating, by the processor, an integrated similarity using
  • the step (c) includes: (c-1) extracting, by the processor, a string of a tag related to a menu; (c-2) the processor arranging the extracted character string; and (c-3) setting, by the processor, the sorted string as a node value of the tree.
  • step (c-2) may be a step in which the processor divides the string into morpheme units, removes duplicate or unnecessary morphemes, and converts the string into a representative word using a thesaurus database.
  • step (d) may be a step in which the processor calculates the first similarity using a tree edit distance (TED) algorithm.
  • TED tree edit distance
  • the step (b) may be a step in which the processor extracts the character string from the character after the function keyword until the parentheses special character appears and stores it as function name information.
  • step (b) may be a step in which the processor extracts a character string from a character after a variable type name to a space before a space and stores it as variable name information.
  • step (d) the processor performs Euclidean distance measurement, Manhattan distance measurement, Hervesine distance measurement method, Minkowski distance measurement method, Mahalanobis distance measurement method, cosine similarity measurement method, jacquard similarity measurement method It may be a step of calculating the second degree of similarity by using at least one of the measurement methods.
  • step (e) may be a step in which the processor calculates the integrated similarity using a cosine similarity measurement method.
  • step (e) may be a step in which the processor calculates the integrated similarity by weighting any one of the first similarity and the second similarity.
  • the website similarity evaluation method according to the present specification may be implemented in the form of a computer program written to perform each step of the website similarity evaluation method on a computer and recorded on a computer-readable recording medium.
  • a website similarity evaluation apparatus for solving the above problems includes: a memory unit for storing tree structure data of a similarity evaluation reference site, function name information, and variable name information; and a processor for storing an HTML file and a script file of the similarity evaluation target site, wherein the processor extracts and stores tag information from the HTML file, and uses a depth of a tag related to a menu among the tag information to create a menu
  • the structure is converted into a tree structure and stored, the first similarity between the converted tree structure data of the similarity evaluation target site and the tree structure data of the similarity evaluation standard site is calculated, and function name information is extracted from the script file and stored, , extracts and stores variable name information from the script file, calculates a second degree of similarity between the function name information and the variable name information of the similarity evaluation target site, respectively, between the function name information and the variable name information of the similarity evaluation standard site, and An integrated similarity may be calculated using the first and second similarities.
  • the processor when the processor converts the menu structure into a tree structure, extracts a string of a tag related to a menu, organizes the extracted string, and converts the organized string into a node value of the tree can be set.
  • the processor when organizing the extracted character string, may divide the character string into morpheme units, remove duplicate or unnecessary morphemes, and convert the extracted character string into a representative word using a thesaurus database.
  • the processor may calculate the first similarity using a tree edit distance (TED) algorithm.
  • TED tree edit distance
  • the processor when extracting function name information from the script file, may extract a character string from the character after the function keyword until the parentheses special character appears and store it as function name information.
  • the processor when extracting variable name information from the script file, may extract a character string from the character after the variable type name until a space appears and store it as variable name information.
  • the processor when calculating the second similarity,
  • the second similarity may be calculated using at least one of a Euclidean distance measurement method, a Manhattan distance measurement method, a Hervesine distance measurement method, a Minkowski distance measurement method, a Mahalanobis distance measurement method, a cosine similarity measurement method, and a Jacquard similarity measurement method.
  • the processor when calculating the integrated similarity, may calculate the integrated similarity by using a cosine similarity measurement method.
  • the processor when calculating the combined similarity, may calculate the combined similarity by assigning a weight to any one of the first similarity and the second similarity.
  • a website similarity evaluation device includes: a website similarity evaluation device; and a communication unit that accesses a similarity evaluation target site under the control of the processor and reads HTML files and script files.
  • FIG. 1 is a schematic flowchart related to calculating a first similarity of a method for evaluating a similarity of a website according to an embodiment of the present specification.
  • FIG. 2 is a reference diagram for treeizing a website menu according to an embodiment of the present specification.
  • FIG. 3 is a reference diagram for evaluating a first degree of similarity between web sites.
  • FIG. 4 is a schematic flowchart related to calculating a second degree of similarity in a method for evaluating a similarity of a website according to an embodiment of the present specification.
  • FIG. 5 is an exemplary diagram of extracting a function name and a variable name from a script file.
  • FIG. 6 is a reference diagram for comparing the present variable name and function name, respectively.
  • the website similarity evaluation method may calculate a first similarity based on a menu structure of a website, and may calculate a second similarity based on a keyword in a script.
  • a final degree of similarity between the two websites may be calculated by calculating an integrated similarity using the first and second similarities. Accordingly, after the first similarity calculation method and the second similarity calculation method will be described, respectively, a method of calculating the combined similarity will be described.
  • FIG. 1 is a schematic flowchart related to calculating a first similarity of a method for evaluating a similarity of a website according to an embodiment of the present specification.
  • the processor may store a Hyper Text Markup Language (HTML) file of the site to be evaluated for similarity.
  • HTML Hyper Text Markup Language
  • the processor may extract and store tag information from the HTML file.
  • the data of the HTML tag can be collected by accessing the similarity evaluation target site and using a python library (a library for importing desired data from HTML) such as beautifulSoup.
  • the processor may convert the menu structure into a tree structure and store the menu structure by using the depth of the tag related to the menu among the tag information.
  • the processor may calculate a first degree of similarity between the converted tree-structured data of the similarity evaluation target site and the tree-structured data of the similarity evaluation reference website.
  • FIG. 2 is a reference diagram for treeizing a website menu according to an embodiment of the present specification.
  • a part of the web page is shown in the left part.
  • a part of the web page is a part related to a menu and is generally arranged on the left side of the screen.
  • a part of the HTML file of the site to be evaluated for similarity may be identified.
  • the HTML file is a portion related to the menu portion in the HTML file of the site to be evaluated for similarity.
  • the HTML file may include various tags such as a class name of a tag, a tag name, tag contents, a tag ID name, and a tag depth. Among them, the menu structure can be converted into a tree structure by using the tag depth. Referring to the example shown in FIG.
  • the tag is indented to the right of the tag as the depth of the tag decreases.
  • the processor uses a library called "Beautiful Soap" to get tag information, it can get information only by inputting a selector to select a tag.
  • This selector is information on which tag in which tag is selected, and it is possible to extract and store a state relationship by determining a depth factor. Thereafter, the processor may proceed with tree structure using the identification character of the depth.
  • the processor may determine and store the tag "#s_content” as depth 1, and the tag "div.section” as depth 2, respectively. As such, the processor may identify whether the relationship between the tag and the tag is a top/bottom relationship or a parallel relationship by classifying a delimiter for which the depth of the tag is lowered. As a result, the menu structure may be converted into a tree structure as shown in the right part of FIG. 2 .
  • each node value may be set as a representative word for describing each menu.
  • the processor may extract a string of a tag related to a menu, organize the extracted string, and set the organized string as a node value of the tree.
  • the processor may divide the character string into morpheme units, remove duplicate or unnecessary morphemes, and convert the character string into a representative word using a thesaurus database. Since the technology for dividing or deleting a character string is a technology known in the field of language processing technology at the time of filing the present specification, a detailed description thereof will be omitted. Also, since the thesaurus database is already built, it is assumed that the processor accesses the thesaurus database and converts the representative word.
  • step S40 the processor may evaluate the first similarity between the tree-structured data of the similarity evaluation target site and the tree-structured data of the similarity evaluation reference site.
  • Various mathematical algorithms for calculating the first degree of similarity may exist.
  • the processor may calculate the first similarity using a Tree Edit Distance (TED) algorithm.
  • TED Algorithm is an algorithm used to compare two tree structures. Editing is performed until it has the same structure as the comparison reference tree by modifying, deleting, and adding the structure of the tree to be compared. This is a method of calculating the first degree of similarity by quantifying the cost (number of times, degree).
  • FIG. 3 is a reference diagram for evaluating a first degree of similarity between web sites.
  • FIG. 3 an example of tree structure data of two web sites can be confirmed. It is assumed that website A is a site for evaluating similarity, and website B is assumed as a site for evaluation of similarity.
  • website A is a site for evaluating similarity
  • website B is assumed as a site for evaluation of similarity.
  • headers and menus 1 to 4 having a parallel relationship are identical.
  • tags belonging to the lower depth of menu 1 tags belonging to the lower depth of menu 3, and several tags belonging to the lower depth of menu 4 are different. Therefore, according to the TED algorithm, the editing cost (number of times, degree) will be generated as much as the difference, and the first degree of similarity between website A and website B can be calculated numerically. This can be expressed as Equation 1 below.
  • T1 Tree structure of the reference site and the tree structure of the comparison site
  • the website similarity evaluation method may evaluate the first similarity between one similarity evaluation reference site and a plurality of similarity evaluation target sites.
  • the first similarity values for each similarity evaluation target site may be arranged in ascending or descending order to provide the user with easy judgment.
  • FIG. 4 is a schematic flowchart related to calculating a second degree of similarity in a method for evaluating a similarity of a website according to an embodiment of the present specification.
  • the processor may store a script file of the site to be evaluated for similarity.
  • a script file refers to a file written in a scripting language, which is one of computer programming languages for controlling application software.
  • An example of the script language is JavaScript (JAVA Script).
  • the processor may extract and store function name information from the script file.
  • the processor may extract the character string from the character after the function keyword until the parentheses special character appears and store it as function name information.
  • FIG. 5 is an exemplary diagram of extracting a function name and a variable name from a script file.
  • a string “setCookieUserinfo” can be identified after a function in the script file.
  • the character string may be extracted and stored as a function name.
  • the processor may extract and store variable name information from the script file.
  • the processor may extract a character string from the character after the variable type name until a space appears and store it as variable name information.
  • Variable type name means a term set in advance for each programming language to set the type of a variable, such as int, var, let, etc.
  • variable type name “var” in the script file can be identified.
  • the string "ck_name” after the variable var may be extracted and stored as a variable name.
  • the processor may calculate a second degree of similarity between the function name information of the similarity evaluation target site and the function name information of the similarity evaluation reference site.
  • the processor may calculate a second degree of similarity between the variable name information of the similarity evaluation target site and the variable name information of the similarity evaluation reference site.
  • FIG. 6 is a reference diagram for comparing the present variable name and function name, respectively.
  • variable name different from a commonly used variable name set between the variable name set of the website A and the variable name set of the website B As shown in FIG.
  • the number of commonly used variable names may be large. This comparison is the same for function names.
  • various mathematical techniques may be used to numerically calculate how similar they are.
  • the processor is configured to at least any one of a text similarity measurement method, Euclidean distance measurement, Manhattan distance measurement, Hervesine distance measurement, Minkowski distance measurement, Mahalanobis distance measurement, cosine similarity measurement, and jacquard similarity measurement method.
  • a second degree of similarity may be calculated using one measurement method.
  • the similarity measurement method if the example of FIG. 6 is expressed as an equation according to the jacquard similarity measurement method, it can be expressed as Equation 2 below.
  • the processor may calculate an integrated similarity using the first and second similarities. According to an embodiment of the present specification, the processor may calculate the integrated similarity using a cosine similarity measurement method.
  • An example of calculating the integrated similarity according to the cosine similarity can be expressed as Equation 3 below.
  • the processor may calculate the integrated similarity by assigning a weight (a m ) to any one of the first and second similarities. For example, when a menu acts as an important factor between two sites, the weight of the first similarity may be increased. If the same worker is suspected between the two sites, the weight of the second similarity may be increased.
  • the website similarity evaluation apparatus may include a memory unit and a processor.
  • the processor may execute the website similarity evaluation method according to the present specification described above.
  • the memory unit may store data according to a control command of the processor.
  • the website similarity evaluation device may be a component of the website similarity evaluation server.
  • the web site similarity evaluation server may include a web site similarity evaluation apparatus according to the present specification and a communication unit that reads an HTNML file and a script file by accessing a similarity evaluation target site under the control of the processor.
  • the communication unit is a device capable of transmitting and receiving data by accessing a server providing a web page, and is not limited by wired/wireless communication or communication protocols.
  • the website similarity evaluation method and apparatus extracts the properties of the menu structure of the site, compares the extracted properties by tree structure, and compares the main keywords in the script reflecting the developer's characteristics, so accurate similarity measurement This is possible. For example, since it is possible to quickly and accurately search and extract features of sites with copyright infringing content, the consumption of material/human resources required to prevent overflow of harmful sites can be greatly reduced. In addition, by comparing the meta data considering the structure of the web page, it is possible to measure the objective similarity by using the numerically expressed index, and a clear classification of similar sites may be possible. Furthermore, it is expected that it will be able to significantly contribute to blocking the re-distribution of harmful sites by using the index to find sites similar to copyright-violating sites faster and more accurately.
  • the processor includes a processor, an application-specific integrated circuit (ASIC), other chipsets, logic circuits, registers, communication modems, data processing devices, etc. known in the art for executing the above-described calculation and various control logics. can do.
  • ASIC application-specific integrated circuit
  • the processor may be implemented as a set of program modules.
  • the program module may be stored in the memory unit and executed by the processor.
  • the above-described computer program is C/C++, C#, JAVA that can be read by a processor (CPU) of the computer through a device interface of the computer in order for the computer to read the program and execute the methods implemented as a program , Python, may include code coded in a computer language such as machine language. Such code may include functional code related to a function defining functions necessary for executing the methods, etc., and includes an execution procedure related control code necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, this code may further include additional information necessary for the processor of the computer to execute the functions or code related to memory reference for which location (address address) in the internal or external memory of the computer should be referenced. have.
  • the code uses the communication module of the computer to determine how to communicate with any other computer or server remotely. It may further include a communication-related code for whether to communicate and what information or media to transmit and receive during communication.
  • the storage medium is not a medium that stores data for a short moment, such as a register, a cache, a memory, etc., but a medium that stores data semi-permanently and can be read by a device.
  • examples of the storage medium include, but are not limited to, ROM, RAM, CD-ROM, magnetic tape, floppy disk, and an optical data storage device.
  • the program may be stored in various recording media on various servers accessible by the computer or in various recording media on the computer of the user.
  • the medium may be distributed in a computer system connected to a network, and a computer-readable code may be stored in a distributed manner.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Technology Law (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Power Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

Est divulgué dans la présente spécification, un procédé d'évaluation de similarité de site Web qui est supérieur à une technique classique. Le procédé d'évaluation de similarité de site Web selon la présente spécification peut extraire des informations d'étiquette d'un fichier HTML, transformer une structure de menu en une structure arborescente à l'aide de la profondeur d'une étiquette associée à un menu et calculer une première similarité entre des données de structure arborescente de sites. De plus, le procédé d'évaluation de similarité de site Web selon la présente spécification peut extraire un nom de fonction et un nom de variable à partir d'un fichier de script, et calculer une seconde similarité entre des sites Web à l'aide du nom de fonction et du nom de variable extraits. La comparaison est effectuée en calculant une similarité intégrée dans laquelle la première similarité et la seconde similarité sont intégrées, et ainsi une similarité précise peut être mesurée.
PCT/KR2021/015431 2020-12-07 2021-10-29 Procédé d'évaluation de similarité de site web sur la base d'une structure de menu et d'un mot-clé dans un script WO2022124573A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2020-0169524 2020-12-07
KR20200169524 2020-12-07

Publications (1)

Publication Number Publication Date
WO2022124573A1 true WO2022124573A1 (fr) 2022-06-16

Family

ID=81973695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/015431 WO2022124573A1 (fr) 2020-12-07 2021-10-29 Procédé d'évaluation de similarité de site web sur la base d'une structure de menu et d'un mot-clé dans un script

Country Status (2)

Country Link
KR (2) KR102419824B1 (fr)
WO (1) WO2022124573A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102595595B1 (ko) * 2023-07-24 2023-10-31 (주)에잇스니핏 웹사이트의 구조 정보를 이용한 불법·유해정보 사이트차단 방법 및 장치
KR102617515B1 (ko) * 2023-07-31 2023-12-27 (주)에잇스니핏 파비콘을 이용한 불법·유해정보 사이트 차단 방법 및장치

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010165272A (ja) * 2009-01-19 2010-07-29 Sony Corp 情報処理方法、情報処理装置、及びプログラム
KR20100099890A (ko) * 2009-03-04 2010-09-15 한국과학기술원 태그를 이용한 웹 페이지 간의 유사도 측정 방법 및 시스템
KR20110108491A (ko) * 2010-03-29 2011-10-06 한국전자통신연구원 악성 스크립트 분석 시스템 및 그를 이용한 악성 스크립트 분석 방법
KR20150144009A (ko) * 2014-06-16 2015-12-24 주식회사 예티소프트 단말, 및 이를 이용한 웹 페이지 위변조 검증 시스템 및 방법
WO2020044469A1 (fr) * 2018-08-29 2020-03-05 Bbソフトサービス株式会社 Dispositif de détection de page web illicite, procédé de commande de dispositif de détection de page web illicite, et programme de commande

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120124581A (ko) * 2011-05-04 2012-11-14 엔에이치엔(주) 개선된 유사 문서 탐지 방법, 장치 및 컴퓨터 판독 가능한 기록 매체
KR101958577B1 (ko) 2017-12-29 2019-03-14 김기수 웹페이지 캡처 이미지 기반의 웹페이지 분석 방법 및 이를 이용한 웹페이지 분석 시스템
KR102213959B1 (ko) * 2018-12-26 2021-02-09 (주)씽크포비엘 크라우드소싱 기반 소스코드 안정성 확보 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010165272A (ja) * 2009-01-19 2010-07-29 Sony Corp 情報処理方法、情報処理装置、及びプログラム
KR20100099890A (ko) * 2009-03-04 2010-09-15 한국과학기술원 태그를 이용한 웹 페이지 간의 유사도 측정 방법 및 시스템
KR20110108491A (ko) * 2010-03-29 2011-10-06 한국전자통신연구원 악성 스크립트 분석 시스템 및 그를 이용한 악성 스크립트 분석 방법
KR20150144009A (ko) * 2014-06-16 2015-12-24 주식회사 예티소프트 단말, 및 이를 이용한 웹 페이지 위변조 검증 시스템 및 방법
WO2020044469A1 (fr) * 2018-08-29 2020-03-05 Bbソフトサービス株式会社 Dispositif de détection de page web illicite, procédé de commande de dispositif de détection de page web illicite, et programme de commande

Also Published As

Publication number Publication date
KR102419824B1 (ko) 2022-07-13
KR20220080691A (ko) 2022-06-14
KR20220080703A (ko) 2022-06-14

Similar Documents

Publication Publication Date Title
WO2022124573A1 (fr) Procédé d'évaluation de similarité de site web sur la base d'une structure de menu et d'un mot-clé dans un script
WO2012108623A1 (fr) Procédé, système et support d'enregistrement lisible par ordinateur pour ajouter une nouvelle image et des informations sur la nouvelle image à une base de données d'images
WO2019103224A1 (fr) Système et procédé d'extraction de mot-clé central dans un document
WO2010011026A2 (fr) Système de recherche utilisant une image
US20090273597A1 (en) User interface screen layout analysis using hierarchical geometric features
WO2012091400A1 (fr) Système et procédé de détection de logiciel malveillant dans un fichier sur la base d'une carte génétique de fichier
CN110532352B (zh) 文本查重方法及装置、计算机可读存储介质、电子设备
PT1107136E (pt) Sistema para recuperação de imagem baseado no seu conteúdo e método para recuperar imagens que utiliza tal sistema.
CN112364637B (zh) 一种敏感词检测方法、装置,电子设备及存储介质
WO2013073805A1 (fr) Procédé et appareil pour rechercher une image, et support d'enregistrement lisible par ordinateur pour exécuter le procédé
WO2017155292A1 (fr) Procédé de détection d'anomalie et programme de détection d'anomalie
US20080127043A1 (en) Automatic Extraction of Programming Rules
WO2019054613A1 (fr) Procédé et système d'identification de progiciel source libre en fonction d'un fichier binaire
US10042622B2 (en) Methods and systems of generating ease of use interfaces for legacy system management facilities
JP2024091709A (ja) 文作成装置、文作成方法および文作成プログラム
WO2016117739A1 (fr) Système et procédé de gestion de données basée sur une base de données en mémoire
WO2021091124A1 (fr) Dispositif électronique et procédé de fonctionnement permettant de rechercher un fichier similaire à un fichier de référence sur la base d'informations de distribution concernant des caractéristiques de chaque fichier de la pluralité de fichiers
WO2014098372A1 (fr) Dispositif et méthode de collecte de sites dangereux
CN114117038A (zh) 一种文档分类方法、装置、系统及电子设备
JP2002007413A (ja) 画像検索装置
WO2012030049A2 (fr) Appareil et procédé de classification de documents similaires par application de valeur seuil dynamique
CN115761778A (zh) 一种文献重构方法、装置、设备和存储介质
WO2014098337A1 (fr) Dispositif et méthode de collecte de sites dangereux
WO2021182657A1 (fr) Système d'importation sélective de données web par réglage arbitraire de conception d'action
WO2015133774A1 (fr) Système et procédé d'analyse de brevets et support d'enregistrement dans lequel est enregistré un programme destiné à les exécuter

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21903620

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21903620

Country of ref document: EP

Kind code of ref document: A1