CN111367566A - Mixed source code feature extraction and matching method - Google Patents

Mixed source code feature extraction and matching method Download PDF

Info

Publication number
CN111367566A
CN111367566A CN201910580956.6A CN201910580956A CN111367566A CN 111367566 A CN111367566 A CN 111367566A CN 201910580956 A CN201910580956 A CN 201910580956A CN 111367566 A CN111367566 A CN 111367566A
Authority
CN
China
Prior art keywords
source code
file
function
source
code file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910580956.6A
Other languages
Chinese (zh)
Inventor
巨李岗
从慧珅
田伟丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Keyware Co ltd
Original Assignee
Beijing Keyware Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Keyware Co ltd filed Critical Beijing Keyware Co ltd
Priority to CN201910580956.6A priority Critical patent/CN111367566A/en
Publication of CN111367566A publication Critical patent/CN111367566A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/74Reverse engineering; Extracting design information from source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Computing Systems (AREA)
  • Stored Programmes (AREA)

Abstract

The invention relates to a mixed source code feature extraction and matching method, which comprises the following steps: step 1) constructing a knowledge base: crawling each open source project through a web crawler, and constructing and maintaining a knowledge base by using the file-level characteristics and the function-level characteristics of each source code file of each open source project; step 2), extracting secondary characteristics of the source code file: aiming at the current mixed source code project, identifying each source code file under the current mixed source code project catalog, and performing secondary feature extraction on each source code file to respectively obtain the file-level features of the source code file and the function-level features of the source code file; and 3) performing feature matching and judgment to determine whether the source code file is an open source code. According to the invention, on the basis of mixed source code feature extraction and matching, the identification of whether all codes in the code engineering are open sources is realized, and meanwhile, an accurate calculation mechanism of the file open source rate is constructed.

Description

Mixed source code feature extraction and matching method
Technical Field
The invention relates to the field of open source identification, in particular to a mixed source code feature extraction and matching method.
Background
The open source software is characterized by ensuring that a user can reissue a source code and a modified version based on the source code after obtaining the source code. An open source software license is required to ensure that anyone can obtain or share the source program when needed, to ensure that anyone can modify and upgrade a portion of the open source software or use it for new open source software, and to ensure that anyone knows that they have the right to do their own actions in developing source code. Because the open source software license is specified as follows: no one is prohibited from acknowledging these rights, or is required to forego the rights by others. These provisions translate into a definitive responsibility if the open source software is modified or a copy of the software is released, which is the most typical role of an open source software license and why an open source software license is essential.
There are many kinds of licenses for open source software from different countries and regions. At present, the only mechanism in the world for authenticating the open source software license is 'open source code initiative action organization OSI', and all open source software license protocols authenticated by OSI comprise 5 categories (strict open source rule; non-open source part can exist; non-open source software can be compatible; software patent is allowed; complete opening) 63 in total. Currently, the more commonly used ones include GPL, LGPL, MPL, BSD, NOKIA, MIT, Apache, etc. GPL and LGPL are the licenses most applied by the current open source software project. The GPL license agreement is rather contagious, and if you want to reissue the binary version after modifying a copy of code that employs the GPL license, you must also reopen its source code. The BSD license is relatively loose, allowing the source code to be republished after modification, including only the license without having to reopen the source code, and the modified version can be turned into commercial use (e.g., microsoft's products incorporate source code for the BSD network portion and the modified version is sold as proprietary software).
By comparing and analyzing the number of purchased commercial code licenses contained in the software with the software license analysis content such as the number of actual software installation licenses, the type of open source code license agreement contained in the software, the source code modification condition and whether the software use conforms to the regulation of the corresponding open source license agreement or not, the quantitative analysis of the compliance of the software licenses is realized, and data support is provided for the work of avoiding intellectual property disputes, correctly pricing the software, auditing management and the like of the software.
In the application of foreign mainstream software, open source codes and third-party plug-ins in the existing mixed source software are used in large quantity, the caused knowledge products and security risks have attracted certain attention and attention abroad, the existing achievements mainly comprise American Blackduck and Protecode mature software, the two sets of mature software are widely applied to units such as American law firm, intellectual property bureau, enterprise audit department, software contractor and the like, and are applied to large-scale software companies and enterprise audit units in other countries.
(1)Blackduck
Blackduck software is currently the largest code analysis software in the market share, but Blackduck mainly implements scanning, auditing and code management of source code. Including a standalone version of the protein and an online test version of the HUB. The Blackduck KnowledgeBase (KB) of software is currently the largest, most comprehensive, open source knowledge base in the world.
As the basis for the overall solution of Blackduck, KB has major advantages including:
1. comprises 5300 billion lines of open source code;
2. encompasses 2,000,000 open source software projects;
3. 2500 unique licenses (licenses);
4. 79,000 security holes;
5. data from 6,500+ sites;
6. professional teams are responsible for maintenance and continuous updating.
The Blackduck supports more than 70 programming languages, can scan and detect more than 100 file types, supports a code line-by-line comparison function, can show the matching content of user codes and open source codes in a parallel window, and helps a user to more accurately confirm code matching.
Blackduck currently owns over 700 more customers in more than 20 countries, including Intel, Cisco, Alcatel-Lucent, Motorola, Qualcomm, Yahoo, etc. The Blackduck product and service are also applied to code auditing during enterprise mergers
(2)Protecode
Protecode is an open source code quality inspection management tool developed by Synopsys, and can manage open source content of third-party codes, discover security vulnerabilities of the open source content and ensure compliance of license and intellectual property rights. Protecodeenterpriserver (es) is software that performs scanning, composition analysis, license compliance analysis, and security vulnerability analysis with respect to source code.
The professor royal phoenix university in Shandong is researching a binary code matching and analyzing technology based on function layer characteristics, the method needs to disassemble malicious software and analyze assembly codes to obtain the characteristics of functions, so the characteristics of the functions are interfered by an obfuscation technology, a method combining static analysis and dynamic analysis needs to be adopted for research, the method is mainly used for realizing the detection of the malicious software, and the existing research results still stay in a laboratory demonstration stage.
The code copying detection technology research is developed by professor Liudongtui of the university of inner Mongolia, the method is matched and identified based on the feature strings, only can support the analysis of C programming language, and still stays in the test simulation stage by relying on a plurality of third-party analysis tools as auxiliary supports.
The university of defense proposes a high-dimensional feature fusion malicious code analysis method, which extracts static binary files, disassembling features and the like of malicious codes, takes local sensitivity thought as a reference, performs fusion analysis and processing on multi-dimensional features, and performs learning training on fused feature vectors by adopting a typical machine learning method.
The professor Zhang Yi of Chongqing university develops the research of a method based on code similarity, at present, only C programming language can be supported, the relevant parameters of code identification degree still have an optimized space, the identification degree is still to be further improved, and the expansibility and the portability are required to be deeply researched.
The new teaching of the tomu university at harbin and the liqing of the base of the chinese mobile game product respectively develop the code multidimensional analysis research of the Android operating system, analyze the structure of the function call relation graph besides the text characteristics of the authority characteristics and the system API characteristics in malicious code detection, respectively construct a kernel function for performing node coding based on a sensitive API and a kernel function for performing node coding based on an instruction operation code, and describe the similarity of the function call relation graph by using a combined kernel function. At present, a multi-feature malicious code detection model is only suitable for an Android operating system, and a lot of uncertainty still exists whether a kernel closed-source Windows operating system is suitable or not.
The code comparison technology research based on feature extraction is carried out by professor Zhao Rong Cai, the information engineering university of China people's liberty, on the basis of defining a binary code description method based on a graph, approximate binary codes are compared from two levels of functions and basic blocks, the same part and difference information between the approximate binary codes are analyzed, the implementation framework of the binary code comparison technology based on the feature extraction is used for enumerating the analysis of the binary code comparison technology in the malicious software variety, but the method still has a lot of uncertainty for the malicious code variety identification, and the credibility of the comparison result is difficult to guarantee by the method without depending on the strategy of a code knowledge base.
The university of China science and technology xi develop technical research of multi-dimensional feature detection malicious codes, a multi-dimensional feature-based obfuscation malicious code detection algorithm is provided, static analysis is performed after obfuscation malicious codes are disassembled, and malicious code family features are summarized and analyzed from multiple feature dimensions of a semantic structure, an Opcode distribution sequence, a call flow graph feature and a system call sequence graph, but the method only aims at the discrimination problem of the malicious code family, and cannot be applied to the scene of a large-scale sample, and the expansibility of the method needs to be discussed deeply.
The method belongs to a dynamic analysis technology and requires resources to run, load and monitor, so that the method based on dynamic analysis has great limitation.
In addition, Xuhaiyin of Huazhong science and technology university develops a code obfuscation technology and application research thereof in software security protection, a Homing professor of Beijing postal and electronic university develops a code obfuscation model research, Wangxou of Beijing postal and electronic university develops a binary code obfuscation key technology research, a Gichunfu professor of south-cut university develops a binary code obfuscation path technology research, a Yang Wu professor of electronic science and technology university develops a software protection research based on binary code obfuscation, a Guo army teacher of northwest university develops a research of a semantic-based binary code anti-obfuscation method, a King Wei auxiliary professor of Tong university develops a credible software watermarking technology research based on fingerprints, and a Jiaguanjie professor of Suzhou university develops a plagiarism behavior research based on a text corpus.
At present, the modeling software for analyzing software codes in China is developed by the open source security alliance of China, which is a company Limited in the same industry, and can provide free binary code security scanning analysis application for users. Meanwhile, both the Blackduck and the Protecode are mature and used and occupy the global main user group for source code analysis, but due to the American trade limitation, the open source code libraries of the two types of software are not sold to China, and only by uploading source code files or binary files, the online detection service of the software is used for scanning and analyzing software composition, so that the problems of confidentiality and safety exist. In addition, most of the domestic existing technologies are in theoretical analysis and technical simulation stages, a plurality of key technologies are needed to be researched and broken through from practical application, systematization and engineering, and the language types supported by source code analysis are too single.
Disclosure of Invention
In order to solve the above problems, the present invention provides a method for extracting and matching the characteristics of a mixed source code, which extracts the characteristics of a source code and then matches the extracted characteristics with an open source code. According to the matching result, compliance analysis and the like of some license agreements can be carried out, so that an intellectual property protection system is promoted, and localization and autonomous controllability of code analysis are promoted.
According to an aspect of the present invention, there is provided a mixed source code feature extraction and matching method, the method including:
step 1) constructing a knowledge base: crawling each open source project through a web crawler, and constructing and maintaining a knowledge base by using the file-level characteristics and the function-level characteristics of each source code file of each open source project;
the file level characteristics of the source code file comprise a file size, a file hash value and an effective code line number, and the function level characteristics of the source code file comprise a function size, a function hash value and a function code line number;
step 2), extracting secondary characteristics of the source code file: aiming at the current mixed source code project, identifying each source code file under the current mixed source code project catalog, and performing secondary feature extraction on each source code file to respectively obtain the file-level features of the source code file and the function-level features of the source code file;
the file level characteristics of the source code file comprise file size, file hash value and effective code line number, the function level characteristics of the source code file comprise function size, function hash value and function code line number, and the source code file is composed of mixed source codes;
step 3), feature matching and judging: the following actions are performed for each source code file: performing file-level matching query in a knowledge base through file-level features of the source code file, determining that the source code file is an open source code when the source code file in the open source project is matched, performing function-level matching query in the knowledge base through function-level features of the source code file when the source code file in the open source project is not matched, determining that the function in the source code file is an open source function when the function of the open source function matched in the source code file exists in the source code file, and determining that the source code file is a closed source code when the function of the open source function matched in the source code file does not exist in the source code file.
More specifically, in the mixed-source code feature extraction and matching method: in step 3), when there is a function matched to an open source function in a source code file in the source code file, the sum of code line numbers of the functions matched to the open source function in the source code file/the effective code line number of the source code file is 100% to obtain the open source rate of the source code file.
More specifically, in the mixed-source code feature extraction and matching method: in the step 1), reading project information of each open source project into a knowledge base, wherein the project information comprises a project name, an open source protocol, a project source and a project version.
More specifically, in the mixed-source code feature extraction and matching method: in step 2), identifying each source code file under the current mixed source code engineering directory includes: and identifying each source code file under the current mixed source code engineering catalog according to the file type.
Therefore, the invention realizes a technical scheme for extracting and matching the characteristics of the mixed source code, and the scheme can realize the analysis of the mixed source code project. The intelligent detection and analysis engine technology of the mixed source code is limited by the size of the knowledge base, the more codes are collected in the knowledge base, the more codes can be matched and identified, the more security holes are found correspondingly, and the higher the accuracy of the analysis result formed by the codes is.
Drawings
Embodiments of the invention will now be described with reference to the accompanying drawings, in which:
fig. 1 is a flowchart illustrating steps of a method for extracting and matching a mixed-source code feature according to an embodiment of the present invention.
Fig. 2 is a block diagram illustrating a knowledge base used in a mixed source code feature extraction and matching method according to an embodiment of the present invention.
Fig. 3 is a detailed diagram illustrating a flowchart of the steps of a hybrid source code feature extraction and matching method according to an embodiment of the present invention.
Detailed Description
Embodiments of the mixed-source code feature extraction and matching method of the present invention will be described in detail below with reference to the accompanying drawings.
The code obfuscation technology is a program transformation technology for protecting software intellectual property, and can avoid software piracy, tampering and reverse engineering to a certain extent. The code obfuscation technology is a double-edged sword, and after self-developed codes are mixed with open-source codes and closed-source (private) codes to form mixed-source codes, the difficulty of code composition analysis is increased, and software intellectual property rights of the open-source codes and the closed-source codes are difficult to identify and protect. In addition, the binary code has high difficulty in composition analysis, mainly because the binary code itself is composed of 0 and 1 digits, the available feature dimension is very small, the recognition and matching mode is very limited, and when the binary code adopts the obfuscation technology, the difficulty in composition analysis of the code is greatly enhanced.
Therefore, in order to identify and protect intellectual property of software and reduce potential safety hazards of the software, the invention builds a mixed source code feature extraction and matching method, and can effectively solve the technical problems.
Fig. 1 is a flowchart illustrating steps of a mixed-source code feature extraction and matching method according to an embodiment of the present invention, where the method includes:
step 1) constructing a knowledge base: crawling each open source project through a web crawler, and constructing and maintaining a knowledge base by using the file-level characteristics and the function-level characteristics of each source code file of each open source project;
the file level characteristics of the source code file comprise a file size, a file hash value and an effective code line number, and the function level characteristics of the source code file comprise a function size, a function hash value and a function code line number;
step 2), extracting secondary characteristics of the source code file: aiming at the current mixed source code project, identifying each source code file under the current mixed source code project catalog, and performing secondary feature extraction on each source code file to respectively obtain the file-level features of the source code file and the function-level features of the source code file;
the file level characteristics of the source code file comprise file size, file hash value and effective code line number, the function level characteristics of the source code file comprise function size, function hash value and function code line number, and the source code file is composed of mixed source codes;
step 3), feature matching and judging: the following actions are performed for each source code file: performing file-level matching query in a knowledge base through file-level features of the source code file, determining that the source code file is an open source code when the source code file in the open source project is matched, performing function-level matching query in the knowledge base through function-level features of the source code file when the source code file in the open source project is not matched, determining that the function in the source code file is an open source function when the function of the open source function matched in the source code file exists in the source code file, and determining that the source code file is a closed source code when the function of the open source function matched in the source code file does not exist in the source code file.
Next, the detailed flow of the mixed-source code feature extraction and matching method of the present invention will be further described.
In the mixed source code feature extraction and matching method:
in step 3), when there is a function matched to an open source function in a source code file in the source code file, the sum of code line numbers of the functions matched to the open source function in the source code file/the effective code line number of the source code file is 100% to obtain the open source rate of the source code file.
In the mixed source code feature extraction and matching method:
in the step 1), reading project information of each open source project into a knowledge base, wherein the project information comprises a project name, an open source protocol, a project source and a project version.
In the mixed source code feature extraction and matching method:
in step 2), identifying each source code file under the current mixed source code engineering directory includes: and identifying each source code file under the current mixed source code engineering catalog according to the file type.
Fig. 2 is a block diagram illustrating a knowledge base used in a mixed source code feature extraction and matching method according to an embodiment of the present invention.
Fig. 3 is a detailed diagram illustrating a flowchart of the steps of a hybrid source code feature extraction and matching method according to an embodiment of the present invention.
As shown in fig. 3, for the mixed source code project, all files in the project directory are recursively traversed, source code files are identified through file types, and multi-level feature extraction is performed on the source code files to respectively extract file-level and function-level features. The file-level features include: file size, file hash value and number of valid code lines; the function level features include: function size, function hash value, and number of function code lines.
Firstly, matching and inquiring in a knowledge base through file-level features to determine whether an open source file can be matched, if so, indicating that the source code file is an open source, if not, further utilizing function-level features to perform function-level matching in the knowledge base, and if functions contained in a source code are not matched with the open source function, determining that the source code file is a closed source code; and if the functions contained in the source code can be matched with the open source functions, the functions are indicated to be open sources, and the open source rate of the source code file is obtained by using the sum of the code line numbers of the open source functions/the effective code line number of the source code by 100%.
In summary, the invention can respectively extract file level and function level characteristics of the source code file in the mixed source code project and the source code file of the open source code, so as to identify whether all codes in the code project are open sources, and simultaneously construct an accurate calculation mechanism of the file open source rate.
It is to be understood that while the present invention has been described in conjunction with the preferred embodiments thereof, it is not intended to limit the invention to those embodiments. It will be apparent to those skilled in the art from this disclosure that many changes and modifications can be made, or equivalents modified, in the embodiments of the invention without departing from the scope of the invention. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (4)

1. A method for extracting and matching mixed-source code features, the method comprising:
step 1) constructing a knowledge base: crawling each open source project through a web crawler, and constructing and maintaining a knowledge base by using the file-level characteristics and the function-level characteristics of each source code file of each open source project;
the file level characteristics of the source code file comprise a file size, a file hash value and an effective code line number, and the function level characteristics of the source code file comprise a function size, a function hash value and a function code line number;
step 2), extracting secondary characteristics of the source code file: aiming at the current mixed source code project, identifying each source code file under the current mixed source code project catalog, and performing secondary feature extraction on each source code file to respectively obtain the file-level features of the source code file and the function-level features of the source code file;
the file level characteristics of the source code file comprise file size, file hash value and effective code line number, the function level characteristics of the source code file comprise function size, function hash value and function code line number, and the source code file is composed of mixed source codes;
step 3), feature matching and judging: the following actions are performed for each source code file: performing file-level matching query in a knowledge base through file-level features of the source code file, determining that the source code file is an open source code when the source code file in the open source project is matched, performing function-level matching query in the knowledge base through function-level features of the source code file when the source code file in the open source project is not matched, determining that the function in the source code file is an open source function when the function of the open source function matched in the source code file exists in the source code file, and determining that the source code file is a closed source code when the function of the open source function matched in the source code file does not exist in the source code file.
2. The mixed-source code feature extraction and matching method of claim 1, wherein:
in step 3), when there is a function matched to an open source function in a source code file in the source code file, the sum of code line numbers of the functions matched to the open source function in the source code file/the effective code line number of the source code file is 100% to obtain the open source rate of the source code file.
3. The mixed-source code feature extraction and matching method of claim 2, wherein:
in the step 1), reading project information of each open source project into a knowledge base, wherein the project information comprises a project name, an open source protocol, a project source and a project version.
4. The mixed-source code feature extraction and matching method of claim 3, wherein:
in step 2), identifying each source code file under the current mixed source code engineering directory includes: and identifying each source code file under the current mixed source code engineering catalog according to the file type.
CN201910580956.6A 2019-06-27 2019-06-27 Mixed source code feature extraction and matching method Pending CN111367566A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910580956.6A CN111367566A (en) 2019-06-27 2019-06-27 Mixed source code feature extraction and matching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910580956.6A CN111367566A (en) 2019-06-27 2019-06-27 Mixed source code feature extraction and matching method

Publications (1)

Publication Number Publication Date
CN111367566A true CN111367566A (en) 2020-07-03

Family

ID=71207890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910580956.6A Pending CN111367566A (en) 2019-06-27 2019-06-27 Mixed source code feature extraction and matching method

Country Status (1)

Country Link
CN (1) CN111367566A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797401A (en) * 2020-07-08 2020-10-20 深信服科技股份有限公司 Attack detection parameter acquisition method, device, equipment and readable storage medium
CN111986044A (en) * 2020-08-15 2020-11-24 广州易行数字技术有限公司 Layout technology for automatically generating process flow based on pattern matching algorithm
CN113721978A (en) * 2021-11-02 2021-11-30 北京大学 Method and system for detecting open source component in mixed source software
CN113723100A (en) * 2021-09-09 2021-11-30 国网电子商务有限公司 Open source component identification method and device based on fingerprint characteristics
CN114385231A (en) * 2021-12-20 2022-04-22 杭州安恒信息安全技术有限公司 Data processing method, data processing device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246968A1 (en) * 2010-04-01 2011-10-06 Microsoft Corporation Code-Clone Detection and Analysis
CN106776744A (en) * 2016-11-21 2017-05-31 中国软件与技术服务股份有限公司 A kind of software development methodology and system based on internet information
CN106990956A (en) * 2017-03-10 2017-07-28 苏州棱镜七彩信息科技有限公司 Code file clone's detection method based on suffix tree
CN107977575A (en) * 2017-12-20 2018-05-01 北京关键科技股份有限公司 A kind of code-group based on privately owned cloud platform is into analysis system and method
CN108229170A (en) * 2018-02-02 2018-06-29 中科软评科技(北京)有限公司 Utilize big data and the software analysis method and device of neural network
CN109062792A (en) * 2018-07-21 2018-12-21 东南大学 A kind of Open Source Code detection method based on String matching and characteristic matching
JP2019032688A (en) * 2017-08-08 2019-02-28 富士通株式会社 Source code analysis device, source code analysis method, and source code analysis program

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246968A1 (en) * 2010-04-01 2011-10-06 Microsoft Corporation Code-Clone Detection and Analysis
CN106776744A (en) * 2016-11-21 2017-05-31 中国软件与技术服务股份有限公司 A kind of software development methodology and system based on internet information
CN106990956A (en) * 2017-03-10 2017-07-28 苏州棱镜七彩信息科技有限公司 Code file clone's detection method based on suffix tree
JP2019032688A (en) * 2017-08-08 2019-02-28 富士通株式会社 Source code analysis device, source code analysis method, and source code analysis program
CN107977575A (en) * 2017-12-20 2018-05-01 北京关键科技股份有限公司 A kind of code-group based on privately owned cloud platform is into analysis system and method
CN108229170A (en) * 2018-02-02 2018-06-29 中科软评科技(北京)有限公司 Utilize big data and the software analysis method and device of neural network
CN109062792A (en) * 2018-07-21 2018-12-21 东南大学 A kind of Open Source Code detection method based on String matching and characteristic matching

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111797401A (en) * 2020-07-08 2020-10-20 深信服科技股份有限公司 Attack detection parameter acquisition method, device, equipment and readable storage medium
CN111797401B (en) * 2020-07-08 2023-12-29 深信服科技股份有限公司 Attack detection parameter acquisition method, device, equipment and readable storage medium
CN111986044A (en) * 2020-08-15 2020-11-24 广州易行数字技术有限公司 Layout technology for automatically generating process flow based on pattern matching algorithm
CN111986044B (en) * 2020-08-15 2024-06-07 广州易行数字技术有限公司 Layout method for automatically generating process flow based on pattern matching algorithm
CN113723100A (en) * 2021-09-09 2021-11-30 国网电子商务有限公司 Open source component identification method and device based on fingerprint characteristics
CN113723100B (en) * 2021-09-09 2023-10-13 国网数字科技控股有限公司 Open source component identification method and device based on fingerprint characteristics
CN113721978A (en) * 2021-11-02 2021-11-30 北京大学 Method and system for detecting open source component in mixed source software
CN114385231A (en) * 2021-12-20 2022-04-22 杭州安恒信息安全技术有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN114385231B (en) * 2021-12-20 2024-05-28 杭州安恒信息安全技术有限公司 Data processing method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN111367566A (en) Mixed source code feature extraction and matching method
Wermke et al. A large scale investigation of obfuscation use in google play
Yen et al. An Android mutation malware detection based on deep learning using visualization of importance from codes
US10558805B2 (en) Method for detecting malware within a linux platform
Chen et al. Detecting android malware using clone detection
Crussell et al. Andarwin: Scalable detection of android application clones based on semantics
AU2009287433B2 (en) System and method for detection of malware
Laskov et al. Static detection of malicious JavaScript-bearing PDF documents
CN111639337B (en) Unknown malicious code detection method and system for massive Windows software
Liu et al. A large-scale empirical study on vulnerability distribution within projects and the lessons learned
CN111291331B (en) Mixed source file license conflict detection method
Chen et al. Slam: A malware detection method based on sliding local attention mechanism
Zhao et al. A large-scale empirical analysis of the vulnerabilities introduced by third-party components in IoT firmware
Li et al. Large-scale third-party library detection in android markets
Belenko et al. Intrusion detection for Internet of Things applying metagenome fast analysis
Akram et al. VCIPR: vulnerable code is identifiable when a patch is released (hacker's perspective)
Shu et al. Android malware detection methods based on convolutional neural network: A survey
Rack et al. Jack-in-the-box: An Empirical Study of JavaScript Bundling on the Web and its Security Implications
Ladisa et al. On the feasibility of cross-language detection of malicious packages in npm and pypi
Hashemzade et al. Hybrid obfuscation using signals and encryption
CN116932381A (en) Automatic evaluation method for security risk of applet and related equipment
Zhao et al. One bad apple spoils the barrel: Understanding the security risks introduced by third-party components in iot firmware
Gonzalez et al. Measuring code reuse in Android apps
Zhang et al. Understanding and conquering the difficulties in identifying third-party libraries from millions of android apps
Gao et al. A Comprehensive Study of Learning-based Android Malware Detectors under Challenging Environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 102209 southeast, 6th floor, block B, national power investment Central Research Institute, South District, future science city, Changping District, Beijing

Applicant after: BEIJING KEYWARE Co.,Ltd.

Address before: 102208 key technology on the fourth floor of the production building of the second Pinzi Bona group, Huilongguan, Changping District, Beijing

Applicant before: BEIJING KEYWARE Co.,Ltd.

CB02 Change of applicant information
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200703

WD01 Invention patent application deemed withdrawn after publication