CN109063421B - Open source license compliance analysis and conflict detection method - Google Patents

Open source license compliance analysis and conflict detection method Download PDF

Info

Publication number
CN109063421B
CN109063421B CN201810691548.3A CN201810691548A CN109063421B CN 109063421 B CN109063421 B CN 109063421B CN 201810691548 A CN201810691548 A CN 201810691548A CN 109063421 B CN109063421 B CN 109063421B
Authority
CN
China
Prior art keywords
license
information
project
software
conflict
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810691548.3A
Other languages
Chinese (zh)
Other versions
CN109063421A (en
Inventor
李必信
宋震天
周颖
王璐璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201810691548.3A priority Critical patent/CN109063421B/en
Publication of CN109063421A publication Critical patent/CN109063421A/en
Application granted granted Critical
Publication of CN109063421B publication Critical patent/CN109063421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/105Arrangements for software license management or administration, e.g. for managing licenses at corporate level

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an open source license compliance analysis and conflict detection method, which is mainly divided into the following four aspects: the identification method of the license comprises the following steps: one is a direct identification mode adopting a mode matching heuristic algorithm, and the other is an indirect identification mode adopting a code matching technology. License information extraction and model construction: as a prerequisite for license compliance analysis and conflict recognition localization, feature information is collected in the project that is correlated to the license agreement. Quantitative analysis of license compliance: and (4) finding out places which are consistent and inconsistent with the license agreement in the mixed-source project by taking the license agreement as a standard. License conflict identification location: utilizing a license repository, the conflict of licenses between the mixed-source item and the open-source library standard open-source software is identified from the four aspects of name, rights, conditions and limitations of the license, and is relocated to a specific open-source item and a specific location.

Description

Open source license compliance analysis and conflict detection method
Technical Field
The invention relates to a compliance analysis and conflict detection method for an open source license, and belongs to the field of open source license feature analysis.
Background
Open source software originally originated in the 70's of the 20 th century, students at MIT in the united states often written some software for free distribution, no one thought of the right of the software, let alone business behavior, which is the infancy of open source sports. Until the mid-80 s, Richard Stallman initiated the GUN project and created the free software Foundation, uncovering the curtain of open source software.
At present, open source software is not only applied to a large number of open source communities, but also widely used as a part of business software. Open sources are changing the way software is built, which enables components in open source systems to be reused. Because such open source code-based reuse may result in a conflict of licenses, it is necessary for the legal department of the enterprise to monitor it. And the compatibility detection of the open source license can ensure that the enterprise meets the requirement of the license when the open source code is reused, and control and reduce the legal risk born by the enterprise. Because different licenses represent different rights and obligations that the user of the open source software has, when a business develops based on the open source software, the enterprise needs to research the open source license of the software to know whether the license has limitations on subsequent development and use.
In recent years, the problem of infringement due to open source licenses has been overwhelmed, for example microsoft windows 7 has to open source code because downloading software uses the ImageMaster of GPLv 2. Therefore, for software developers and even the whole open source community, it becomes important to detect whether open source licenses in a software package are compatible, and although some semi-automatic open source tools can automatically identify open source license information of mixed source software abroad at present, the tools are not perfect enough and cannot detect license compatibility conflicts, so that a license compatibility detection tool needs to be researched.
Disclosure of Invention
With the development of open source software and open source communities, open source development gradually becomes the first choice of enterprises, although the development mode promotes the reuse and distribution of open source software, the development mode also has the challenge of license conflict, and the conflict is particularly obvious in a large software system with various licenses. In order to solve the above problems, we intend to develop an open source license automatic analysis system, which helps enterprises manage their increasingly complex software and their open source licenses, verify the compliance of licenses, help enterprises find the compliance problem of software early in the software development lifecycle, and control and reduce the legal risk born by enterprises.
In order to achieve the purpose, the technical scheme of the invention is as follows: an open source license compliance analysis and conflict detection method, the method comprising the steps of:
the first step, the construction of the license repository,
the method comprises the steps of analyzing ten mainstream open source license agreements, constructing a license agreement unified description model, and constructing a license warehouse based on the license agreement unified description model, wherein the analysis of license terms and the compatibility relationship between licenses are included. Ten common open source software licenses are collected and classified based on four aspects of name, right, condition and limitation, and stored in a license warehouse.
In a second step, the identification of the license,
the license characteristic analysis is mainly divided into two aspects, one is a direct identification mode adopting a pattern matching heuristic algorithm, and the other is an indirect identification mode adopting a code matching technology.
2.1 direct identification of license types is based on the software package license information of the mixed source software, using a license repository, employing a heuristic algorithm of pattern matching to identify the license class therein.
2.2 Indirect identification of license types is based on software package source codes, and utilizes open source library matching technology to match project software packages with open source library standard software, thereby deducing the software package license category in the mixed source software.
Thirdly, extracting license information and constructing a model,
based on the license feature tree technology, license associated feature information is collected and processed from bottom to top from four levels of files, catalogs, software packages and projects to form a project license feature tree, and difference information collected by corresponding tree nodes is stored in a project license difference library. The information in the project license feature tree and the project license difference library is derived primarily from declaration information for licenses in the miscellaneous project and modification information for the source file. The process of license information extraction and model construction is the process of project license feature analysis, and is mainly divided into four steps of feature information extraction, feature information combination, software package tree generation and project level feature extraction.
3.1, extracting characteristic information: the project license characteristic information extraction is that from the file level, a license characteristic extractor is used for extracting license characteristic information from annotations of license files and common files, and the types, the quantity and the distribution of licenses are collected and recorded. The file-level license differential information is mainly information in the file level that does not comply with the open source software specification.
3.2 merging the characteristic information: the extraction of directory-level license feature information is similar to the extraction of file-level license feature information, except that information is extracted based on a higher level directory level. Directory level license difference information extraction similar to file level license difference information extraction, information that does not conform to the open source software specification is extracted from a higher level directory level.
3.3 generating a software package tree: the software layer level license information is the collection and induction of the catalog level license characteristic information, the software layer level is clearer relative to the catalog level hierarchical structure, and the software layer level license characteristic information is used as the information source of the project level license characteristic information. The package's difference information is also a generalization of directory-level based difference information.
3.4 item level feature extraction: based on the analysis of the feature information of the license of the file, the directory and the software package of the mixed source project, the analyzed information not only comprises the type, the quantity and the distribution of the license, but also constructs a project license feature tree on the basis. And identifying the license category and the open source software name corresponding to the software package in the project by utilizing a license identification technology and combining an open source library searching technology, and updating the license category and the open source software name into a project software package tree. And comparing license declaration information and code modification information between the software package catalog and the standard open source software catalog, storing difference information of the software package catalog and the standard open source software catalog into a project license difference library, retrieving feature information of the license in the mixed source software project, performing feature association analysis, and taking the feature information as the basis for performing compliance analysis and conflict analysis on the license.
Fourthly, analyzing the compliance of the license,
the subject of compliance quantification analysis is the feed item itself and the license agreement used by the feed item. The analysis idea is to use the license agreement as a standard, find out the places in the mixed source project which are consistent and inconsistent with the license agreement, and mainly include the following aspects: quantification of license agreement standards, collection and comparison of license association characteristic information, and analysis of license compliance.
4.1 quantification of license agreement standards: for the quantitative problem of the license agreement standard, the terms of the license are divided into four latitudes of name, right, limit and condition through the analysis of ten license terms. There are different entries at each latitude, and the constraints of different license agreement categories are different. By way of analysis, the "license and version prompt" entry in the "condition" dimension is key to identifying the kind of license agreement. And the quantized result of the license agreement standard is stored in a license warehouse according to the agreement uniform description model.
4.2 collecting and comparing license associated feature information: the license feature tree and the difference library model are based on to integrate the feature information associated with the license agreement, the license feature information of the whole mixed source project is completely stored, and the license feature information is expanded from files, directories, software packages and project levels according to requirements.
4.3 license compliance analysis procedure: the license compliance analysis is to analyze license compliance terms and feature information based on an OMM model evaluation tree, and specifically comprises the steps of firstly extracting information of a project license feature tree and a license difference library model, then associating the information to a threshold value and acquiring a processing strategy based on the model evaluation tree, executing the processing strategy through a compliance analysis engine, and generating an evaluation result corresponding to a project license set based on a set threshold value.
And fifthly, identifying and positioning the license conflict,
license conflicts refer to the case of incompatibility between licenses, between licenses for mixed source items, and between open source library standard open source software. The conflict between the license of the mixed source item and the standard open source software of the open source library is to analyze the conflict situation of the license file and the copyright file based on the matching result of the open source library. License conflict identification location also relies on the project license feature tree and the license difference library model. The conflict is positioned by marking the characteristic position on the license characteristic tree and jumping to the project code file or the directory according to the link. And measuring the license conflict condition in the project development life cycle by extracting the historical information of the mixed source project version library, and drawing a conflict trend graph so as to control the license conflict risk.
Compared with the prior art, the invention has the following beneficial effects: 1) the technical scheme provides an open source code unified description model supporting multiple types of multiple language multiple structures and a unified description model supporting multiple protocols, and the provided model describes detailed information of the open source code and the open source protocol in a unified mode, so that the expansion of the open source code and the open source protocol in the system is facilitated, the unified processing of the code and the protocol by a feature extraction method, a feature measurement method and the like is facilitated, and the flexibility and the efficiency are improved; at present, research on code composition analysis mainly focuses on the aspects of a feature extraction and comparison algorithm of codes, compliance analysis of an open source protocol and the like, and research on unified description from the perspective of multi-language multi-structure multi-protocols of multiple types has not been reported. 2) Compared with the similar foreign tools, the project utilizes the license feature information of the project license feature tree and the license difference library, so that the license clauses are checked more comprehensively, the license text and the copyright statement information are checked, the code change on the open source code is checked, and the compliance check of the open source software use mode is covered by the mode of inputting the project information. Analyzing the type, the severity level and the problem positioning information of the compliance problem by combining the project license feature tree, and giving a modification suggestion; 3) the technical scheme provides conflict analysis based on the project license feature tree and the license difference library, compared with the similar foreign tools, the project license feature tree and the license difference library based on the license feature information of the project license feature tree and the license difference library, the conflict recognition of a code change scene is added and the conflict is positioned to a conflict position, the conflict category, the influence range, the severity level and the specific positioning information are analyzed by combining the project license feature tree, meanwhile, modification suggestions are given, and further analysis on the conflict is facilitated.
Drawings
FIG. 1 is a general framework diagram of an open source license automatic analysis system;
FIG. 2 is a license characteristics analysis diagram;
FIG. 3 is a flow diagram of license compliance analysis;
FIG. 4 is a flowchart of license conflict identification location;
FIG. 5 is a license compatibility relationship diagram;
FIG. 6 is a flow diagram of conflict detection between a mixed-source project and open-source library software;
FIG. 7 is a license conflict trend analysis flow diagram;
FIG. 8 is a diagram of a license analysis model;
FIG. 9 is a diagram of a license compliance analysis engine;
fig. 10 is a flowchart of conflict recognition between licenses.
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
Referring to fig. 1 to 10, the method of the present invention includes the following steps:
the first step, the construction of the license repository,
ten kinds of common open source software licenses are collected and classified according to four aspects of name, right, condition and limitation and stored in the license warehouse by analyzing ten kinds of main open source license agreements and constructing the license warehouse based on a license agreement unified description model, wherein the analysis comprises the analysis of the compatibility relation between license terms and licenses.
In a second step, the identification of the license,
the license characteristic analysis is mainly divided into two aspects, one is a direct identification mode adopting a pattern matching heuristic algorithm, and the other is an indirect identification mode adopting a code matching technology.
2.1 direct identification of license types is based on the software package license information of the mixed source software, using a license repository, employing a heuristic algorithm of pattern matching to identify the license class therein.
2.2 the indirect identification of the license type is based on the source code of the software package, and the software package is matched with standard open source software of the open source library by utilizing the open source library matching technology, so that the license category and the name of the open source software of the software package in the mixed source software are deduced.
Thirdly, extracting license information and constructing a model,
by adopting a license feature tree technology, license associated feature information is collected and processed from bottom to top from four levels of files, catalogs, software packages and projects to form a project license feature tree, and difference information collected by corresponding tree nodes is stored in a project license difference library. The information of the project license feature tree and the project license difference library is mainly derived from license declaration information and source file modification information collected in the mixed-source project. The process of license information extraction and model construction is the process of project license feature analysis, and is divided into four steps of feature information extraction, feature information combination, software package tree generation and project level feature extraction.
3.1, extracting characteristic information: the project license characteristic information extraction is that from the file level, a license characteristic extractor is used for extracting license characteristic information from license files and common file annotations, and the types, the quantity and the distribution of licenses are recorded. The file-level license differential information is mainly information that is extracted from the file hierarchy that does not conform to the open source software specification.
3.2 merging the characteristic information: the extraction of directory-level license feature information is similar to the extraction of file-level license feature information, except that information is extracted at a higher level of directory. Directory level license difference information extraction similar to file level license difference information extraction, information that does not conform to the open source software specification is extracted from a higher level directory level.
3.3 generating a software package tree: the software layer level license information is the collection and induction of the catalog level license characteristic information, the software layer level is clearer relative to the catalog level hierarchical structure, and the software layer level license characteristic information is used as the information source of the project level license characteristic information. The package delta information is also a generalized summary of the catalog level delta information.
3.4 item level feature extraction: based on the analysis of the feature information of the license of the file, the directory and the software package of the mixed source project, the analyzed information not only comprises the type, the quantity and the distribution of the license, but also constructs a project license feature tree on the basis. And identifying the license category and the open source software name corresponding to the software package in the project by utilizing a license identification technology and combining an open source library searching technology, and updating the license category and the open source software name into a project software package tree. And comparing license declaration information and code modification information between the software package catalog and the standard open source software catalog, storing difference information of the software package catalog and the standard open source software catalog into a project license difference library, searching feature information and analyzing feature association of the license in the mixed source software project, and taking the feature information and the feature association as the basis for carrying out compliance analysis and conflict detection on the license.
Fourthly, analyzing the compliance of the license,
the subject of compliance quantification analysis is the feed item itself and the license agreement used by the feed item. The analysis idea is to use the license agreement as a standard, find out the places in the mixed source project which are consistent and inconsistent with the license agreement, and mainly include the following aspects: quantification of license agreement standards, collection and comparison of license association characteristic information, and analysis of license compliance.
4.1 quantification of license agreement standards: for the quantitative problem of the license agreement standard, the terms of the license are divided into four latitudes of name, right, limit and condition through the analysis of ten license terms. There are different entries at each latitude, and the constraints of different license agreement categories are different. Analyzed, the "license and version prompt" entry in the "condition" dimension is key to identifying the kind of license agreement. And the quantized result of the license agreement standard is stored in the license warehouse according to the agreement uniform description model.
4.2 collecting and comparing the associated characteristic information of the license agreement: the license feature tree and the difference library model are based on to integrate the feature information associated with the license agreement, the license feature information of the whole mixed source project is completely stored, and the license feature information is expanded from files, directories, software packages and project levels according to requirements.
4.3 analysis of license compliance: the license compliance analysis process is to analyze license compliance terms and feature information based on an OMM model evaluation tree, and specifically comprises the following steps of extracting project license feature tree and license difference library model information, associating a threshold value and an executable processing strategy based on the model evaluation tree, executing the processing strategy through a compliance analysis engine, and generating an evaluation result corresponding to a project license set based on a set threshold value.
And fifthly, identifying and positioning the license conflict,
license conflicts refer to the situation where there is incompatibility between licenses, between licenses for mixed source items, and between open source library standard open source software. The conflict detection between the license of the mixed source item and the standard open source software of the open source library is to analyze the number of license files and copyright files and the conflict situation of the content based on the matching result of the open source library. License conflict identification location also relies on the project license feature tree and the license difference library model. The conflict is positioned by marking the position on the license characteristic tree and jumping to the project code file or the directory according to the link. Through extracting the historical information of the mixed source project version library, the condition of license conflict in the project development life cycle is measured, a conflict trend graph is drawn, and management and control on the license conflict risk are facilitated.
As an improvement of the invention, the license information extraction and model construction are an aspect, the license information extraction and model construction are the premise of license compliance analysis and conflict identification positioning, and the problem to be solved is to collect characteristic information associated with comparison license agreements in projects. Firstly, from the perspective of the license agreement standard, each license item has different characteristics, such as 'commercial' item, belonging to the information provided by the project management decision layer, and the analysis of the mixed source project is not needed; the items of 'license and version prompt' and 'modified code license invariant' can be quantized, but the quantization mode needs to be comprehensively considered by combining the characteristics of the mixed source item and the characteristic information of the item, and the part is a key link for performing compliance analysis. In a mixed-source project, what license agreements to use are tied to the open source software used by the project. The distribution of the open source software is uncertain, and the use modes (adding, modifying, deleting codes and the like) of the open source software are also uncertain, which causes uncertainty of license characteristic information corresponding to the open source software in the mixed source project. Therefore, the key to solving how to collect the feature information associated with the license agreement in the feed-mixed project is how to clarify the scope of the different license agreements in the feed-mixed project. This problem is throughout the entire process of compliance analysis and conflict recognition location and requires collection of license-associated characteristic information without "breaching" the boundaries of the scope of action of the license agreement. The scope of action is the specific directory in the mixed source software corresponding to the license agreement, and is also the directory of the open source software. Based on the method, different license agreements in the mixed source software and different positions of the same license agreement are distinguished, so that the license agreement correlation characteristics in the mixed source software can be effectively collected, then combined and further analyzed.
As an improvement of the invention, in one aspect of license information extraction and model construction, the extraction of project license characteristic information is carried out at the latitude of a file level. And extracting license feature information and difference information in the mixed-source software by using a license feature extractor, and finally generating a project license feature tree. The license characteristic information mainly comprises information which represents the license type in a mixed source item and is divided into four aspects of a file level, a directory level, a software package level and a project level, and the license difference information mainly comprises information which represents specific terms of the license in the project and is inconsistent with the matched standard open source item and is also divided into the file level, the directory level, the software package level and the project level. This subsection refers primarily to file-level feature information and difference information.
As an improvement of the invention, in one aspect of the license information extraction and model construction, the file-level license characteristic information comprises five aspects, file annotation content, file author name, file date, file email and file category. The file-level license difference information contains three aspects, file lack of copyright notice, file lack of author information, and file lack of date.
As an improvement of the invention, in one aspect of license information extraction and model construction, the directory-level license characteristic information comprises four aspects, namely sharing copyright statement information of all files under a directory, whether license files exist under the directory, name of the directory license files and content of the directory license files. The directory level license difference information includes two aspects, a list of copyright-annotated files lacking under the directory and a file annotation under the directory.
As an improvement of the invention, in one aspect of the license information extraction and model construction, the software package level license characteristic information comprises five aspects, namely directory type, license content, copyright content, directory depth and file number. The software package level license difference information comprises two aspects, wherein the nested directory comprises a plurality of licenses and a list of licenses without copyright information in the directory.
As an improvement of the invention, in one aspect of the license information extraction and model construction, the project-level license characteristic information comprises four aspects, namely an open source software package list, an open source software package name, an open source software package corresponding license category and an open source software package copyright holder list. Project level license difference information contains an aspect that closed source software matches to open source software markup.
As an improvement of the present invention, in one aspect of the license compliance analysis, the compliance analysis is implemented based on the OMM model. The QualiPSO is a project for improving the quality of open Source software in the cooperation of China and Europe, and an OMM (open Source Maturity model) model is provided for the project. The quality factor that the OMM verifies has the most impact on open source software quality, the most important and most user-approved, is called the trust factor. The OMM model decomposes each TWE into one or more targets, each target is further refined into one or more specific practices, namely, practices are actions for achieving the targets, all the specific practices must be considered for achieving the specific targets, a threshold value is set for each practice, and finally the score achievement criterion of each practice is calculated to pass, so that the tree structure of the OMM is formed. The model takes open source software licenses as one of the important factors that affect the quality of open source software. In conjunction with the license compliance analysis scenario, we use the LCS in the OMM model to resolve it into 3 targets, rights, conditions and restrictions respectively. Specific practices are subdivided under each objective. Establishing a model evaluation tree of license compliance according to the OMM tree structure, wherein the root node of the tree is the license compliance; the next-layer node is a concrete practice for realizing the target, associates Key Performance Indicators (KPIs) corresponding to the practice, sets corresponding thresholds and processing strategies, and is constructed layer by layer to finally form a model evaluation tree of license compliance. Wherein the key execution factor KPI correlates a threshold value and an execution policy.
As an improvement of the present invention, in one aspect of the license compliance analysis, the license statement information based compliance analysis searches the project license feature tree and license difference library model for license statement related feature information and license difference information. The license statement information compliance analysis item list mainly comprises the following aspects of file number, license copyright number, license file number, missing copyright file number, error copyright file number, license author number and license author number.
As an improvement of the present invention, in one aspect of the license compliance analysis, the compliance analysis based on project information (manually entered) is processed by first entering the compliance analysis by a version manager (PMO) or project manager and then comparing the compliance analysis with the license repository information. The following table is the information of the collated compliance analysis item, which relates to the project positioning given by the project manager or PMO in the project establishment stage, including project use, use range and the like. The analysis items are compared one by one against the identified license category requirements. The item of compliance analysis based on project information mainly comprises the following aspects of commercial use, patent use, open source code, network distribution, disclaimer, trademark use and guarantee.
As an improvement of the invention, in one aspect of the license compliance analysis, the license modification information based compliance analysis searches the project license feature tree and license difference library model for the mixed-source project source code modified feature information and license difference information, and presents the analysis results based on the license feature tree structure. The compliance analysis item based on the license modification information mainly comprises the following aspects of modified files, modified contents, the number of the modified files, newly added file contents, the number of the newly added files, deleted file contents, deleted file numbers and modification instruction files.
As an improvement of the present invention, in one aspect of the license conflict identification and location, a conflict between licenses means that when two different pieces of open source software are merged into one larger module, the licenses of the two pieces of open source software must be allowed to do so, which is said to be compatible; conversely, if there is no way to satisfy both licenses at the same time, they are incompatible. Two licenses that are incompatible are said to be conflicting if they appear in the same module of mixed source code. Such as licenses GPLv2 and MPL 2.0, are conflicting. Because the GPL v2 license applies to the entire module, when code is added to a module, the newly added code becomes part of the module and must also be authorized as a GPL license. But MPL 2.0 does not allow the license to be changed and therefore the open source code of MPL 2.0 license cannot be used for modules of GPL license. As can be seen from this example, the conflict between licenses is effectively a conflict in terms of the license agreement. Thus, the general idea of conflict analysis between licenses is: license compatibility information in the license repository is utilized, and license conflict comparison is performed by traversing each edge in the project license feature tree. And displaying the analysis result based on the license feature tree structure.
As an improvement of the invention, in one aspect of the license conflict identification and location, the process of identifying conflicts between licenses includes the first step of generating a set of project nested licenses by traversing a project license feature tree. The root node of the license feature tree is the name of the item and the other nodes are license class names. Each edge on the license features tree can be viewed as a nested license pair. And taking out each edge on the license feature tree as an element to be stored in the nested license set, and simultaneously recording the position of the license on the feature tree for conflict positioning. The number of edges in the license feature tree is the number of nested license set elements. And secondly, generating a compatible license set by traversing the license compatibility relationship graph. As the above diagram, each arc is taken out as an element and stored in a compatible license set by traversing the relational graph; due to the transitivity of the compatibility relationship, the arc tails and the arc heads corresponding to the two arcs connected end to end are compatible and are also stored in the compatible license set. And thirdly, circularly taking out the license pairs in the nested license set. Searching the license pair on the compatible license set, and if the searching is successful, indicating that the use of the two corresponding licenses in the license pair is compatible; otherwise, the use of two corresponding licenses in the license pair is conflicted, and the license pair is recorded into a conflicted license set. And fourthly, outputting the license pair in the conflict license set, positioning the license pair to the position of the license feature tree according to the mark on the license pair, and finding out the specific position of the conflict license in the mixed source project. The conflict analysis item between the licenses mainly comprises the following points of compatible license pairs, the number of compatible licenses, conflict license pairs and the number of conflict licenses.
As an improvement of the present invention, in one aspect of the license conflict identification and location, the conflict between the mixed source item and the open source library software is analyzed based on the open source library matching technology, and the conflict situation between the license file and the copyright file is identified by means of directory comparison and content matching, and the identifiable conflict type mainly includes the following points, that there is no license and no copyright file, there is license and a copyright file but the file content is missing, and there is license and a copyright file but the file content is modified.
As an improvement of the present invention, in one aspect of the license conflict identification and location, the conflict trend graph can measure the license conflict situation of each time period in the project development lifecycle by extracting the history information of the mixed source project version library, and draw the conflict trend graph, thereby facilitating management and control of the license conflict risk. The processing flow of the conflict trend comprises the following steps that firstly, the history construction version in the mixed source project version management library is circularly traversed, if the traversal is completed, the fifth step is skipped, and if the history construction version is not completed, the operation of the second step is carried out. And secondly, taking out the history constructed version package from the version management library, namely the source code with complete mixed source item. And thirdly, performing project license analysis on the historical construction version to generate a model of a project license feature tree and a license difference library, and performing conflict analysis. And fourthly, collecting license conflict index values, such as the number of license conflicts, from the license analysis result. And fifthly, drawing a license conflict trend graph based on the creation time of the historical build version.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the above-mentioned technical solutions belong to the scope of the present invention.

Claims (2)

1. An open source license compliance analysis and conflict detection method, the method comprising the steps of:
firstly, constructing a license warehouse;
secondly, identifying the license;
thirdly, extracting license information and constructing a model;
fourthly, license compliance analysis is carried out;
fifthly, identifying and positioning license conflicts;
the first step is the construction of a license warehouse, specifically, ten mainstream open source license agreements are analyzed, the license warehouse is constructed on the basis of a license agreement unified description model, the compatibility relation between the terms of the license and the license is analyzed, ten open source software licenses are induced and classified on the basis of four aspects of name, right, condition and limitation, and are stored in the license warehouse;
and the second step, the identification of the license, including direct identification means and indirect identification means,
the direct identification mode is based on the software package license information of the mixed source software, and identifies the license type of the software package by using a heuristic algorithm of pattern matching by using a license warehouse; the indirect identification mode is based on the source code of the mixed source software, and utilizes the open source library matching technology to match the software package with the open source library standard software, so as to deduce the type of the software package license in the mixed source software;
thirdly, license information extraction and model construction, which are divided into four aspects of characteristic information extraction, characteristic information combination, software package tree generation and project level characteristic extraction,
3.1, extracting characteristic information: the extraction of the project license feature information is based on the file hierarchy, a license feature extractor is used for extracting the license feature information from the license file and the common file annotation, the type, the quantity and the distribution of the license are recorded, and the document-level license difference information is used for extracting the information which is not in accordance with the open source software specification from the file hierarchy;
3.2 merging the characteristic information: extracting the directory-level license characteristic information is similar to extracting the file-level license characteristic information, but extracting information based on a higher directory level, extracting the directory-level license difference information is similar to extracting the file-level license difference information, and extracting information which does not accord with the open source software specification from the higher directory level;
3.3 generating a software package tree: the software package level license information is the collection and induction of the catalog level license characteristic information, the software package level is clearer relative to the catalog level hierarchical structure, the software package level license characteristic information is simultaneously used as an information source of the project level license characteristic information, and the difference information of the software package is based on the induction of the catalog level difference information;
3.4 item level feature extraction: based on the analysis of the feature information of the licenses of the three levels of the file, the directory and the software package of the mixed source project, the analyzed information comprises the type, the quantity and the distribution of the licenses, a project license feature tree is constructed on the basis, a license identification technology is utilized, the license category and the source software name corresponding to the software package in the project are identified by combining with an open source library search technology, and are updated to the project software package tree, meanwhile, license statement information and code modification information between the software package directory and a standard source software directory are compared, and difference information of the two is stored in a project license difference library for feature information retrieval and feature association analysis of the licenses in the mixed source software project, and the difference information is used as the basis for compliance analysis and conflict analysis of the licenses;
the fourth step, license compliance analysis, is as follows,
the target of the compliance quantitative analysis is the mixed source project and the license agreement used by the mixed source project, and the places in the mixed source project, which are consistent with and inconsistent with the license agreement, are found by taking the used license agreement as a standard, and the method comprises the following aspects: quantification of the license agreement standard, collection and comparison of license associated feature information, and analysis of license compliance are as follows:
4.1 quantification of license agreement standards: for the quantitative problem of the standard of the license agreement, the terms of the license are divided into four latitudes of name, right, limitation and condition by analyzing ten license terms, each latitude is provided with different entries, the constraints of different license agreements are different, the analyzed entries of the license and the version prompt under the dimension of the condition are the key for identifying the type of the license agreement, and the quantitative result of the standard of the license agreement is stored in a license warehouse according to a unified description model of the agreement;
4.2 collecting and comparing license associated feature information: integrating the associated characteristic information of the license based on the license characteristic tree and the license difference library model, completely storing the license characteristic information of the whole mixed source project, and expanding from files, catalogs, software packages and project levels according to requirements;
4.3 analysis of license compliance: the license compliance analysis is based on OMM (Open Source compliance Model) evaluation tree to analyze license compliance terms and feature information, specifically, as follows, information extraction is firstly carried out on a project license feature tree and a license difference library Model, then a threshold value is set and a processing strategy is obtained based on the Model evaluation tree, then the processing strategy is executed through a compliance analysis engine, and an evaluation result corresponding to a project license set is generated based on the set threshold value.
2. The open-source license compliance analysis and conflict detection method of claim 1, wherein in the fifth step, license conflict identification and localization are specifically as follows: license conflict refers to the situation of incompatibility among licenses, between licenses of a mixed source item and standard open source software of an open source library, conflict detection between the licenses of the mixed source item and the standard open source software of the open source library is to analyze the conflict situation of license files and copyright files based on the matching result of the open source library, identification and positioning of license conflict are realized according to a license feature tree and a license difference library model, the conflict positioning is realized by marking feature positions in the license feature tree and jumping to specific project codes according to links, and the situation of license conflict in a project development life cycle is measured and a conflict trend graph is drawn by extracting historical information of a mixed source item version library so as to manage and control the risk of license conflict.
CN201810691548.3A 2018-06-28 2018-06-28 Open source license compliance analysis and conflict detection method Active CN109063421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810691548.3A CN109063421B (en) 2018-06-28 2018-06-28 Open source license compliance analysis and conflict detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810691548.3A CN109063421B (en) 2018-06-28 2018-06-28 Open source license compliance analysis and conflict detection method

Publications (2)

Publication Number Publication Date
CN109063421A CN109063421A (en) 2018-12-21
CN109063421B true CN109063421B (en) 2022-03-04

Family

ID=64817818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810691548.3A Active CN109063421B (en) 2018-06-28 2018-06-28 Open source license compliance analysis and conflict detection method

Country Status (1)

Country Link
CN (1) CN109063421B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109828780B (en) * 2018-12-28 2022-09-16 奇安信科技集团股份有限公司 Open source software identification method and device
CN111291331B (en) * 2019-06-27 2022-02-22 北京关键科技股份有限公司 Mixed source file license conflict detection method
CN110990256B (en) * 2019-10-29 2023-09-05 中移(杭州)信息技术有限公司 Open source code detection method, device and computer readable storage medium
CN111199022B (en) * 2019-12-31 2022-05-03 北京月新时代科技股份有限公司 License management method and device, electronic equipment and storage medium
CN111274548A (en) * 2020-01-17 2020-06-12 深圳开源互联网安全技术有限公司 Method and device for determining open source software license compliance
CN111400672A (en) * 2020-03-18 2020-07-10 中国信息安全测评中心 Open source software monitoring method and device
CN111625466B (en) * 2020-06-01 2023-11-10 Oppo广东移动通信有限公司 Software detection method and device and computer readable storage medium
CN113282965A (en) * 2021-05-20 2021-08-20 苏州棱镜七彩信息科技有限公司 Open source license and copyright information tampering detection method and system
CN113268714B (en) * 2021-06-03 2022-10-04 西南大学 Automatic extraction method for license terms of open source software
CN113268713A (en) * 2021-06-03 2021-08-17 西南大学 Open source software license selection method based on software dependence
CN115080924B (en) * 2022-07-25 2022-11-15 南开大学 Software license clause extraction method based on natural language understanding
CN116302042B (en) * 2023-05-25 2023-09-15 南方电网数字电网研究院有限公司 Protocol element content recommendation method and device and computer equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104969230A (en) * 2013-01-30 2015-10-07 惠普发展公司,有限责任合伙企业 Systems and methods for determining compatibility between software licenses
CN106934254A (en) * 2017-02-15 2017-07-07 中国银联股份有限公司 The analysis method and device of a kind of licensing of increasing income

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104969230A (en) * 2013-01-30 2015-10-07 惠普发展公司,有限责任合伙企业 Systems and methods for determining compatibility between software licenses
CN106934254A (en) * 2017-02-15 2017-07-07 中国银联股份有限公司 The analysis method and device of a kind of licensing of increasing income

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
开源许可证检测系统的研究;许洪波等;《计算机应用研究》;20100831;第27卷(第8期);第2972-2975,2979页 *

Also Published As

Publication number Publication date
CN109063421A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109063421B (en) Open source license compliance analysis and conflict detection method
Fan et al. Improving data quality: Consistency and accuracy
CN111400724B (en) Operating system vulnerability detection method, system and medium based on code similarity analysis
US8010538B2 (en) Methods and systems for reporting regions of interest in content files
Uba et al. Clone detection in repositories of business process models
Nugroho et al. How different are different diff algorithms in Git? Use--histogram for code changes
KR101751388B1 (en) Big data analytics based Web Crawling System and The Method for searching and collecting open source vulnerability analysis target
US9471575B2 (en) Managing changes to one or more files via linked mapping records
JP5064510B2 (en) Computer-based tool for managing digital documents
CN112800430A (en) Safety and compliance management method suitable for open source assembly
CN110442847B (en) Code similarity detection method and device based on code warehouse process management
CN110909364A (en) Source code bipolar software security vulnerability map construction method
Stephan et al. Using mutation analysis for a model-clone detector comparison framework
US20180300390A1 (en) System and method for reconciliation of data in multiple systems using permutation matching
CN116541887B (en) Data security protection method for big data platform
Corea et al. A taxonomy of business rule organizing approaches in regard to business process compliance
Ufuktepe et al. Tracking code bug fix ripple effects based on change patterns using markov chain models
CN112118251A (en) Vulnerability detection method of Java project open source component based on maven plug-in
CN117034284A (en) Tracing method and related device for repairing patch corresponding to open source vulnerability
El-Boussaidi et al. Detecting patterns of poor design solutions using constraint propagation
KR20180077397A (en) System for constructing software project relationship and method thereof
JP6955162B2 (en) Analytical equipment, analysis method and analysis program
CN111881309A (en) Electronic certificate retrieval method, device and computer readable medium
Liu et al. Drift: Fine-Grained Prediction of the Co-Evolution of Production and Test Code via Machine Learning
CN115510446A (en) Vulnerability repair information retrieval method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant