CN106934254B - Analysis method and device for open source license - Google Patents

Analysis method and device for open source license Download PDF

Info

Publication number
CN106934254B
CN106934254B CN201710081702.0A CN201710081702A CN106934254B CN 106934254 B CN106934254 B CN 106934254B CN 201710081702 A CN201710081702 A CN 201710081702A CN 106934254 B CN106934254 B CN 106934254B
Authority
CN
China
Prior art keywords
open source
source license
conflict
license
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710081702.0A
Other languages
Chinese (zh)
Other versions
CN106934254A (en
Inventor
于镳
蒋丹妮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201710081702.0A priority Critical patent/CN106934254B/en
Publication of CN106934254A publication Critical patent/CN106934254A/en
Priority to EP17896537.2A priority patent/EP3584728B1/en
Priority to PCT/CN2017/111095 priority patent/WO2018149187A1/en
Priority to US16/485,358 priority patent/US10942733B2/en
Priority to TW107103031A priority patent/TWI662431B/en
Application granted granted Critical
Publication of CN106934254B publication Critical patent/CN106934254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/105Arrangements for software license management or administration, e.g. for managing licenses at corporate level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention relates to the technical field of computers, in particular to an analysis method and device of an open source license, which comprises the following steps: receiving a file to be tested and planning conditions; detecting an open source license related to the file to be detected; performing conflict matching on the detected open source license and the planning condition, and determining a first conflict between the detected open source license and the planning condition; and generating a first risk assessment report according to the first conflict. The embodiment of the invention is used for analyzing and evaluating the use risk of the source license.

Description

Analysis method and device for open source license
Technical Field
The invention relates to the technical field of computers, in particular to an open source license analysis method and device.
Background
An open source license is a license that is friendly to business applications. The open source is open source code, which provides final source material in the product production and development, and usually indicates the source software, and the copyright holder of the software reserves a part of rights under the regulation of the agreement and allows the user to learn, modify and improve the quality of the software. Open source software is not entirely without limitation. The most fundamental limitation is that the open source software forces anyone who uses and modifies the software to agree on the promoter's copyright and all the participants' contributions. Anyone has the right to freely copy, modify, and use these source codes, and no restrictions on the domain of anyone or the community must be set. Commercial use of open source software, etc. is not limited. A license is one such legal document that guarantees these restrictions.
The open source license specifies terms regarding modifying, copying, and reissuing the source code. The number and variety of the existing open source licenses in the industry are various, and the size and range of each right granted to a licensee by different open source licenses are different. Because the same software often involves multiple open source licenses, which may conflict with each other or with the intended goals of the user, the use of open source software or secondary development based on open source software in a commercial environment faces many potential legal issues and risks.
The open source license detection tool automatically locates and identifies a particular open source license by scanning the software source code, etc. The existing license detection and analysis tool can only perform simple detection, marking and statistical operation, cannot support further risk assessment and analysis, and needs to be strengthened in the aspects of content and risk analysis of the open source license.
Disclosure of Invention
The application provides an evaluation method and device of an open source license, which are used for analyzing and evaluating the use risk of the open source license.
The embodiment of the invention provides an evaluation method of an open source license, which comprises the following steps:
receiving a file to be tested and planning conditions;
detecting an open source license related to the file to be detected;
performing conflict matching on the detected open source license and the planning condition, and determining a first conflict between the detected open source license and the planning condition;
and generating a first risk assessment report according to the first conflict.
Optionally, the detecting the open source license related to the file to be tested includes:
the file to be detected comprises a plurality of detection texts, and a vocabulary of each detection text is determined by using a k-shift algorithm aiming at one detection text;
counting the word frequency of each word in the vocabulary in the detection text, and determining a first characteristic matrix of the detection text;
aiming at one open source license stored in a database, determining the word frequency of each word in the vocabulary in the open source license so as to determine a second feature matrix of the open source license;
calculating text similarity between the detection text and the open source license according to the first feature matrix and the second feature matrix;
and taking the open source license with the highest text similarity as the open source license related to the detection text.
Optionally, after detecting the open-source license related to the file to be tested, the method further includes:
performing conflict matching on the detected open source licenses, and determining a second conflict among the detected open source licenses;
and generating a second risk assessment report according to the second conflict.
Optionally, after determining the first conflict between the detected open-source license and the planning condition, the method further includes:
determining a risk level corresponding to the first conflict;
after determining the detected second conflict between the plurality of open source licenses, the method further comprises:
and determining a risk level corresponding to the second conflict.
Optionally, the method further includes:
receiving an identification and/or a snippet of an open source license;
determining a corresponding open source license from a database according to the identification and/or the segment;
and generating a license list according to the corresponding open source license.
An apparatus for evaluating an open source license, comprising:
the receiving unit is used for receiving the file to be tested and the planning condition;
the detection unit is used for detecting the open source license related to the file to be detected;
a matching unit, configured to perform conflict matching on the detected open-source license and the planning condition, and determine a first conflict between the detected open-source license and the planning condition;
and the reporting unit is used for generating a first risk assessment report according to the first conflict.
Optionally, the detection unit is specifically configured to:
the file to be detected comprises a plurality of detection texts, and a vocabulary of each detection text is determined by using a k-shift algorithm aiming at one detection text;
counting the word frequency of each word in the vocabulary in the detection text, and determining a first characteristic matrix of the detection text;
aiming at one open source license stored in a database, determining the word frequency of each word in the vocabulary in the open source license so as to determine a second feature matrix of the open source license;
calculating text similarity between the detection text and the open source license according to the first feature matrix and the second feature matrix;
and taking the open source license with the highest text similarity as the open source license related to the detection text.
Optionally, the matching unit is further configured to perform conflict matching on the detected multiple open-source licenses, and determine a second conflict between the detected multiple open-source licenses;
the reporting unit is further configured to generate a second risk assessment report according to the second conflict.
Optionally, the matching unit is further configured to:
determining a risk level corresponding to the first conflict;
and determining a risk level corresponding to the second conflict.
Optionally, the receiving unit is further configured to receive an identifier and/or a fragment of an open source license;
the matching unit is further used for determining a corresponding open source license from a database according to the identification and/or the segment;
the report unit is further configured to generate a license list according to the corresponding open-source license.
In the embodiment of the invention, the server receives the file to be tested uploaded by the user and detects the open source license related to the file to be tested. Meanwhile, the server also receives the planning condition input by the user, wherein the planning condition is a relevant condition for the future planning of the software project. And performing conflict matching on the detected open source license and the planning condition, namely determining that the content of the related open source license conflicts with the planning condition of the software, and finally producing a first risk assessment report according to the first conflict and feeding back the first risk assessment report to the user. The embodiment of the invention automatically identifies the open source license contained in the software, determines the conflict between the open source license and the planning condition, and finally generates a risk assessment report based on the conflict, thereby providing support and reference for better tracking and developing the software and making reasonable decisions for users of the open source software.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a diagram illustrating a system architecture suitable for use with an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an evaluation of an open source license according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating evaluation of an open source license in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for evaluating an open source license according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a system architecture to which the embodiment of the present invention is applicable includes a web service module 101, a processing engine module 102, a database module 103, and an update maintenance module 104. The web service module 101, the processing engine module 102, the database module 103, and the update maintenance module 104 may be integrated in one server, or may be modules in different servers, where the servers may be network devices such as computers. Preferably, the web service module 101, the processing engine module 102, the database module 103 and the update maintenance module 104 may use cloud computing technology for information processing.
The web service module 101 provides the user with entries of functions of querying information of the open-source license, detecting the open-source license, evaluating risk, and the like, that is, the user inputs a file to be tested, planning conditions, and the like to the server through the web service module 101. In addition, the web service module 101 presents the query and analysis results to the user in the form of a list, a graph, text, etc.
The processing engine module 102 retrieves information meeting the conditions from the database module 103 according to the user's input and feeds back the result to the web service module 101 so as to be presented to the user in different forms of a search list, detailed information, and the like. The module supports keyword fuzzy query, namely, relevant data is searched for through character fragments.
The processing engine module 102 further detects an open source license related to the file to be tested by analyzing the received software source code, detects a usage report of the open source license generated by a copyrighted file, a non-copyrighted file and the like, and feeds back the usage report to the user in a format of a PDF file, a chart and the like. After detecting the open source license, detecting the open source license with conflict in the file to be tested or the conflict between the open source license and the planning condition based on the software project planning condition set or input by the user and the conflict rule preset by the expert, analyzing the advanced legal risk based on the conflict, generating a risk assessment report, and then feeding back to the user in a PDF file or chart format.
The database module 103 can be divided into an open source license information base and a conflict rule base. The open source license information base stores information such as agreement terms, applicable scenes, use conditions, limitations and the like of various open source licenses in the market. On one hand, the conflict rule base stores conflict rule expressions among different open source licenses, and known conflicts between the two open source licenses can be judged according to the expressions; on the other hand, the conflict rule base also stores service scene expressions which are not applicable to each open source license, and the potential conflict between the project planning related options and the open source license can be judged according to the expressions.
Update maintenance module 104 allows a user or administrator to perform operations for updating and maintaining the contents of database module 103. The open source license information and the conflict rule can be imported in batch in a file import mode, and the required information is manually added in a manual input mode. In addition to this, operations to delete and modify data are supported.
Fig. 2 exemplarily shows a flowchart of an evaluation method for an open source license according to an embodiment of the present invention, and as shown in fig. 2, the evaluation method for an open source license according to an embodiment of the present invention includes the following steps:
step 201, receiving a file to be tested and planning conditions;
step 202, detecting an open source license related to the file to be detected;
step 203, performing conflict matching on the detected open source license and the planning condition, and determining a first conflict between the detected open source license and the planning condition;
and step 204, generating a first risk assessment report according to the first conflict.
In the embodiment of the invention, the server receives the file to be tested uploaded by the user and detects the open source license related to the file to be tested. Meanwhile, the server also receives the planning condition input by the user, wherein the planning condition is a relevant condition for the future planning of the software project. And performing conflict matching on the detected open source license and the planning condition, namely determining that the content of the related open source license conflicts with the planning condition of the software, and finally producing a first risk assessment report according to the first conflict and feeding back the first risk assessment report to the user. The embodiment of the invention automatically identifies the open source license contained in the software, determines the conflict between the open source license and the planning condition, and finally generates a risk assessment report based on the conflict, thereby providing support and reference for better tracking and developing the software and making reasonable decisions for users of the open source software.
For the detection of an open source license, a method based on keyword matching is generally implemented in the prior art, and the problem of low identification precision exists, and the situations of missing check and errors of the license can occur. In the embodiment of the present invention, detecting the open source license related to the file to be detected includes:
the file to be detected comprises a plurality of detection texts, and a vocabulary of each detection text is determined by using a k-shift algorithm aiming at one detection text;
counting the word frequency of each word in the vocabulary in the detection text, and determining a first characteristic matrix of the detection text;
aiming at one open source license stored in a database, determining the word frequency of each word in the vocabulary in the open source license so as to determine a second feature matrix of the open source license;
calculating text similarity between the detection text and the open source license according to the first feature matrix and the second feature matrix;
and taking the open source license with the highest text similarity as the open source license related to the detection text.
In the face of the problem that an open source license detection tool is low in identification precision, the solution provided by the text mainly helps a user to identify potential open source license information by means of text similarity calculation, and reduces the occurrence probability of missing and mistaken investigation of the open source license. Besides the text similarity method, a regular expression-based method is available, but the method using the regular expression requires manual setting of a large number of rules, and the situation that the identified open source license cannot be classified is easy to occur, and the text similarity method can well overcome the defects.
Specifically, in the embodiment of the invention, the K value of the K-shift algorithm is defined according to the text characteristics of different opening licenses, and the Similarity between the detected text and each opening license in the database is calculated by using the Jaccard Similarity algorithm, so that better time efficiency, accuracy and recall rate are achieved.
Because the file to be tested comprises a plurality of detection texts, the detection texts may be the open source licenses related to the file to be tested, or may be source codes, or other related data, in the embodiment of the present invention, the detection texts and the open source licenses in the database calculate the text similarity, so as to detect the open source licenses in the texts.
For a detection text, the text similarity calculation method of the embodiment of the invention is as follows:
1. the vocabulary of the text is statistically tested by the k-shift algorithm. And k is a self-defined variable and represents that k characters in the detection text are extracted. And traversing the detected text and sequentially storing k characters, wherein the text content is abcdefg, and k is 2, so that vocabularies ab, bc, cd, de, ef and fg are obtained.
2. Counting the word frequency of each word in the vocabulary table in the detection text, and constructing a first characteristic matrix of the detection text; meanwhile, the word frequency of each word in the vocabulary table in the open source license of the database is counted, and a second feature matrix of each open source license is constructed.
3. And calculating the Similarity between the detection text and each open source license in the database by using a Jaccard Similarity algorithm according to the first characteristic matrix and the second characteristic matrix. The Jaccard Similarity algorithm divides the intersection of the two sets by the union of the two sets to obtain the Similarity of the two sets. In the embodiment of the invention, the two sets are respectively a detection text and an open source license text, and words appearing in the text are elements in the sets, so that the calculation of the similarity between the detection text and the open source license is that the similarity between the two detection texts and the open source license is calculated by utilizing the first feature matrix of the detection text and the second feature matrix of the open source license.
4. And selecting the open source license with the highest text similarity as a matching result by using the calculated text similarity between the detection text and each open source license.
After the open source license related to the file to be tested is detected, whether conflict exists between the open source license and the planning condition and what conflict exists are analyzed based on the use information of the open source license and the software project planning condition input by a user. Specifically, the open source license usage information, such as open source licenses like GPL (GNU General public license), BSD (Berkeley Software Distribution), Apache (Apache web server Software), and Software project planning conditions (e.g., whether there is a closed source demand in the future, whether other licenses are to be introduced, etc.), are matched one by one with the rule expressions in the conflict rule base as input conditions. The regular expressions here are exemplified by:
if((LGPL||Mozilla||GPL)&&(closed source==true)){Conflict=true;RiskLevel=high;}
the code above indicates that if there is a license agreement of the LGPL, Mozilla or GPL type and the planning condition for the development of the software project is closed-source software, then there is an agreement conflict and the risk level is high.
In addition to the conflict between the open source license and the planning condition, the conflict between the open source licenses is analyzed in the embodiment of the present invention. After the open source license related to the file to be tested is detected, the method further comprises the following steps:
performing conflict matching on the detected open source licenses, and determining a second conflict among the detected open source licenses;
and generating a second risk assessment report according to the second conflict.
Open source licenses can be divided into five categories: 1. license possession this can use the software anywhere for any purpose; 2. the license owner can only freely copy the open source software; 3. the licensee can only copy freely or re-develop the software; 4. the license owner has free access to the software and use of the source code of the software, but cannot combine with other components; 5. the license owner is free to combine the open source software with other software. The use of licenses that are in conflict has a significant impact on software development, particularly the development of commercial software. Therefore, in the embodiment of the invention, the open-source licenses with conflicts are detected, and the conflicts between two or more open-source licenses related to the same file to be tested are determined. Specifically, the detected open source licenses are matched with rule expressions in a conflict rule base, so that whether conflicts exist among the open source licenses or not is determined. For example, a regular expression may be:
if(GPL&&BSD){Conflict=true;Risk Level=medium;}
the above code indicates that if both GPL and BSD protocols exist, then there is a protocol conflict and the risk level is medium.
After determining the first conflict between the detected open source license and the planning condition, the method further includes:
determining a risk level corresponding to the first conflict;
after determining the detected second conflict between the plurality of open source licenses, the method further comprises:
and determining a risk level corresponding to the second conflict.
According to the embodiment of the invention, risk grades are divided for various conflicts, so that reasonable decisions are made for the open source software user on the file to be tested and the related open source license to serve as reference.
In addition to detecting open source licenses, embodiments of the invention may retrieve a list of matching licenses based on a name or segment of an open source license entered by a user. The embodiment of the invention also comprises the following steps:
receiving an identification and/or a snippet of an open source license;
determining a corresponding open source license from a database according to the identification and/or the segment;
and generating a license list according to the corresponding open source license.
Thus, the user can quickly inquire various information of each open source license conveniently. The user clicks on a single entry in the list into the details page of the corresponding open source license, which contains an introduction to the open source license content, typical application cases, conditions and limitations of use, etc.
In addition, because the number of rules in the conflict rule base of the database is increasing, and there may be multiple variations of the same rule, traversing the entire conflict rule base for each analysis may make the detection analysis inefficient. The embodiment of the invention optimizes the analysis mode by establishing the classification index, sets the classification index to the conflict rule base according to different types related to the open source license, and quickly positions the rule set associated with the specific open source license through the index list, thereby improving the analysis efficiency.
The specific classification index method can adopt an index structure based on a graph, store the graph in the form of an adjacency list, and define the header of the list as an open source license or other entities and the edge as a rule among different open source licenses. The scheme can quickly locate and query the rule under specific conditions, is easy for dynamic expansion of the rule and is convenient to update and maintain.
In order to more clearly understand the present invention, the following detailed description of the above process is provided by using specific embodiments, and the specific steps are shown in fig. 3, and include:
step 301, the server receives a file to be tested and planning conditions input by a user, wherein the file to be tested comprises a plurality of test texts.
Step 302, aiming at a detection text, determining a vocabulary of the detection text by using a k-shift algorithm.
Step 303, counting word frequencies of all words in the vocabulary in the detection text, and determining a first characteristic matrix.
Step 304, aiming at one open source license stored in the database, determining the word frequency of each word in the vocabulary in the open source license, and determining a second feature matrix of the open source license.
And 305, calculating the text Similarity between the detection text and the open source license by using a Jaccard Similarity algorithm according to the first characteristic matrix and the second characteristic matrix.
And step 306, determining the open source license with the highest text similarity as the open source license corresponding to the detection text.
And 307, performing conflict matching on each detected open-source license and the planning condition, determining a first conflict between the open-source license and the planning condition, and determining a risk level corresponding to the first conflict.
And 308, performing conflict matching on the detected open-source licenses, determining a second conflict among the open-source licenses, and determining a risk level corresponding to the second conflict.
And 309, generating a risk assessment report, and feeding back to the user in a PDF format.
Fig. 4 schematically shows a structural diagram of an identity authentication apparatus according to an embodiment of the present invention.
As shown in fig. 4, an identity authentication apparatus provided in an embodiment of the present invention includes:
a receiving unit 401, configured to receive a file to be tested and a planning condition;
a detecting unit 402, configured to detect an open source license related to the file to be tested;
a matching unit 403, configured to perform conflict matching on the detected open-source license and the planning condition, and determine a first conflict between the detected open-source license and the planning condition;
a reporting unit 404, configured to generate a first risk assessment report according to the first conflict.
Optionally, the detecting unit 402 is specifically configured to:
the file to be detected comprises a plurality of detection texts, and a vocabulary of each detection text is determined by using a k-shift algorithm aiming at one detection text;
counting the word frequency of each word in the vocabulary in the detection text, and determining a first characteristic matrix of the detection text;
aiming at one open source license stored in a database, determining the word frequency of each word in the vocabulary in the open source license so as to determine a second feature matrix of the open source license;
calculating text similarity between the detection text and the open source license according to the first feature matrix and the second feature matrix;
and taking the open source license with the highest text similarity as the open source license related to the detection text.
Optionally, the matching unit 403 is further configured to perform conflict matching on the detected multiple open-source licenses, and determine a second conflict between the detected multiple open-source licenses;
the reporting unit 404 is further configured to generate a second risk assessment report according to the second conflict.
Optionally, the matching unit 403 is further configured to:
determining a risk level corresponding to the first conflict;
and determining a risk level corresponding to the second conflict.
Optionally, the receiving unit 401 is further configured to receive an identifier and/or a fragment of an open source license;
the matching unit 403 is further configured to determine a corresponding open-source license from a database according to the identifier and/or the segment;
optionally, the reporting unit 404 is further configured to generate a license list according to the corresponding open-source license.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (12)

1. A method for evaluating an open source license, comprising:
receiving a file to be tested and a planning condition input by a user; the planning condition is a planning condition of the software project input by the user;
detecting an open source license related to a file to be detected according to text similarity between the open source license in an open source license information base of a database module and a detected text in the file to be detected; the database module is divided into an open source license information base and a conflict rule base; the open source license information base stores the agreement terms, the applicable scenes, the use conditions and the limitations of various open source licenses in the market; on one hand, the conflict rule base stores conflict rule expressions among different open source licenses; on the other hand, service scene expressions which are not applicable to each open source license are also stored;
taking the detected open source license and the planning condition as input conditions, performing conflict matching with the service scene expression in the conflict rule base, and determining a first conflict between the detected open source license and the planning condition;
and generating a first risk assessment report according to the first conflict.
2. The method of claim 1, wherein the detecting the open-source license to which the file under test relates comprises:
the file to be detected comprises a plurality of detection texts, and a vocabulary of each detection text is determined by using a k-shift algorithm aiming at one detection text;
counting the word frequency of each word in the vocabulary in the detection text, and determining a first characteristic matrix of the detection text;
aiming at one open source license stored in a database, determining the word frequency of each word in the vocabulary in the open source license so as to determine a second feature matrix of the open source license;
calculating text similarity between the detection text and the open source license according to the first feature matrix and the second feature matrix;
and taking the open source license with the highest text similarity as the open source license related to the detection text.
3. The method according to claim 1 or 2, wherein after detecting the open source license to which the file under test relates, the method further comprises:
performing conflict matching on the detected open source licenses, and determining a second conflict among the detected open source licenses;
and generating a second risk assessment report according to the second conflict.
4. The method of claim 3, wherein after determining the first conflict between the detected open source license and the planning condition, further comprising:
determining a risk level corresponding to the first conflict;
after determining the detected second conflict between the plurality of open source licenses, the method further comprises:
and determining a risk level corresponding to the second conflict.
5. The method of claim 1, further comprising:
receiving an identification and/or a snippet of an open source license;
determining a corresponding open source license from a database according to the identification and/or the segment;
and generating a license list according to the corresponding open source license.
6. An apparatus for evaluating an open source license, comprising:
the receiving unit is used for receiving the file to be tested and the planning conditions input by the user; the planning condition is a planning condition of the software project input by the user;
the detection unit is used for detecting the open source license related to the file to be detected according to the text similarity between the open source license in the open source license information base of the database module and the detected text in the file to be detected; the database module is divided into an open source license information base and a conflict rule base; the open source license information base stores the agreement terms, the applicable scenes, the use conditions and the limitations of various open source licenses in the market; on one hand, the conflict rule base stores conflict rule expressions among different open source licenses; on the other hand, service scene expressions which are not applicable to each open source license are also stored;
a matching unit, configured to perform conflict matching on the detected open-source license and the planning condition as input conditions, and the service scene expression in the conflict rule base, and determine a first conflict between the detected open-source license and the planning condition;
and the reporting unit is used for generating a first risk assessment report according to the first conflict.
7. The apparatus of claim 6, wherein the detection unit is specifically configured to:
the file to be detected comprises a plurality of detection texts, and a vocabulary of each detection text is determined by using a k-shift algorithm aiming at one detection text;
counting the word frequency of each word in the vocabulary in the detection text, and determining a first characteristic matrix of the detection text;
aiming at one open source license stored in a database, determining the word frequency of each word in the vocabulary in the open source license so as to determine a second feature matrix of the open source license;
calculating text similarity between the detection text and the open source license according to the first feature matrix and the second feature matrix;
and taking the open source license with the highest text similarity as the open source license related to the detection text.
8. The apparatus of claim 6 or 7,
the matching unit is further used for performing conflict matching on the detected open source licenses and determining a second conflict among the detected open source licenses;
the reporting unit is further configured to generate a second risk assessment report according to the second conflict.
9. The apparatus of claim 8, wherein the matching unit is further configured to:
determining a risk level corresponding to the first conflict;
and determining a risk level corresponding to the second conflict.
10. The apparatus of claim 6,
the receiving unit is further configured to receive an identifier and/or a fragment of an open source license;
the matching unit is further used for determining a corresponding open source license from a database according to the identification and/or the segment;
the report unit is further configured to generate a license list according to the corresponding open-source license.
11. A computing device, comprising:
a memory for storing program instructions;
a processor for calling program instructions stored in said memory to execute the method of any one of claims 1 to 5 in accordance with the obtained program.
12. A computer-readable non-transitory storage medium including computer-readable instructions which, when read and executed by a computer, cause the computer to perform the method of any one of claims 1 to 5.
CN201710081702.0A 2017-02-15 2017-02-15 Analysis method and device for open source license Active CN106934254B (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201710081702.0A CN106934254B (en) 2017-02-15 2017-02-15 Analysis method and device for open source license
EP17896537.2A EP3584728B1 (en) 2017-02-15 2017-11-15 Method and device for analyzing open-source license
PCT/CN2017/111095 WO2018149187A1 (en) 2017-02-15 2017-11-15 Method and device for analyzing open-source license
US16/485,358 US10942733B2 (en) 2017-02-15 2017-11-15 Open-source-license analyzing method and apparatus
TW107103031A TWI662431B (en) 2017-02-15 2018-01-29 Analysis method and device for open source license

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710081702.0A CN106934254B (en) 2017-02-15 2017-02-15 Analysis method and device for open source license

Publications (2)

Publication Number Publication Date
CN106934254A CN106934254A (en) 2017-07-07
CN106934254B true CN106934254B (en) 2020-05-26

Family

ID=59424093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710081702.0A Active CN106934254B (en) 2017-02-15 2017-02-15 Analysis method and device for open source license

Country Status (5)

Country Link
US (1) US10942733B2 (en)
EP (1) EP3584728B1 (en)
CN (1) CN106934254B (en)
TW (1) TWI662431B (en)
WO (1) WO2018149187A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106934254B (en) * 2017-02-15 2020-05-26 中国银联股份有限公司 Analysis method and device for open source license
CN108984391B (en) * 2018-06-06 2022-07-12 阿里巴巴(中国)有限公司 Application program analysis method and device and electronic equipment
CN109063421B (en) * 2018-06-28 2022-03-04 东南大学 Open source license compliance analysis and conflict detection method
CN110826834B (en) * 2018-08-14 2023-04-18 中国石油天然气股份有限公司 Comparison method and device between different responsibility separation rule sets
CN111291331B (en) * 2019-06-27 2022-02-22 北京关键科技股份有限公司 Mixed source file license conflict detection method
CN111400672A (en) * 2020-03-18 2020-07-10 中国信息安全测评中心 Open source software monitoring method and device
CN112084309B (en) * 2020-09-17 2024-06-04 北京中科微澜科技有限公司 License selection method and system based on open source software map
CN113282965A (en) * 2021-05-20 2021-08-20 苏州棱镜七彩信息科技有限公司 Open source license and copyright information tampering detection method and system
CN113268713A (en) * 2021-06-03 2021-08-17 西南大学 Open source software license selection method based on software dependence
JP7055232B1 (en) * 2021-08-24 2022-04-15 ビジョナル・インキュベーション株式会社 Processing equipment and processing method
CN115080924B (en) * 2022-07-25 2022-11-15 南开大学 Software license clause extraction method based on natural language understanding
CN116302042B (en) * 2023-05-25 2023-09-15 南方电网数字电网研究院有限公司 Protocol element content recommendation method and device and computer equipment
CN118051889A (en) * 2024-04-16 2024-05-17 北京安普诺信息技术有限公司 LLM-based SCA license risk analysis method, device and equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101223549A (en) * 2005-07-14 2008-07-16 微软公司 Digital application operating according to aggregation of plurality of licenses

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040068734A1 (en) * 2002-10-07 2004-04-08 Microsoft Corporation Software license isolation layer
US10437964B2 (en) * 2003-10-24 2019-10-08 Microsoft Technology Licensing, Llc Programming interface for licensing
US9489687B2 (en) * 2003-12-04 2016-11-08 Black Duck Software, Inc. Methods and systems for managing software development
US8700533B2 (en) * 2003-12-04 2014-04-15 Black Duck Software, Inc. Authenticating licenses for legally-protectable content based on license profiles and content identifiers
US7747533B2 (en) 2005-07-14 2010-06-29 Microsoft Corporation Digital application operating according to aggregation of plurality of licenses
US8359655B1 (en) * 2008-10-03 2013-01-22 Pham Andrew T Software code analysis and classification system and method
US9020857B2 (en) * 2009-02-11 2015-04-28 Johnathan C. Mun Integrated risk management process
CN101651564B (en) 2009-09-08 2011-07-06 杭州华三通信技术有限公司 License detection method, distributed network management system and server
US8875301B2 (en) * 2011-10-12 2014-10-28 Hewlett-Packard Development Company, L. P. Software license incompatibility determination
US8589306B1 (en) * 2011-11-21 2013-11-19 Forst Brown Todd LLC Open source license management
US9424401B2 (en) * 2012-03-15 2016-08-23 Microsoft Technology Licensing, Llc Automated license management
KR20140050323A (en) * 2012-10-19 2014-04-29 삼성전자주식회사 Method and apparatus for license verification of binary file
FR3009634B1 (en) * 2013-08-09 2015-08-21 Viaccess Sa METHOD FOR PROVIDING A LICENSE IN A SYSTEM FOR PROVIDING MULTIMEDIA CONTENT
CN103440441A (en) 2013-08-28 2013-12-11 北京华胜天成科技股份有限公司 Software protection method and system
CN106934254B (en) 2017-02-15 2020-05-26 中国银联股份有限公司 Analysis method and device for open source license

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101223549A (en) * 2005-07-14 2008-07-16 微软公司 Digital application operating according to aggregation of plurality of licenses

Also Published As

Publication number Publication date
TW201832118A (en) 2018-09-01
EP3584728A1 (en) 2019-12-25
TWI662431B (en) 2019-06-11
EP3584728A4 (en) 2020-05-20
US20200026512A1 (en) 2020-01-23
CN106934254A (en) 2017-07-07
EP3584728B1 (en) 2022-05-04
US10942733B2 (en) 2021-03-09
WO2018149187A1 (en) 2018-08-23

Similar Documents

Publication Publication Date Title
CN106934254B (en) Analysis method and device for open source license
CN111767716B (en) Method and device for determining enterprise multi-level industry information and computer equipment
CN110929125A (en) Search recall method, apparatus, device and storage medium thereof
CN110162754B (en) Method and equipment for generating post description document
CN110851729A (en) Resource information recommendation method, device, equipment and computer storage medium
CN110737824B (en) Content query method and device
CN113283675A (en) Index data analysis method, device, equipment and storage medium
JP7040535B2 (en) Security information processing equipment, information processing methods and programs
CN112559526A (en) Data table export method and device, computer equipment and storage medium
CN114493255A (en) Enterprise abnormity monitoring method based on knowledge graph and related equipment thereof
CN108460049B (en) Method and system for determining information category
CN111427900B (en) Label library updating method, device, equipment and readable storage medium
CN114238768A (en) Information pushing method and device, computer equipment and storage medium
US20210374559A1 (en) Computerized method of training a computer executed model for recognizing numerical quantities
CN112800215A (en) Text processing method and device, readable storage medium and electronic equipment
CN109508185B (en) Code review method and device
JP2020067700A (en) Information collecting method, information collecting processing device, and information collecting program
US20180189803A1 (en) A method and system for providing business intelligence
CN117114142B (en) AI-based data rule expression generation method, apparatus, device and medium
US20230289522A1 (en) Deep Learning Systems and Methods to Disambiguate False Positives in Natural Language Processing Analytics
US20220374914A1 (en) Regulatory obligation identifier
JP2007304950A (en) Document processing device and document processing method
Hu et al. Neighborhood hypergraph based classification algorithm for incomplete information system
CN117033552A (en) Information evaluation method, device, electronic equipment and storage medium
CN114398640A (en) Method and device for determining target vulnerability validity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1239904

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant