CN117668327A

CN117668327A - Component identification method, device, terminal equipment and storage medium

Info

Publication number: CN117668327A
Application number: CN202410147745.4A
Authority: CN
Inventors: 朱良文; 万振华; 王颉; 李华; 董燕
Original assignee: Seczone Technology Co Ltd
Current assignee: Seczone Technology Co Ltd
Priority date: 2024-02-02
Filing date: 2024-02-02
Publication date: 2024-03-08

Abstract

The application discloses a component identification method, a device, terminal equipment and a storage medium, particularly by collecting coordinate data of an open source component; acquiring a dependent characteristic file corresponding to the open source component according to the coordinate data of the open source component; analyzing the dependent characteristic file through an analysis tool to obtain dependent data; analyzing the dependent data to obtain dependent data; setting a filtering condition based on the dependent data and the dependent data; and obtaining the target assembly according to the filtering condition. The method and the device can rapidly and accurately identify the target components in a mass knowledge base, so that the difficulty in deployment of the knowledge base is reduced, the hardware cost is saved, the accuracy of the whole data is improved, and the labor hour cost is reduced.

Description

Component identification method, device, terminal equipment and storage medium

Technical Field

The present invention relates to the field of data analysis, and in particular, to a component identification method, device, terminal equipment, and storage medium.

Background

In order to improve the efficiency of equipment management and maintenance, ensure the compatibility and the stability of equipment, promote the security and the compliance of equipment, need to discern the target subassembly.

Currently, the same kind of technology about the identification of the target component is often defined by disclosing the number of downloaded and collected related components on the website, such as the number of fork and start of the item on the GitHub (cloud-based code management platform), and the target component obtained according to this way is not accurate, because the number only represents the attention of the item, and the user who performs fork or start on the item may simply download the item and cannot represent whether the component is actually introduced in the software development process.

Disclosure of Invention

The invention mainly aims to provide a component identification method, a device, terminal equipment and a storage medium, and aims to solve the technical problem of accurately identifying target components in a mass knowledge base.

In order to achieve the above object, the present invention provides a component recognition method, including:

collecting coordinate data of an open source assembly;

acquiring a dependent characteristic file corresponding to the open source component according to the coordinate data of the open source component;

analyzing the dependent characteristic file through an analysis tool to obtain dependent data;

analyzing the dependent data to obtain dependent data;

Setting a filtering condition based on the dependent data and the dependent data;

and obtaining the target assembly according to the filtering condition.

Optionally, the step of obtaining the dependency profile corresponding to the version of the open source component according to the coordinate data of the open source component includes:

maintaining a knowledge base of the version of the open source component based on the coordinate data of the open source component;

traversing the coordinate data of the open source component in the maintained knowledge base of the open source component version to obtain a specific downloading operation instruction;

and downloading the dependency characteristic file corresponding to the open source component version through an item repository based on the specific downloading operation instruction.

Optionally, after the step of analyzing the dependency profile by an analysis tool to obtain the dependency data, the method further includes:

and storing the dependent data into a database for retrieving the dependent data for analyzing the dependent data operation.

Optionally, the step of analyzing the dependent data to obtain the dependent data includes:

retrieving dependent data from a database;

based on the dependency data, analyzing the dependency relationship of the open source assembly to obtain a dependency relationship result;

and classifying and counting the dependent data according to the dependent relation result to obtain the dependent data.

Optionally, the step of setting a filtering condition based on the dependent data and the dependent data includes:

counting the quantity of the dependent data and the dependent data to obtain the total component quantity;

and setting filtering conditions according to the number of the dependent data and the total component number.

Optionally, after the step of setting the filtering condition according to the number of the dependent data and the total number of components, the method further includes:

obtaining dependent data meeting the filtering conditions according to the filtering conditions;

calculating the existence ratio of the number of the dependent data and the total component number meeting the filtering condition through an analysis tool;

if the existence ratio of the number of the dependent data meeting the filtering condition and the total component number is larger than or equal to the preset existence ratio, the filtering condition is reasonably set;

and resetting the filtering condition if the existence ratio of the number of the dependent data and the total component number meeting the filtering condition is smaller than the preset existence ratio.

Optionally, the step of obtaining the target component according to the filtering condition includes:

summarizing the dependent data to obtain the number of parent components;

Judging whether the number of the father components meets the filtering condition or not;

if yes, the open source component corresponding to the number of the father component is represented as a target component;

if not, the open source component corresponding to the parent component number is represented as a non-target component.

The embodiment of the application also provides a component identification device, which comprises:

the coordinate data acquisition module is used for acquiring the coordinate data of the open source component;

the component information acquisition module is used for acquiring a dependency characteristic file corresponding to the version of the open source component according to the coordinate data of the open source component;

the dependence analysis module is used for analyzing the dependence characteristic file through a software component analysis tool to obtain dependence data;

the dependent analysis module is used for analyzing the dependent data to obtain dependent data;

a condition setting module for setting a filtering condition based on the dependent data and the dependent data;

and the component identification module is used for obtaining a target component according to the filtering condition.

The embodiment of the application also provides a component identification terminal device, which comprises: a memory, a processor, and a component identification program stored on the memory and executable on the processor, the component identification program configured to implement the steps of the component identification method as described above.

The embodiment of the application also provides a storage medium, wherein a component identification program is stored on the storage medium, and the component identification program realizes the steps of the component identification method when being executed by a processor.

Through the embodiment scheme, the coordinate data of the open source assembly are collected; acquiring a dependent characteristic file corresponding to the open source component according to the coordinate data of the open source component; analyzing the dependent characteristic file through an analysis tool to obtain dependent data; analyzing the dependent data to obtain dependent data; setting a filtering condition based on the dependent data and the dependent data; and obtaining the target assembly according to the filtering condition. The method and the device can rapidly and accurately identify the target components in a mass knowledge base, so that the difficulty in deployment of the knowledge base is reduced, the hardware cost is saved, the accuracy of the whole data is improved, and the labor hour cost is reduced.

Drawings

FIG. 1 is a schematic diagram of functional modules of a terminal device to which a component recognition device of the present application belongs;

FIG. 2 is a flow chart of a first exemplary embodiment of a component identification method of the present application;

FIG. 3 is a flow chart of a second exemplary embodiment of a component identification method of the present application;

FIG. 4 is a flow chart of a third exemplary embodiment of a component identification method of the present application;

FIG. 5 is a schematic diagram of a further refinement of FIG. 4;

FIG. 6 is a schematic diagram of a further refinement of FIG. 5;

fig. 7 is a schematic diagram of a conventional component identification flow related to JAVA data source, which is involved in a specific scenario example of the component identification method of the present application.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The main solutions of the embodiments of the present application are: collecting coordinate data of an open source assembly; acquiring a dependent characteristic file corresponding to the open source component according to the coordinate data of the open source component; analyzing the dependent characteristic file through an analysis tool to obtain dependent data; analyzing the dependent data to obtain dependent data; setting a filtering condition based on the dependent data and the dependent data; and obtaining the target assembly according to the filtering condition. The method and the device can rapidly and accurately identify the target components in a mass knowledge base, so that the difficulty in deployment of the knowledge base is reduced, the hardware cost is saved, the accuracy of the whole data is improved, and the labor hour cost is reduced.

Technical terms referred to in this application:

and (3) assembly: a reusable software module, which may be a library, framework, module, or service, etc., is used to implement a particular function or task. In modern software development, components typically exist in third party packages, libraries, or modules that developers can accelerate development, improve efficiency, and reduce duplication of effort by introducing these components.

Target component: the particular device or component to be identified in the device identification process.

Common components: components that are often introduced by developers during the software development process.

Dependency of components: in software development, one component (e.g., library, framework, module, etc.) may rely on the relationship of other components to function properly or to perform a particular function.

Relying on the profile: in software development, specific files are used to describe and manage the dependencies of software items. These files typically contain information about the third party library, module or package required for the software project, including its name, version, dependencies, repository address, etc. A common example of a dependency profile is a pon.xml file in a maven project, which contains various dependency information required for the project, as well as relevant configurations of construction, packaging, etc.

In the embodiment of the application, the related art component identification methods are defined by disclosing the number of downloaded and collected components on the website, and the obtained target components are not accurate, because the number only represents the attention of the item and cannot represent whether the components are actually introduced in the software development process.

Based on the above, the embodiment of the application provides a solution, which can rapidly and accurately identify target components in a mass knowledge base, and improves the retrieval efficiency and accuracy of the knowledge base. The user can acquire the required information more conveniently, and the working efficiency is improved.

Specifically, referring to fig. 1, fig. 1 is a schematic functional block diagram of a terminal device to which the component identifying apparatus of the present application belongs. The component recognition device may be a device independent of the terminal device, which is capable of data processing, and which may be carried on the terminal device in the form of hardware or software. The terminal device can be a computer, a mobile phone, a tablet computer and other devices with computing and network connection functions, and the devices can use the functions of the component recognition device by installing corresponding application programs or accessing web pages. The present embodiment is exemplified by a computer.

In this embodiment, the terminal device to which the component identifying apparatus belongs includes at least an input module 110, a processor 120, a memory 130, and an output module 140.

The memory 130 stores an operating system and a component recognition program, and the component recognition device may store the acquired coordinate data, dependent data, and dependent data of the open source component in the memory 130; the input module 110 may receive information input by a user, and may include a keyboard, a mouse, a touch screen, voice recognition, etc.; the output module 140 may include text, image, audio, etc.

Wherein the component identification program in the memory 130 when executed by the processor performs the steps of:

collecting coordinate data of an open source assembly;

analyzing the dependent data to obtain dependent data;

and obtaining the target assembly according to the filtering condition.

Further, the component identification program in the memory 130, when executed by the processor, also implements the steps of:

retrieving dependent data from a database;

summarizing the dependent data to obtain the number of parent components;

According to the embodiment, through the scheme, the coordinate data of the open source assembly are collected; acquiring a dependent characteristic file corresponding to the open source component according to the coordinate data of the open source component; analyzing the dependent characteristic file through an analysis tool to obtain dependent data; analyzing the dependent data to obtain dependent data; setting a filtering condition based on the dependent data and the dependent data; and obtaining the target assembly according to the filtering condition. The method and the device can rapidly and accurately identify the target components in the mass knowledge base, thereby reducing the deployment difficulty of the knowledge base, saving the hardware cost, improving the accuracy of the whole data, reducing the labor hour cost, enabling a user to acquire the required information more conveniently and improving the working efficiency.

Based on the above terminal device architecture, but not limited to the above architecture, the method embodiments of the present application are presented.

Referring to fig. 2, fig. 2 is a flow chart illustrating a first exemplary embodiment of the component identification method of the present application. The component identification method comprises the following steps:

and S10, collecting coordinate data of an open source assembly.

Wherein, the open source component refers to a reusable software module or library shared and released by an open source community or individual developer in the software development;

The coordinates of an open source component refer to a unique identifier of an open source library or framework used in software development to identify and locate a particular component version, typically consisting of an organization, name, version.

Since the coordinates of the open source component are a set of available unique identifiers of the open source component, and the target component is typically selected from the open source components, the coordinate data of the open source component may be used as a reference for identifying the target component, and therefore, the coordinate data of the open source component needs to be collected before the target component is identified.

Specifically, it is first necessary to determine the range and the target of the coordinate data of the open source assembly to be acquired, and in this embodiment, the range to be acquired is the full amount of the coordinate data of the open source assembly.

And after the acquired range is determined, determining the source address for acquiring the coordinate data of the open source component. The coordinate data of the open source component may be obtained from public websites, such as mvnreposition (Maven repository), aliyun Maven (alicloud project management) repository, software management systems, such as Maven (project management tool), npm (Node Package Manager ), pyPI, etc. (Python Package Index, software repository of Python programming language), code hosting platforms, such as GitHub (open source and proprietary software project oriented hosting platform), gitLab (open source platform of warehouse hosting service), etc., and open source component libraries and directories, such as Apache Maven (software project management and automatic construction tool) repository, npm repository, etc.

Web crawler technology may be used to crawl web page content if it is selected to obtain coordinate data from public web sites. The coordinate information of the open source component can be extracted by analyzing the web page structure and using a suitable crawler library, such as beautfulso (Python library that extracts data from HTML or XML files), scrapy (web crawler framework written in Python), etc.

If the coordinate data is selected to be obtained from the software package management system and the code hosting platform, the corresponding API or command line tool may be used for the obtaining. For example, API using Maven (Application Programming Interface )

If the coordinate data is selected to be obtained from the open source component library and the catalog, the corresponding warehouse or catalog can be directly downloaded or accessed to obtain the coordinate information of the open source component. For example, an index file of the Apache Maven repository may be downloaded, from which the coordinate data of the open source component is extracted.

In addition, it is noted that collecting the coordinate data of the open source components is a continuous process, as the versions and changes of the open source components are dynamic, and therefore, the collected coordinate data of the open source components need to be periodically updated and maintained to maintain the accuracy and integrity of the data.

After the acquired coordinate data of the open source assembly is maintained, a knowledge base of the version of the open source assembly can be established.

Through the steps, the dependency relationship can be found by collecting the coordinate data of the full open source assembly, and an important basis is provided for the follow-up work of identifying the target assembly.

And step S20, obtaining a dependent characteristic file corresponding to the open source component according to the coordinate data of the open source component.

In order to understand the dependency relationship between open source components in order to more easily identify target components, it is also necessary to obtain a dependency profile. The dependency profile contains the dependency of the open source component.

In this embodiment, first, a well maintained open source component version knowledge base needs to be connected and logged in, and data in the maintained open source component version knowledge base is traversed, so that traversed open source component version information can be obtained, where the open source component version information includes a name and a version number of each open source component.

And then, according to the traversed open source component version information, the dependency characteristic files corresponding to the component versions are downloaded one by one through a project management warehouse, and the files contain the dependency relationship, configuration information and other related information of the components.

In the downloading process, the downloaded file needs to be checked, so that the integrity and the correctness of the file are ensured. The downloaded file can be checked by using a hash algorithm such as MD5 or SHA1, so as to ensure that the file is not tampered in the transmission process.

When the downloading of the dependent feature files is completed, each downloaded dependent feature file needs to be saved in a file storage server for subsequent processing and use. The file storage server may be a local file system, cloud storage service, such as Amazon S3, alicloud OSS, or other reliable storage means.

In summary, by downloading the dependency characteristic files of each component version, a data basis is provided for subsequent analysis and identification, and developers can be helped to better know and use the open source components to perform version management and upgrading, so that development efficiency is improved, and system quality and safety are ensured.

And step S30, analyzing the dependency characteristic file through an analysis tool to obtain dependency data.

The analysis tools may include, among other things, SCA (Software Composition Analysis, software component analysis technique), source check tools, etc.

In this embodiment, in order to be able to further acquire the dependency data required for identifying the target component, it is also necessary to analyze the dependency profiles one by means of an analysis tool, thereby acquiring the dependency data.

Specifically, first, an appropriate analysis tool is selected, and an analysis tool using an open source, such as a Maven-dependent analysis tool, a Gradle-dependent analysis tool, or the like, may be selected.

The analysis tool parses each dependency profile to extract the dependency data in the file. These dependency data include information such as the name of the component, the version number, other components that are dependent, and their version numbers.

In the parsing process, because repeated dependent data or invalid dependent data may exist, filtering and deduplication processing are also required for the dependent data, so that accuracy and reliability of the dependent data are ensured.

After filtering and deduplication processing is performed on the dependent data, statistics and classification can also be performed on the analyzed dependent data according to requirements. For example, information such as the number of dependent data, the type of dependent data, the version of the dependent data and the like of each open source component is counted, so that the dependency relationship among the components and the use condition of the components can be better known.

Finally, the parsed dependent data may be output in a desired format, such as JSON, XML, etc., and stored in a database for subsequent further processing and utilization.

In summary, by analyzing the dependency profile using the analysis tool, dependency data of each component can be extracted, and important data support is provided for subsequent target component identification and security assessment.

And S40, analyzing the dependent data to obtain the dependent data.

Where dependent data refers to the situation in which one component or module is dependent on another component or module during the software development process.

First, the dependency data is retrieved from the database and, based on the dependency data, a processing program is written, which can read the dependency data and perform a corresponding processing according to the dependency data. Wherein the process may be written using a programming language.

During the execution of the handler, the dependency between each open source component needs to be analyzed. Specifically, by traversing and filtering the dependent data, it is found out which components depend on a specific component, so as to obtain the dependent data, and according to the dependent data and the dependent data, it is determined the relationship between the sub-component and the parent component, i.e. how many parent components exist in one sub-component. And after the determination is finished, the obtained result is the dependency result.

For each dependent component it is also necessary to count how many parent components it has, which can be done by counting the analyzed dependency results. Wherein, it is possible to record which other components each component depends on by using a data structure such as a collection, a dictionary, etc., and calculate the number of parent components of each component, i.e. the dependent data.

Finally, the analyzed dependent data is output in a proper form. The output may be in the form of tables, graphics, etc. for ease of viewing and analysis. The output data should include information such as the names of the dependent components, version numbers, and the number of their parent components.

In addition, the analyzed dependent data can be organized and stored, and the data structure such as a list, a dictionary or a database can be used for storing.

In summary, by analyzing the dependent data, it is possible to find out which components are frequently used, which components are greatly dependent, and the association relationship and the dependency relationship between components. Such information may help the user understand the structure and organization of the project, identify common components and core components, and better understand the functional modules and business processes in the project.

And step S50, setting filtering conditions based on the dependent data and the dependent data.

Specifically, first, it is necessary to obtain the total component number by traversing the dependent data and the dependent data, counting the number of the dependent data and the dependent data.

Then, according to the number of the obtained dependent data and the total number of components, the filtering condition can be set according to the requirement. In this embodiment, by setting a threshold, only the components whose number of times the dependent data exceeds the threshold, or only the components whose number of times the dependent data is a certain proportion of the total number of components, are selected. For example, if the filtering condition is set to be that the number of the parent components is greater than or equal to 2, the parent components, the number of which is greater than or equal to 2, corresponding to the dependent data can be screened out from the total number of the components, and the components corresponding to the parent components can be obtained as target components.

Through the steps, the filtering conditions can be set according to the number of the data to be relied and the total number of the components, so that the components and the dependency relations which meet the specific conditions can be screened out.

And step S60, obtaining the target assembly according to the filtering conditions.

In this embodiment, to identify the target component, after obtaining the dependent data, i.e., the parent component, it is first necessary to traverse the dependent data and count how many parent components each component depends on. For each component, the number of parent components it depends on is accumulated, resulting in the total number of parent components for that component.

Then, the number of parent components of each component is compared with the filtering conditions according to the filtering conditions set in the steps. If the number of the father components meets the filtering condition, the components are indicated to be target components; if the number of parent components does not meet the filtering condition, the component is indicated as a non-target component.

For example, the filtering condition is set to be that the number of parent components is equal to or greater than 2, the number of parent components in the first component is 1, the number of parent components in the second component is 2, and the number of parent components of the third component is 4.

By comparing the set filtering conditions with the number of parent components of each group, it can be obtained that the second component and the third component satisfy the set filtering conditions, and the first component does not satisfy the set filtering conditions.

It may be further determined that the second component and the third component are target components and the first component is a non-target component.

Through the steps, whether the number of the father components of each component meets the filtering condition can be judged according to the summarizing result of the dependent data and the filtering condition, so that the target components in the mass knowledge base can be accurately identified.

Further, referring to fig. 3, fig. 3 is a flow chart illustrating a second exemplary embodiment of the component identification method of the present application. In this embodiment, based on the step S20, the obtaining the dependency profile corresponding to the open source component according to the coordinate data of the open source component further includes

Step S21, maintaining a knowledge base of the version of the open source component based on the coordinate data of the open source component;

step S22, traversing the coordinate data of the open source assembly in the maintained knowledge base of the open source assembly version to obtain a specific downloading operation instruction;

step S23, based on the specific downloading operation instruction, the dependency characteristic file corresponding to the open source component version is downloaded through the project repository.

Compared with the above embodiment, the present embodiment further includes a manner of downloading the dependency profile corresponding to the open source component version from the item repository based on the specific downloading operation instruction.

In this embodiment, after the coordinate data of the full open source component is obtained from the public website, a knowledge base of the version of the open source component is created or updated according to the obtained coordinate data.

And then maintaining open source components in the knowledge base, including operations such as adding new component versions, updating existing component versions, deleting component versions which are no longer used, and the like.

And traversing the maintained knowledge base of the open source assembly version, acquiring the coordinate data of the open source assemblies one by one, and generating a specific downloading operation instruction according to the coordinate data of each open source assembly. The download operation instruction may be a command line command, API call, or other form of instruction for downloading a specific version of the open source component.

The generated download operation instructions are stored in a list, file or database.

When the download dependent feature file operation is required, a specific download operation instruction may be obtained from a list, file or database. And executing corresponding download dependent characteristic file operation according to the acquired download operation instruction.

And after the downloading operation is executed, downloading the dependency characteristic file corresponding to the open source component version. The dependency characteristic file contains the dependency relationship, configuration information and the like of the version.

Through the embodiment, the knowledge base of the version of the open source component can be maintained based on the coordinate data of the open source component, and the dependence characteristic file corresponding to the required version of the open source component is downloaded through a specific downloading operation instruction. The method is beneficial to the management and the use of the open source assembly by a user, ensures the correctness and completeness of the dependency relationship of the project, and provides a basis for the subsequent identification of the target assembly.

Further, referring to fig. 4, fig. 4 is a schematic flow chart of a third exemplary embodiment of the component recognition method of the present application. In this embodiment, based on the step S50, after the step of setting the filtering condition according to the number of the dependent data and the total component number, the method further includes:

Step S501, obtaining dependent data meeting the filtering conditions according to the filtering conditions;

step S502, calculating the existence ratio of the number of the dependent data and the total component number meeting the filtering condition through an analysis tool;

step S503, if the existence ratio of the number of the dependent data meeting the filtering condition and the total component number is greater than or equal to the preset existence ratio, the filtering condition is set reasonably;

in step S504, if the presence ratio of the number of the dependent data and the total number of components satisfying the filtering condition is smaller than the preset presence ratio, the filtering condition is reset.

Compared with the embodiment, the method further comprises the steps of repeatedly verifying and adjusting based on the set filtering conditions, and the target assembly corresponding to the reasonable filtering conditions is guaranteed.

Specifically, firstly, the dependent data in a certain component is screened and filtered according to preset filtering conditions, and the data meeting the filtering conditions can be screened from all the dependent data. For example, the preset filtering condition is a determined threshold value, the number of the dependent data in the component is compared with the threshold value, and if the number of the dependent data is greater than or equal to the threshold value, the number of the dependent data is indicated to meet the preset filtering condition, namely, the dependent data meeting the filtering condition.

The dependencies and depended relationships in some common enterprise-level projects may then be analyzed by an analysis tool, such as a sca tool, and the number of depended data and the total number of components that meet the filtering criteria calculated to yield the presence ratio. Where the presence ratio is the ratio between the number of dependent data and the total number of components that meet the filtering condition.

Next, the calculated presence ratio is compared with a preset presence ratio. If the existence ratio is larger than or equal to the preset existence ratio, the filtering condition is reasonably set, namely the number of the dependent data meeting the filtering condition is relatively high in the total number of the components, and the expected screening requirement is met.

If the presence ratio is smaller than the preset presence ratio, the filtering condition is unreasonably set, that is, the number of the dependent data meeting the filtering condition is relatively low in the total number of the components, the filtering condition is required to be reset, and the requirement of the expected screening is not met.

According to the actual situation, the filtering condition is reset, so that the quantity of the dependent data meeting the filtering condition is increased. Such as adjusting the threshold, weighting, or adding other conditions to the filtering conditions to achieve the desired screening effect. After the filtering conditions are reset, the steps of this embodiment need to be re-executed until a target component corresponding to the reasonable filtering conditions is obtained.

For example, taking 1000 ten thousand components as an example, 1000 ten thousand components are filtered and target components are identified.

Specifically, the number of parent components in each component may be 10 ten thousand, the number of parent components is greater than or equal to 1000 ten thousand, the number of parent components is greater than or equal to 100, the number of parent components is greater than or equal to 80 ten thousand, the number of parent components is greater than or equal to 20, the number of parent components is greater than or equal to 100 ten thousand, the number of parent components is greater than or equal to 300 ten thousand, the filtering condition may be set to be greater than or equal to 20, then verification is performed based on the data obtained by the condition, the sca tool may be used to analyze the dependence of some common enterprise-level items, and statistics may be performed on whether the dependence exists in the target components, if the actual existence ratio (the actual existence ratio=the number of parent components/the total number of components is 100%) is greater than the preset existence ratio, if the preset existence ratio is 85%, if the preset existence ratio is less than 85%, the filtering condition may be properly reduced, and the step may be repeated until the target component corresponding to the reasonable filtering condition is obtained.

Fig. 5 is a further detailed flowchart of fig. 4, and it can be seen from fig. 5 that if the knowledge base storing 1000 ten thousand components is not filtered, the entire knowledge base volume before being filtered is 200GB, which may cause problems of difficult data deployment and high server hardware cost. But the target component identified by the filtering condition has a knowledge base with a volume of 20GB and a coverage rate as high as 95%. Therefore, the target assembly obtained under the filtering condition has the advantages of simple data deployment, low server hardware cost and coverage rate of up to 95% on the premise of not affecting the service.

Fig. 6 is a further detailed flowchart of fig. 5, and it can be seen from fig. 6 that if 1000 ten thousand components are stored in the knowledge base, the actual accuracy is 95%, that is, 95 out of 100 of 1000 ten thousand components are actually used, but 100 people/day are required to perform the operation of identifying the target component for 1000 ten thousand components. However, if the target components are screened out after the filtering condition, useless or repeated components in the knowledge base can be filtered to only leave 100 tens of thousands of components, although the accuracy of actual identification is only 90%, the target components filtered out through the filtering operation are used in 95% of the scenes when 1000 tens of thousands of component enterprise projects are developed, the rest 5% of the scenes can be added into the target component library in a manual or automatic reporting mode, the coverage range is far wider than that of the target components which are not filtered, the target components which are obtained through the filtering condition only need to be 10 people/day, and the labor hour cost is greatly reduced.

Through the embodiment, the target component can be identified according to the filtering condition and the existence ratio, so that the difficulty in deploying the knowledge base is reduced, the hardware cost is saved, the accuracy of the whole data is improved, and the labor hour cost is reduced.

The method of the embodiment of the present application will be described in detail below in conjunction with specific occasions.

The data source is Java, the target component is a common component, the dependent package introduction mode is maven, and the dependent feature file is pon.

A flow diagram of a method for identifying commonly used components for a Java-based data source is shown in fig. 7.

Specifically, component coordinate data is first collected from a public website, wherein the component coordinate data can be pkg: maven/com.squareup.okio/okio@1.11.0, etc., and the public website can include mvnrepositioning, aliyun maven warehouse, etc.

And establishing the acquired open source component coordinate data into an open source component coordinate knowledge base, storing the full component coordinate data in the knowledge base, and carrying out data maintenance on the knowledge base.

Then, traversing and maintaining the coordinate data of the open source components in the knowledge base, acquiring the coordinate data of the open source components one by one, and generating a specific downloading operation instruction according to the coordinate data of each open source component.

And according to the acquired downloading operation instruction, the dependent characteristic files corresponding to each component version, namely the pon.xml, are downloaded one by one through a maven warehouse and stored in a file storage server.

The dependency profiles are then analyzed one by means of the SCA tool to derive dependency data introduced by the component, such as:

fastjson@1.2.83 relies on joda-time@2.10, log4j@1.2.17, okio@1.11.0;

fastjson@1.2.82 relies on joda-time@2.10, log4j@1.2.17, okio@1.11.0;

springenox-swagger2@2.9.2 relies on joda-time@2.10, log4j@1.2.17;

springenox-swagger2@2.9.1 relies on joda-time@2.10, log4j@1.2.17;

joda-time@2.10 relies on log4j@1.2.17;

joda-time@1.0 relies on log4j@1.1.3.

According to the above dependency relationship, dependency data of fastjson@1.2.83, fastjson@1.2.82, springox-swagger2@2.9.2, springox-swagger2@2.9.1, joda-time@2.10, joda-time@1.0 can be obtained. And store these dependent data in a database.

When the dependent data is obtained, the dependent data also needs to be cleaned to obtain the dependent data.

The specific operational procedure is as follows.

Based on the dependency data writing processing program, analyzing which components are dependent on a certain component, namely the relation between a sub-component and a parent component, and counting the number of the parent components to obtain the following data:

joda-time@2.10 is dependent by fastjson@1.2.82, fastjson@1.2.83 and springox-swagger 2@2.9.2, and the number of parent components is 4;

log4j@1.2.17 is covered by fastjson@1.2.82, fastjson@1.2.83, spring force ox-swagger2

2.9.1, springenox-swagger2@2.9.2, oda-time@2.10, the number of parent components is 5

okio@1.11.0 is relied on by fastjson@1.2.82 and fastjson@1.2.83, and the number of parent components is 2;

log4j@1.1.3 is relied on by joda-time@1.0, and the number of parent components is 1;

fastjson@1.2.82, fastjson@1.2.83, springlox-swagger2@2.9.1, springlox-swagger2@2.9.2, okio@1.12.0 and joda-time@1.0 all have no parent components, and the number of the parent components is 0.

The dependent data are Joda-time@2.10, log4j@1.2.17, okio@1.11.0, log4j@1.1.3 from the above-described dependency relationship.

When setting the filtering conditions, it is also necessary to further verify the analysis based on the data that is relied upon. For example, in a total component count of 10, there are 4 dependent data satisfying a parent component count of 5 or more; there are 5 dependent data satisfying the parent component number greater than or equal to 6; 2 dependent data satisfying the parent component number of 3 or more exist; the number of the dependent data meeting the requirement that the number of the parent components is greater than or equal to 2 is 1, then the number of the parent components is greater than or equal to 4 according to the median preset filtering condition of the dependents count, and according to the preset filtering condition, only Joda-time@2.10 and log4j of the dependent data meeting the condition can be obtained

@1.2.17。

Then, the preset existing ratio is 80%, and the actual existing ratio (the actual existing ratio=the number of parent components/the total number of components is 100%) can be calculated by the number of parent components and the total number of components, but it is obvious that the actual existing ratio does not satisfy the preset existing ratio at all if the preset filtering condition is that the number of parent components is greater than or equal to 4.

Therefore, the filter conditions can be reduced, and the filter conditions can be satisfied. The preset filtering condition can be that the number of the father pieces is greater than or equal to 4 and is reduced to be greater than or equal to 2, then according to the obtained number of the father pieces, the actual existence rate can be obtained through calculation to be greater than the preset existence rate, and then the common components can be obtained to be joda-time@2.10, log4j@1.2.17 and okio@1.11.0, and the other common components.

By the specific scene examples, the common components can be accurately identified based on the dependency relationship, and the common components with higher coverage rate can be obtained by setting the filtering condition and the verification method.

By the method, coordinate data of the open source assembly are collected; acquiring a dependent characteristic file corresponding to the open source component according to the coordinate data of the open source component; analyzing the dependent characteristic file through an analysis tool to obtain dependent data; analyzing the dependent data to obtain dependent data; setting a filtering condition based on the dependent data and the dependent data; and obtaining the target assembly according to the filtering condition. The method and the device can rapidly and accurately identify the target components in a mass knowledge base, so that the difficulty in deployment of the knowledge base is reduced, the hardware cost is saved, the accuracy of the whole data is improved, and the labor hour cost is reduced.

In addition, the embodiment of the application also provides a component identification device, which comprises:

The principle and implementation process of component identification are implemented in this embodiment, please refer to the above embodiments, and are not repeated here.

In addition, the embodiment of the application also provides a component identification terminal device, which comprises: a memory, a processor, and a component identification program stored on the memory and executable on the processor, the component identification program configured to implement the steps of the error log analysis method as described above.

Because all the technical solutions of all the embodiments are adopted when the component identification program is executed by the processor, at least all the beneficial effects brought by all the technical solutions of all the embodiments are provided, and the description is omitted herein.

In addition, the embodiment of the application also provides a storage medium, wherein the storage medium stores an xx program, and the component identification program realizes the steps of the component identification method when being executed by a processor.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of embodiments, it will be clear to a person skilled in the art that the above embodiment method may be implemented by means of software plus a necessary general hardware platform, but may of course also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A component identification method, characterized in that the component identification method comprises the steps of:

collecting coordinate data of an open source assembly;

analyzing the dependent data to obtain dependent data;

and obtaining the target assembly according to the filtering condition.

2. The component identifying method as claimed in claim 1, wherein the step of obtaining the dependency profile corresponding to the version of the open source component according to the coordinate data of the open source component comprises:

3. The component identification method of claim 1, wherein after the step of analyzing the dependency profile by an analysis tool to obtain dependency data, further comprising:

4. The component identification method of claim 1, wherein the step of analyzing the dependent data to obtain the dependent data comprises:

retrieving dependent data from a database;

5. The component recognition method of claim 1, wherein the step of setting a filter condition based on the dependent data and the dependent data comprises:

6. The component recognition method of claim 5, wherein after the step of setting the filtering condition according to the number of the dependent data and the total component number, further comprising:

7. The component recognition method of claim 1, wherein the step of obtaining the target component according to the filtering condition comprises:

summarizing the dependent data to obtain the number of parent components;

8. A component recognition apparatus, characterized in that the component recognition apparatus comprises:

9. A component identification terminal device, characterized in that the component identification terminal device comprises: a memory, a processor, and a component identification program stored on the memory and executable on the processor, the component identification program being configured to implement the steps of the component identification method of any one of claims 1 to 7.

10. A storage medium having stored thereon a component recognition program which, when executed by a processor, implements the steps of the component recognition method according to any one of claims 1 to 7.