CN111506268B - Code file storage method and device and electronic equipment - Google Patents

Code file storage method and device and electronic equipment Download PDF

Info

Publication number
CN111506268B
CN111506268B CN202010305891.7A CN202010305891A CN111506268B CN 111506268 B CN111506268 B CN 111506268B CN 202010305891 A CN202010305891 A CN 202010305891A CN 111506268 B CN111506268 B CN 111506268B
Authority
CN
China
Prior art keywords
code
file
target
code file
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010305891.7A
Other languages
Chinese (zh)
Other versions
CN111506268A (en
Inventor
唐杰
于澔
刘志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010305891.7A priority Critical patent/CN111506268B/en
Publication of CN111506268A publication Critical patent/CN111506268A/en
Application granted granted Critical
Publication of CN111506268B publication Critical patent/CN111506268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a code file storage method, a code file storage device and electronic equipment, and relates to the field of data storage. The specific implementation scheme is as follows: acquiring an object code library, wherein the object code library comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; and storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that repeated storage of the same file content is avoided, waste of storage resources is reduced, and storage cost is reduced.

Description

Code file storage method and device and electronic equipment
Technical Field
The present disclosure relates to data storage technologies in the field of data processing technologies, and in particular, to a method and an apparatus for storing a code file, and an electronic device.
Background
Code searching is one of the most important means in modern program development. In different code libraries of a content-addressed file system, or in different branches of the code libraries, there may be code files with the same file content, that is, there is a case of repeated storage of the code files with the same file content, so that in the case of a relatively large number of code files, a large amount of storage space is required to store the code files, resulting in a waste of storage space.
Disclosure of Invention
The embodiment of the application provides a code file storage method, a code file storage device and electronic equipment, and aims to solve the problem of storage space waste caused by repeated storage of code files with the same file content.
In order to solve the technical problems, the application is realized in the following way:
a first aspect of the present application provides a code file storing method, including:
acquiring an object code library, wherein the object code library comprises at least one code file;
obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;
And storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value.
A second aspect of the present application provides a code file storage device, comprising:
the first acquisition module is used for acquiring an object code library, wherein the object code library comprises at least one code file;
the second acquisition module is used for acquiring a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;
and the storage module is used for storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value.
A third aspect of the present application provides an electronic device, including:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
A fourth aspect of the present application provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the first aspect.
One embodiment of the above application has the following advantages or benefits: acquiring an object code library, wherein the object code library comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; and storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that repeated storage of the same file content is avoided, waste of storage resources is reduced, and storage cost is reduced.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is one of the flowcharts of a code file storage method provided by an embodiment of the present application;
FIG. 2 is a graph of the relationship between file indexes and code indexes provided by embodiments of the present application;
FIG. 3 is a second flowchart of a code file storing method according to an embodiment of the present disclosure;
FIG. 4 is a third flowchart of a code file storing method according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of a code file storage device provided by an embodiment of the present application;
fig. 6 is a block diagram of an electronic device for implementing a code file storage method of an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, fig. 1 is a flowchart of a code file storage method provided in an embodiment of the present application, and as shown in fig. 1, the embodiment provides a code file storage method applied to an electronic device, including the following steps:
step 101, obtaining an object code library, wherein the object code library comprises at least one code file.
The content-addressed file system (e.g., git) may comprise a plurality of code libraries, the object code library may be one of the plurality of code libraries, the user may obtain the object code library by way of download, the object code library comprises one or more code files, and in particular, the object code library may comprise a plurality of branches, each branch comprising one or more code files. Code files are understood to mean files whose content is code.
Step 102, obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value.
The summary value set comprises a target summary value or a plurality of different target summary values, wherein the target summary value is determined according to file contents of code files in the at least one code file, and a plurality of code files with the same file contents correspond to the target summary value. When the target abstract value is determined according to the file content of the code file, the file content can be processed according to a hash algorithm to obtain the target abstract value.
And 103, storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value.
And storing the code file corresponding to the target abstract value according to each target abstract value in the abstract value combination. Only one code file is stored for each target abstract value, namely if one target abstract value corresponds to a plurality of code files, only the file content of one code file is stored because the file content of the plurality of code files is the same, so that repeated storage is avoided, and the waste of storage resources is reduced.
In this embodiment, an object code library is obtained, where the object code library includes at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; and storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that repeated storage of the same file content is avoided, waste of storage resources is reduced, and storage cost is reduced.
In one embodiment of the present application, step 103, storing the at least one code file of the object code library according to the summary value set includes:
for each target digest value in the set of digest values, matching the target digest value with a code index of a database, the code index comprising an existing digest value and an existing code file corresponding to the existing digest value;
and if the target digest value is not successfully matched with the existing digest value of the code index, storing the target digest value and the file content of one code file corresponding to the target digest value in the code index.
Specifically, the database is used for storing file contents of the code file, and the database may be a distributed storage search Engine (ES). The database includes code indexes, which may also be referred to as code indexes. The code index includes file content of the code file, which may specifically be a snapshot of the file content of the code file. The code index includes a field identification that stores a digest value of the code file, the digest value being determined from file content of the code file. If the file contents of the plurality of code files are the same, the digest values of the plurality of code files are also the same, in which case only one copy exists in the code index, so that repeated storage of the same file contents can be avoided. Further, if the target digest value is successfully matched with the existing digest value of the code index, it is indicated that the target digest value already exists in the code index, and new data does not need to be added into the code index.
For convenience of distinction, the digest value included in the code index is referred to as an existing digest value, and the code file included in the code index is referred to as an existing code file in this embodiment.
And for one target digest value in the digest value set, matching the target digest value with the existing digest value of the code index, if the target digest value is different from the existing digest value, the matching is unsuccessful, and storing the target digest value and the file content of one code file corresponding to the target digest value in the code index. That is, if there are a plurality of code files corresponding to the target digest value, the file contents of the plurality of code files are the same, and when storing the plurality of code files, the file contents of one code file of the plurality of code files may be stored. The above-described processing is performed for each target digest value in the digest value set.
In this embodiment, repeated storage of a plurality of code files with the same file content can be avoided, storage space is saved, and when searching for the code files, the searching range of the code files is reduced, because the number of inverted indexes and code files to be accessed during inquiry is reduced. Each item in the code index is a target abstract value and file content of the code file, the file content of the code file can be searched according to the target abstract value, and the code index can be regarded as an inverted index. Further, the file content in the code index may be the file content itself or the address of the file content.
In this embodiment, when the matching between the target digest value and the existing digest value of the code index is unsuccessful, the target digest value and the file content of one code file corresponding to the target digest value are stored in the code index, so as to avoid repeated storage of the same file content and reduce the waste of storage space. Because the plurality of code files with the same file content only store one file content in the code index, the data storage amount in the code index is reduced, so that the searching range of the code file searching can be reduced and the efficiency of the code file searching can be improved when the code file is searched in the code index.
In one embodiment of the present application, step 103, storing the at least one code file of the object code library according to the summary value set includes:
for each target digest value in the set of digest values, storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of the database.
The database also includes a file index, which may include two fields, a first field for storing file information of the code file and a second field for storing a digest value of the code file, that is, the file index may be used to store file information and a digest value of the code file. The file information may also be referred to as metadata information, and may include a code library name where the code file is located, a deposit address, branch information where the code file is located, a file name of the code file, a time when the code file is submitted to the code library, a submitter, and the like. In the code index, the abstract value can be regarded as a pointer of the code content, and can also be called as an external key, the file content of the code file can be obtained from the code index according to the abstract value, and the file information of the code file can be obtained from the file index according to the abstract value.
Fig. 2 is a diagram showing a relationship between a file index and a code index, where the file index and the code index are related by a summary value, and a plurality of file information in the file index may correspond to one file content in the code index, and conversely, one file content in the code index may correspond to a plurality of file information in the code index.
In this embodiment, if the matching of the target digest value with the existing digest value of the code index is unsuccessful, in addition to storing the target digest value and the file content of one code file corresponding to the target digest value in the code index, it is also necessary to store the target digest value and the file information of one or more code files in the target code library corresponding to the target digest value in the file index.
For example, if file contents of the code file a and the code file B are the same, the digest values of the two code files are the target digest value a, the code index does not include the same digest value as the target digest value a, the target digest value a and the file contents of the code file a (or the code file B) are stored in the code index, and two pieces of new data are stored in the file index, the two pieces of new data are respectively: target digest value a, file information of code file a; target digest value a, and file information for code file B.
If the target digest value is successfully matched with the existing digest value of the code index, new data is not required to be added into the code index, but the target digest value and file information of one or more code files in the target code library corresponding to the target digest value are still stored in the file index.
For example, if the file contents of the code file a and the code file B are the same, the digest values of the two code files are the target digest value a, and the code index includes the digest value same as the target digest value a, two pieces of new data are stored in the file index, and the two pieces of new data are respectively: target digest value a, file information of code file a; target digest value a, and file information for code file B.
That is, whether or not the target digest value matches an existing digest value of a code index successfully, for each target digest value in the set of digest values, the target digest value, and file information of one or more code files in the target code library to which the target digest value corresponds, are stored in the file index. The steps in this embodiment are not sequential between the step of matching the target digest value with the code index of the database.
As shown in fig. 3, the code file parser parses the code files in the object code library, adds new data to the file index according to the parsing result, for example, obtains file contents, file information, and digest values of each code file, and then stores the file information and the digest values in the file index. In addition, whether new data is added to the code index is determined according to the analysis result. As shown in fig. 4, when new data is added to the code index, it is determined whether a target digest value obtained by parsing exists in the digest value set, if not, the target digest value is added to the digest value set, and then file contents of the code file are further stored into the database based on the digest value set.
In this embodiment, for each target digest value in the digest value set, the target digest value and file information of one or more code files in the target code library corresponding to the target digest value are stored in the file index, so that storage of the code files is achieved, and information loss of the code files is avoided.
In one embodiment of the present application, the object code library includes default branches and non-default branches; the default branch includes a first code file of the at least one code file, and the non-default branch includes a second code file of the at least one code file other than the first code file;
Step 102, obtaining a summary value set according to the at least one code file of the target code library, including:
for each first target code file in the first code files, obtaining the abstract value of the first target code file according to the file content of the first target code file;
if the digest value set does not include the digest value of the first object code file, adding the digest value of the first object code file to the digest value set;
acquiring a second target code file in the second code file, wherein the file content of the second target code file is different from the file content of each first target code file in the first code file;
for each second target code file in the second code files, obtaining the abstract value of the second target code file according to the file content of the second target code file;
the digest value of the second object code file is added to the set of digest values.
Specifically, the object code library at least comprises one branch, and if the object code library comprises one branch, the branch is a default branch; if the object code library includes multiple branches, one of the multiple branches is a default branch and the remaining other branches are non-default branches. The default branch includes a first code file of the at least one code file, and the non-default branch includes a second code file of the at least one code file other than the first code file.
In this embodiment, the code file in the default branch is first processed, specifically: for one first target code file in the first code files, obtaining the abstract value of the first target code file according to the file content of the first target code file; if the digest value set does not include the digest value of the first object code file, adding the digest value of the first object code file to the digest value set. And performing the above processing on each first target code file in the first code files, for example, traversing each first target code file in the first code files, analyzing the first target code files, and obtaining the file content, the file information and the abstract value of each target code file, which can be obtained specifically through a git hash-object command. The object code library corresponds to a set of digest values, i.e., a set of git_blobs.
Then, the code files in the non-default branches are processed, specifically: acquiring a second target code file in the second code files, and acquiring the abstract value of one second target code file in the second code files according to the file content of the second target code file; the digest value of the second object code file is added to the set of digest values. The above-described processing is performed for each of the second code files. If there are multiple non-default branches, the above-mentioned processing is sequentially performed on the multiple branches in the non-default branches.
The file content of the second target code file is different from the file content of each first target code file in the first code files, namely the first code files of the default branches are used as comparison references, the file content of each code file of the second code files of the non-default branches is respectively compared with the file content of each code file of the first code files, and the code files with different file content from the file content of the first code files in the non-default branches are determined to be the second target code files. For example, a git command may be used to obtain code files in a non-default branch that differ from the default branch file content (i.e., the second object code file), and then store the file content of these code files, i.e., obtain the digest values of these code files, and add them to the digest value set. Code files with different contents from the default branch files in the non-default branches can be directly obtained through the git command, and the engineer is not required to rewrite codes to carry out traversal comparison on the code files in the non-default branches.
In this embodiment, each first target code file of the first code file of the default branch is parsed to determine that the digest value of the first target code file is included in the digest value set, and then, with the first code file of the default branch as a reference, a second code file of the non-default branch is compared with the first code file to obtain a second target code file, and the digest value of the second target code file is added to the digest value set. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract value in the abstract value set, repeated storage of the same file content can be avoided, and the waste of storage space is reduced.
In one embodiment of the present application, the obtaining the second object code file in the second code file includes:
for each code file to be determined in the second code files, determining the abstract value of the code file to be determined according to the file content of the code file to be determined;
and if the abstract value of the code file to be determined is different from the abstract value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
In the process of determining the second target code file, determining the abstract value of one code file to be determined in the second code file according to the file content of the code file to be determined; and if the abstract value of the code file to be determined is different from the abstract value of each first target code file in the first code file, determining the code file to be determined as the second target code file. The steps are executed for each code file to be determined in the second code files, and a plurality of second target code files are obtained.
In this embodiment, for each code file to be determined in the second code files, determining a summary value of the code file to be determined according to file content of the code file to be determined; and if the abstract value of the code file to be determined is different from the abstract value of each first target code file in the first code file, determining the code file to be determined as the second target code file. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract value in the abstract value set, repeated storage of the same file content can be avoided, and waste of storage space is reduced.
In one embodiment of the present application, the object code library includes default branches and non-default branches; the default branch includes a first code file of the at least one code file, and the non-default branch includes a second code file of the at least one code file other than the first code file;
step 102, obtaining a summary value set according to the at least one code file of the target code library, including:
for each first target code file in the first code files, obtaining the abstract value of the first target code file according to the file content of the first target code file;
if the digest value set does not include the digest value of the first object code file, adding the digest value of the first object code file to the digest value set;
for each third target code file in the second code files, obtaining the abstract value of the third target code file according to the file content of the third target code file;
and if the digest value set does not comprise the digest value of the third object code file, adding the digest value of the third object code file to the digest value set.
In this embodiment, each first target code file in the first code files is processed first, and the digest value of the first target code file not included in the digest value set is added to the digest value set; the second code file is then processed, adding to the set of digest values the digest values of the third object code file not included in the set of digest values. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract value in the abstract value set, repeated storage of the same file content can be avoided, and waste of storage space is reduced.
Further, if a plurality of code libraries are obtained, determining each code library in the plurality of code libraries as a target code library in turn, and then processing the target code library by using the method provided by the embodiment, wherein each target code library corresponds to one summary value set.
Referring to fig. 5, fig. 5 is a block diagram of a code file storage device according to an embodiment of the present application, and as shown in fig. 5, the embodiment provides a code file storage device 500, including:
a first obtaining module 501, configured to obtain an object code library, where the object code library includes at least one code file;
A second obtaining module 502, configured to obtain a summary value set according to the at least one code file of the target code library, where the summary value set includes at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;
and a storage module 503, configured to store the at least one code file of the target code library according to the summary value set, where only file content of one code file is stored for each target summary value.
In one embodiment of the present application, the storage module 503 includes:
a matching sub-module, configured to match, for each target digest value in the set of digest values, the target digest value with a code index of a database, where the code index includes an existing digest value and an existing code file corresponding to the existing digest value;
and the first storage sub-module is used for storing the target abstract value and the file content of one code file corresponding to the target abstract value in the code index if the matching of the target abstract value and the existing abstract value of the code index is unsuccessful.
In one embodiment of the present application, the storage module 503 includes:
and the second storage sub-module is used for storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of the database for each target digest value in the digest value set.
In one embodiment of the present application, the object code library includes default branches and non-default branches; the default branch includes a first code file of the at least one code file, and the non-default branch includes a second code file of the at least one code file other than the first code file;
the second obtaining module 502 includes:
the first acquisition submodule is used for acquiring the abstract value of each first target code file in the first code files according to the file content of the first target code file;
a first adding sub-module, configured to add the digest value of the first object code file to the digest value set if the digest value set does not include the digest value of the first object code file;
The second acquisition sub-module is used for acquiring a second target code file in the second code file, wherein the file content of the second target code file is different from the file content of each first target code file in the first code file;
a third obtaining sub-module, configured to obtain, for each second target code file in the second code files, a digest value of the second target code file according to file content of the second target code file;
and the second adding sub-module is used for adding the digest value of the second target code file into the digest value set.
In one embodiment of the present application, the second obtaining submodule is configured to:
for each code file to be determined in the second code files, determining the abstract value of the code file to be determined according to the file content of the code file to be determined;
and if the abstract value of the code file to be determined is different from the abstract value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
The code file storage device 500 is capable of implementing each process implemented by the electronic device in the method embodiment shown in fig. 1, and is not described herein again for the sake of avoiding repetition.
The code file storage device 500 of the embodiment of the present application obtains an object code library, where the object code library includes at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; and storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that repeated storage of the same file content is avoided, waste of storage resources is reduced, and storage cost is reduced.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 6, a block diagram of an electronic device according to a code file storage method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 6, the electronic device includes: one or more processors 601, memory 602, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 601 is illustrated in fig. 6.
Memory 602 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the code file storage methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the code file storage method provided by the present application.
The memory 602 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the first acquisition module 501, the second acquisition module 502, and the storage module 503 shown in fig. 5) corresponding to the code file storage method in the embodiments of the present application. The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 602, i.e., implements the code file storage method in the above-described method embodiments.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device implementing the code file storage method, and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 602 may optionally include memory located remotely from processor 601, such remote memory being connectable through a network to an electronic device implementing the code file storage method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device implementing the code file storing method may further include: an input device 603 and an output device 604. The processor 601, memory 602, input device 603 and output device 604 may be connected by a bus or otherwise, for example in fig. 6.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device implementing the code file storage method, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, etc. input devices. The output means 604 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, the target code library is obtained, and the target code library comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; and storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that repeated storage of the same file content is avoided, waste of storage resources is reduced, and storage cost is reduced.
In addition, when the matching of the target digest value and the existing digest value of the code index is unsuccessful, storing the target digest value and the file content of one code file corresponding to the target digest value in the code index, so as to avoid repeated storage of the same file content and reduce the waste of storage space. Because the plurality of code files with the same file content only store one file content in the code index, the data storage amount in the code index is reduced, so that the searching range of the code file searching can be reduced and the efficiency of the code file searching can be improved when the code file is searched in the code index.
And for each target abstract value in the abstract value set, storing the target abstract value and file information of one or more code files in the target code library corresponding to the target abstract value in the file index, thereby realizing storage of the code files and avoiding information loss of the code files.
Analyzing each first target code file of a first code file of a default branch to determine that the digest value of the first target code file is included in the digest value set, and then comparing a second code file of a non-default branch with the first code file by taking the first code file of the default branch as a reference to obtain a second target code file, and adding the digest value of the second target code file into the digest value set. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract value in the abstract value set, repeated storage of the same file content can be avoided, and the waste of storage space is reduced.
For each code file to be determined in the second code files, determining the abstract value of the code file to be determined according to the file content of the code file to be determined; and if the abstract value of the code file to be determined is different from the abstract value of each first target code file in the first code file, determining the code file to be determined as the second target code file. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract value in the abstract value set, repeated storage of the same file content can be avoided, and waste of storage space is reduced.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (10)

1. A code file storage method, comprising:
acquiring an object code library, wherein the object code library comprises at least one code file;
obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;
storing the at least one code file of the target code library according to the summary value set, wherein only the file content of one code file is stored for each target summary value;
the object code library comprises default branches and non-default branches; the default branch includes a first code file of the at least one code file, and the non-default branch includes a second code file of the at least one code file other than the first code file;
the obtaining a summary value set according to the at least one code file of the target code library includes:
for each first target code file in the first code files, obtaining the abstract value of the first target code file according to the file content of the first target code file;
If the digest value set does not include the digest value of the first object code file, adding the digest value of the first object code file to the digest value set;
acquiring a second target code file in the second code file, wherein the file content of the second target code file is different from the file content of each first target code file in the first code file;
for each second target code file in the second code files, obtaining the abstract value of the second target code file according to the file content of the second target code file;
the digest value of the second object code file is added to the set of digest values.
2. The code file storing method according to claim 1, wherein storing the at least one code file of the object code library according to the digest value set includes:
for each target digest value in the set of digest values, matching the target digest value with a code index of a database, the code index comprising an existing digest value and an existing code file corresponding to the existing digest value;
And if the target digest value is not successfully matched with the existing digest value of the code index, storing the target digest value and the file content of one code file corresponding to the target digest value in the code index.
3. The code file storing method according to claim 1, wherein storing the at least one code file of the object code library according to the digest value set includes:
for each target digest value in the set of digest values, storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of a database.
4. The code file storage method of claim 1, wherein the obtaining a second object code file of the second code files comprises:
for each code file to be determined in the second code files, determining the abstract value of the code file to be determined according to the file content of the code file to be determined;
and if the abstract value of the code file to be determined is different from the abstract value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
5. A code file storage device, comprising:
the first acquisition module is used for acquiring an object code library, wherein the object code library comprises at least one code file;
the second acquisition module is used for acquiring a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;
the storage module is used for storing the at least one code file of the target code library according to the abstract value set, wherein only the file content of one code file is stored for each target abstract value;
the object code library comprises default branches and non-default branches; the default branch includes a first code file of the at least one code file, and the non-default branch includes a second code file of the at least one code file other than the first code file;
the second acquisition module includes:
the first acquisition submodule is used for acquiring the abstract value of each first target code file in the first code files according to the file content of the first target code file;
A first adding sub-module, configured to add the digest value of the first object code file to the digest value set if the digest value set does not include the digest value of the first object code file;
the second acquisition sub-module is used for acquiring a second target code file in the second code file, wherein the file content of the second target code file is different from the file content of each first target code file in the first code file;
a third obtaining sub-module, configured to obtain, for each second target code file in the second code files, a digest value of the second target code file according to file content of the second target code file;
and the second adding sub-module is used for adding the digest value of the second target code file into the digest value set.
6. The code file storage device of claim 5, wherein the storage module comprises:
a matching sub-module, configured to match, for each target digest value in the set of digest values, the target digest value with a code index of a database, where the code index includes an existing digest value and an existing code file corresponding to the existing digest value;
And the first storage sub-module is used for storing the target abstract value and the file content of one code file corresponding to the target abstract value in the code index if the matching of the target abstract value and the existing abstract value of the code index is unsuccessful.
7. The code file storage device of claim 5, wherein the storage module comprises:
and the second storage sub-module is used for storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of a database for each target digest value in the digest value set.
8. The code file storage device of claim 5, wherein the second acquisition sub-module is configured to:
for each code file to be determined in the second code files, determining the abstract value of the code file to be determined according to the file content of the code file to be determined;
and if the abstract value of the code file to be determined is different from the abstract value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.
CN202010305891.7A 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment Active CN111506268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010305891.7A CN111506268B (en) 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010305891.7A CN111506268B (en) 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111506268A CN111506268A (en) 2020-08-07
CN111506268B true CN111506268B (en) 2023-07-18

Family

ID=71872821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010305891.7A Active CN111506268B (en) 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111506268B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156376A (en) * 2013-05-15 2014-11-19 腾讯科技(深圳)有限公司 Storage method, device and server for file
CN106557571A (en) * 2016-11-23 2017-04-05 福建亿榕信息技术有限公司 A kind of data duplicate removal method and device based on K V storage engines
CN106708927A (en) * 2016-11-18 2017-05-24 北京二六三企业通信有限公司 Duplicate removal processing method and duplicate removal processing device for files
CN106843773A (en) * 2017-02-16 2017-06-13 天津书生云科技有限公司 Storage method and distributed storage system
CN107193498A (en) * 2017-05-25 2017-09-22 山东浪潮商用系统有限公司 A kind of method and device that data are carried out with deduplication processing

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7814056B2 (en) * 2004-05-21 2010-10-12 Computer Associates Think, Inc. Method and apparatus for data backup using data blocks
US7725437B2 (en) * 2007-07-31 2010-05-25 Hewlett-Packard Development Company, L.P. Providing an index for a data store
CN102024002A (en) * 2009-09-10 2011-04-20 上海中信信息发展股份有限公司 Safe storage method and system of filing of electronic documents
US10140308B2 (en) * 2012-03-06 2018-11-27 International Business Machines Corporation Enhancing data retrieval performance in deduplication systems
CN102880671A (en) * 2012-09-07 2013-01-16 浪潮电子信息产业股份有限公司 Method for actively deleting repeated data of distributed file system
US9697223B2 (en) * 2013-07-08 2017-07-04 International Business Machines Corporation Providing identifiers to data files in a data deduplication system
US9892048B2 (en) * 2013-07-15 2018-02-13 International Business Machines Corporation Tuning global digests caching in a data deduplication system
US9892127B2 (en) * 2013-07-15 2018-02-13 International Business Machines Corporation Global digests caching in a data deduplication system
US9619167B2 (en) * 2013-11-27 2017-04-11 Intel Corporation System and method for computing message digests
US10235080B2 (en) * 2017-06-06 2019-03-19 Saudi Arabian Oil Company Systems and methods for assessing upstream oil and gas electronic data duplication
US10387066B1 (en) * 2018-04-18 2019-08-20 EMC IP Holding Company LLC Providing data deduplication in a data storage system with parallelized computation of crypto-digests for blocks of host I/O data
CN110175155B (en) * 2019-06-03 2023-06-13 武汉纺织大学 File deduplication processing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156376A (en) * 2013-05-15 2014-11-19 腾讯科技(深圳)有限公司 Storage method, device and server for file
CN106708927A (en) * 2016-11-18 2017-05-24 北京二六三企业通信有限公司 Duplicate removal processing method and duplicate removal processing device for files
CN106557571A (en) * 2016-11-23 2017-04-05 福建亿榕信息技术有限公司 A kind of data duplicate removal method and device based on K V storage engines
CN106843773A (en) * 2017-02-16 2017-06-13 天津书生云科技有限公司 Storage method and distributed storage system
CN107193498A (en) * 2017-05-25 2017-09-22 山东浪潮商用系统有限公司 A kind of method and device that data are carried out with deduplication processing

Also Published As

Publication number Publication date
CN111506268A (en) 2020-08-07

Similar Documents

Publication Publication Date Title
US8788471B2 (en) Matching transactions in multi-level records
CN111666206B (en) Method, device, equipment and storage medium for acquiring influence range of change code
CN111523001B (en) Method, device, equipment and storage medium for storing data
EP3832493B1 (en) Method, apparatus, electronic device and readable storage medium for data query
US20210133217A1 (en) Method and apparatus for importing data into graph database, electronic device and medium
CN110597797A (en) Table space debris recovery method and device, electronic equipment and storage medium
CN111274127B (en) Code jumping method, device, equipment and medium in code evaluation
CN111125176A (en) Service data searching method and device, electronic equipment and storage medium
CN108959294B (en) Method and device for accessing search engine
CN111931524B (en) Method, apparatus, device and storage medium for outputting information
CN111782633B (en) Data processing method and device and electronic equipment
CN111767442B (en) Data updating method, device, search server, terminal and storage medium
CN111259058B (en) Data mining method, data mining device and electronic equipment
CN111639116B (en) Data access connection session protection method and device
CN110688837B (en) Data processing method and device
CN111506268B (en) Code file storage method and device and electronic equipment
CN111966846A (en) Image query method and device, electronic equipment and storage medium
CN111177479A (en) Method and device for acquiring feature vectors of nodes in relational network graph
CN112446728B (en) Advertisement recall method, device, equipment and storage medium
CN111026438B (en) Method, device, equipment and medium for extracting small program package and page key information
CN111506787B (en) Method, device, electronic equipment and computer readable storage medium for web page update
CN111459887B (en) Resource screening method and device, electronic equipment and storage medium
CN111292223A (en) Graph calculation processing method and device, electronic equipment and storage medium
CN113051121A (en) Log information retrieval method and device, electronic equipment and medium
CN111506786B (en) Method, device, electronic equipment and computer readable storage medium for web page update

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant