CN111506268A - Code file storage method and device and electronic equipment - Google Patents

Code file storage method and device and electronic equipment Download PDF

Info

Publication number
CN111506268A
CN111506268A CN202010305891.7A CN202010305891A CN111506268A CN 111506268 A CN111506268 A CN 111506268A CN 202010305891 A CN202010305891 A CN 202010305891A CN 111506268 A CN111506268 A CN 111506268A
Authority
CN
China
Prior art keywords
code
file
target
code file
digest value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010305891.7A
Other languages
Chinese (zh)
Other versions
CN111506268B (en
Inventor
唐杰
于澔
刘志伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010305891.7A priority Critical patent/CN111506268B/en
Publication of CN111506268A publication Critical patent/CN111506268A/en
Application granted granted Critical
Publication of CN111506268B publication Critical patent/CN111506268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • G06F3/0641De-duplication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a code file storage method and device and electronic equipment, and relates to the field of data storage. The specific implementation scheme is as follows: acquiring a target code base, wherein the target code base comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.

Description

Code file storage method and device and electronic equipment
Technical Field
The present application relates to data storage technologies in the field of data processing technologies, and in particular, to a code file storage method and apparatus, and an electronic device.
Background
Code searching is one of the most important means in the modern program development process. In different code libraries of the content addressing file system or in different branches of the code libraries, code files with the same file content may exist, that is, the code files with the same file content have the situation of repeated storage, so that under the condition that the number of the code files is large, a large amount of storage space is needed to store the code files, and the storage space is wasted.
Disclosure of Invention
The embodiment of the application provides a code file storage method and device and electronic equipment, and aims to solve the problem that storage space is wasted due to repeated storage of code files with the same file content.
In order to solve the above technical problem, the present application is implemented as follows:
a first aspect of the present application provides a code file storage method, including:
acquiring a target code base, wherein the target code base comprises at least one code file;
obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;
storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.
A second aspect of the present application provides a code file storage apparatus, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target code library which comprises at least one code file;
a second obtaining module, configured to obtain a digest value set according to the at least one code file of the target code library, where the digest value set includes at least one target digest value, and multiple code files with the same file content correspond to one target digest value;
and the storage module is used for storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.
A third aspect of the present application provides an electronic device, comprising:
at least one processor;
and a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
A fourth aspect of the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of the first aspect.
One embodiment in the above application has the following advantages or benefits: acquiring a target code base, wherein the target code base comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.
Other effects of the above-described alternative will be described below with reference to specific embodiments.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flowchart of a code file storage method provided by an embodiment of the present application;
FIG. 2 is a diagram illustrating a relationship between a file index and a code index provided by an embodiment of the present application;
FIG. 3 is a second flowchart of a code file storage method according to an embodiment of the present application;
FIG. 4 is a third flowchart of a code file storage method according to an embodiment of the present application;
FIG. 5 is a block diagram of a code file storage device according to an embodiment of the present application;
fig. 6 is a block diagram of an electronic device for implementing a code file storage method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, fig. 1 is a flowchart of a code file storage method provided in an embodiment of the present application, and as shown in fig. 1, the embodiment provides a code file storage method applied to an electronic device, including the following steps:
step 101, obtaining a target code base, wherein the target code base comprises at least one code file.
The content addressable file system (e.g., Git) may include a plurality of code libraries, the target code library may be one of the plurality of code libraries, and the user may obtain the target code library by downloading, the target code library including one or more code files, and in particular, the target code library may include a plurality of branches, each branch including one or more code files. A code file is understood to be a file whose contents are code.
Step 102, obtaining a digest value set according to the at least one code file of the target code library, wherein the digest value set includes at least one target digest value, and a plurality of code files with the same file content correspond to one target digest value.
The digest value set comprises a target digest value or a plurality of different target digest values, the target digest value is determined according to the file content of the code files in the at least one code file, and a plurality of code files with the same file content correspond to the target digest value. When the target abstract value is determined according to the file content of the code file, the file content can be processed according to a Hash algorithm to obtain the target abstract value.
Step 103, storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.
And storing the code file corresponding to the target abstract value according to each target abstract value in the abstract value combination. Only one code file is stored aiming at each target abstract value, namely if one target abstract value corresponds to a plurality of code files, only the file content of one code file is stored because the file contents of the plurality of code files are the same, so that repeated storage is avoided, and the waste of storage resources is reduced.
In this embodiment, a target code library is obtained, where the target code library includes at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.
In an embodiment of the present application, the step 103 of storing the at least one code file of the target code library according to the digest value set includes:
for each target abstract value in the abstract value set, matching the target abstract value with a code index of a database, wherein the code index comprises an existing abstract value and an existing code file corresponding to the existing abstract value;
and if the target abstract value is unsuccessfully matched with the existing abstract value of the code index, storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index.
Specifically, the database is used for storing file contents of the code file, and the database may be an elastic search Engine (ES). The database includes a code index, which may also be referred to as a code index. The code index includes file contents of the code file, and may be specifically a snapshot of the file contents of the code file. The code index includes a field identification that stores a digest value of the code file, the digest value being determined based on file content of the code file. If the file contents of a plurality of code files are the same, the digest values of the plurality of code files are also the same, in this case, only one copy is stored in the code index, so that the repeated storage of the same file contents can be avoided. Further, if the target digest value is successfully matched with the existing digest value of the code index, which indicates that the target digest value already exists in the code index, new data does not need to be added to the code index.
For the sake of distinction, in this embodiment, the digest value included in the code index is referred to as an existing digest value, and the code file included in the code index is referred to as an existing code file.
And matching a target abstract value in the abstract value set with the existing abstract value of the code index, if the target abstract value is different from the existing abstract value, the matching is unsuccessful, and storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index. That is, if there are a plurality of code files corresponding to the target digest value, the file contents of the plurality of code files are the same, and when the plurality of code files are stored, the file content of one of the plurality of code files may be stored. The above-described manner is adopted for each target digest value in the digest value set.
In the embodiment, repeated storage of a plurality of code files with the same file content can be avoided, the storage space is saved, and the searching range of the code files is reduced when the code files are searched, because the inverted index and the number of the code files which need to be accessed during query are reduced. Each item in the code index is a target abstract value and the file content of the code file, the file content of the code file can be searched according to the target abstract value, and the code index can be regarded as an inverted index. Further, the file content in the code index may be the file content itself, or may be an address of the file content.
In this embodiment, when the target digest value is unsuccessfully matched with the existing digest value of the code index, the target digest value and the file content of a code file corresponding to the target digest value are stored in the code index, so as to avoid repeated storage of the same file content and reduce waste of storage space. Because a plurality of code files with the same file content only store one file content in the code index, the data storage capacity in the code index is also reduced, so that when the code files are searched in the code index, the searching range of code file searching can be reduced, and the efficiency of code file searching is improved.
In an embodiment of the present application, the step 103 of storing the at least one code file of the target code library according to the digest value set includes:
for each target digest value in the digest value set, storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of the database.
The database may further include a file index, which may include two fields, a first field for storing file information of the code file and a second field for storing a digest value of the code file, that is, the file index may be used to store the file information and the digest value of the code file. The file information may also be referred to as metadata information, and the file information may include a name of a code library where the code file is located, a storage address, branch information where the code file is located, a file name of the code file, a time when the code file is submitted to the code library, a submitter, and the like. In the code index, the digest value may be regarded as a pointer of the code content and may also be referred to as a foreign key, and the file content of the code file may be obtained from the code index according to the digest value, and the file information of the code file may be obtained from the file index according to the digest value.
Fig. 2 is a diagram illustrating a relationship between a file index and a code index, where the file index and the code index are associated by a digest value, and a plurality of file information in the file index may correspond to one file content in the code index, and in reverse, one file content in the code index may correspond to a plurality of file information in the code index.
In this embodiment, if the target digest value is unsuccessfully matched with the existing digest value of the code index, in addition to storing the target digest value in the code index and the file content of a code file corresponding to the target digest value, it is also necessary to store the target digest value in the file index and the file information of one or more code files in the target code library corresponding to the target digest value.
For example, if the file contents of code file a and code file B are the same, the digest values of both code files are the target digest value a, the code index does not include the digest value that is the same as the target digest value a, the target digest value a and the file contents of code file a (or code file B) are stored in the code index, and two new pieces of data are stored in the file index, which are: a target abstract value a and file information of the code file A; a target digest value a, and file information of code file B.
If the target digest value is successfully matched with the existing digest value of the code index, new data does not need to be added into the code index, but the target digest value and file information of one or more code files in the target code base corresponding to the target digest value are still stored in the file index.
For example, if the file contents of code file a and code file B are the same, the digest values of both code files are target digest value a, and the code index includes the digest value that is the same as target digest value a, then two new pieces of data are stored in the file index, where the two new pieces of data are: a target abstract value a and file information of the code file A; a target digest value a, and file information of code file B.
That is, regardless of whether a target digest value is successfully matched with an existing digest value of a code index, for each target digest value in the digest value set, the target digest value and file information of one or more code files in the target code library corresponding to the target digest value are stored in the file index. The steps in this embodiment and the step of matching the target digest value with the code index of the database are not in sequence.
As shown in fig. 3, the code file parser parses the code files in the target code library, adds new data to the file index according to the parsing result, for example, obtains file contents, file information, and digest values of each code file, and then stores the file information and digest values in the file index. In addition, whether to add new data to the code index is determined according to the parsing result. As shown in fig. 4, when adding new data to the code index, it is determined whether a target digest value obtained by parsing already exists in the digest value set, and if not, the target digest value is added to the digest value set, and then the file content of the code file is further stored in the database based on the digest value set.
In this embodiment, for each target digest value in the digest value set, the target digest value and file information of one or more code files in the target code base corresponding to the target digest value are stored in the file index, so that storage of the code files is realized, and information loss of the code files is avoided.
In one embodiment of the present application, the target code library includes a default branch and a non-default branch; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;
step 102, obtaining a digest value set according to the at least one code file of the target code library, including:
for each first target code file in the first code files, obtaining a digest value of the first target code file according to the file content of the first target code file;
if the digest value set does not include the digest value of the first target code file, adding the digest value of the first target code file to the digest value set;
acquiring a second target code file in the second code file, wherein the file content of the second target code file is different from the file content of each first target code file in the first code file;
for each second target code file in the second code files, obtaining a digest value of the second target code file according to the file content of the second target code file;
adding the digest value of the second target code file to the set of digest values.
Specifically, the target code library at least comprises one branch, and if the target code library comprises one branch, the branch is a default branch; if the target code base comprises a plurality of branches, one branch of the plurality of branches is a default branch, and the rest branches are non-default branches. The default branch includes a first code file of the at least one code file and the non-default branch includes a second code file of the at least one code file other than the first code file.
In this embodiment, the code file in the default branch is processed first, specifically: for a first target code file in the first code files, acquiring a digest value of the first target code file according to the file content of the first target code file; if the digest value set does not include the digest value of the first object code file, adding the digest value of the first object code file to the digest value set. For example, traversing each first object code file in the first code files, analyzing the first object code files, and obtaining file contents, file information, and a digest value of each object code file, which can be specifically obtained by a git hash-object command. The target code base corresponds to a digest value set, i.e. a git _ blob set.
Then, processing the code file in the non-default branch, specifically: acquiring a second target code file in the second code file, and acquiring a digest value of the second target code file for one second target code file in the second code file according to the file content of the second target code file; adding the digest value of the second target code file to the set of digest values. The above-described process is performed for each of the second object code files. If there are a plurality of non-default branches, the above-mentioned processing is sequentially performed for a plurality of branches in the non-default branches.
The file content of the second target code file is different from the file content of each first target code file in the first code files, namely the first code file of the default branch is used as a comparison reference, the file content of each code file of the second code file of the non-default branch is compared with the file content of each code file of the first code file respectively, the code files with different file contents in the non-default branch and the file contents in the first code file are determined, and the code files are the second target code files. For example, a git command may be used to retrieve code files in the non-default branch that are different from the default branch file content (i.e., the second target code file), and then store the file content of these code files, i.e., retrieve the digest values of these code files and add them to the set of digest values. Code files with different contents from the default branch files in the non-default branch can be directly obtained through the git command, and an engineer does not need to write codes again to perform traversal comparison on the code files in the non-default branch.
In this embodiment, each first target code file of the first code file of the default branch is first parsed to determine that the digest value of the first target code file is included in the digest value set, then, with the first code file of the default branch as a reference, the second code file of the non-default branch is compared with the first code file to obtain a second target code file, and the digest value of the second target code file is added to the digest value set. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.
In an embodiment of the present application, the obtaining a second object code file in the second code file includes:
for each code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined;
and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
In the process of determining the second target code file, for one code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined; and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file. And executing the steps for each code file to be determined in the second code file to obtain a plurality of second target code files.
In this embodiment, for each code file to be determined in the second code file, a digest value of the code file to be determined is determined according to file content of the code file to be determined; and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.
In one embodiment of the present application, the target code library includes a default branch and a non-default branch; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;
step 102, obtaining a digest value set according to the at least one code file of the target code library, including:
for each first target code file in the first code files, obtaining a digest value of the first target code file according to the file content of the first target code file;
if the digest value set does not include the digest value of the first target code file, adding the digest value of the first target code file to the digest value set;
for each third target code file in the second code file, obtaining a digest value of the third target code file according to the file content of the third target code file;
if the digest value set does not include the digest value of the third target code file, adding the digest value of the third target code file to the digest value set.
In this embodiment, each first target code file in the first code files is processed first, and digest values of the first target code files, which are not included in the digest value set, are added to the digest value set; the second code file is then processed to add to the set of digest values the digest values of the third target code file not included in the set of digest values. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.
Further, if a plurality of code libraries are acquired, determining each code library in the plurality of code libraries as a target code library in sequence, and then processing the target code library by using the method provided by the above embodiment, where each target code library corresponds to one digest value set.
Referring to fig. 5, fig. 5 is a structural diagram of a code file storage apparatus according to an embodiment of the present application, and as shown in fig. 5, the present embodiment provides a code file storage apparatus 500, including:
a first obtaining module 501, configured to obtain a target code library, where the target code library includes at least one code file;
a second obtaining module 502, configured to obtain a digest value set according to the at least one code file of the target code library, where the digest value set includes at least one target digest value, and multiple code files with the same file content correspond to one target digest value;
a storage module 503, configured to store the at least one code file of the target code library according to the digest value set, where only the file content of one code file is stored for each target digest value.
In an embodiment of the present application, the storage module 503 includes:
the matching submodule is used for matching each target abstract value in the abstract value set with a code index of a database, wherein the code index comprises an existing abstract value and an existing code file corresponding to the existing abstract value;
and the first storage submodule is used for storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index if the target abstract value is unsuccessfully matched with the existing abstract value of the code index.
In an embodiment of the present application, the storage module 503 includes:
and the second storage sub-module is used for storing the target digest value and file information of one or more code files in the target code base corresponding to the target digest value in a file index of the database for each target digest value in the digest value set.
In one embodiment of the present application, the target code library includes a default branch and a non-default branch; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;
the second obtaining module 502 includes:
the first obtaining submodule is used for obtaining a digest value of each first target code file in the first code files according to the file content of the first target code file;
a first adding submodule, configured to add the digest value of the first target code file to the digest value set if the digest value set does not include the digest value of the first target code file;
the second obtaining submodule is used for obtaining a second target code file in the second code file, and the file content of the second target code file is different from the file content of each first target code file in the first code file;
a third obtaining sub-module, configured to, for each second object code file in the second code files, obtain a digest value of the second object code file according to file content of the second object code file;
and the second adding submodule is used for adding the digest value of the second target code file into the digest value set.
In an embodiment of the application, the second obtaining sub-module is configured to:
for each code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined;
and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
The code file storage 500 can implement the processes implemented by the electronic device in the method embodiment shown in fig. 1, and is not described herein again to avoid repetition.
The code file storage device 500 of the embodiment of the application acquires a target code library, where the target code library includes at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 6, it is a block diagram of an electronic device according to the code file storage method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.
The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the code file storage methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the code file storage method provided herein.
The memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the code file storage method in the embodiments of the present application (for example, the first obtaining module 501, the second obtaining module 502, and the storage module 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implements the code file storage method in the above-described method embodiments.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device implementing the code file storage method, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, which may be connected via a network to an electronic device implementing the code file storage method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device implementing the code file storage method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the code file storage method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. the output device 604 may include a display device, an auxiliary lighting device (e.g., L ED), a tactile feedback device (e.g., a vibration motor), etc.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (P L D)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.
The systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or L CD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer for providing interaction with the user.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., AN application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with AN implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, a target code base is obtained and comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.
In addition, when the target abstract value is unsuccessfully matched with the existing abstract value of the code index, the target abstract value and the file content of a code file corresponding to the target abstract value are stored in the code index, so that the repeated storage of the same file content is avoided, and the waste of storage space is reduced. Because a plurality of code files with the same file content only store one file content in the code index, the data storage capacity in the code index is also reduced, so that when the code files are searched in the code index, the searching range of code file searching can be reduced, and the efficiency of code file searching is improved.
And for each target digest value in the digest value set, storing the target digest value and file information of one or more code files in the target code base corresponding to the target digest value in the file index, so that the code files are stored, and information loss of the code files is avoided.
Analyzing each first target code file of the first code file of the default branch to determine that the digest value of the first target code file is included in the digest value set, then comparing the second code file of the non-default branch with the first code file by taking the first code file of the default branch as a reference to obtain a second target code file, and adding the digest value of the second target code file to the digest value set. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.
For each code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined; and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A code file storage method, comprising:
acquiring a target code base, wherein the target code base comprises at least one code file;
obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;
storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.
2. The method according to claim 1, wherein said storing the at least one code file of the target code library according to the digest value set comprises:
for each target abstract value in the abstract value set, matching the target abstract value with a code index of a database, wherein the code index comprises an existing abstract value and an existing code file corresponding to the existing abstract value;
and if the target abstract value is unsuccessfully matched with the existing abstract value of the code index, storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index.
3. The method according to claim 1, wherein said storing the at least one code file of the target code library according to the digest value set comprises:
for each target digest value in the digest value set, storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of a database.
4. The code file storage method according to claim 1, wherein the target code library includes a default branch and a non-default branch; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;
said obtaining a set of digest values from said at least one code file of said target code library, comprising:
for each first target code file in the first code files, obtaining a digest value of the first target code file according to the file content of the first target code file;
if the digest value set does not include the digest value of the first target code file, adding the digest value of the first target code file to the digest value set;
acquiring a second target code file in the second code file, wherein the file content of the second target code file is different from the file content of each first target code file in the first code file;
for each second target code file in the second code files, obtaining a digest value of the second target code file according to the file content of the second target code file;
adding the digest value of the second target code file to the set of digest values.
5. The method for storing a code file according to claim 4, wherein the obtaining a second object code file in the second code file comprises:
for each code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined;
and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
6. A code file storage device, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target code library which comprises at least one code file;
a second obtaining module, configured to obtain a digest value set according to the at least one code file of the target code library, where the digest value set includes at least one target digest value, and multiple code files with the same file content correspond to one target digest value;
and the storage module is used for storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.
7. The code file storage device of claim 6, wherein the storage module comprises:
the matching submodule is used for matching each target abstract value in the abstract value set with a code index of a database, wherein the code index comprises an existing abstract value and an existing code file corresponding to the existing abstract value;
and the first storage submodule is used for storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index if the target abstract value is unsuccessfully matched with the existing abstract value of the code index.
8. The code file storage device of claim 6, wherein the storage module comprises:
and the second storage sub-module is used for storing the target digest value and file information of one or more code files in the target code base corresponding to the target digest value in a file index of a database for each target digest value in the digest value set.
9. The code file storage device of claim 6, wherein the target code library comprises default branches and non-default branches; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;
the second obtaining module includes:
the first obtaining submodule is used for obtaining a digest value of each first target code file in the first code files according to the file content of the first target code file;
a first adding submodule, configured to add the digest value of the first target code file to the digest value set if the digest value set does not include the digest value of the first target code file;
the second obtaining submodule is used for obtaining a second target code file in the second code file, and the file content of the second target code file is different from the file content of each first target code file in the first code file;
a third obtaining sub-module, configured to, for each second object code file in the second code files, obtain a digest value of the second object code file according to file content of the second object code file;
and the second adding submodule is used for adding the digest value of the second target code file into the digest value set.
10. The code file storage device of claim 9, wherein the second obtaining sub-module is configured to:
for each code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined;
and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202010305891.7A 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment Active CN111506268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010305891.7A CN111506268B (en) 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010305891.7A CN111506268B (en) 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111506268A true CN111506268A (en) 2020-08-07
CN111506268B CN111506268B (en) 2023-07-18

Family

ID=71872821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010305891.7A Active CN111506268B (en) 2020-04-17 2020-04-17 Code file storage method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111506268B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070027937A1 (en) * 2004-05-21 2007-02-01 Mcgrattan Emma K Method and apparatus for storage backup
US20090037456A1 (en) * 2007-07-31 2009-02-05 Kirshenbaum Evan R Providing an index for a data store
CN102024002A (en) * 2009-09-10 2011-04-20 上海中信信息发展股份有限公司 Safe storage method and system of filing of electronic documents
CN102880671A (en) * 2012-09-07 2013-01-16 浪潮电子信息产业股份有限公司 Method for actively deleting repeated data of distributed file system
US20130238568A1 (en) * 2012-03-06 2013-09-12 International Business Machines Corporation Enhancing data retrieval performance in deduplication systems
CN104156376A (en) * 2013-05-15 2014-11-19 腾讯科技(深圳)有限公司 Storage method, device and server for file
US20150012504A1 (en) * 2013-07-08 2015-01-08 International Business Machines Corporation Providing identifiers to data files in a data deduplication system
US20150019501A1 (en) * 2013-07-15 2015-01-15 International Business Machines Corporation Global digests caching in a data deduplication system
US20150019817A1 (en) * 2013-07-15 2015-01-15 International Business Machines Corporation Tuning global digests caching in a data deduplication system
CN105659222A (en) * 2013-11-27 2016-06-08 英特尔公司 System and method for computing message digests
CN106557571A (en) * 2016-11-23 2017-04-05 福建亿榕信息技术有限公司 A kind of data duplicate removal method and device based on K V storage engines
CN106708927A (en) * 2016-11-18 2017-05-24 北京二六三企业通信有限公司 Duplicate removal processing method and duplicate removal processing device for files
CN106843773A (en) * 2017-02-16 2017-06-13 天津书生云科技有限公司 Storage method and distributed storage system
CN107193498A (en) * 2017-05-25 2017-09-22 山东浪潮商用系统有限公司 A kind of method and device that data are carried out with deduplication processing
US20180349054A1 (en) * 2017-06-06 2018-12-06 Saudi Arabian Oil Company Systems and methods for assessing upstream oil and gas electronic data duplication
US10387066B1 (en) * 2018-04-18 2019-08-20 EMC IP Holding Company LLC Providing data deduplication in a data storage system with parallelized computation of crypto-digests for blocks of host I/O data
CN110175155A (en) * 2019-06-03 2019-08-27 武汉纺织大学 A kind of method and system of file duplicate removal processing

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070027937A1 (en) * 2004-05-21 2007-02-01 Mcgrattan Emma K Method and apparatus for storage backup
US20090037456A1 (en) * 2007-07-31 2009-02-05 Kirshenbaum Evan R Providing an index for a data store
CN102024002A (en) * 2009-09-10 2011-04-20 上海中信信息发展股份有限公司 Safe storage method and system of filing of electronic documents
US20130238568A1 (en) * 2012-03-06 2013-09-12 International Business Machines Corporation Enhancing data retrieval performance in deduplication systems
CN102880671A (en) * 2012-09-07 2013-01-16 浪潮电子信息产业股份有限公司 Method for actively deleting repeated data of distributed file system
CN104156376A (en) * 2013-05-15 2014-11-19 腾讯科技(深圳)有限公司 Storage method, device and server for file
US20150012504A1 (en) * 2013-07-08 2015-01-08 International Business Machines Corporation Providing identifiers to data files in a data deduplication system
US20150019817A1 (en) * 2013-07-15 2015-01-15 International Business Machines Corporation Tuning global digests caching in a data deduplication system
US20150019501A1 (en) * 2013-07-15 2015-01-15 International Business Machines Corporation Global digests caching in a data deduplication system
CN105659222A (en) * 2013-11-27 2016-06-08 英特尔公司 System and method for computing message digests
CN106708927A (en) * 2016-11-18 2017-05-24 北京二六三企业通信有限公司 Duplicate removal processing method and duplicate removal processing device for files
CN106557571A (en) * 2016-11-23 2017-04-05 福建亿榕信息技术有限公司 A kind of data duplicate removal method and device based on K V storage engines
CN106843773A (en) * 2017-02-16 2017-06-13 天津书生云科技有限公司 Storage method and distributed storage system
CN107193498A (en) * 2017-05-25 2017-09-22 山东浪潮商用系统有限公司 A kind of method and device that data are carried out with deduplication processing
US20180349054A1 (en) * 2017-06-06 2018-12-06 Saudi Arabian Oil Company Systems and methods for assessing upstream oil and gas electronic data duplication
US10387066B1 (en) * 2018-04-18 2019-08-20 EMC IP Holding Company LLC Providing data deduplication in a data storage system with parallelized computation of crypto-digests for blocks of host I/O data
CN110175155A (en) * 2019-06-03 2019-08-27 武汉纺织大学 A kind of method and system of file duplicate removal processing

Also Published As

Publication number Publication date
CN111506268B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN110806923B (en) Parallel processing method and device for block chain tasks, electronic equipment and medium
US8788473B2 (en) Matching transactions in multi-level records
CN111241108B (en) Key value based indexing method and device for KV system, electronic equipment and medium
EP3832493B1 (en) Method, apparatus, electronic device and readable storage medium for data query
US20210133217A1 (en) Method and apparatus for importing data into graph database, electronic device and medium
CN111475164A (en) Component dependency relationship detection method and device and electronic equipment
CN110633281A (en) Method and device for processing multi-type data sources
CN111461343A (en) Model parameter updating method and related equipment thereof
CN110597797A (en) Table space debris recovery method and device, electronic equipment and storage medium
CN112269706A (en) Interface parameter checking method and device, electronic equipment and computer readable medium
CN112115105A (en) Service processing method, device and equipment
CN113868251B (en) Global secondary indexing method and device for distributed database
CN110990179A (en) Task processing method, device and equipment
CN111506268A (en) Code file storage method and device and electronic equipment
CN111782633B (en) Data processing method and device and electronic equipment
CN113111138A (en) Data processing method, device, computing equipment and medium
CN111506787A (en) Webpage updating method and device, electronic equipment and computer-readable storage medium
CN111427910A (en) Data processing method and device
CN117235078B (en) Method, system, device and storage medium for processing mass data at high speed
JP7293544B2 (en) Q&A system update processing method and device
CN109947775B (en) Data processing method and device, electronic equipment and computer readable medium
CN111459887A (en) Resource screening method and device, electronic equipment and storage medium
CN111506786A (en) Webpage updating method and device, electronic equipment and computer-readable storage medium
CN112183041A (en) Report form adjusting method, device, equipment and storage medium based on indexes
CN116932095A (en) Method and device for dynamically controlling system behavior based on data state machine diagram

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant