CN111506268A

CN111506268A - Code file storage method and device and electronic equipment

Info

Publication number: CN111506268A
Application number: CN202010305891.7A
Authority: CN
Inventors: 唐杰; 于澔; 刘志伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-04-17
Filing date: 2020-04-17
Publication date: 2020-08-07
Anticipated expiration: 2040-04-17
Also published as: CN111506268B

Abstract

The application discloses a code file storage method and device and electronic equipment, and relates to the field of data storage. The specific implementation scheme is as follows: acquiring a target code base, wherein the target code base comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.

Description

Code file storage method and device and electronic equipment

Technical Field

The present application relates to data storage technologies in the field of data processing technologies, and in particular, to a code file storage method and apparatus, and an electronic device.

Background

Code searching is one of the most important means in the modern program development process. In different code libraries of the content addressing file system or in different branches of the code libraries, code files with the same file content may exist, that is, the code files with the same file content have the situation of repeated storage, so that under the condition that the number of the code files is large, a large amount of storage space is needed to store the code files, and the storage space is wasted.

Disclosure of Invention

The embodiment of the application provides a code file storage method and device and electronic equipment, and aims to solve the problem that storage space is wasted due to repeated storage of code files with the same file content.

In order to solve the above technical problem, the present application is implemented as follows:

a first aspect of the present application provides a code file storage method, including:

acquiring a target code base, wherein the target code base comprises at least one code file;

obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value;

storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.

A second aspect of the present application provides a code file storage apparatus, including:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target code library which comprises at least one code file;

a second obtaining module, configured to obtain a digest value set according to the at least one code file of the target code library, where the digest value set includes at least one target digest value, and multiple code files with the same file content correspond to one target digest value;

and the storage module is used for storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.

A third aspect of the present application provides an electronic device, comprising:

at least one processor;

and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

A fourth aspect of the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of the first aspect.

One embodiment in the above application has the following advantages or benefits: acquiring a target code base, wherein the target code base comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flowchart of a code file storage method provided by an embodiment of the present application;

FIG. 2 is a diagram illustrating a relationship between a file index and a code index provided by an embodiment of the present application;

FIG. 3 is a second flowchart of a code file storage method according to an embodiment of the present application;

FIG. 4 is a third flowchart of a code file storage method according to an embodiment of the present application;

FIG. 5 is a block diagram of a code file storage device according to an embodiment of the present application;

fig. 6 is a block diagram of an electronic device for implementing a code file storage method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, fig. 1 is a flowchart of a code file storage method provided in an embodiment of the present application, and as shown in fig. 1, the embodiment provides a code file storage method applied to an electronic device, including the following steps:

step 101, obtaining a target code base, wherein the target code base comprises at least one code file.

The content addressable file system (e.g., Git) may include a plurality of code libraries, the target code library may be one of the plurality of code libraries, and the user may obtain the target code library by downloading, the target code library including one or more code files, and in particular, the target code library may include a plurality of branches, each branch including one or more code files. A code file is understood to be a file whose contents are code.

Step 102, obtaining a digest value set according to the at least one code file of the target code library, wherein the digest value set includes at least one target digest value, and a plurality of code files with the same file content correspond to one target digest value.

The digest value set comprises a target digest value or a plurality of different target digest values, the target digest value is determined according to the file content of the code files in the at least one code file, and a plurality of code files with the same file content correspond to the target digest value. When the target abstract value is determined according to the file content of the code file, the file content can be processed according to a Hash algorithm to obtain the target abstract value.

Step 103, storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value.

And storing the code file corresponding to the target abstract value according to each target abstract value in the abstract value combination. Only one code file is stored aiming at each target abstract value, namely if one target abstract value corresponds to a plurality of code files, only the file content of one code file is stored because the file contents of the plurality of code files are the same, so that repeated storage is avoided, and the waste of storage resources is reduced.

In this embodiment, a target code library is obtained, where the target code library includes at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.

In an embodiment of the present application, the step 103 of storing the at least one code file of the target code library according to the digest value set includes:

for each target abstract value in the abstract value set, matching the target abstract value with a code index of a database, wherein the code index comprises an existing abstract value and an existing code file corresponding to the existing abstract value;

and if the target abstract value is unsuccessfully matched with the existing abstract value of the code index, storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index.

Specifically, the database is used for storing file contents of the code file, and the database may be an elastic search Engine (ES). The database includes a code index, which may also be referred to as a code index. The code index includes file contents of the code file, and may be specifically a snapshot of the file contents of the code file. The code index includes a field identification that stores a digest value of the code file, the digest value being determined based on file content of the code file. If the file contents of a plurality of code files are the same, the digest values of the plurality of code files are also the same, in this case, only one copy is stored in the code index, so that the repeated storage of the same file contents can be avoided. Further, if the target digest value is successfully matched with the existing digest value of the code index, which indicates that the target digest value already exists in the code index, new data does not need to be added to the code index.

For the sake of distinction, in this embodiment, the digest value included in the code index is referred to as an existing digest value, and the code file included in the code index is referred to as an existing code file.

And matching a target abstract value in the abstract value set with the existing abstract value of the code index, if the target abstract value is different from the existing abstract value, the matching is unsuccessful, and storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index. That is, if there are a plurality of code files corresponding to the target digest value, the file contents of the plurality of code files are the same, and when the plurality of code files are stored, the file content of one of the plurality of code files may be stored. The above-described manner is adopted for each target digest value in the digest value set.

In the embodiment, repeated storage of a plurality of code files with the same file content can be avoided, the storage space is saved, and the searching range of the code files is reduced when the code files are searched, because the inverted index and the number of the code files which need to be accessed during query are reduced. Each item in the code index is a target abstract value and the file content of the code file, the file content of the code file can be searched according to the target abstract value, and the code index can be regarded as an inverted index. Further, the file content in the code index may be the file content itself, or may be an address of the file content.

In this embodiment, when the target digest value is unsuccessfully matched with the existing digest value of the code index, the target digest value and the file content of a code file corresponding to the target digest value are stored in the code index, so as to avoid repeated storage of the same file content and reduce waste of storage space. Because a plurality of code files with the same file content only store one file content in the code index, the data storage capacity in the code index is also reduced, so that when the code files are searched in the code index, the searching range of code file searching can be reduced, and the efficiency of code file searching is improved.

for each target digest value in the digest value set, storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of the database.

The database may further include a file index, which may include two fields, a first field for storing file information of the code file and a second field for storing a digest value of the code file, that is, the file index may be used to store the file information and the digest value of the code file. The file information may also be referred to as metadata information, and the file information may include a name of a code library where the code file is located, a storage address, branch information where the code file is located, a file name of the code file, a time when the code file is submitted to the code library, a submitter, and the like. In the code index, the digest value may be regarded as a pointer of the code content and may also be referred to as a foreign key, and the file content of the code file may be obtained from the code index according to the digest value, and the file information of the code file may be obtained from the file index according to the digest value.

Fig. 2 is a diagram illustrating a relationship between a file index and a code index, where the file index and the code index are associated by a digest value, and a plurality of file information in the file index may correspond to one file content in the code index, and in reverse, one file content in the code index may correspond to a plurality of file information in the code index.

In this embodiment, if the target digest value is unsuccessfully matched with the existing digest value of the code index, in addition to storing the target digest value in the code index and the file content of a code file corresponding to the target digest value, it is also necessary to store the target digest value in the file index and the file information of one or more code files in the target code library corresponding to the target digest value.

For example, if the file contents of code file a and code file B are the same, the digest values of both code files are the target digest value a, the code index does not include the digest value that is the same as the target digest value a, the target digest value a and the file contents of code file a (or code file B) are stored in the code index, and two new pieces of data are stored in the file index, which are: a target abstract value a and file information of the code file A; a target digest value a, and file information of code file B.

If the target digest value is successfully matched with the existing digest value of the code index, new data does not need to be added into the code index, but the target digest value and file information of one or more code files in the target code base corresponding to the target digest value are still stored in the file index.

For example, if the file contents of code file a and code file B are the same, the digest values of both code files are target digest value a, and the code index includes the digest value that is the same as target digest value a, then two new pieces of data are stored in the file index, where the two new pieces of data are: a target abstract value a and file information of the code file A; a target digest value a, and file information of code file B.

That is, regardless of whether a target digest value is successfully matched with an existing digest value of a code index, for each target digest value in the digest value set, the target digest value and file information of one or more code files in the target code library corresponding to the target digest value are stored in the file index. The steps in this embodiment and the step of matching the target digest value with the code index of the database are not in sequence.

As shown in fig. 3, the code file parser parses the code files in the target code library, adds new data to the file index according to the parsing result, for example, obtains file contents, file information, and digest values of each code file, and then stores the file information and digest values in the file index. In addition, whether to add new data to the code index is determined according to the parsing result. As shown in fig. 4, when adding new data to the code index, it is determined whether a target digest value obtained by parsing already exists in the digest value set, and if not, the target digest value is added to the digest value set, and then the file content of the code file is further stored in the database based on the digest value set.

In this embodiment, for each target digest value in the digest value set, the target digest value and file information of one or more code files in the target code base corresponding to the target digest value are stored in the file index, so that storage of the code files is realized, and information loss of the code files is avoided.

In one embodiment of the present application, the target code library includes a default branch and a non-default branch; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;

step 102, obtaining a digest value set according to the at least one code file of the target code library, including:

for each first target code file in the first code files, obtaining a digest value of the first target code file according to the file content of the first target code file;

if the digest value set does not include the digest value of the first target code file, adding the digest value of the first target code file to the digest value set;

acquiring a second target code file in the second code file, wherein the file content of the second target code file is different from the file content of each first target code file in the first code file;

for each second target code file in the second code files, obtaining a digest value of the second target code file according to the file content of the second target code file;

adding the digest value of the second target code file to the set of digest values.

Specifically, the target code library at least comprises one branch, and if the target code library comprises one branch, the branch is a default branch; if the target code base comprises a plurality of branches, one branch of the plurality of branches is a default branch, and the rest branches are non-default branches. The default branch includes a first code file of the at least one code file and the non-default branch includes a second code file of the at least one code file other than the first code file.

In this embodiment, the code file in the default branch is processed first, specifically: for a first target code file in the first code files, acquiring a digest value of the first target code file according to the file content of the first target code file; if the digest value set does not include the digest value of the first object code file, adding the digest value of the first object code file to the digest value set. For example, traversing each first object code file in the first code files, analyzing the first object code files, and obtaining file contents, file information, and a digest value of each object code file, which can be specifically obtained by a git hash-object command. The target code base corresponds to a digest value set, i.e. a git _ blob set.

Then, processing the code file in the non-default branch, specifically: acquiring a second target code file in the second code file, and acquiring a digest value of the second target code file for one second target code file in the second code file according to the file content of the second target code file; adding the digest value of the second target code file to the set of digest values. The above-described process is performed for each of the second object code files. If there are a plurality of non-default branches, the above-mentioned processing is sequentially performed for a plurality of branches in the non-default branches.

The file content of the second target code file is different from the file content of each first target code file in the first code files, namely the first code file of the default branch is used as a comparison reference, the file content of each code file of the second code file of the non-default branch is compared with the file content of each code file of the first code file respectively, the code files with different file contents in the non-default branch and the file contents in the first code file are determined, and the code files are the second target code files. For example, a git command may be used to retrieve code files in the non-default branch that are different from the default branch file content (i.e., the second target code file), and then store the file content of these code files, i.e., retrieve the digest values of these code files and add them to the set of digest values. Code files with different contents from the default branch files in the non-default branch can be directly obtained through the git command, and an engineer does not need to write codes again to perform traversal comparison on the code files in the non-default branch.

In this embodiment, each first target code file of the first code file of the default branch is first parsed to determine that the digest value of the first target code file is included in the digest value set, then, with the first code file of the default branch as a reference, the second code file of the non-default branch is compared with the first code file to obtain a second target code file, and the digest value of the second target code file is added to the digest value set. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.

In an embodiment of the present application, the obtaining a second object code file in the second code file includes:

for each code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined;

and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file.

In the process of determining the second target code file, for one code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined; and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file. And executing the steps for each code file to be determined in the second code file to obtain a plurality of second target code files.

In this embodiment, for each code file to be determined in the second code file, a digest value of the code file to be determined is determined according to file content of the code file to be determined; and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.

for each third target code file in the second code file, obtaining a digest value of the third target code file according to the file content of the third target code file;

if the digest value set does not include the digest value of the third target code file, adding the digest value of the third target code file to the digest value set.

In this embodiment, each first target code file in the first code files is processed first, and digest values of the first target code files, which are not included in the digest value set, are added to the digest value set; the second code file is then processed to add to the set of digest values the digest values of the third target code file not included in the set of digest values. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.

Further, if a plurality of code libraries are acquired, determining each code library in the plurality of code libraries as a target code library in sequence, and then processing the target code library by using the method provided by the above embodiment, where each target code library corresponds to one digest value set.

Referring to fig. 5, fig. 5 is a structural diagram of a code file storage apparatus according to an embodiment of the present application, and as shown in fig. 5, the present embodiment provides a code file storage apparatus 500, including:

a first obtaining module 501, configured to obtain a target code library, where the target code library includes at least one code file;

a second obtaining module 502, configured to obtain a digest value set according to the at least one code file of the target code library, where the digest value set includes at least one target digest value, and multiple code files with the same file content correspond to one target digest value;

a storage module 503, configured to store the at least one code file of the target code library according to the digest value set, where only the file content of one code file is stored for each target digest value.

In an embodiment of the present application, the storage module 503 includes:

the matching submodule is used for matching each target abstract value in the abstract value set with a code index of a database, wherein the code index comprises an existing abstract value and an existing code file corresponding to the existing abstract value;

and the first storage submodule is used for storing the target abstract value and the file content of a code file corresponding to the target abstract value in the code index if the target abstract value is unsuccessfully matched with the existing abstract value of the code index.

In an embodiment of the present application, the storage module 503 includes:

and the second storage sub-module is used for storing the target digest value and file information of one or more code files in the target code base corresponding to the target digest value in a file index of the database for each target digest value in the digest value set.

the second obtaining module 502 includes:

the first obtaining submodule is used for obtaining a digest value of each first target code file in the first code files according to the file content of the first target code file;

a first adding submodule, configured to add the digest value of the first target code file to the digest value set if the digest value set does not include the digest value of the first target code file;

the second obtaining submodule is used for obtaining a second target code file in the second code file, and the file content of the second target code file is different from the file content of each first target code file in the first code file;

a third obtaining sub-module, configured to, for each second object code file in the second code files, obtain a digest value of the second object code file according to file content of the second object code file;

and the second adding submodule is used for adding the digest value of the second target code file into the digest value set.

In an embodiment of the application, the second obtaining sub-module is configured to:

The code file storage 500 can implement the processes implemented by the electronic device in the method embodiment shown in fig. 1, and is not described herein again to avoid repetition.

The code file storage device 500 of the embodiment of the application acquires a target code library, where the target code library includes at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 6, it is a block diagram of an electronic device according to the code file storage method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 6, the electronic apparatus includes: one or more processors 601, memory 602, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 6, one processor 601 is taken as an example.

The memory 602 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the code file storage methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the code file storage method provided herein.

The memory 602, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the code file storage method in the embodiments of the present application (for example, the first obtaining module 501, the second obtaining module 502, and the storage module 503 shown in fig. 5). The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implements the code file storage method in the above-described method embodiments.

The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device implementing the code file storage method, and the like. Further, the memory 602 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 602 optionally includes memory located remotely from the processor 601, which may be connected via a network to an electronic device implementing the code file storage method. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device implementing the code file storage method may further include: an input device 603 and an output device 604. The processor 601, the memory 602, the input device 603 and the output device 604 may be connected by a bus or other means, and fig. 6 illustrates the connection by a bus as an example.

The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of an electronic apparatus implementing the code file storage method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. the output device 604 may include a display device, an auxiliary lighting device (e.g., L ED), a tactile feedback device (e.g., a vibration motor), etc.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable logic devices (P L D)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.

The systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or L CD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer for providing interaction with the user.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., AN application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with AN implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, a target code base is obtained and comprises at least one code file; obtaining a summary value set according to the at least one code file of the target code library, wherein the summary value set comprises at least one target summary value, and a plurality of code files with the same file content correspond to one target summary value; storing the at least one code file of the target code library according to the digest value set, wherein only the file content of one code file is stored for each target digest value. Therefore, if the file contents of a plurality of code files in the target code library are the same, only one file content is stored, so that the repeated storage of the same file content is avoided, the waste of storage resources is reduced, and the storage cost is reduced.

In addition, when the target abstract value is unsuccessfully matched with the existing abstract value of the code index, the target abstract value and the file content of a code file corresponding to the target abstract value are stored in the code index, so that the repeated storage of the same file content is avoided, and the waste of storage space is reduced. Because a plurality of code files with the same file content only store one file content in the code index, the data storage capacity in the code index is also reduced, so that when the code files are searched in the code index, the searching range of code file searching can be reduced, and the efficiency of code file searching is improved.

And for each target digest value in the digest value set, storing the target digest value and file information of one or more code files in the target code base corresponding to the target digest value in the file index, so that the code files are stored, and information loss of the code files is avoided.

Analyzing each first target code file of the first code file of the default branch to determine that the digest value of the first target code file is included in the digest value set, then comparing the second code file of the non-default branch with the first code file by taking the first code file of the default branch as a reference to obtain a second target code file, and adding the digest value of the second target code file to the digest value set. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.

For each code file to be determined in the second code file, determining a digest value of the code file to be determined according to the file content of the code file to be determined; and if the digest value of the code file to be determined is different from the digest value of each first target code file in the first code file, determining the code file to be determined as the second target code file. Therefore, the code files with the same file content correspond to one abstract value in the abstract value set, and when the code files are stored according to the abstract values in the abstract value set subsequently, the repeated storage of the same file content can be avoided, and the waste of storage space is reduced.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A code file storage method, comprising:

2. The method according to claim 1, wherein said storing the at least one code file of the target code library according to the digest value set comprises:

3. The method according to claim 1, wherein said storing the at least one code file of the target code library according to the digest value set comprises:

for each target digest value in the digest value set, storing the target digest value and file information of one or more code files in the target code library corresponding to the target digest value in a file index of a database.

4. The code file storage method according to claim 1, wherein the target code library includes a default branch and a non-default branch; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;

said obtaining a set of digest values from said at least one code file of said target code library, comprising:

5. The method for storing a code file according to claim 4, wherein the obtaining a second object code file in the second code file comprises:

6. A code file storage device, comprising:

7. The code file storage device of claim 6, wherein the storage module comprises:

8. The code file storage device of claim 6, wherein the storage module comprises:

and the second storage sub-module is used for storing the target digest value and file information of one or more code files in the target code base corresponding to the target digest value in a file index of a database for each target digest value in the digest value set.

9. The code file storage device of claim 6, wherein the target code library comprises default branches and non-default branches; the default branch comprises a first code file of the at least one code file, and the non-default branch comprises a second code file of the at least one code file other than the first code file;

the second obtaining module includes:

10. The code file storage device of claim 9, wherein the second obtaining sub-module is configured to:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.