CN116955282A

CN116955282A - Query method and related device

Info

Publication number: CN116955282A
Application number: CN202310514461.XA
Authority: CN
Inventors: 李命饶
Original assignee: Guangzhou Tencent Technology Co Ltd
Current assignee: Guangzhou Tencent Technology Co Ltd
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2023-10-27

Abstract

The application discloses a query method and a related device, wherein the method comprises the following steps: acquiring content to be processed through an application program; obtaining corresponding abstract integer data through integer mapping according to the content abstract identification of the content to be processed; determining an index identifier to be identified of the content to be processed based on the abstract integer data and the content capacity of the content to be processed; querying an index database of the application program through the index identification to be identified, wherein the index database is used for recording the index identification of the stored content stored by the application program; in response to the index database having the target index identifier identical to the index identifier to be identified, determining that the content to be processed and the first content corresponding to the target index identifier are identical, and establishing a first hard link based on the first content for the content to be processed, so that the high efficiency and the accuracy of index query can be simultaneously realized.

Description

Query method and related device

Technical Field

The present application relates to the field of data processing, and in particular, to a query method and related apparatus.

Background

Hard linking refers to the sharing of the same file storage unit by multiple files in a computer file system equally. A plurality of hard links can be established based on one source file, the source file can be accessed through the plurality of hard links, and only the source file is stored in the storage space, so that the storage space occupied by repeatedly storing a plurality of identical files can be saved.

Before establishing a hard link for a file, index inquiry is required to be performed on a local storage file, and the same file corresponding to the file is inquired from the local storage file.

However, when the order of magnitude of the local storage file is large, the time consumed for index query of the local storage file by the character string comparison mode is long, so that the efficiency of index query is low, and the high efficiency requirement of index query is difficult to meet.

Disclosure of Invention

In order to solve the technical problems, the application provides a query method and a related device, which can meet the high efficiency and accuracy of index query.

The embodiment of the application discloses the following technical scheme:

in one aspect, an embodiment of the present application provides a query method, which is characterized in that the method includes:

acquiring content to be processed through an application program;

obtaining corresponding abstract integer data through integer mapping according to the content abstract identification of the content to be processed;

Determining a to-be-identified index identifier of the to-be-processed content based on the abstract integer data and the content capacity of the to-be-processed content;

querying an index database of the application program through the index identifier to be identified, wherein the index database is used for recording the index identifier of the stored content stored by the application program;

and responding to the index database with the target index identifier which is the same as the index identifier to be identified, determining that the first content corresponding to the target index identifier of the content to be processed is the same content, and establishing a first hard link based on the first content for the content to be processed.

In yet another aspect, an embodiment of the present application provides a query apparatus, including: the device comprises an acquisition unit, an integer mapping unit, a determination unit, a query unit and a hard link unit;

the acquisition unit is used for: acquiring content to be processed through an application program;

the integer mapping unit is used for obtaining corresponding abstract integer data through integer mapping according to the content abstract identification of the content to be processed;

the determining unit is used for: determining a to-be-identified index identifier of the to-be-processed content based on the abstract integer data and the content capacity of the to-be-processed content;

The query unit is configured to: querying an index database of the application program through the index identifier to be identified, wherein the index database is used for recording the index identifier of the stored content stored by the application program;

the hard link unit is used for: and responding to the index database with the target index identifier which is the same as the index identifier to be identified, determining that the first content corresponding to the target index identifier of the content to be processed is the same content, and establishing a first hard link based on the first content for the content to be processed.

Preferably, when the content to be processed is content received by the application program and not yet downloaded, the apparatus further includes a saving unit configured to:

and determining that the stored content does not have the content to be processed in response to the index database not having the target index identifier which is the same as the index identifier to be identified, and storing the content to be processed through the application program.

Preferably, the apparatus further comprises an updating unit for:

generating a first content index of the to-be-processed content for the application program according to the to-be-identified index identifier and the storage path of the to-be-processed content;

Updating the first content index into the index database.

Preferably, when the content to be processed is content to be forwarded to other terminal devices through the application program, the query unit is further configured to:

determining a forwarding path of the content to be processed;

responsive to the forwarding path being associated with the application program, establishing a hard link for the content to be processed based on source content of the forwarding path;

and responding to the forwarding path not associated with the application program, executing the operation of querying an index database of the application program through the index identification to be identified.

Preferably, the apparatus further comprises a replication unit for:

and determining that the stored content does not have the content to be processed in response to the index database not having the target index identifier which is the same as the index identifier to be identified, and copying the content to be processed to a interface to be forwarded through the application program based on the forwarding path.

Preferably, after the establishing of the first hard link based on the first content for the content to be processed, the apparatus further comprises an updating unit for:

Generating a second content index of the first hard link for the application program according to the index identifier to be identified and the storage path of the first hard link;

and updating the second content index into the index database.

Preferably, the saved content includes a second content having a second content index in the index database, the second content index including a second index identification and an index value including a path item for identifying a save path of the second content;

the determining unit is specifically configured to:

dividing the storage path of the second content into a fixed field, a content source field and a content name field according to the type of the path field of the storage path of the second content, wherein the fixed field is a common path field in the storage path of the stored content, and the content source field is used for identifying an account number identifier of a target account number for providing the second content in the application program and storing a time parameter of the second content;

discarding the fixed field, and obtaining corresponding source integer data through integer mapping according to field information of the content source field;

The path item is determined by the source integer data and the content name field.

Preferably, the source integer data comprises a first integer data sub-item and a second integer data sub-item, the determining unit being specifically adapted to:

obtaining a corresponding first integer data sub-item through an integer mapping rule according to field information of the content source field for identifying the account identifier;

and obtaining a corresponding second integer data sub-item through the integer mapping rule according to the field information of the content source field for identifying the time parameter.

Preferably, the index database further comprises a restoration index table for the integer mapping rule, and the apparatus further comprises a restoration unit for:

acquiring an acquisition request for a storage path of the second content, the acquisition request including a second index identifier of the second content;

searching an index value in the second content index from the index database according to the second index identifier;

according to the first integer data sub-item and the second integer data sub-item in the path item of the index value, restoring the account identification and the time parameter through the restoring index table;

And generating a storage path of the second content through the account number identification, the time parameter, the fixed field and the content name field in the path item.

Preferably, the index value further includes a content digest identification of the second content and a latest modification time of the second content by the application program.

Preferably, the second content is created with N hard links, N >1, the index database includes index content corresponding to the N hard links, and the apparatus further includes a deletion unit configured to:

switching a target hard link of the N hard links from a read-only state to a writable state;

when the target hard link is deleted by the application program, inquiring N-1 content indexes with the same latest modification time and deletion time in the index value from the index database according to the deletion time of the target hard link;

and switching the states of the N-1 hard-linked files corresponding to the N-1 content indexes from a writable state to a read-only state.

Preferably, the apparatus further comprises a construction unit for:

acquiring the content to be processed and stored which is stored by the application program;

obtaining corresponding abstract integer data to be processed through integer mapping according to the content abstract identification of the stored content to be processed;

Determining an index identifier of the to-be-processed saved content based on the to-be-processed summary integer data and the content capacity of the to-be-processed saved content;

responding to a plurality of to-be-processed saved contents with the same index identification in the to-be-processed saved contents, taking one to-be-processed saved content in the plurality of to-be-processed saved contents as a target saved content, deleting other to-be-processed saved contents, and establishing a hard link based on the target saved content for the other to-be-processed saved contents;

and taking the rest target saved content as the saved content, and constructing the index database according to the index identification of the saved content.

In yet another aspect, an embodiment of the present application provides a computer device including a processor and a memory:

the memory is used for storing a computer program and transmitting the computer program to the processor;

the processor is configured to perform the method according to the above aspect according to the computer program.

In yet another aspect, embodiments of the present application provide a computer-readable storage medium for storing a computer program for performing the method described in the above aspect.

In yet another aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of the above aspect.

According to the technical scheme, when the content to be processed is obtained through the application program, the content abstract identification of the content to be processed can be obtained, and because the content abstract identification is non-integer data, a large amount of system resources are consumed in indexing, the abstract integer data corresponding to the content abstract identification can be obtained through integer mapping, so that the repetition rate of the abstract integer data can be reduced, the corresponding index identification to be identified is determined by combining the content capacity of the content to be processed, and the uniqueness of the identification of the content to be processed is improved under the condition of reducing the complexity of data. Because the index database is recorded with the index identifier of the stored content stored by the application program, whether the stored content has the first content identical to the to-be-processed content or not can be quickly determined based on the to-be-identified index identifier with a simple data structure, and when the stored content has the first content, a hard link corresponding to the first content is established for the to-be-processed content, and the to-be-processed content is not required to be additionally stored. Therefore, in a hard-link storage scene, when repeated content needs to be determined, the index database is queried according to the index identification to be identified obtained by mapping the non-integer content abstract identification into integer data, compared with a character string comparison mode in index query in the related technology, the comparison between integer data can be greatly optimized in terms of processing capacity and efficiency, and query precision can be improved by combining content capacity, so that high efficiency and accuracy are realized simultaneously.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a query scenario provided in an embodiment of the present application;

FIG. 2 is a method flow chart of a query method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a relationship between the number of mapping values output by the FNV algorithm and collision probability according to an embodiment of the present application;

FIG. 4 is a flowchart of a query method for a document 1 according to an embodiment of the present application;

FIG. 5 is a flowchart of a query method for an attachment download request according to an embodiment of the present application;

fig. 6 is a flowchart of a query method in a forwarding scenario provided in an embodiment of the present application;

FIG. 7 is a graph comparing performance of an original solution and an optimized solution according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a query device according to an embodiment of the present application;

Fig. 9 is a block diagram of a terminal device according to an embodiment of the present application;

fig. 10 is a block diagram of a server according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described below with reference to the accompanying drawings.

By establishing hard links for the same content and using a plurality of hard links to access source content corresponding to the same content, the storage resources occupied by repeatedly storing the same content in the application program using process can be reduced. In the related art, when a new content to be processed is obtained by an application program, index query is required to be performed on a locally stored content by means of character string comparison according to a content abstract of the content to be processed, so as to determine the same content corresponding to the content to be processed in the content stored by the application program, and a storage position of the same content, and establish a hard link between the content to be processed and the same content. However, when the index query is performed in a character string comparison mode, each character in the character string needs to be compared one by one, a large amount of system resources are consumed, when the local storage content is more, the time consumed when the index query is performed according to the content abstract is longer, the index query efficiency is lower, and the high-efficiency requirement of the index query is difficult to meet.

Therefore, the embodiment of the application provides a query method and a related device, which perform index query by using the index identifier to be identified only comprising integer data, combine the content capacity in the index identifier to be identified, save the index query time, improve the index query efficiency and realize the high efficiency and the accuracy of the index query while ensuring the query precision.

The query method provided by the embodiment of the application can be implemented through computer equipment, wherein the computer equipment can be terminal equipment or a server, and the server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server for providing cloud computing service. Terminal devices include, but are not limited to, cell phones, computers, intelligent voice interaction devices, intelligent home appliances, vehicle terminals, aircraft, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

It will be appreciated that in the specific embodiments of the present application, related data such as user information, contact information, etc. are involved, and when the above embodiments of the present application are applied to specific products or technologies, user permissions or consents need to be obtained, and the collection, use and processing of related data need to comply with relevant laws and regulations and standards of relevant countries and regions.

The following examples are provided to illustrate the invention:

in the scenario shown in fig. 1, a terminal device 100 is explained as an example of the aforementioned computer device. In order to save the storage space in the terminal device, the same file corresponding to the file 1 stored by the application program needs to be queried in the terminal device, and when the same file exists, a hard link between the file 1 and the same file can be directly established, so that repeated storage of the same file is avoided.

However, since the content digest of the file 1 is identified as non-integer data, a great amount of system resources are consumed by performing index query on the terminal device through the non-integer data, and the efficiency of the index query is low. Meanwhile, because the abstract integer data obtained through integer mapping may be repeated, in order to ensure the accuracy of index query, the content capacity of the file 1 is introduced when the index identifier to be identified corresponding to the file 1 is determined, and the index identifier to be identified corresponding to the file 1 is determined through the abstract integer data and the content capacity, and because the content capacity of the file 1 is also integer data, the index identifier to be identified determined according to the abstract integer data and the content capacity is integer data, and the data structure of the index identifier to be identified for index query is simplified.

The index database stores index identifiers corresponding to the stored contents stored by the application program, and the index identifiers stored in the index database are integer data determined according to the content abstract identifiers and the content capacity of the stored contents. According to the index identification to be identified, index inquiry is carried out, index identification identical to the index identification to be identified is inquired from an index database, when the index identification to be identified is identical to index identification 1 in the index database, stored content corresponding to the index identification 1 is considered to be identical to the content to be processed corresponding to the index identification to be identified, the index identification 1 is determined to be a target index identification, stored content corresponding to the index identification 1 is determined to be first content corresponding to the target index identification, then the content to be processed is identical to the first content, a hard link corresponding to the first content is established for the content to be processed based on the first content, when a user opens file 1 through a chat interface in an application program, the first content is directly accessed based on the hard link, downloading of the file 1 is not needed, and repeated storage of the same file is avoided.

That is, in the case of using hard links to save the scene, when the duplicate content corresponding to the file 1 needs to be determined, by mapping the non-integer content summary identifier to integer data, then determining the index identifier to be identified according to the integer content capacity, and querying the index database according to the index identifier to be identified, compared with the index querying method using the non-integer data, the computing amount in the index querying process can be effectively reduced while the index querying precision is ensured, the time spent for index querying is reduced, the efficiency of index querying is improved, and meanwhile, the high efficiency and the accuracy are realized.

Fig. 2 is a flowchart of a method for querying a user, where the method may be performed by a computer device, and in this embodiment, the computer device is taken as an example of a terminal device, and the method includes:

step 201: and acquiring the content to be processed through the application program.

The application program is a software program running on the terminal device and is used for executing specific tasks or functions, such as a shopping application program for commodity transaction, a social application program for chat and friend making, and the like, the storage positions of data in different application programs in the terminal device are different, and in the process of using the application program by a user, the application program receives messages sent by other users through the application program, wherein the messages possibly comprise contents such as text information, images, videos and files, and when the user needs to access the contents such as pictures, videos and files in the messages through the application program, the contents need to be downloaded into the storage positions of the application program corresponding to the terminal device, and then the contents are opened from the storage positions of the application program corresponding to the terminal device through the application program, so that the contents are provided for the user.

With the increase of the processing information of the user in the application program, the content stored by different information in the application program may be the same, and the same content is repeatedly stored, so that the storage space of the application program in the terminal device is occupied, and the waste of the storage space is caused for the user. In the embodiment of the application, the content to be processed acquired by the application program is the message content which is not stored by the terminal equipment after the application program in the terminal equipment receives the message prompt. When the terminal device receives the message prompt corresponding to the content to be processed through the application program, basic information of the content to be processed, such as a content name, a content capacity, a sending time and the like, can be obtained from the message prompt, but specific contents, such as images, videos, files and the like, corresponding to the content to be processed are not received and stored by the terminal device, so that the content to be processed does not occupy a storage space in the terminal device yet.

For example, in the scenario shown in fig. 1, the contact a sends the file 1 to the user through the application program, where the file 1 is the content to be processed in the scenario, and a message prompt corresponding to the content to be processed is displayed in the chat interface, and at this time, the content name, content capacity and sending time of the content to be processed may be obtained through the message prompt, but the file 1 corresponding to the content to be processed is not yet downloaded into the terminal device.

Step 202: and obtaining corresponding abstract integer data through integer mapping according to the content abstract identification of the content to be processed.

After the terminal equipment obtains the content to be processed through the application program, a content abstract identifier corresponding to the content to be processed can be obtained, wherein the content abstract identifier is a unique identifier generated according to specific content contained in the content to be processed, and whether specific content corresponding to the content abstract identifiers is the same or not can be determined according to comparison among the content abstract identifiers; when the specific contents are the same, the content digest identifications corresponding to the specific contents are also the same.

In particular implementations, the content Digest identification may be determined from an MD5 value generated by an MD5Message-Digest Algorithm (MD 5 Message-Digest). The MD5message digest algorithm is a widely used cryptographic hash function that can generate a 128-bit hash value to ensure that the information transfer is completely consistent. In the message transmission process, MD5 generated by the sending end according to the sent message is compared with MD5 generated by the receiving end according to the received message, and when the MD5 of the sending end and the receiving end are identical, the sent message and the received message are considered to be consistent, so that in the embodiment of the application, the MD5 can be used as a content abstract mark for checking consistency among contents.

It should be noted that the content digest identification may be determined by other means, and the foregoing MD5 is merely an example and should not be construed as limiting the embodiments of the present application.

However, the content summary identifier generated according to the specific content included in the content to be processed is often non-shaping data, the data structure is complex, and the index query through the complex data structure needs to consume a large amount of system resources, and takes a long query time, so that the index query efficiency is low. For example, the MD5 value obtained by the MD5 message digest algorithm is a long string, and if the MD5 value is used for index query, string comparison is required, the string is a string of characters consisting of numbers, letters and underlines, the string comparison is to compare each of a plurality of strings one by one, and since each of the strings has a plurality of possible forms, the data composition is complex, and thus it takes a long time for the string comparison. Taking comparison between two character strings as an example, when comparing the two character strings, whether the characters corresponding to the same positions in the two character strings are the same or not needs to be compared, and when the characters corresponding to the same positions in the two character strings are the same and the lengths of the two character strings are the same, the two character strings are considered to be the same. And comparing the MD5 value corresponding to the content to be processed with the MD5 value of the stored content stored in the terminal equipment through the application program, wherein in the process of comparing the character strings, each character in the character strings is required to be checked one by one, the time consumption is long, and the index inquiry efficiency is low.

In order to improve index query efficiency, in the embodiment of the application, integer mapping is needed for the content abstract identification of the content to be processed. Integer mapping means mapping non-Integer data, and endowing the non-Integer data with a mapping value, wherein the mapping value is Integer data (Intger, INT), and the data stored in the Integer data is Integer, so that the Integer mapping can be directly realized by comparing the numerical values of the integers in the Integer data when comparing the Integer data, and compared with a character string comparison mode in the related art, the time consumed by data comparison is effectively shortened; the mapping value is used as a label of the non-integer data, a mapping relation between the mapping value and the non-integer data is established, and based on the mapping relation, the same non-integer data corresponds to the same mapping value. In the embodiment of the application, since the content abstract identifiers are the unique identifiers corresponding to the content to be processed, the content abstract identifiers corresponding to the same content are the same, so that the mapping values obtained by mapping the same content abstract identifiers are the same, namely the same content has the same label, the label is abstract integer data, the integer data with the same value as the abstract integer data can be quickly searched for based on the value comparison among the integer data by inquiring the abstract integer data, the same content with the same label as the content to be processed is quickly determined, and the time consumed in the inquiring process is effectively reduced.

Specifically, the integer mapping can be implemented by an FNV (Fowler-Noll-Vo) algorithm, which is a fast and reliable hash algorithm that can hash large amounts of data quickly and keep the collision rate small. The hash (hash) refers to transforming an input with any length into an output value with a fixed length through a hash algorithm, and in the embodiment of the application, the content abstract identifier of the content to be processed is subjected to integer mapping through an FNV algorithm, the content abstract identifier is mapped into integer data, and the mapping value of the FNV algorithm aiming at the content abstract identifier is the abstract integer data corresponding to the content abstract identifier. In summary integer data output by the FNV algorithm, when different content summary identifications are mapped to the same summary integer data, collision occurs, and the summary integer data corresponding to the different content summary identifications are repeated.

The FNV algorithm has high dispersibility, and when the FNV algorithm is used for mapping, a collision probability formula between mapping values is shown as follows:

where p (k) is the collision probability, k is the number of inputs, and N is the number of mapped values.

When mapping is performed using the 32-bit map value, the relationship between the Number of map values (Number of 32-bit maps) output by the FNV algorithm and the collision probability (Probability of hash collision) is shown in fig. 3, and it can be seen that as the Number of map values output by the FNV algorithm increases, the collision probability when mapping is performed using the 32-bit map value gradually increases.

Specifically, three groups of data, namely numbers 1 to 216553, 216553 different english words and 216553 random universal unique identification codes (Universally Unique Identifier, UUID), are mapped by using a 32-bit hash value through an FNV-1a algorithm and an FNV-1 algorithm in the FNV algorithm, and the collision numbers corresponding to the two algorithms are shown in the following table:

when the numbers 1 to 216553 are mapped, collision does not occur in the FNV-1a algorithm and the FNV-1 algorithm; when 216553 different English words are mapped, 4 English words are collided by using a mapping value of the FNV-1a algorithm, and 1 English word is collided by using the FNV-1 algorithm; when 216553 random UUIDs are mapped, 4 UUIDs are collided by using the FNV-1a algorithm, 5 UUIDs are collided by using the FNV-1 algorithm, and therefore compared with the total data amount 216553 of each group, the number of collided mapping values when the FNV algorithm is used for mapping is extremely small.

Therefore, the integer mapping can be performed on the content abstract identification of the content to be processed in the embodiment of the application through the FNV algorithm.

Step 203: and determining a to-be-identified index identifier of the to-be-processed content based on the abstract integer data and the content capacity of the to-be-processed content.

The abstract integer data is integer data corresponding to the content abstract identifier of the content to be processed, which is obtained through integer mapping, so that the data complexity of the content abstract identifier is reduced; the content capacity of the content to be processed is also integer data, and is used for describing the size of the content to be processed, namely, the storage space occupied when the content to be processed is stored, for example, when the content to be processed is a file, the content capacity is the size of the file, and when the content to be processed is an image, the content capacity is the size of the image. The index identification to be identified is an identification which is determined according to the abstract integer data and the content capacity and is used in the index inquiry process according to the content to be processed.

In the process of using the application program by the user, as the user data increases, more and more contents need to determine the corresponding mapping value through the integer mapping, and the input quantity of the integer mapping also increases. When the mapping value range of integer mapping is limited, mapping value collision may occur, that is, different summary identifiers of content may be mapped into the same summary integer data, at this time, the corresponding to-be-processed content cannot be uniquely determined according to the summary integer data, which results in reduced accuracy of index query. Therefore, the embodiment of the application introduces the content capacity of the content to be processed in the index inquiry process, even if the abstract integer data corresponding to different content abstract identifiers are the same, the content capacity corresponding to different specific contents is different because the specific contents corresponding to different content abstract identifiers are different, so that the content capacity corresponding to different content abstract identifiers is different, and the index identifier to be identified corresponding to the content to be processed can be determined by two factors of the abstract integer data and the content capacity on the basis of the abstract integer data.

The index identifier to be identified is an identifier for verifying consistency among a plurality of contents when index inquiry is performed. When the summary integer data and the content capacity corresponding to the plurality of contents are the same, the index identifiers to be identified corresponding to the plurality of contents are the same, and then the plurality of contents can be determined to be the same through the index identifiers to be identified.

Because the abstract integer data and the content capacity of the content to be processed are integer data, the index identifier to be identified determined according to the abstract integer data and the content capacity of the content to be processed are integer data, and index inquiry is carried out through the index identifier to be identified.

Step 204: and querying an index database of the application program through the index identification to be identified.

The index database is used for recording index identifiers of stored contents stored through the application program.

Since the index identification to be identified determined according to the summary integer data and the content capacity is the integer data, the index identification recorded in the index database is also the integer data. Therefore, when the index database of the application program is queried through the index identification to be identified, integer data with the same value as the index identification to be identified can be queried from the index database of the application program directly through a numerical comparison mode. When comparing different types of data in a computer, the comparison between character strings needs to be performed on each character in the character strings in sequence, the comparison of floating point data needs to be performed on floating point data with different precision, then numerical comparison is performed on the converted data, and the comparison of integer data can be directly performed by comparing integer numerical values, so that the time consumed by index inquiry can be effectively shortened, and the efficiency of index inquiry is improved.

Step 205: and responding to the index database with the target index identifier which is the same as the index identifier to be identified, determining that the first content corresponding to the target index identifier of the content to be processed is the same content, and establishing a first hard link based on the first content for the content to be processed.

When the index database of the application program in the terminal equipment inquires the target index identification which is the same as the index identification to be identified, the content which is the same as the content to be processed in the abstract integer data and the content capacity exists in the saved content which is saved by the application program in the terminal equipment, namely the abstract integer data and the content capacity between the first content corresponding to the target index identification and the content to be processed are the same; based on the mapping relation established in the integer mapping process, the summary identification of the first content is considered to be the same as the summary identification of the content to be processed; the first content may be considered to be identical to the content to be processed under the condition that the content digest identifier and the content capacity are identical, i.e., the same content corresponding to the content to be processed stored by the application program is already stored in the terminal device.

After the content to be processed and the first content are the same, a first hard link can be established for the content to be processed directly based on the first content, wherein the hard link refers to linking up the respective file names of the same files and node numbers used by the computer file system when the same files exist in the computer file system, the same file storage unit is shared equally by the same files, and the file storage unit is the storage position of a source file corresponding to the same files, and can be accessed through the file names, so that the storage space in the computer file system is saved. When a user accesses the content to be processed through the application program, the source content corresponding to the first content and the content to be processed can be opened directly through the first hard link without downloading the content to be processed, so that repeated storage of the same content is avoided, and the storage space occupied by the application program in the terminal equipment is saved.

When the embodiment of the application obtains the content to be processed through the application program, the content abstract identification of the content to be processed can be obtained, and because the content abstract identification is non-integer data, a large amount of system resources are consumed when indexing is performed, the abstract integer data corresponding to the content abstract identification can be obtained through integer mapping, and the corresponding index identification to be identified is determined by combining the content capacity of the content to be processed in order to reduce the possible repetition rate of the abstract integer data, so that the uniqueness of the identification of the content to be processed is improved under the condition of reducing the complexity of the data. Because the index database is recorded with the index identifier of the stored content stored by the application program, whether the stored content has the first content identical to the to-be-processed content or not can be quickly determined based on the to-be-identified index identifier with a simple data structure, and when the stored content has the first content, a hard link corresponding to the first content is established for the to-be-processed content, and the to-be-processed content is not required to be additionally stored. Therefore, in a hard-link storage scene, when repeated content needs to be determined, the index database is queried according to the index identification to be identified obtained by mapping the non-integer content abstract identification into integer data, compared with a character string comparison mode in index query in the related technology, the comparison between integer data can be greatly optimized in terms of processing capacity and efficiency, and query precision can be improved by combining content capacity, so that high efficiency and accuracy are realized simultaneously.

After establishing a first hard link for the content to be processed based on the first content, the index database needs to be updated based on the first hard link, so after step 205, a query method provided by an embodiment of the present application further includes the following steps:

s11: and generating a second content index of the first hard link for the application program according to the index identifier to be identified and the storage path of the first hard link.

Before establishing a hard link for the content to be processed, determining whether the same content exists in the stored content and the storage position of the same content; when the first content and the content to be processed are the same, a first hard link is established for the content to be processed based on the first content, and a new path is established for the source content corresponding to the first content and the content to be processed through hard connection without downloading the specific content corresponding to the content to be processed. The storage position of the source content corresponding to the content to be processed can be determined through the first hard link, and the source content is accessed according to the storage position.

S12: and updating the second content index into the index database.

And adding the second content index into an index database, and updating the index database. And storing the index identifier to be identified corresponding to the content to be processed and the storage path of the first hard link in the form of a second content index in the index database, and directly managing the content to be processed through the second content index. When index inquiry is carried out subsequently, the index identification to be identified which is equal to the value can be quickly inquired from the index database according to the value of integer data in the index identification, then a storage path of a first hard link in the second index content is determined through the second index content containing the index identification to be identified, and the first hard link is called through the storage path, so that source content corresponding to the first hard link is called.

Taking the scenario shown in fig. 1 as an example, taking a file 1 as a content to be processed, establishing a first hard link for the file 1 based on the first content, generating a second index content corresponding to the file 1 according to an index identifier to be identified of the file 1 and a storage path of the first hard link, and storing the second index content in an index database corresponding to the application program. The hard links may be represented as one or more file names of a file, file 1 being one of the file names of the source files in the scenario shown in fig. 1, which is managed based on the first hard link. Although the specific file corresponding to the file 1 is not downloaded to the terminal device 100 through the application program, when the index is queried, the corresponding second content index can be determined from the index database according to the index identifier to be identified of the file 1, and the first hard link is called based on the save path of the first hard link in the second content index, so that the source file connected to the first hard link is accessed, that is, the user can directly access the specific content corresponding to the first hard link through the file name of "file 1" in the chat interface.

When the target index identification which is the same as the index identification to be identified exists in the index database, a second content index which aims at the content to be processed is generated according to the index identification to be identified and the storage path of the first hard link, and the second content index is updated into the index database, so that timeliness of the index content in the index database is improved, query time of index query is shortened, and further efficiency of index query is improved.

If the target index identification identical to the index identification to be identified is not queried in the index database corresponding to the application program according to the index identification to be identified of the content to be processed, at this time, it is considered that the content identical to the content to be processed does not exist in the stored content stored by the terminal device through the application program, and then the content to be processed at this time is the content which is received by the application program and is not yet downloaded, and the content to be processed needs to be stored in the terminal device through the application program so that the user accesses the content to be processed.

Therefore, in one possible implementation manner, when the content to be processed is content that is received by the application program and has not yet been downloaded, the query method provided by the embodiment of the present application further includes:

When the index database does not have the target index identification which is the same as the index identification to be identified, the stored content which is stored by the terminal equipment through the application program is considered to have no content which is the same as the content to be processed, namely, although the content to be processed is received by the application program, the specific content which corresponds to the content is not downloaded, namely, the stored content which corresponds to the application program in the terminal equipment is not stored; the content to be processed is a new content with respect to the previously stored content. When a user needs to access the to-be-processed content through the application program, the to-be-processed content cannot be found in the stored content, and therefore, when the to-be-processed content is a content which is received through the application program and is not yet downloaded, a specific content corresponding to the to-be-processed content needs to be downloaded through the application program, and the specific content and the stored content are stored in the terminal device together.

Specifically, referring to fig. 4, taking the file 1 in fig. 1 as an example, when the terminal device 100 receives the file 1 through the application program, firstly, index inquiry is performed in the index database through the index identifier to be identified of the file 1, so as to determine whether the same file corresponding to the file 1 exists in the saved content saved by the terminal device 100 through the application program. When the index database has the target index identification which is the same as the index identification to be identified, the same file corresponding to the file 1 is considered to exist in the stored content, and at the moment, a hard link is established for the file 1 directly based on the same file; otherwise, the file 1 is downloaded and stored in the terminal device 100 together with the aforementioned stored contents.

When the target index identification which is the same as the index identification to be identified does not exist in the index database, the specific content corresponding to the content to be processed is directly downloaded and stored in the terminal equipment through the application program, so that the processing efficiency of the new content is improved.

Further, after the downloading of the content to be processed is completed by the application program, the method further comprises:

s21: and generating a first content index of the to-be-processed content for the application program according to the to-be-identified index identifier and the storage path of the to-be-processed content.

And when the content to be processed is new content, downloading and storing the content to be processed in a storage space corresponding to the application program in the terminal equipment, obtaining a storage path corresponding to the content to be processed in the storage space, and generating a first content index of the content to be processed for the application program based on the storage path and the index identifier to be identified.

S22: updating the first content index into the index database.

And adding the first content index into an index database, and updating the index database. In an index database, the index identifier to be identified and the storage path of the content to be processed are stored in the form of a first content index, and the content to be processed is directly managed through the first content index. When index inquiry is carried out, the index identification to be identified which is equal to the value can be quickly inquired from an index database according to the value of integer data in the index identification, then a preservation path of the content to be processed in the first index content is determined through the first index content containing the index identification to be identified, and the content to be processed is directly called through the preservation path.

When the target index identification which is the same as the index identification to be identified does not exist in the index database, a first content index aiming at the content to be processed is generated according to the index identification to be identified and the storage path of the content to be processed, and the first content index is updated into the index database, so that timeliness of the index content in the index database is improved, query time of index query is shortened, and further efficiency of index query is improved.

Specifically, referring to fig. 5, when the content to be processed is an attachment, after receiving a downloading request of the attachment, an application program firstly queries whether a target index identifier which is the same as the index identifier to be identified exists in an index database according to the index identifier to be identified corresponding to the attachment, if not, directly downloads the attachment through the application program, generates a content index corresponding to the attachment according to the index identifier to be identified and a storage path of the attachment, and updates the content index to the index database; if yes, establishing a hard link for the accessory based on the content corresponding to the target index identifier, and displaying a quick downloading effect of the accessory for a user in a chat interface, wherein the user can know the application program to start downloading the accessory through the quick downloading effect; after the hard link is established, generating a content index of the accessory for the application program according to the index identifier to be identified and a storage path of the hard link, updating the content index into an index database, calling the hard link in the storage path of the hard link when a user opens the accessory in a chat interface, and accessing a source file corresponding to the hard link through the hard link; after the database is updated, in the chat interface, the message state corresponding to the attachment is updated through the identifiers such as successful receiving, attachment received and the like.

When the content to be processed is the content to be forwarded to other terminal equipment through the application program, the query method provided by the embodiment of the application further comprises the following steps:

s31: and determining a forwarding path of the content to be processed.

The forwarding refers to a process of transferring and sending content, after receiving data, a node processes the data according to a destination address and forwards the data to a target node, and a forwarding path is a source path of the data. For example, in the scenario shown in fig. 1, after receiving the file 1 sent by the contact a, the user transfers the file 1 in the chat interface for the contact a to the chat interface for the contact B in the application, and sends the file 1 to the contact B, where the forwarding path is the contact a. In addition to forwarding in the same application, forwarding between different applications may be implemented, for example, a user may forward a commodity link in a shopping application to a contact a in a social application to share commodity content in the commodity link, where the forwarding path is the shopping application.

S32: and in response to the forwarding path being associated with the application program, establishing a hard link for the content to be processed based on source content of the forwarding path.

When the forwarding path is associated with the application program, namely forwarding in the same application program, at this time, since the content to be processed comes from the application program, the index database of the application program must store the target index identifier identical to the index identifier to be identified of the content to be processed, so that index inquiry can be directly performed from the index database to determine the target index identifier; in the index database, the target index identifies source content that may correspond to the content to be processed, or a hard link to the source content, and in either form, a hard link to the source content based on the forwarding path may be established for the content to be processed.

S33: and responding to the forwarding path not associated with the application program, executing the operation of querying an index database of the application program through the index identification to be identified.

When the forwarding path is not associated with the application program, whether the target index identification identical to the index identification to be identified exists or not can be inquired in an index database based on the index identification to be identified of the content to be processed, so that whether the same content corresponding to the content to be processed is stored in the application program or not can be determined, if so, a hard link can be established for the content to be processed directly according to the same content, and repeated downloading of the same content is avoided.

In the content forwarding scene, whether the same content corresponding to the content to be processed exists in the stored content stored by the terminal equipment through the application program is determined according to the forwarding path or the index identifier to be identified, and if the same content exists, a hard link based on the source content is directly established for the content to be processed according to the same content, so that the content forwarding efficiency is improved.

If the target index identifier which is the same as the index identifier to be identified is not queried in the index database of the application program through the index identifier to be identified, the method further comprises the following steps:

When the index database does not have the target index identification which is the same as the index identification to be identified, the content to be processed is considered not to be in the stored content stored by the terminal equipment through the application program, so that when the content to be processed is forwarded, the content to be processed needs to be acquired from a forwarding path, and the content to be processed is copied into a forwarding interface through the application program, so that the forwarding of the content to be processed is realized.

When the content to be processed does not exist in the stored content stored by the terminal equipment through the application program, the content to be processed is directly copied to the interface to be forwarded through the application program for forwarding, and the accuracy of forwarding the content is ensured.

Specifically, referring to fig. 6, when the content to be processed is a content to be forwarded to other terminal devices through an application program, firstly, judging whether the forwarding path is in the application program according to the forwarding path of the content to be processed, if so, explaining that the application program must store related information corresponding to the content to be processed, where the related information may be source content corresponding to the content to be processed or a hard link corresponding to the source content, and then, directly establishing a hard link based on the forwarding path for the content to be processed based on the stored content in the application program; if not, carrying out index inquiry in an index database according to the index identifier to be identified of the content to be processed, judging whether the index identifier which is the same as the index identifier to be identified exists in the index database, namely whether the application program processes the same content which is too much and corresponds to the content to be processed, if so, the same content corresponds to the same source content with the content to be processed, and according to the same content, establishing a hard link based on the source content for the index identifier to be identified; otherwise, the content to be processed is directly obtained from the forwarding path and copied into the interface to be forwarded through the application program, so that the forwarding of the content to be processed is realized.

In the terminal device, the saved content saved by the application program needs to include a saving path corresponding to the saved content, so as to determine a specific position of the saved content according to the saving path. Since the saved path generated by the terminal device is often non-integer data, in order to further improve the index query efficiency, the path item needs to be further optimized.

In one possible implementation, the saved content includes a second content having a second content index in the index database, the second content index including a second index identification and an index value including a path item for identifying a saved path of the second content.

Taking a second content in the stored content as an example, storing a second content index corresponding to the second content in an index database, wherein the second content index comprises a second index identifier and an index value of the second content, the second index identifier is integer data, and the second index identifier is integer data determined according to abstract integer data corresponding to a content abstract identifier of the second content and content capacity of the second content and is used for quickly inquiring the second content index corresponding to the second content in the index database; the index value is a path item for identifying a save path of the second content for determining a specific location of the second content in the storage space of the terminal device.

The path item is determined by:

s41: dividing the storage path of the second content into a fixed field, a content source field and a content name field according to the type of the path field of the storage path of the second content, wherein the fixed field is a common path field in the storage path of the stored content, and the content source field is used for identifying an account number identification of a target account number for providing the second content in the application program and storing a time parameter of the second content.

In the save path of the second content, the save path is divided into a plurality of constituent parts according to the path field type, for example:

UserName/FileStorage/MsgAttach/TalkerMD 5/(Image/File/Video)/year-month/FileName

UserName/FileStorage/(Image/File/Video)/year-month/FileName

In the saved path shown above, each path field is divided by "/", and the first three path fields UserName/FileStorage/msgtach are path fields common to the saved contents according to the type of the path field, and the saved contents saved by the application program in the terminal device are all saved under the path corresponding to the common path field, so this component is divided into fixed fields.

The TalkerMD5 is an account identifier for identifying a target account for providing the second content in the application, and the specific form is represented by an MD5 value corresponding to the target account, where the MD5 value may be determined according to an output value obtained by inputting a message digest algorithm into an account identification number (Identity Document, ID) of the target account, or may be determined according to user information in the target account or a data value obtained by inputting other information into the message digest algorithm. When a user deletes a certain contact account in the application program, a path corresponding to the account identifier can be searched according to the account identifier corresponding to the contact account, and all contents under the path are deleted.

The year-month is a path field for describing the preservation time corresponding to the second content, that is, a time parameter for preserving the second content.

Based on the two path fields, talkerMD5 and year-month, the source of the second content, i.e., when, which target account number the second content provides, can be determined, thus dividing the two path fields into content source fields.

The (Image/File/Video) path field is used for describing the content type of the second content, the content type can be divided into three types of Image (Image), file (File) and Video (Video), when the second content index is stored in the index database, the second content indexes of different types can be directly and respectively stored according to the content type, so that the storage space occupied by storing the path field is saved; the FileName is a path field for describing a content name corresponding to the second content, for example, when the content type corresponding to the second content is a file, the content name corresponding to the second content is a file name of the second content, so that the path field is divided into content name fields.

S42: discarding the fixed field, and obtaining corresponding source integer data through integer mapping according to the field information of the content source field.

When optimizing the saving path of the second content, since all the saving paths corresponding to the saved content saved by the application program include the fixed field, the fixed field cannot perform the function of distinguishing the saved content, and the fixed field can be discarded, so that part of storage space is saved.

In the content source field, the content source field is determined according to the account identifier of the target account number for providing the second content in the application program and the time parameter for storing the second content, and the data type of the account identifier may be non-integer data, so the data type in the content source field may also be non-integer data; when the data is stored, the data composition of the non-integer data is more complex than that of the integer data, and the possible data forms of each position are more, so that more storage space is occupied when the non-integer data is stored; in one application, the content sources corresponding to all the content are limited and quantifiable, so that when optimizing the storage path of the second content, the field information of the content source fields can be subjected to integer mapping, each content source field is mapped into corresponding source integer data, a mapping relationship between the source integer data and the content source fields is established, and the same content source fields correspond to the same source integer data.

Taking a social application program as an example, when the number of contacts in the social application program is 1 ten thousand, and the time for a user to use the social application program is 10 years, mapping of two path fields of TalkerMd5 and year-month can be realized by using integers 1 to 10120, so that mapping of content source fields in a storage path is realized, and occupation of storage space in storing the storage path is greatly reduced.

S43: the path item is determined by the source integer data and the content name field.

In the storage space corresponding to the application program, determining a content source field of the second content according to the mapping relation between the source integer data and the content source field; and combining the content source field with the fixed field to determine a specific storage position of the second content, and searching the second content in the specific storage position according to the content name field. That is, for the application, a complete save path can be represented based on the source-integer data and the content-name field, so that in the second index content, a path item for identifying the save path of the second content can be determined directly based on the source-integer data and the content-name field. And optimizing the path item in the second content index according to the field type of the saved path, so that the storage space occupied by storing the saved path is saved.

Because of the possible duplication between source integer data obtained by the integer mapping, so that different content source fields correspond to the same source integer data, the source integer data is further optimized in order to reduce the duplication rate between source integer data.

In one possible implementation manner, the source integer data includes a first integer data sub-item and a second integer data sub-item, and the corresponding source integer data is obtained through integer mapping according to the field information of the content source field described in S42, which is specifically implemented by the following manner:

s51: obtaining a corresponding first integer data sub-item through an integer mapping rule according to field information of the content source field for identifying the account identifier;

s52: and obtaining a corresponding second integer data sub-item through the integer mapping rule according to the field information of the content source field for identifying the time parameter.

And mapping field information for identifying the account number identifier and field information for identifying the time parameter in the content source field respectively through an integer mapping rule to obtain a first integer data sub-item corresponding to the account number identifier and a second integer data sub-item corresponding to the time parameter. In the source integer data, the first integer data sub-item and the second integer data sub-item distinguish account information and time parameters contained in the source integer data, independent integer data sub-items obtained by mapping are respectively carried out, two independent integer data sub-items obtained by two times of mapping are combined, source integer data corresponding to the content source field is determined, and the repetition rate between source integer data corresponding to different content source fields is further reduced.

When the saved path of the second content is obtained, the source field information corresponding to the second content is restored according to the source integer data, and then the source field information is combined with the fixed field corresponding to the application program and the content name field in the path item to obtain the complete saved path corresponding to the second content. Therefore, in the embodiment of the application, a restoring index table for restoring the source integer data is also stored in the index database.

In a possible implementation manner, the index database further includes a restored index table for the integer mapping rule, and after S52, the method according to the embodiment of the present application may further include the following steps:

s61: acquiring an acquisition request for a storage path of the second content, the acquisition request including a second index identifier of the second content;

s62: and searching an index value in the second content index from the index database according to the second index identification.

And using a second index identifier of the second content in the acquisition request aiming at the second content storage path to perform index inquiry from an index database of the application program, and quickly determining a second content index containing the second index identifier.

S63: and restoring the first integer data sub-item and the second integer data sub-item in the path item of the index value through the restoring index table to obtain the account identification and the time parameter.

The restoration index table is generated according to the integer mapping rule, wherein after mapping according to the integer mapping rule, the mapping relation between the field information for identifying the account number identification and the first integer data sub-item in the content source field and the mapping relation between the field information for identifying the time parameter and the second integer data sub-item are stored.

The index value is a path item for identifying a storage path of the second content, in S51 and S52, the account identification and the time parameter in the content source field of the path item are mapped by using integer mapping rules, so as to obtain a first integer data sub-item and a second integer data sub-item, and in the second content index of the index database, the content source field in the storage path of the second content is stored by using the first integer data sub-item and the second integer data sub-item; when the storage path of the second content needs to be acquired, a restoration index table aiming at an integer mapping rule needs to be used for acquiring a mapping relation between the first integer data sub-item and field information of the identification account number identification and a mapping relation between the second integer data sub-item and field information of the identification time parameter, restoring the first integer data sub-item to the corresponding account number identification based on the two mapping relations, and restoring the second integer data sub-item to the corresponding time parameter.

S64: and generating a storage path of the second content through the account number identification, the time parameter, the fixed field and the content name field in the path item.

According to the field information of the identification account number restored by the first integer data sub-item and the field information of the identification time parameter restored by the second integer data sub-item, a content source field in a storage path of the second content can be obtained, and then the path splicing is performed by combining a fixed field corresponding to the application program and a content name field in the path item, so that a complete storage path corresponding to the second content can be generated.

According to the restoring index table aiming at the integer mapping rule, the content source field in the second content preservation path can be rapidly and accurately determined through the first integer data sub-item and the second integer data sub-item, and the preservation path of the second content is determined by combining the fixed field and the content name field, so that the accuracy and the efficiency of obtaining the preservation path of the second content are improved.

After the second content is downloaded through the application and stored in the terminal device, the user can modify the second content in the terminal device, and the modified content is not identical to the second content originally downloaded through the application. When the application program receives the same content corresponding to the second content again, the same content and the modified content are linked together based on the second content index stored in the index database, but the same content and the modified content are not the same at the moment.

To this end, in one possible implementation, the index value further comprises a content digest identification of the second content and a latest modification time of the second content by the application.

And introducing a content abstract identifier of the second content into the index value, wherein the content abstract identifier is used for verifying whether the content corresponding to the same index identifier queried in the index database is the same as the second content or not so as to avoid hard link establishment between different contents when different contents correspond to the same index identifier.

When the second content is modified after being stored in the terminal device by the application program, determining the latest modification time of the second content by the application program based on the completion time of the modification, and introducing the latest modification time into the index value.

Specifically, when the content type corresponding to the second content is a file, the second content index includes four parameters of a content digest identifier (MD 5), a file size (FileSize), a modification time (modytime), and a save path (FilePath) of the file. Wherein a second index identification in the second content index is determined by MD5 and file size of the second content; the index value in the second content index is determined by the modification time and the storage path, and the data types corresponding to the four parameters are shown in the following table:

TABLE 1

Parameter name	MD5	FileSize	ModifyTime	FilePath
					Data type	TEXT	INT	INT	TEXT

MD5 is a summary content identifier corresponding to the second content, and the data type is character string data (TEXT), which is non-integer data; when determining the second index identifier corresponding to the second content according to MD5 and the file size, the file size is integer data (INT), the corresponding abstract integer data is obtained by integer mapping the abstract content identifier, and the second index identifier (MD 5 hash) is determined based on the abstract integer data and the file size, wherein the abstract integer data and the file size are integer data, so that the second index identifier is also integer data.

Similarly, since the data type corresponding to the saving path is also TEXT, the data of the TEXT type needs to occupy more storage space when being saved, so that the data source field in the saving path is mapped through the integer mapping rule to obtain a first integer data sub-item (DirID 1) corresponding to the field information for identifying the account number identifier and a second integer data sub-item (DirID 2) corresponding to the field information for identifying the time parameter, and the source integer data corresponding to the second content is determined according to the first integer data sub-item and the second integer data sub-item, so that the data structure of the content source field in the index value is optimized, and the storage space occupied by saving the content source field.

In the index value of the second content index, a content digest identifier (MD 5), a modification time, and a file name (FileName) corresponding to the second content for verification are also introduced, where the data type of MD5 is a binary large object (Binary Large Object, BLOB), the data type of modification time is INT, and the data type of file name is TEXT.

The optimized parameters and data types thereof are shown in the following table:

TABLE 2

Parameter name

MD5hash

DirID1

DirID2

MD5

ModifyTime

FileName

Data type

INT

BLOB

INT

TEXT

The second index identifier (MD 5 hash) is integer data determined according to summary integer data corresponding to the second content and content capacity of the second content, and is used for searching out an index identifier identical to the second index identifier from a plurality of index identifiers in an index database, so as to ensure the efficiency of index query.

The optimized data type of the MD5 is a binary large object (Binary Large Object, BLOB), where the MD5 is used to verify the index query result one-to-one, and since the MD5 is a unique identifier determined according to the second content, it can be accurately determined according to the MD5 whether the stored content corresponding to the same index identifier is the second content, so as to ensure the accuracy of the index query.

In the index database, a restored index table for integer mapping rules is shown in the following table:

TABLE 3 Table 3

Parameter name	DirIDx	Path
			Data type	INT	TEXT

Wherein Path is field information in a content source field in the second content storage Path, including field information for identifying an account number identifier and field information for identifying a time parameter, and DirIDx is a mapping value determined for the field information by an integer mapping rule, that is, the first integer data subitem (x is 1) or the second integer data subitem (x is 2). According to the restoring index table, the field information in the content source field corresponding to the first integer data sub-item and the second data sub-item can be restored, so that the storage space occupied when the complete content source field information is saved.

According to the MD5hash, index inquiry is carried out in the index database, so that second index content with the same index identifier can be rapidly determined, the time spent for index inquiry is saved, the index inquiry efficiency is improved, and the high efficiency of index inquiry is ensured; through MD5 and ModifyTime, second content corresponding to second index content can be verified, whether the content corresponding to the current MD5hash is identical to the second content or not is determined, and index query accuracy is further guaranteed; based on the fixed field of the application program and DirID1, dirID2 and FileName in the second index content, the path splicing is performed, so that a complete storage path corresponding to the second content can be obtained, and storage resources required for storing the storage path in a database are saved.

Referring to fig. 7, when the data size is 100 ten thousand, the original scheme stores MD5 and a storage path of non-integer data in the index database, which needs to occupy 225 megabytes (M) of storage space, while in the optimized scheme, the MD5 and the storage path of the non-integer data are optimized, and the optimized index content is stored in the database, which only needs to occupy 120M of storage space, and compared with the two, the optimized scheme can save 47% of storage space; in the optimization scheme, index inquiry is performed in an index database by using a second index identifier which is integer data, the required inquiry time length is 1.2 milliseconds (ms), and compared with the method that the index inquiry is performed by using MD5 of non-integer data in the original scheme, the method has the advantages that the index inquiry time is saved by 53%, and the index inquiry efficiency is improved.

When a plurality of hard links corresponding to the second content need to be processed in batches, if the index database is queried according to index identifiers corresponding to the plurality of hard links, a large amount of system resources are occupied, and at this time, since one of the files which are hard linked together is changed, the files which are other hard linked together are changed together, so that the files which are hard linked together can be quickly searched according to the latest modification time in the second index content.

In one possible implementation manner, the second content is created with N hard links, N >1, and the index database includes index content corresponding to the N hard links, and the method further includes:

s71: and switching the target hard link in the N hard links from a read-only state to a writable state.

An application programming interface (Application Programming Interface, API) may be used in managing the hard links corresponding to the saved content saved by the application in the terminal device. However, if the target hard link to be processed is in the read-only state, the API cannot modify the target hard link in the read-only state, so that the target hard link needs to be switched from the read-only state to the writable state.

S72: and when the target hard link is deleted by the application program, inquiring N-1 content indexes with the same latest modification time and deletion time in the index values from the index database according to the deletion time of the target hard link.

When deleting the target hard link by the application program, updating the latest modification time in the index content according to the deleted time of the target hard link in the index content corresponding to other hard links associated with the target hard link; through the corresponding latest modification time when the target hard links are deleted, the content indexes with the same latest modification time can be quickly searched from the index database, and the hard links corresponding to the content indexes are N-1 hard links associated with the target hard links, so that the time for searching the hard links corresponding to the index identifications one by one in the index database according to the index identifications of N hard links corresponding to the second content is saved.

S73: and switching the states of the N-1 hard-linked files corresponding to the N-1 content indexes from a writable state to a read-only state.

After deleting the target hard link, according to the corresponding latest modification time when the target hard link is deleted, N-1 content indexes with the same latest modification time are quickly searched from an index database, file states of the hard links corresponding to the N-1 content indexes are switched back to a read-only state again, quick positioning of N-1 hard links associated with the target hard link is achieved, the file states of the N-1 hard links are restored, calculation power consumed by index query of a plurality of contents is saved, and content management efficiency is improved.

Based on the above method, when updating and configuring the query method for the application program, the saved content saved by the application program in the terminal device needs to be de-duplicated, so that the repetition of the same content in the saved content is reduced, and the storage space is released. To this end, an embodiment of the present application provides a method for constructing an index database, which specifically includes the following steps:

s81: and acquiring the to-be-processed stored content stored by the application program.

Before updating an application program, the query method according to the embodiment of the present application is not configured in the terminal device, so that in the process of using the application program by a user, there may be duplicate storage of the same content in stored contents stored by the terminal device through the application program, so that in order to reduce the duplicate rate in the stored contents, the stored contents need to be determined as to-be-processed stored contents, and duplicate removal is performed on the to-be-processed stored contents, so as to reduce the duplicate rate of the same content.

S82: obtaining corresponding abstract integer data to be processed through integer mapping according to the content abstract identification of the stored content to be processed;

s83: and determining an index identification of the to-be-processed saved content based on the to-be-processed abstract integer data and the content capacity of the to-be-processed saved content.

Wherein S82 to S83 may be performed with reference to the methods described in steps 202 to 203.

S84: and responding to a plurality of to-be-processed saved contents with the same index identification in the to-be-processed saved contents, taking one to-be-processed saved content in the plurality of to-be-processed saved contents as a target saved content, deleting other to-be-processed saved contents, and establishing a hard link based on the target saved content for the other to-be-processed saved contents.

Through traversing the folder of the user, after determining the index identifier for each traversed content to be processed and stored, the content to be processed and stored with the same index identifier can be grouped according to the index identifier, the content to be processed and stored with the same index identifier is grouped into the same group, one content to be processed and stored is determined to be the target stored content corresponding to the group from a plurality of content to be processed and stored with the same index identifier in the group, the other content to be processed and stored in the group is deleted, the storage space occupied by the repeated content is released, and a hard link is established for the other content to be processed and stored in the group based on the target stored content, the other content to be processed and the target stored content corresponding to the group are linked in the form of the hard link, and the target stored content is accessed through the hard link corresponding to each other, so that the storage space occupied by the repeated content in the stored content is saved.

It should be noted that, when determining a target saved content from a plurality of saved contents to be processed with the same index identifier, one target saved content may be randomly selected from the plurality of saved contents to be processed, or one target saved content may be determined according to other parameters such as a save time.

S85: and taking the rest target saved content as the saved content, and constructing the index database according to the index identification of the saved content.

After the target saved contents uniquely corresponding to each index identifier are determined, the target saved contents are saved contents saved by the terminal equipment through the application program, so that the repetition rate of the saved contents to be processed is reduced, and only one corresponding specific content in the saved contents in each index identifier is saved in the terminal equipment, thereby effectively saving the storage space occupied by repeatedly saving the same content. Based on the index identifications of the stored contents, an index database is established, and when new contents are received in the application program, whether the stored contents which are the same as the new contents exist in the stored contents or not can be quickly inquired from the index database according to the index identifications of the new contents, so that the time for inquiring the indexes is saved, and the efficiency of inquiring the indexes is improved.

Further, when updating and configuring the application program in the terminal device, the number of the to-be-processed stored contents stored by the application program may be large, and the one-time processing of the contents may occupy a large amount of resources of a central processing unit (Central Processing Unit, CPU) and disk, so that the resource occupation amount in a unit time needs to be controlled. Therefore, when the query method is updated by the application program, the processing frequency of 5ms and 995ms for rest can be adopted in the mechanical hard disk, and the processing frequency of 10ms and 990ms for rest can be adopted in the solid state disk, so that the resource occupation amount in unit time is reduced, and the user is prevented from being influenced by the terminal equipment.

Referring to fig. 8, an embodiment of the present application provides a query device, including: an acquisition unit 801, an integer mapping unit 802, a determination unit 803, a query unit 804, and a hard link unit 805;

the acquiring unit 801 is configured to: acquiring content to be processed through an application program;

the integer mapping unit 802 is configured to obtain corresponding abstract integer data through integer mapping according to the content abstract identifier of the content to be processed;

the determining unit 803 is configured to: determining a to-be-identified index identifier of the to-be-processed content based on the abstract integer data and the content capacity of the to-be-processed content;

the query unit 804 is configured to: querying an index database of the application program through the index identifier to be identified, wherein the index database is used for recording the index identifier of the stored content stored by the application program;

the hard link unit 805 is configured to: and responding to the index database with the target index identifier which is the same as the index identifier to be identified, determining that the first content corresponding to the target index identifier of the content to be processed is the same content, and establishing a first hard link based on the first content for the content to be processed.

In a possible implementation manner, when the content to be processed is content that is received by the application program and has not been downloaded, the apparatus shown in fig. 8 further includes a storage unit configured to:

In a possible implementation manner, the apparatus shown in fig. 8 further includes an updating unit, configured to:

updating the first content index into the index database.

In a possible implementation manner, when the content to be processed is content to be forwarded to other terminal devices through the application program, the querying unit is further configured to:

determining a forwarding path of the content to be processed;

In a possible implementation manner, the apparatus shown in fig. 8 further includes a copying unit, configured to:

In a possible implementation manner, after the establishing a first hard link for the content to be processed, the apparatus shown in fig. 8 further includes an updating unit configured to:

and updating the second content index into the index database.

In one possible implementation, the saved content includes a second content having a second content index in the index database, the second content index including a second index identification and an index value, the index value including a path item for identifying a saved path of the second content;

The determining unit is specifically configured to:

In a possible implementation manner, the source integer data includes a first integer data sub-item and a second integer data sub-item, and the determining unit is specifically configured to:

In a possible implementation manner, the index database further includes a restoration index table for the integer mapping rule, and the apparatus shown in fig. 8 further includes a restoration unit, configured to:

In a possible implementation, the index value further includes a content digest identification of the second content and a latest modification time of the second content by the application program.

In a possible implementation manner, the second content is created with N hard links, N >1, the index database includes index content corresponding to the N hard links, and the apparatus shown in fig. 8 further includes a deletion unit, configured to:

In a possible implementation manner, the apparatus shown in fig. 8 further includes a construction unit, configured to:

The embodiment of the application also provides a computer device, which is the computer device introduced above, and can comprise a terminal device or a server, and the query device can be configured in the computer device. The computer device is described below with reference to the accompanying drawings.

If the computer device is a terminal device, please refer to fig. 9, an embodiment of the present application provides a terminal device, taking the terminal device as a mobile phone as an example:

fig. 9 is a block diagram showing a part of the structure of a mobile phone related to a terminal device provided by an embodiment of the present application. Referring to fig. 9, the mobile phone includes: radio Frequency (RF) circuitry 1410, memory 1420, input unit 1430, display unit 1440, sensor 1450, audio circuitry 1460, wireless fidelity (WiFi) module 1470, processor 1480, and power supply 1490. It will be appreciated by those skilled in the art that the handset construction shown in fig. 9 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 9:

the RF circuit 1410 may be used for receiving and transmitting signals during a message or a call, and particularly, after receiving downlink information of a base station, the downlink information is processed by the processor 1480; in addition, the data of the design uplink is sent to the base station.

The memory 1420 may be used to store software programs and modules, and the processor 1480 performs various functional applications and data processing of the cellular phone by executing the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a storage program area that may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 1430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 1430 may include a touch panel 1431 and other input devices 1432.

The display unit 1440 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 1440 may include a display panel 1441.

The handset can also include at least one sensor 1450, such as a light sensor, motion sensor, and other sensors.

Audio circuitry 1460, speaker 1461, microphone 1462 may provide an audio interface between the user and the handset.

WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 1470, so that wireless broadband Internet access is provided for the user.

The processor 1480 is a control center of the handset, connects various parts of the entire handset using various interfaces and lines, performs various functions of the handset and processes data by running or executing software programs and/or modules stored in the memory 1420, and invoking data stored in the memory 1420.

The handset also includes a power supply 1490 (e.g., a battery) that provides power to the various components.

In this embodiment, the processor 1480 included in the terminal apparatus also has the following functions:

acquiring content to be processed through an application program;

If the computer device is a server, as shown in fig. 10, fig. 10 is a block diagram of a server 1500 according to an embodiment of the present application, where the server 1500 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (Central Processing Units, abbreviated as CPUs) 1522 (e.g., one or more processors) and a memory 1532, one or more storage media 1530 (e.g., one or more mass storage devices) storing application programs 1542 or data 1544. Wherein the memory 1532 and the storage medium 1530 may be transitory or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a series of instruction operations on the server. Still further, the central processor 1522 may be configured to communicate with a storage medium 1530 and execute a series of instruction operations on the storage medium 1530 on the server 1500.

The Server 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input/output interfaces 1558, and/or one or more operating systems 1541, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

The steps performed by the server in the above embodiments may be based on the server structure shown in fig. 10.

In addition, the embodiment of the application also provides a storage medium for storing a computer program for executing the method provided by the embodiment.

The present application also provides a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method provided by the above embodiments.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, where the above program may be stored in a computer readable storage medium, and when the program is executed, the program performs steps including the above method embodiments; and the aforementioned storage medium may be at least one of the following media: read-only Memory (ROM), RAM, magnetic disk or optical disk, and the like, on which a computer program can be stored.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, with reference to the description of the method embodiments in part. The apparatus and system embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

The foregoing is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be included in the scope of the present application. Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method of querying, the method comprising:

acquiring content to be processed through an application program;

2. The method of claim 1, wherein when the content to be processed is content received by the application and not yet downloaded, the method further comprises:

3. The method according to claim 2, wherein the method further comprises:

updating the first content index into the index database.

4. The method according to claim 1, wherein when the content to be processed is content to be forwarded to other terminal devices by the application program, the method further comprises:

determining a forwarding path of the content to be processed;

5. The method according to claim 4, wherein the method further comprises:

6. The method of claim 1, wherein after the establishing a first hard link for the content to be processed based on the first content, the method further comprises:

and updating the second content index into the index database.

7. The method of claim 1, wherein the saved content comprises a second content having a second content index in the index database, the second content index comprising a second index identification and an index value, the index value comprising a path item for identifying a saved path of the second content;

the path item is determined by:

8. The method of claim 7, wherein the source integer data includes a first integer data sub-item and a second integer data sub-item, wherein the obtaining the corresponding source integer data by integer mapping according to the field information of the content source field includes:

9. The method of claim 8, wherein the index database further comprises a restored index table for the integer mapping rule, the method further comprising:

10. The method of claim 7, wherein the index value further comprises a content digest identification of the second content and a last modification time of the second content by the application.

11. The method of claim 10, wherein the second content is created with N hard links, N >1, and wherein the index database includes index content corresponding to the N hard links, the method further comprising:

12. The method according to any one of claims 1-11, further comprising:

13. A query device, the device comprising: the device comprises an acquisition unit, an integer mapping unit, a determination unit, a query unit and a hard link unit;

14. A computer device, the computer device comprising a processor and a memory:

the processor is configured to perform the method of any of claims 1-12 according to the computer program.

15. A computer readable storage medium, characterized in that the computer readable storage medium is for storing a computer program for executing the method of any one of claims 1-12.

16. A computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of claims 1-12.