CN116796085A

CN116796085A - File processing method and device, electronic equipment and storage medium

Info

Publication number: CN116796085A
Application number: CN202310763457.7A
Authority: CN
Inventors: 郑锐
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2023-06-26
Filing date: 2023-06-26
Publication date: 2023-09-22

Abstract

The invention discloses a file processing method, a file processing device, electronic equipment and a storage medium. The method comprises the following steps: acquiring an initial file, and determining characteristic fields included in the initial file; determining occupation information of the feature fields in the initial file, and determining key feature fields based on the occupation information of the feature fields; setting an association identifier of the key feature field, wherein the length of the association identifier is smaller than a preset length; and replacing the corresponding key feature field in the initial file based on the association identifier to obtain a target file, wherein the data volume of the target file is smaller than that of the initial file. According to the invention, the corresponding association identifier is set for the key feature field extracted from the initial file, and the key feature field is replaced by the association identifier to obtain the target file, so that the data volume of URL data in the file is reduced, and the file volume is reduced.

Description

File processing method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of file processing technologies, and in particular, to a file processing method, a device, an electronic device, and a storage medium.

Background

A uniform resource locator (Universal Resource Locator, URL), also called a web Address, is an Address (Address) of a standard resource on the internet. Each file on the internet has a unique URL that contains information indicating the location of the file and how the browser should handle it.

Currently, when a mobile terminal uses a large amount of URL data in an application program, there are two schemes, one is: storing all URLs into a file, such as a txt file, presetting the file into a due program, and reading the file when in use; the scheme II is as follows: and writing all URLs into program codes, and dynamically writing the URLs into a database when running a program.

However, in the first scheme, the data size of the URL is too large, which may cause the overall volume of the application to increase; and when using data, need to read the data into the memory, because the data volume is too big, easily lead to program crash. In the second scheme, the data volume to be processed is too large, which results in too long program execution time.

Disclosure of Invention

The invention provides a file processing method, a file processing device, electronic equipment and a storage medium, so as to reduce the data volume of URL data.

According to an aspect of the present invention, there is provided a file processing method including:

acquiring an initial file, and determining characteristic fields included in the initial file;

determining occupation information of the feature fields in the initial file, and determining key feature fields based on the occupation information of the feature fields;

setting an association identifier of the key feature field, wherein the length of the association identifier is smaller than a preset length;

and replacing the corresponding key feature field in the initial file based on the association identifier to obtain a target file, wherein the data volume of the target file is smaller than that of the initial file.

Optionally, the initial file includes a plurality of address information;

the determining the feature field included in the initial file includes: dividing the characteristic fields of each piece of address information based on specific characters to obtain the characteristic fields included in each piece of address information; and carrying out de-duplication processing on the characteristic field.

Optionally, the occupation information of the feature field in the initial file includes the occurrence frequency of the feature field in the initial file;

the determining the key feature field based on the occupation information of each feature field includes:

and determining the feature field of which the occurrence frequency meets a frequency threshold as a key feature field.

Optionally, the occupation information of the feature field in the initial file includes a field length of the feature field and a frequency of occurrence of the feature field in the initial file;

for any one of the feature fields, determining the data amount of the feature field based on the field length and the occurrence frequency of the feature field; and determining the characteristic field of which the data quantity meets the data quantity threshold value as a key characteristic field.

Optionally, the length of the association identifier is smaller than the length of the association characteristic field.

Optionally, the method further comprises: and creating and storing a corresponding relation file based on the corresponding relation between the key characteristic field and the association identifier, wherein the corresponding relation file is used for restoring the target file.

Optionally, the method further comprises: before the target file is used, identifying the association identifier in the target file, determining the key feature field corresponding to the association identifier based on the corresponding relation file, and replacing the corresponding association identifier with the matched key feature field to obtain the initial file.

According to another aspect of the present invention, there is provided a document processing apparatus, comprising:

the device comprises a characteristic field determining module, a characteristic field determining module and a characteristic processing module, wherein the characteristic field determining module is used for acquiring an initial file and determining characteristic fields included in the initial file;

the key feature field determining module is used for determining occupation information of the feature fields in the initial file and determining key feature fields based on the occupation information of the feature fields;

the association identifier setting module is used for setting association identifiers of the key feature fields, and the length of the association identifiers is smaller than a preset length;

and the target file determining module is used for replacing the corresponding key characteristic fields in the initial file based on the association identifier to obtain a target file, wherein the data volume of the target file is smaller than that of the initial file.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the file processing method according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to execute a file processing method according to any embodiment of the present invention.

According to the technical scheme, the target file is obtained by setting the corresponding association identifier for the key characteristic field extracted from the initial file and replacing the key characteristic field with the association identifier, and the data volume of URL data in the file is reduced, so that the file volume is reduced.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for processing a file according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a document processing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Fig. 1 is a flowchart of a file processing method provided in an embodiment of the present invention, where the present embodiment is applicable to a case where a mobile terminal application uses a large amount of URL data stored in a file, the method may be performed by a file processing apparatus, and the file processing apparatus may be implemented in the form of hardware and/or software, and the file processing apparatus may be configured in a mobile terminal. As shown in fig. 1, the method includes:

s110, acquiring an initial file, and determining characteristic fields included in the initial file.

The initial file refers to a file storing address information, specifically, the initial file may be a file storing a large amount of address information, for example, a TXT file storing 10 ten thousand original URL data, where the original URL may be: http://7qb.jdfmgt.com/badCode/7qb. The feature field refers to a field in the address information of the initial file, specifically, the feature field includes, but is not limited to, a protocol, a domain name, a directory of each level, and the like, and is exemplified by the original URL, and the feature fields are "http:", "7qb.jdfmgt.com", "badCode", "7qb".

On the basis of the above embodiment, optionally, the initial file includes a plurality of address information; the determining the feature field included in the initial file includes: dividing the characteristic fields of each piece of address information based on specific characters to obtain the characteristic fields included in each piece of address information; and carrying out de-duplication processing on the characteristic field.

Wherein the specific character refers to a character dividing the feature field, for example: the character "/", character "," and the like are not limited herein. In this embodiment, address information in an initial file is traversed, and feature fields of each address information are divided according to specific characters to obtain feature fields included in each address information; and after the division of the characteristic fields is completed, performing de-duplication processing on the obtained characteristic fields. It should be noted that, there is a priority between specific symbols, and the character "/" and the character "." are taken as examples, and the priority of the character "/" is greater than the character ",". That is, when the feature field is divided, the "/" division of the feature field is preferentially used, and if the feature field divided by the character "/" needs to be further divided, the "." division of the feature field divided by the character "/" is used.

Illustratively, assume that each address information in the initial file is as follows:

http://7qb.jdfmgt.com；

http://7qb.jdfmgt.com/badCode/7qb；

http://7qb.jdfmgt.com/badCode/7qb/v1.0.2；

http://7qb.jdfmgt.com/badCode/index；

http://7qb.jdfmgt.com/badCode/index/edit/v1.0.2。

the characteristic fields after the characteristic fields are divided and the duplication removal processing is carried out on the address information based on the character "/": "http:", "7qb.jdfmgt.com", "badCode", "7qb", "v1.0.2", "index" and "edit", if further partitioning of the above-described feature fields is required, the feature fields "7qb.jdfmgt.com" and "v1.0.2" may be partitioned based on the character "," if desired. It will be appreciated that the feature field "7qb.jdfmgt.com" is a domain name, and the feature field "v1.0.2" is a directory, typically as a complete feature field, and is not further divided.

S120, determining occupation information of the feature fields in the initial file, and determining key feature fields based on the occupation information of the feature fields.

The occupation information refers to related information of the feature field in the initial file, and specifically, the occupation information includes, but is not limited to, frequency of occurrence of the feature field in the initial file, field length of the feature field, and the like, which are not limited herein. In this embodiment, the occupation information of the feature field in the initial file may be determined first, and then whether the feature field is a key field is determined according to the occupation information, and the key field is screened out.

On the basis of the above embodiment, optionally, the information of occupation of the feature field in the initial file includes frequency of occurrence of the feature field in the initial file; the determining the key feature field based on the occupation information of each feature field includes: and determining the feature field of which the occurrence frequency meets a frequency threshold as a key feature field.

Specifically, frequency statistics can be performed on each feature field in the initial file to obtain the occurrence frequency of each feature field in the initial file, the occurrence frequency is compared with a frequency threshold, and if the occurrence frequency exceeds the frequency threshold, the feature field corresponding to the occurrence frequency is determined to be a key feature field. Taking an initial file storing 10 ten thousand address information as an example, if a certain feature field occurs more than 10 times in the 10 ten thousand address information, the feature field is determined as a key feature field. Wherein the frequency threshold is set by those skilled in the art according to experience and requirements, and is not limited herein.

On the basis of the above embodiment, optionally, the occupation information of the feature field in the initial file includes a field length of the feature field and a frequency of occurrence of the feature field in the initial file; the determining the key feature field based on the occupation information of each feature field includes: for any one of the feature fields, determining the data amount of the feature field based on the field length and the occurrence frequency of the feature field; and determining the characteristic field of which the data quantity meets the data quantity threshold value as a key characteristic field. By way of example only, and not by way of limitation,

specifically, for any feature field, the field length of the feature field and the occurrence frequency can be multiplied to obtain the data volume occupied by the feature field in the initial file, and then the data volume corresponding to the feature field and the data volume threshold can be compared, and if the data volume of the feature field exceeds the data volume threshold, the feature field is determined to be a key feature field; wherein the data amount threshold is set by one skilled in the art based on experience and requirements, and is not limited herein. Taking a feature field "badCode" as an example, the field length of the feature field is 7 characters, if the occurrence frequency of the feature field in the initial file is 10 times, and if the data volume threshold is 50, the data volume of the feature field meets the data volume threshold, the feature field is a key feature field.

In some embodiments, optionally, for any feature field, the field length of the feature field may be compared with a field length threshold, and if the field length exceeds the field length threshold, the feature field is determined to be a key feature field to be stored; and then, comparing the occurrence frequency of the key feature field to be stored in the initial file with a frequency threshold, and if the occurrence frequency of the key feature field to be stored exceeds the frequency threshold, determining the key feature field to be stored as a key feature field. The field length threshold is set by those skilled in the art according to experience and requirements, and is not limited herein. For example, the field length threshold is 5 characters, the occurrence number threshold is 10, the field length of the feature field "badCode" is greater than the field length threshold, the "badCode" may be used as the key feature to be stored, and if the occurrence number of the "badCode" in the initial file exceeds 10, the "badCode" is determined as the key feature field.

In some embodiments, the frequency threshold of the different feature fields may be dynamically varied, e.g., the frequency threshold is inversely related to the length of the feature field, and accordingly, the frequency threshold of the feature field may be determined based on the length of the feature field.

S130, setting an association identifier of the key feature field, wherein the length of the association identifier is smaller than a preset length.

The association identifier refers to a character identifier corresponding to the key feature field, and specifically, the association identifier includes, but is not limited to, a serial number, a character, a combination of the serial number and the character, and the like, which is not limited herein. In this embodiment, for the determined key feature field, an association identifier corresponding to the key feature field is set. Wherein the preset length is set by one skilled in the art according to experience and requirements, and is not limited herein

It should be noted that the length of the set association identifier needs to be smaller than the length of the association feature field, and if the condition is not satisfied, the association identifier is updated or not set. For example, if the feature field "7qb" is a key feature field, and the set association identifier of the key field is "$15", setting the association identifier is not meaningful for the key feature field "7qb", and cannot reduce the data amount of the feature field "7qb" in the initial file.

Illustratively, the manner in which the association identifier is set is divided into two types:

(1) The sequence number can be directly used to set the association identifier of the key feature field, such as:

0-http://7qb.jd.com；

1-http://7qb.jdfmgt.com；

2-http://7qb.jdfmgt.test；

3-http://7qb-erp.jd.com。

it will be appreciated that, since the protocol is generally used as a whole with the domain name, the feature field representing the protocol and the feature field representing the domain name may be used as a key feature field to set the association identifier, so as to improve the efficiency of file processing.

(2) The association identifier of the key field may also be set by using a special symbol+sequence number, for example:

$0-badCode；

$1-v1.0.2；

$2-v1.0.6；

$3-mpManager。

and S140, replacing the corresponding key characteristic fields in the initial file based on the association identifier to obtain a target file, wherein the data volume of the target file is smaller than that of the initial file.

Specifically, traversing each address information in the initial file, identifying key feature fields in each address information, and replacing the key feature fields in the initial file by using the associated identifiers corresponding to the key feature fields to obtain the target file. It will be appreciated that the amount of data of the target file is less than the amount of data of the initial file after the key feature field is replaced with the association identifier. The embodiment reduces the data volume in the initial file by replacing the key feature fields in the initial file with the associated identifications, thereby reducing the file volume.

Taking address information of 'http:// 7qb.jdfmgt.com/badCode/7qb' as an example, assuming that two association relations of '1-http:// 7qb.jdfmgt.com' and '$0-badCode' are stored in the key feature file, the address information becomes after replacing the key feature: 1/$0/7qb.

On the basis of the above embodiment, optionally, the method further includes: and creating and storing a corresponding relation file based on the corresponding relation between the key characteristic field and the association identifier, wherein the corresponding relation file is used for restoring the target file.

In this embodiment, a corresponding relationship file is created, specifically, the key feature field and the association identifier may be associated and stored in a text file, so as to obtain the corresponding relationship file; for example: keyword. And restoring the target file based on the corresponding relation file to obtain an initial file. It is understood that the restoration process of the target file may be understood as the reverse process of replacing the key feature field with the original file to obtain the target file.

On the basis of the above embodiment, optionally, the method further includes: before the target file is used, identifying the association identifier in the target file, determining the key feature field corresponding to the association identifier based on the corresponding relation file, and replacing the corresponding association identifier with the matched key feature field to obtain the initial file.

In this embodiment, when the target file is used, the target file needs to be restored, the target file is restored to the initial file, specifically, the association identifier in the target file can be identified, the association identifier in the target file is obtained, the key feature field corresponding to the association identifier is matched in the corresponding relation file according to the association identifier, and the corresponding association identifier is replaced by the matched key field feature, so as to obtain the initial file.

For example, assume that address information after replacement processing in the target file is: 1/$0/7qb, the corresponding relation between key feature fields and associated identifiers in the corresponding relation file is 1-http://7qb.jdfmgt.com and $0-badCode, and the target file is subjected to reduction processing to obtain original address information: "http://7qb.jdfmgt.com/badCode/7qb".

According to the technical scheme, the corresponding association identifications are respectively set for the key feature fields extracted from the initial file, and the key feature fields are replaced by the association identifications, so that the target file is obtained, the data volume of URL data in the file is reduced, and the file volume is reduced.

Fig. 2 is a schematic structural diagram of a file processing device according to an embodiment of the present invention. As shown in fig. 2, the apparatus includes:

a feature field determining module 210, configured to obtain an initial file, and determine feature fields included in the initial file;

a key feature field determining module 220, configured to determine occupancy information of the feature fields in the initial file, and determine a key feature field based on the occupancy information of each feature field;

an association identifier setting module 230, configured to set an association identifier of the key feature field, where a length of the association identifier is smaller than a preset length;

and the target file determining module 240 is configured to replace a corresponding key feature field in the initial file based on the association identifier to obtain a target file, where the data size of the target file is smaller than the data size of the initial file.

According to the technical scheme, the target file is obtained by setting the corresponding association identifier for the key feature field extracted from the initial file and replacing the key feature field with the association identifier, and the data volume of URL data in the file is reduced, so that the file volume is reduced.

On the basis of the above embodiment, optionally, the initial file includes a plurality of address information; the feature field determining module 210 is configured to divide feature fields of each address information based on specific characters, so as to obtain feature fields included in each address information; and carrying out de-duplication processing on the characteristic field.

On the basis of the above embodiment, optionally, the information of occupation of the feature field in the initial file includes frequency of occurrence of the feature field in the initial file; the key feature field determining module 220 is configured to determine, as a key feature field, a feature field in which the occurrence frequency satisfies a frequency threshold.

On the basis of the above embodiment, optionally, the occupation information of the feature field in the initial file includes a field length of the feature field and a frequency of occurrence of the feature field in the initial file; the key feature field determining module 220 is further configured to determine, for any one of the feature fields, an amount of data of the feature field based on a field length and a frequency of occurrence of the feature field; and determining the characteristic field of which the data quantity meets the data quantity threshold value as a key characteristic field.

On the basis of the above embodiment, optionally, the length of the association identifier is smaller than the length of the association feature field.

On the basis of the above embodiment, optionally, the device further includes a correspondence file creating module, configured to create a correspondence file based on the correspondence between the key feature field and the association identifier, and store the correspondence file, where the correspondence file is used to restore the target file.

On the basis of the above embodiment, optionally, the apparatus further includes a target file restoring module, configured to identify an association identifier in the target file before the target file is used, determine a key feature field corresponding to the association identifier based on the corresponding relationship file, and replace the corresponding association identifier with the matched key feature field, so as to obtain the initial file.

The file processing device provided by the embodiment of the invention can execute the file processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 3, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a file processing method.

In some embodiments, the file processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the file processing method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the file processing method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

The computer program used to implement the document processing methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores computer instructions for causing a processor to execute a file processing method, the method comprising:

acquiring an initial file, and determining characteristic fields included in the initial file; determining occupation information of the feature fields in the initial file, and determining key feature fields based on the occupation information of the feature fields; setting an association identifier of the key feature field, wherein the length of the association identifier is smaller than a preset length; and replacing the corresponding key feature field in the initial file based on the association identifier to obtain a target file, wherein the data volume of the target file is smaller than that of the initial file.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A document processing method, comprising:

2. The method of claim 1, wherein the initial file includes a plurality of address information therein;

the determining the feature field included in the initial file includes:

dividing the characteristic fields of each piece of address information based on specific characters to obtain the characteristic fields included in each piece of address information;

and carrying out de-duplication processing on the characteristic field.

3. The method of claim 1, wherein the occupancy information of the feature field in the initial file comprises a frequency of occurrence of the feature field in the initial file;

4. The method of claim 1, wherein the occupancy information of the feature field in the initial file includes a field length of the feature field and a frequency of occurrence of the feature field in the initial file;

for any one of the feature fields, determining the data amount of the feature field based on the field length and the occurrence frequency of the feature field;

and determining the characteristic field of which the data quantity meets the data quantity threshold value as a key characteristic field.

5. The method of claim 1, wherein the association identifier has a length less than a length of an association characteristic field.

6. The method according to claim 1, wherein the method further comprises:

and creating and storing a corresponding relation file based on the corresponding relation between the key characteristic field and the association identifier, wherein the corresponding relation file is used for restoring the target file.

7. The method of claim 6, wherein the method further comprises:

before the target file is used, identifying the association identifier in the target file, determining the key feature field corresponding to the association identifier based on the corresponding relation file, and replacing the corresponding association identifier with the matched key feature field to obtain the initial file.

8. A document processing apparatus, comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the file processing method of any one of claims 1-7.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores computer instructions for causing a processor to implement the file processing method of any one of claims 1-7 when executed.