CN112783840B - Method and device for storing document, electronic equipment and storage medium - Google Patents

Method and device for storing document, electronic equipment and storage medium Download PDF

Info

Publication number
CN112783840B
CN112783840B CN202010511841.4A CN202010511841A CN112783840B CN 112783840 B CN112783840 B CN 112783840B CN 202010511841 A CN202010511841 A CN 202010511841A CN 112783840 B CN112783840 B CN 112783840B
Authority
CN
China
Prior art keywords
image
images
target
same
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010511841.4A
Other languages
Chinese (zh)
Other versions
CN112783840A (en
Inventor
邓斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN202010511841.4A priority Critical patent/CN112783840B/en
Publication of CN112783840A publication Critical patent/CN112783840A/en
Application granted granted Critical
Publication of CN112783840B publication Critical patent/CN112783840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the application provides a method, a device, electronic equipment and a storage medium for storing documents, and relates to the technical field of computers, wherein the method comprises the following steps: obtaining a target document to be stored, wherein the target document comprises a plurality of images, determining the same image in the images according to a preset image recognition rule, determining a target image in the same image, deleting other images except the target image in the same image, and modifying the reference identification of the other images in the target document into the reference identification of the target image. The application can effectively reduce the storage space occupied by the target document.

Description

Method and device for storing document, electronic equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, an electronic device, and a storage medium for storing a document.
Background
Currently, most people store information collected in their lives, works or studies in documents in the form of text or pictures. In storing a document, the electronic device will store the entire content contained in the document, i.e., the electronic device will store the pictures contained in the document.
Since in practice there may be multiple duplicate pictures in the same document. Based on the technical scheme, a plurality of repeated pictures can be stored in the electronic equipment, and the storage space of the electronic equipment can be wasted.
Disclosure of Invention
The embodiment of the application aims to provide a method, a device, electronic equipment and a storage medium for storing documents, so as to reduce the storage space occupied by the documents. The specific technical scheme is as follows:
in a first aspect, there is provided a method of storing a document, the method comprising:
A target document to be stored is acquired, the target document including a plurality of images.
And determining the same image in the plurality of images according to a preset image recognition rule.
In the same image, a target image is determined, other images except the target image in the same image are deleted, and the reference identification of the other images in the target document is modified to be the reference identification of the target image.
And storing the modified target document.
Optionally, before determining the same image in the plurality of images according to a preset image recognition rule, the method further includes:
And determining the page index where the plurality of images are located and the position information in the page.
The plurality of images are ordered according to the size of the data volume.
The images with the same data size in the images are divided into the same image group.
And determining the same image in the plurality of images according to a preset image recognition rule, wherein the image comprises.
For each image group, the same image is determined in the image group according to a preset image recognition rule.
Optionally, determining the same image in the plurality of images according to a preset image recognition rule includes:
For any two images in the plurality of images, a preset number of first pixel points are selected from a first image in the two images, and a second pixel point with the same position as the first pixel point is selected from a second image in the two images.
If the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position, the two images are determined to be different.
Otherwise, comparing all pixel points in the first image and the second image.
If the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, determining that the two images are different; otherwise, the two images are determined to be the same image.
Optionally, modifying the reference identifier of the other image in the target document to the reference identifier of the target image includes:
Target location information of other images in the target document is determined, wherein the target location information comprises page indexes and page locations of the other images in the page.
And modifying the reference identifier corresponding to the target position information into the reference identifier of the target image.
Optionally, the method further comprises:
when a display instruction of a corresponding target document input by a user is received, acquiring a target image according to a reference identifier of the target image corresponding to the target position information.
And displaying the target image at a position corresponding to the target position information in the target document.
In a second aspect, there is provided an apparatus for storing a document, the apparatus being applied to an electronic device, the apparatus comprising:
And the acquisition module is used for acquiring a target document to be stored, wherein the target document comprises a plurality of images.
The first determining module is used for determining the same image in the plurality of images according to a preset image recognition rule.
The first reference module is used for determining a target image in the same image, deleting other images except the target image in the same image, and modifying the reference identification of the other images in the target document into the reference identification of the target image.
And the storage module is used for storing the modified target document.
Optionally, the apparatus further includes:
and the second determining module is used for determining the page index where the plurality of images are located and the position information in the page.
And the ordering module is used for ordering the plurality of images according to the size of the data volume.
And the dividing module is used for dividing the images with the same data size in the images into the same image group.
The first determining module is specifically configured to determine, for each image group, an identical image in a plurality of images according to a preset image recognition rule, including:
For each image group, the same image is determined in the image group according to a preset image recognition rule.
Optionally, the first determining module is specifically configured to:
For any two images in the plurality of images, a preset number of first pixel points are selected from a first image in the two images, and a second pixel point with the same position as the first pixel point is selected from a second image in the two images.
If the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position, the two images are determined to be different.
Otherwise, comparing all pixel points in the first image and the second image.
If the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, determining that the two images are different; otherwise, the two images are determined to be the same image.
Optionally, the first reference module is specifically configured to:
Target location information of other images in the target document is determined, wherein the target location information comprises page indexes and page locations of the other images in the page.
And modifying the reference identifier corresponding to the target position information into the reference identifier of the target image.
Optionally, the apparatus further includes:
and the second reference module is used for acquiring the target image according to the reference identification of the target image corresponding to the target position information when receiving the display instruction of the corresponding target document input by the user.
And the display module is used for displaying the target image at a position corresponding to the target position information in the target document.
In a third aspect, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
A memory for storing a computer program;
and a processor, configured to implement the method steps described in the first aspect when executing the program stored in the memory.
In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the method steps according to the first aspect.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the first aspect described above.
The method, the device, the electronic equipment and the storage medium for storing the document can be applied to the electronic equipment, and the electronic equipment can acquire the target document to be stored, wherein the target document comprises a plurality of images. The electronic device can determine the same image in the plurality of images in the target document according to a preset image recognition rule. The electronic device will determine a target image in the same image, delete other images in the same image than the target image, and modify the reference identifier of the other images in the target document to the reference identifier of the target image. Therefore, when a plurality of identical images exist in the target document, only one image in the identical images is required to be stored, and the storage space occupied by the target document can be effectively reduced.
Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for storing a document according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for storing a document according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an apparatus for storing documents according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an apparatus for storing documents according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an apparatus for storing documents according to an embodiment of the present application;
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides a method for storing a document, which can be applied to an electronic device capable of reading PDF (Portable Document Format ) documents. The terminal may be a mobile terminal, such as a mobile phone; or the terminal may be a PC (personal computer ) terminal. The method for storing documents according to the embodiment of the present application will be described in detail with reference to the following specific embodiments, as shown in fig. 1, and the specific steps are as follows:
Step 101, obtaining a target document to be stored.
In the embodiment of the application, the electronic equipment can acquire the target document to be stored. The electronic equipment can acquire the target document in a wireless data transmission mode, for example, the electronic equipment can download a certain document from a network to serve as the target document to be stored; or the electronic equipment can acquire the target document in a wired data transmission mode such as USB, data wire and the like. The target document may be an electronic document, such as a PDF document. The target document may include a plurality of images, and may further include other data such as text, tables, and the like, which is not limited by the embodiment of the present application.
Step 102, determining the same image in the plurality of images according to a preset image recognition rule.
In the embodiment of the application, the electronic equipment can traverse the whole target document by taking the page as a unit to acquire the image contained in the target document. Then, the electronic device may determine the same image among the plurality of images according to a preset image recognition rule. The image recognition rule may be any recognition rule or recognition model with an image recognition function in the prior art, and the embodiment of the application is not limited.
Alternatively, the electronic device may group the images first and then determine the same image, and the specific processing procedure may be as follows:
step one, determining page indexes and position information in pages of all images in a target document.
In the embodiment of the application, before grouping all images in the target document, the electronic device determines the page index and the position information in the page of all images, wherein the position information specifically comprises a specific position and a specific page index, and meanwhile, when determining the page index and the position information in the page of all images, the electronic device determines the data size of all images.
And step two, ordering all the images in the target document according to the data size.
In the embodiment of the application, the data size of the image is the data size of the storage space occupied by the image. The electronic device may sort the images according to their data size. For example, the electronic device may sort the plurality of images in order of the data size from large to small, or may sort the plurality of images in order of the data size from small to large, where the sorting is in a sense of providing convenience for image grouping and improving efficiency of the whole method.
And thirdly, dividing the images with the same data size in all the images in the target document into the same image group.
In the embodiment of the application, after the electronic equipment sorts all the images, the images are grouped, the images with the same data size in all the images in the target document are divided into the same image group, the content of each image in the same image group can be different, but each image should meet the condition that the data size is the same, and the images with the same data size as each image in the image group cannot exist outside the image group.
And step four, determining the same image in each image group according to a preset image recognition rule.
In the embodiment of the application, after the electronic device sorts and groups all the images, the same image is determined in the image group according to the preset image recognition rule, and in the method, the same image means: images with the same data size, the same size and the same content.
The essence of the preset identification rule of the method is to compare any two images in the same image group, if the pixel values of the pixel points at the same position in any two images are equal, the content of any two images is judged to be the same, otherwise, the content of the two images is not the same.
If the data size, the size and the content of any two images in the same image group are the same, then judging that any two images in the same image group are the same, otherwise, judging that any two images in the same image group are different.
Optionally, the specific processing procedure of identifying the same image by the terminal device may include the following steps:
step one, for any two images in a plurality of images, selecting a preset number of first pixel points from a first image in the two images, and selecting a second pixel point with the same position as the first pixel point from a second image in the two images.
In the embodiment of the application, the electronic device can acquire any two images from a plurality of images contained in the target document, and the images to be compared can be called a first image and a second image for convenience of distinction. The electronic device may select a preset number of first pixels from the first image, and select a second pixel having the same position as the first pixel from the second image. In particular, the manner of selecting the pixel points may be varied, and two possible implementations are provided in the embodiments of the present application.
In one possible implementation, the electronic device may randomly select a preset number of first pixel points from the first image, determine a position (may be referred to as a first position) corresponding to each first pixel point, and then select, according to each first position, a pixel point at a first position (i.e., a second pixel point) from the second image.
In another possible implementation manner, the electronic device may be preset with a sampling rule of the pixel points, and the electronic device may collect the pixel points from the first image and the second image respectively through the same sampling rule. Thus, the preset number of pixel points with the same position can be acquired from the first image and the second image respectively. For example, 10% of the total number of pixels are acquired from the same image area of the first image and the second image.
And step two, if the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position, determining that the two images are different.
In the embodiment of the application, if the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position in the first image and the second image, the pixel value of at least one pixel point in the first image and the second image participating in comparison is different, the content of the first image and the content of the second image are different, and the electronic equipment can judge that the first image and the second image are different images.
And step three, if the pixel value of the first pixel point at the first position exists, the pixel value of the first pixel point at the first position is the same as the pixel value of the second pixel point at the first position. Comparing all the pixel points in the first image and the second image, and if the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, determining that the two images are different; otherwise, the two images are determined to be the same image.
In the embodiment of the application, if the pixel value of the first pixel point at the first position exists in the first image and the second image, the pixel value of the first pixel point at the first position is the same as the pixel value of the second pixel point at the first position, which means that the first image and the second image are possibly the same image, and the electronic device needs to further compare all the pixel points to determine.
The electronic equipment compares all pixel points in the first image and the second image, if the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, and if at least one pixel point in the first image and the second image is determined to be different, the content of the first image and the content of the second image are indicated to be different; otherwise, the first image and the second image are determined to be the same image.
For the case of comparing any two images, correspondingly, the electronic device may select a preset number of third pixels from the third images in any two images for any two images (the third image and the fourth image) in each group, and select a fourth pixel with the same position as the third pixels from the fourth image in the two images. If the pixel value of the third pixel point at the third position is different from the pixel value of the fourth pixel point at the third position, the two images are determined to be different. Otherwise, comparing all pixel points in the third image and the fourth image. If the pixel value of the third pixel point at the fourth position is different from the pixel value of the fourth pixel point at the fourth position, determining that the two images are different; otherwise, the two images are determined to be the same image.
Step 103, determining a target image in the same image, deleting other images except the target image in the same image, and modifying the reference identification of the other images in the target document into the reference identification of the target image.
In the embodiment of the present application, for a method for removing a duplicate image and retaining a target image, two specific methods are provided in the embodiment of the present application, including:
in one method, after determining that the same image exists in the image group, all the same images in the image group are listed in a second image group. A target image is determined in the second image group, the target image is not deleted, and after the target image is determined, all images except the target image in the second image group are deleted.
And if the two images identified by the identification rule provided by the embodiment of the application are different images, the images are not processed and the comparison is continuously carried out. If the two images identified by the identification rule provided by the embodiment of the application are the same, deleting one of the images, and referencing the target image at the deleted image position.
Optionally, after removing the duplicate images in the image group, modifying the reference identifier of the deleted duplicate image in the target document to the reference identifier of the target image, and specifically, the following steps are included:
step one, determining target position information of other images in a target document.
Wherein the target location information includes page indexes and page locations of other images in the page.
In the embodiment of the application, the other images are images except for the target image in the same image group, and the target position information is used for determining the specific positions of the other images so that the position information of the target image can be accurately referenced at the target position during the referencing.
And step two, modifying the reference identifier corresponding to the target position information into the reference identifier of the target image.
In the embodiment of the application, the reference mark mentioned in the step is the sequence number of each image in the document, and the sequence number can uniquely identify one image. When the image is referenced, the electronic equipment can acquire the corresponding relation between the preset position information and the serial number, and then the reference mark corresponding to the target position information in the corresponding relation is modified into the reference mark of the target image so as to reference the correct image at the correct position, and the reference mark has the effect of referencing the target image to the target position through the reference mark.
And step 104, storing the modified target document.
The embodiment of the application provides a method for storing a document. When the method is triggered, the electronic device performs deleting and referencing operations on the repeated images in the target document, and the data size of the target document is reduced because the repeated images are deleted. When storing a document, a user obtains a target document whose content is unchanged and whose data amount is smaller than that before storing. In the embodiment of the application, the final purpose of the method for storing the document by the electronic equipment is to reduce the size of the storage space occupied by the target document on the basis of not changing the display of each image in the target document, so that after the method for storing the document is used, the display content of the document is the same as the display content before the method for storing the document is used.
Optionally, the embodiment of the application further provides a process for displaying the target document, which specifically comprises the following steps:
Step one, when a display instruction of a corresponding target document input by a user is received, acquiring a target image according to an identifier corresponding to target position information.
In the embodiment of the application, when the target document processed by the method for storing the document by referring to the target image receives the display instruction of the corresponding target document, the target position is the position of other deleted images, and the target image and the other images are the same images.
And step two, displaying the target image at a position corresponding to the target position information in the target document.
In the embodiment of the application, when the electronic equipment displays the document, the acquired target image can be displayed at the position corresponding to the target position information in the target document. Because the target image and other images are the same image, the display content of the document is not changed after the electronic equipment refers to the target image, the document processed by the method is changed into a storage space, and the storage space is reduced, so that the purpose of saving storage space resources is achieved.
The embodiment of the application also provides an example of a method for storing a document, as shown in fig. 2, which includes:
step 201, obtaining a target document, traversing the whole document in units of pages, and determining position information of all images of each page, wherein the position information comprises page indexes of the images and positions of the images in the pages.
Step 202, sorting all the images according to the data size, screening out the images with the same data size, and dividing the images into a group to obtain an image group of the images with the same data size.
In step 203, two images are taken from the image group, and a part of pixels are taken at the same position of the two images.
Step 204, comparing the same number of pixel points at the same positions extracted in step 203, if the pixel values of the pixel points are not identical, and if the two images are proved to be different, ending; if the pixel values of the pixels are identical, and it is proved that the two images are possibly identical, step 205 is skipped for further comparison.
In step 205, all pixels in the two images are taken.
Step 206, comparing whether the pixel values of all the pixel points in the two images are equal, if the pixel values of all the pixel points in the two images are not completely equal, proving that the two images are not identical, ending; if the pixel values of all pixels of the two images are completely equal, the two images are completely identical, and step 207 is skipped.
In step 207, one of the images is deleted, and then the image that has not been deleted is referenced in the page in which the deleted image is located by using the position information of all the images of each page determined in step 201.
Step 208, it is detected whether all the images in the document are processed, and if the detection result is that all the images are not processed, the process returns to step 203 to perform the loop processing.
And step 209, if the detection result is that all the images are processed, storing the modified document.
In the embodiment of the application, the electronic equipment can acquire the target document to be stored, and the target document comprises a plurality of images. The electronic device can determine the same image in the plurality of images in the target document according to a preset image recognition rule. The electronic device will determine a target image in the same image, delete other images in the same image than the target image, and modify the reference identifier of the other images in the target document to the reference identifier of the target image. The electronic equipment stores the modified target document, so that the storage space occupied by the target document is effectively reduced.
Based on the same technical concept, the embodiment of the application also provides a device for storing documents, as shown in fig. 3, the device is applied to electronic equipment, and the device comprises:
the acquiring module 301 is configured to acquire a target document to be stored, where the target document includes a plurality of images.
The first determining module 302 is configured to determine the same image from the plurality of images according to a preset image recognition rule.
The first reference module 303 is configured to determine a target image in the same image, delete other images except the target image in the same image, and modify the reference identifier of the other images in the target document to the reference identifier of the target image.
And the storage module 304 is used for storing the modified target document.
Optionally, as shown in fig. 4, the apparatus further includes:
The second determining module 305 is configured to determine a page index where the plurality of images are located and location information in the page.
A sorting module 306, configured to sort the plurality of images according to the size of the data volume;
the dividing module 307 is configured to divide images with equal data sizes in the plurality of images into the same image group.
The first determining module 302 is specifically configured to determine, for each image group, an identical image in a plurality of images according to a preset image recognition rule, where the determining includes:
For each image group, the same image is determined in the image group according to a preset image recognition rule.
Optionally, the first determining module 302 is specifically configured to:
For any two images in the plurality of images, a preset number of first pixel points are selected from a first image in the two images, and a second pixel point with the same position as the first pixel point is selected from a second image in the two images.
If the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position, the two images are determined to be different.
Otherwise, comparing all pixel points in the first image and the second image.
If the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, determining that the two images are different; otherwise, the two images are determined to be the same image.
Optionally, the first reference module 303 is specifically configured to:
Target location information of other images in the target document is determined, wherein the target location information comprises page indexes and page locations of the other images in the page.
And modifying the reference identifier corresponding to the target position information into the reference identifier of the target image.
Optionally, as shown in fig. 5, the apparatus further includes:
and the second reference module 308 is configured to obtain the target image according to the reference identifier of the target image corresponding to the target position information when receiving the display instruction of the corresponding target document input by the user.
A display module 309, configured to display the target image at a location corresponding to the target location information in the target document.
In the embodiment of the application, the electronic equipment can acquire the target document to be stored, and the target document comprises a plurality of images. The electronic device can determine the same image in the plurality of images in the target document according to a preset image recognition rule. The electronic device will determine a target image in the same image, delete other images in the same image than the target image, and modify the reference identifier of the other images in the target document to the reference identifier of the target image. The electronic equipment stores the modified target document, so that the storage space occupied by the target document can be effectively reduced.
The embodiment of the application also provides an electronic device, as shown in fig. 6, which comprises a processor 601, a communication interface 602, a memory 603 and a communication bus 604, wherein the processor 601, the communication interface 602 and the memory 603 complete communication with each other through the communication bus 604,
A memory 603 for storing a computer program;
The processor 601 is configured to execute the program stored in the memory 603, and implement the following steps:
A target document to be stored is acquired, the target document including a plurality of images.
And determining the same image in the plurality of images according to a preset image recognition rule.
In the same image, a target image is determined, other images except the target image in the same image are deleted, and the reference identification of the other images in the target document is modified to be the reference identification of the target image.
And storing the modified target document.
Optionally, before determining the same image in the plurality of images according to a preset image recognition rule, the method further includes:
And determining the page index where the plurality of images are located and the position information in the page.
The plurality of images are ordered according to the size of the data volume.
The images with the same data size in the images are divided into the same image group.
According to a preset image recognition rule, determining the same image in the plurality of images comprises:
For each image group, the same image is determined in the image group according to a preset image recognition rule.
Optionally, determining the same image in the plurality of images according to a preset image recognition rule includes:
For any two images in the plurality of images, a preset number of first pixel points are selected from a first image in the two images, and a second pixel point with the same position as the first pixel point is selected from a second image in the two images.
If the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position, the two images are determined to be different.
Otherwise, comparing all pixel points in the first image and the second image.
If the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, determining that the two images are different; otherwise, the two images are determined to be the same image.
Optionally, modifying the reference identifier of the other image in the target document to the reference identifier of the target image includes:
Target location information of other images in the target document is determined, wherein the target location information comprises page indexes and page locations of the other images in the page.
And modifying the reference identifier corresponding to the target position information into the reference identifier of the target image.
Optionally, the method further comprises:
when a display instruction of a corresponding target document input by a user is received, acquiring a target image according to a reference identifier of the target image corresponding to the target position information.
And displaying the target image at a position corresponding to the target position information in the target document.
The communication bus mentioned by the network device may be a peripheral component interconnect standard (english: PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (english: extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface is used for communication between the network device and other devices.
The Memory may include random access Memory (RAM, english: random Access Memory) or nonvolatile Memory (NVM, english: non-Volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (english: centralProcessing Unit, abbreviated as CPU), a network processor (english: network Processor, abbreviated as NP), etc.; it may also be a digital signal processor (English: DIGITAL SIGNAL Processing: DSP), an Application specific integrated Circuit (English: application SPECIFIC INTEGRATED Circuit: ASIC), a Field Programmable gate array (English: field-Programmable GATE ARRAY; FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
Based on the same technical idea, the embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes the steps of the method for storing the document when being executed by a processor.
Based on the same technical idea, an embodiment of the present application also provides a computer program product containing instructions, which when run on a computer, cause the computer to perform the above-mentioned method steps of storing a document.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (8)

1. A method of storing a document, the method comprising:
acquiring a target document to be stored, wherein the target document comprises a plurality of images;
Traversing the whole target document by taking a page as a unit, acquiring images contained in the target document, determining page indexes and position information in pages of the images, sorting the images according to the data size, and dividing the images with the same data size into the same image group;
Determining the same image in the plurality of images according to a preset image recognition rule; the same image refers to: images with the same data size, the same size and the same content;
Determining a target image in the same image, deleting other images except the target image in the same image, and modifying the reference mark of the other images in the target document into the reference mark of the target image; the reference mark is the sequence number of each image in the target document, and the sequence number of each image is used for uniquely identifying one image;
Storing the modified target document;
Wherein, the determining the same image in the plurality of images according to the preset image recognition rule includes:
For each image group, determining the same image in the image group according to a preset image recognition rule;
the determining the same image in the plurality of images according to a preset image recognition rule includes:
for any two images in the plurality of images, selecting a preset number of first pixel points from a first image in the two images, and selecting a second pixel point with the same position as the first pixel point from a second image in the two images;
If the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position, determining that the two images are different;
otherwise, comparing all pixel points in the first image and the second image;
If the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, determining that the two images are different; otherwise, the two images are determined to be the same image.
2. The method of claim 1, wherein the modifying the reference identifier of the other image in the target document to the reference identifier of the target image comprises:
determining target position information of the other images in the target document, wherein the target position information comprises page indexes and page positions of the other images in pages;
and modifying the reference identifier corresponding to the target position information into the reference identifier of the target image.
3. The method according to claim 2, wherein the method further comprises:
When a display instruction corresponding to the target document input by a user is received, acquiring the target image according to the reference identifier of the target image corresponding to the target position information;
and displaying the target image at a position corresponding to the target position information in the target document.
4. An apparatus for storing documents, the apparatus comprising:
the system comprises an acquisition module, a storage module and a storage module, wherein the acquisition module is used for acquiring a target document to be stored, and the target document comprises a plurality of images;
The dividing module is used for traversing the whole target document by taking a page as a unit, acquiring images contained in the target document, determining page indexes where the images are positioned and position information in the page, sorting the images according to the data size, and dividing the images with the same data size in the images into the same image group;
The first determining module is used for determining the same image in the plurality of images according to a preset image recognition rule; the same image refers to: images with the same data size, the same size and the same content;
The first reference module is used for determining a target image in the same image, deleting other images except the target image in the same image, and modifying the reference identification of the other images in the target document into the reference identification of the target image; the reference mark is the sequence number of each image in the target document, and the sequence number of each image is used for uniquely identifying one image;
The storage module is used for storing the modified target document;
The first determining module is specifically configured to determine, for each image group, the same image in the image group according to a preset image recognition rule;
The first determining module is specifically configured to:
for any two images in the plurality of images, selecting a preset number of first pixel points from a first image in the two images, and selecting a second pixel point with the same position as the first pixel point from a second image in the two images;
If the pixel value of the first pixel point at the first position is different from the pixel value of the second pixel point at the first position, determining that the two images are different;
otherwise, comparing all pixel points in the first image and the second image;
If the pixel value of the first pixel point at the second position is different from the pixel value of the second pixel point at the second position, determining that the two images are different; otherwise, the two images are determined to be the same image.
5. The apparatus according to claim 4, wherein the first referencing module is specifically configured to:
determining target position information of the other images in the target document, wherein the target position information comprises page indexes and page positions of the other images in pages;
and modifying the reference identifier corresponding to the target position information into the reference identifier of the target image.
6. The apparatus of claim 5, wherein the apparatus further comprises:
the second reference module is used for acquiring the target image according to the reference identifier of the target image corresponding to the target position information when receiving the display instruction corresponding to the target document input by the user;
And the display module is used for displaying the target image at a position corresponding to the target position information in the target document.
7. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
A memory for storing a computer program;
A processor for carrying out the method steps of any one of claims 1-3 when executing a program stored on a memory.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-3.
CN202010511841.4A 2020-06-08 2020-06-08 Method and device for storing document, electronic equipment and storage medium Active CN112783840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010511841.4A CN112783840B (en) 2020-06-08 2020-06-08 Method and device for storing document, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010511841.4A CN112783840B (en) 2020-06-08 2020-06-08 Method and device for storing document, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112783840A CN112783840A (en) 2021-05-11
CN112783840B true CN112783840B (en) 2024-06-25

Family

ID=75750085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010511841.4A Active CN112783840B (en) 2020-06-08 2020-06-08 Method and device for storing document, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112783840B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543062A (en) * 2018-09-29 2019-03-29 中国平安人寿保险股份有限公司 Image processing method, system, computer installation and readable storage medium storing program for executing

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040012601A1 (en) * 2002-07-18 2004-01-22 Sang Henry W. Method and system for displaying a first image as a second image
JP4757001B2 (en) * 2005-11-25 2011-08-24 キヤノン株式会社 Image processing apparatus and image processing method
CN106203459B (en) * 2015-04-29 2020-05-12 腾讯科技(深圳)有限公司 Picture processing method and device
JP7013182B2 (en) * 2017-09-21 2022-01-31 キヤノン株式会社 Information processing equipment, information processing methods and programs
CN110545427A (en) * 2018-05-28 2019-12-06 北京金山办公软件股份有限公司 PDF document compression method and device and electronic equipment
CN110807300A (en) * 2018-07-18 2020-02-18 广州金山移动科技有限公司 Image processing method and device, electronic equipment and medium
CN110941589A (en) * 2018-09-21 2020-03-31 珠海金山办公软件有限公司 Picture exporting method and device, electronic equipment and readable storage medium
CN111199144B (en) * 2018-10-30 2024-03-26 珠海金山办公软件有限公司 Document content altering method and device, electronic equipment and readable storage medium
CN110163301A (en) * 2019-05-31 2019-08-23 北京金山云网络技术有限公司 A kind of classification method and device of image

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543062A (en) * 2018-09-29 2019-03-29 中国平安人寿保险股份有限公司 Image processing method, system, computer installation and readable storage medium storing program for executing

Also Published As

Publication number Publication date
CN112783840A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN107609186B (en) Information processing method and device, terminal device and computer readable storage medium
CN110222791B (en) Sample labeling information auditing method and device
CN111767713B (en) Keyword extraction method and device, electronic equipment and storage medium
CN107885449B (en) Photographing search method and device, terminal equipment and storage medium
CN107885875B (en) Synonymy transformation method and device for search words and server
CN115357155A (en) Window identification method, device, equipment and computer readable storage medium
CN108133048B (en) File sorting method and device and mobile terminal
CN106033417B (en) Method and device for sequencing series of video search
CN112783840B (en) Method and device for storing document, electronic equipment and storage medium
CN112329409B (en) Cell color conversion method and device and electronic equipment
CN110825947A (en) URL duplicate removal method, device, equipment and computer readable storage medium
CN107071553B (en) Method, device and computer readable storage medium for modifying video and voice
CN109002446B (en) Intelligent sorting method, terminal and computer readable storage medium
CN110674330B (en) Expression management method and device, electronic equipment and storage medium
CN110717109B (en) Method, device, electronic equipment and storage medium for recommending data
CN111813971B (en) Hash table construction and image matching method and device, storage medium and electronic equipment
CN111880776A (en) Hierarchical relationship obtaining method and device and electronic equipment
CN110889279B (en) Method and device for displaying display information in document
CN110795914B (en) Method and device for converting PDF document into picture and electronic equipment
CN111881356A (en) Content recommendation method and device, electronic equipment and storage medium
CN110989892B (en) Text display method and device, electronic equipment and storage medium
CN116301655B (en) Method, system and readable storage medium for loading historical note pictures
CN113568578B (en) Picture processing method and device, electronic equipment and readable storage medium
CN110543623B (en) PDF document display method and device
CN116804967A (en) Test case screening method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant