CN117236291A - Method and system for rapidly converting scanned file into vector layout file - Google Patents

Method and system for rapidly converting scanned file into vector layout file Download PDF

Info

Publication number
CN117236291A
CN117236291A CN202311523381.7A CN202311523381A CN117236291A CN 117236291 A CN117236291 A CN 117236291A CN 202311523381 A CN202311523381 A CN 202311523381A CN 117236291 A CN117236291 A CN 117236291A
Authority
CN
China
Prior art keywords
position information
target
data
data point
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311523381.7A
Other languages
Chinese (zh)
Other versions
CN117236291B (en
Inventor
李超
朱静宇
赵云
张伟
庄玉龙
陆猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dianju Information Technology Co ltd
Original Assignee
Beijing Dianju Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dianju Information Technology Co ltd filed Critical Beijing Dianju Information Technology Co ltd
Priority to CN202311523381.7A priority Critical patent/CN117236291B/en
Publication of CN117236291A publication Critical patent/CN117236291A/en
Application granted granted Critical
Publication of CN117236291B publication Critical patent/CN117236291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to the technical field of data compression, in particular to a method and a system for rapidly converting a scanned file into a vector format file, wherein the method comprises the following steps: obtaining a two-dimensional matrix of a vector layout file; acquiring a target subarea according to the two-dimensional matrix of the vector layout file; acquiring all the position information groups in each target subarea and initial data points of each position information group; acquiring a position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set; acquiring a compression starting point of each target sub-region according to the preference degree of the position information set and the initial data point of each position information set; and obtaining a compression result of the vector layout file according to the compression starting point of each target subarea. The method and the device improve the compression effect of the vector layout file by calculating the repeatability of the position information data in the vector layout file.

Description

Method and system for rapidly converting scanned file into vector layout file
Technical Field
The invention relates to the technical field of data compression, in particular to a method and a system for rapidly converting a scanned file into a vector format file.
Background
A layout file refers to a file in which all text data in a document are combined and arranged in a certain format. For example, the disclosed red header files generally contain a large amount of text and formatting information, if the red header files are to be published, the files are displayed as scanned files, noise and distortion can occur when the files are amplified, and complete conversion from paper files to electronic files cannot be realized. Therefore, when the layout file is stored and transmitted, the layout file needs to be vectorized.
When vector data is stored, the coordinate position information of each vector data needs to be stored to describe the position and shape of the vector data, so that the vector layout file data can become very large, the data volume of the large vector file data is overlarge during storage, a large amount of storage space is occupied, the coordinate position information of the vector file data needs to be compressed, so that the storage space is saved, the transmission bandwidth is reduced, and the transmission efficiency is improved.
Disclosure of Invention
The invention provides a method and a system for rapidly converting a scanned file into a vector format file, which are used for solving the existing problems: the data volume occupied by the vector layout file is overlarge, which is unfavorable for the storage and transmission of vector data.
The invention relates to a method and a system for rapidly converting a scanned file into a vector format file, which adopts the following technical scheme:
one embodiment of the invention provides a method for rapidly converting a scanned file into a vector layout file, which comprises the following steps:
acquiring a two-dimensional matrix of a vector layout file and target data points;
acquiring a target subarea according to the two-dimensional matrix of the vector layout file; acquiring position information data from any target data point to another target data point in the target subregion;
acquiring all the position information groups in each target subarea and the initial data point of each position information group according to the position information data from any target data point to another target data point in the target subarea; acquiring a position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set;
acquiring a compression starting point of each target sub-region according to the preference degree of the position information set and the initial data point of each position information set;
and obtaining a compression result of the vector layout file according to the compression starting point of each target subarea.
Preferably, the method for obtaining the two-dimensional matrix of the vector layout file and the target data point includes the following specific steps:
scanning a paper file through a file scanner to obtain a scanned file matrix, and acquiring a specific position of a text in the scanned file by utilizing an optical character recognition technology; setting the data value of the data point at the text position in the scanning file matrix to be 1, and marking the data point as a target data point; the data value of the data point which is not at the text position in the scanning file matrix is set to 0 and is recorded as a blank data point; and obtaining a two-dimensional matrix of the vector layout file.
Preferably, the method for obtaining the target sub-region according to the two-dimensional matrix of the vector layout file includes the following specific steps:
in the two-dimensional matrix of the vector layout file, if two target data points are adjacent, the two target data points are classified into the same target subarea, and a plurality of target subareas are obtained.
Preferably, the method for acquiring the position information data from any target data point to another target data point in the target subregion includes the following specific steps:
the horizontal rightward direction is recorded as a reference direction; acquiring an included angle between a ray from any target data point in each target subarea to another target data point and a reference direction, and taking the included angle as a direction angle from any target data point in each target subarea to another target data point;
acquiring Euclidean distance from any target data point to another target data point in each target subregion, and obtaining the distance from any target data point to another target data point in the target subregion;
the direction angle and distance of any target data point to another target data point are recorded as the position information data of any target data point to another target data point.
Preferably, the method for acquiring all the position information groups in each target subarea and the initial data point of each position information group includes the following specific steps:
for the firstA target subarea, acquiring the position information data from the first target data point to all other target data points, classifying the position information data from the first target data point to all other target data points into a group, and marking the group as +.>A first set of location information in the target subregion; and the first target data point is marked as +.>A first set of location information for a first set of location information in the target subregion;
acquiring the position information data from the second target data point to all other target data points, grouping the position information data from the second target data point to all other target data points, and marking the position information data as the first groupA second set of location information in the target subregion; and the second target data point is marked as +.>A starting data point of a second set of location information in the target subregion;
and so on until the position information data from the last target data point to all other target data points is obtained, and the position information data from the last target data point to all other target data points are grouped and recorded as the firstThe last one of the target subregions is the first position information group; and the first last target data point is marked as the first +.>A starting data point of the penultimate position information set in the target subregion;
all the sets of location information in each target sub-area are obtained, along with the starting data point for each set of location information.
Preferably, the acquiring a position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set, which comprises the following specific methods:
randomly selecting a position information group from each target subarea, classifying the position information data in the selected position information group into the same position information set, and classifying the position information data which are completely the same in the set into the same position information data; and counting the occurrence frequency of each type of position information data in the position information set and the quantity of the position information data in the position information set for the same position information set, and acquiring the preference degree of the position information set according to the type quantity of the position information data in the position information set, the occurrence frequency of each type of position information data in the position information set and the quantity of the position information data in the position information set.
Preferably, the obtaining the preference degree of the location information set includes the following specific method calculation formula:
in the method, in the process of the invention,indicate->The degree of preference of the set of location information; />Indicate->The number of location information data in the set of location information; />Indicate->Seed position information data at->Frequency of occurrence in the set of location information; />Indicate->The number of categories of location information data in the set of individual location information; />An exponential function based on a natural constant; />A logarithmic function in 2 bases is represented.
Preferably, the method for obtaining the compression starting point of each target sub-region according to the preference degree of the position information set and the starting data point of each position information set includes the following specific steps:
selecting the position information set with the highest preference degree as the optimal position information set; recording all the position information groups forming the optimal position information set as compressed data groups; and taking the initial data point of each compressed data set as a compression starting point of the target subarea corresponding to each compressed data set.
Preferably, the method for obtaining the compression result of the vector layout file according to the compression start point of each target sub-region includes the following specific steps:
recording the compression starting point position of each target subarea; and carrying out Huffman coding operation on the position information data in the optimal position information set, constructing a coding tree, and compressing the position information data from the compression starting point of each target sub-region to other target data points to obtain the compression result of the vector layout file.
The embodiment of the invention provides a system for rapidly converting a scanned file into a vector layout file, which comprises a data acquisition module, a data dividing module, a data analysis module, a data selection module and a data compression module, wherein:
the data acquisition module is used for acquiring a two-dimensional matrix of the vector layout file;
the data dividing module is used for acquiring a target subarea according to the two-dimensional matrix of the vector layout file; acquiring position information data from any target data point to another target data point in the target subregion;
the data analysis module is used for acquiring all the position information groups in each target subarea and initial data points of each position information group; acquiring a position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set;
the data selection module is used for acquiring a compression starting point of each target subarea according to the preference degree of the position information set and the starting data point of each position information group;
and the data compression module is used for acquiring the compression result of the vector layout file according to the compression starting point of each target subarea.
The technical scheme of the invention has the beneficial effects that: because a large number of angles and distances exist in the vector format file, a large amount of space is occupied when the vector format file is stored, and the transmission efficiency is low when the vector format file is transmitted; the invention provides a method for quickly converting a vector layout file of a scanned file, which aims to reduce the storage space for storing the vector layout file and improve the transmission efficiency of transmitting the vector layout file by compressing the vector layout file.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a method for fast converting a vector layout file for a scanned file according to the present invention;
FIG. 2 is a block diagram of a system for fast converting a scanned document into a vector layout document according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a method and a system for quickly converting a scanned file into a vector layout file according to the invention, which are detailed in the following, and the specific implementation, structure, characteristics and effects are described in detail in the following. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The method and the system for quickly converting the scanned file into the vector format file are specifically described below with reference to the accompanying drawings.
Referring to fig. 1, a flowchart of a method for quickly converting a vector layout file for a scanned file according to an embodiment of the invention is shown, the method includes the following steps:
step S001: and obtaining a two-dimensional matrix of the vector layout file and target data points.
It should be noted that when converting a paper document into an electronic document, the situation that noise and distortion occur in the electronic document often occurs due to the problem of scaling of the paper document, resulting in poor quality of the electronic document, so that the scanned paper document is usually converted into a vector format document, that is, the scanned document is converted into the vector format document, so as to improve the quality of the electronic document.
It should be further noted that, because there are a large number of angles and distances in the vector layout file, when the vector layout file is stored, a large amount of storage space is occupied, and the transmission efficiency is low when the vector layout file is transmitted; therefore, the embodiment provides a method for quickly converting the vector layout file into the scanned file, which aims to reduce the storage space for storing the vector layout file and improve the transmission efficiency of transmitting the vector layout file by compressing the vector layout file. It is therefore first necessary to obtain a two-dimensional matrix of the vector layout file.
Specifically, scanning a paper document by a document scanner to obtain a scanned document matrix, and acquiring a specific position of a text in the scanned document by using an optical character recognition technology; setting the data value of the data point at the text position in the scanning file matrix to be 1, and marking the data point as a target data point; the data value of the data point which is not at the text position in the scanning file matrix is set to 0 and is recorded as a blank data point; obtaining a two-dimensional matrix of the vector layout file; since the optical character recognition is a well-known prior art, the description thereof is omitted in this embodiment.
So far, a two-dimensional matrix of the vector layout file is obtained.
Step S002: acquiring a target subarea according to the two-dimensional matrix of the vector layout file; position information data of any target data point to another target data point in the target subregion is acquired.
It should be noted that, the present embodiment is used as a method for quickly converting a vector format file by using a scanned file, and aims to reduce a storage space for storing the vector format file and improve transmission efficiency of transmitting the vector format file by compressing the vector format file; because only blank data points and target data points exist in the vector layout file matrix, and the number of blank data points in the vector layout file matrix is far greater than the number of target data points, the compression of the vector layout file can be realized only by compressing the target data points in the vector layout file matrix. A target sub-region of each of the two-dimensional matrices of the vector layout file comprised of target data points is first acquired.
Specifically, in the matrix of the vector layout file, if two target data points are adjacent, the two target data points are classified into the same target sub-area, and if the two target data points are not adjacent, the two target data points cannot be classified into the same target sub-area, so that a plurality of target sub-areas are obtained.
It should be further noted that, the vector data in the two-dimensional matrix of the vector layout file is the direction angle and the distance between each data point in the vector layout file and other data points, and the method for quickly converting the vector layout file by using the embodiment as a method for scanning the file needs to obtain the direction angle and the distance between each target data point in all target subareas and other target data points in the two-dimensional matrix of the vector layout file.
Specifically, the horizontal rightward direction is referred to as the reference direction; the specific calculation formula for acquiring the included angle between the ray of any target data point in each target subregion and the other target data point and the reference direction is as follows:
in the method, in the process of the invention,representing the>Target data point to->The direction angle of the ray of each target data point and the reference direction; />Representing the>The number of columns of the target data points in the matrix of the vector layout file; />Representing the>The number of columns of the target data points in the matrix of the vector layout file; />Representing the>The number of rows of the target data points in the matrix of the vector layout file; />Representing the>The number of rows of the target data points in the matrix of the vector layout file; />Representing an arctangent function; />Representing a signed function.
And acquiring Euclidean distance from any target data point to another target data point in each target subregion, wherein the Pythagorean theorem is a known technology, so that repeated description is omitted in the embodiment, and the distance from any target data point to another target data point in the target subregion is obtained.
The direction angle and the distance from any target data point to another target data point in the target subarea are obtained, and the direction angle and the distance from any target data point to another target data point are recorded as position information data from any target data point to another target data point.
Step S003: acquiring all the position information groups in each target subarea and the initial data point of each position information group according to the position information data from any target data point to another target data point in the target subarea; acquiring a position information set according to all the position information groups in each target subarea; the degree of preference of the set of location information is calculated.
It should be noted that, the present embodiment is used as a method for quickly converting a vector format file by using a scanned file, and aims to reduce a storage space for storing the vector format file and improve transmission efficiency of transmitting the vector format file by compressing the vector format file; meanwhile, in each target subregion obtained in the step S002, the position information data from any target data point to another target data point has certain repeatability, and the greater the repeatability, the better the compression effect; therefore, the optimization degree of different position information data can be obtained according to the repeatability of the position information data, so that the compression effect of the vector format file is improved.
Specifically, for the firstIndividual target subregionsAcquiring position information data from the first target data point to all other target data points, classifying the position information data from the first target data point to all other target data points into a group, and marking the group as +.>A first set of location information in the target subregion; and the first target data point is marked as +.>A first set of location information for a first set of location information in the target subregion;
acquiring the position information data from the second target data point to all other target data points, grouping the position information data from the second target data point to all other target data points, and marking the position information data as the first groupA second set of location information in the target subregion; and the second target data point is marked as +.>A starting data point of a second set of location information in the target subregion;
and so on until the position information data from the last target data point to all other target data points is obtained, and the position information data from the last target data point to all other target data points are grouped and recorded as the firstThe last one of the target subregions is the first position information group; and the first last target data point is marked as the first +.>The starting data point of the penultimate set of positional information in the target subregion.
Similarly, all sets of location information in each target sub-area are acquired, along with the starting data point for each set of location information.
Then randomly selecting one position information group from each target subarea, classifying the position information data in the selected position information group into the same position information set, and classifying the position information data which are completely the same in the set into the same position information data; for the same position information set, counting the occurrence frequency of each position information data in the position information set and the number of the position information data in the position information set, and acquiring the preference degree of the position information set according to the type number of the position information data in the position information set, the occurrence frequency of each position information data in the position information set and the number of the position information data in the position information set, wherein the specific calculation formula is as follows:
in the method, in the process of the invention,indicate->The degree of preference of the set of location information; />Indicate->The number of location information data in the set of location information; />Indicate->Seed position information data at->Frequency of occurrence in the set of location information; />Indicate->The number of categories of location information data in the set of individual location information; />An exponential function based on a natural constant; />A logarithmic function in 2 bases is represented.
Since the number of pieces of positional information data included in all the positional information groups in each target sub-area is equal, the number of pieces of positional information data included in each positional information set is equal, and therefore, as the number of pieces of positional information data in the positional information set increases, the repeatability of the positional information set becomes lower, that is, the degree of preference of the positional information set becomes lower.Indicate->The information entropy of the set of location information,the smaller is->The lower the information entropy of the individual position information sets, the better the compression effect of the compression by entropy coding, and therefore +.>The smaller the value of (c), the higher the compression effect, i.e., the higher the degree of preference. Thus->The greater the value of +.>The higher the preference of the individual sets of location information.
So far, the preference degree of all the position information sets is obtained.
Step S004: and acquiring a compression starting point of each target sub-region according to the preference degree of the position information set and the starting data point of each position information group.
It should be noted that, the preferred degree of all the location information sets is obtained through step S003, that is, the optimal location information set is obtained according to the preferred degree of all the location information sets, and then the compression starting point of each target sub-area is obtained according to the optimal location information set.
Specifically, selecting the position information set with the highest preference degree as the optimal position information set; recording all the position information groups forming the optimal position information set as compressed data groups; and taking the initial data point of each compressed data set as a compression starting point of the target subarea corresponding to each compressed data set.
So far, a compression starting point of each target sub-region is obtained.
Step S005: and obtaining a compression result of the vector layout file according to the compression starting point of each target subarea.
It should be noted that, according to step S004, a compression start point of each target sub-region is obtained, and then each target sub-region can be compressed according to the compression start point of each target sub-region, so as to obtain a compression result of the vector layout file.
Specifically, the compression starting point position of each target subarea is recorded; and carrying out Huffman coding operation on the position information data in the optimal position information set, constructing a coding tree, and compressing the position information data from the compression starting point of each target subarea to other target data points to obtain a compression result of the vector layout file, wherein the Huffman coding operation is a well-known prior art, so that redundant description is omitted in the embodiment.
This embodiment is completed.
Referring to fig. 2, a block diagram of a system for fast converting a vector layout file of a scanned file according to an embodiment of the present invention is shown, where the system includes the following modules:
the data acquisition module is used for acquiring a two-dimensional matrix of the vector layout file;
the data dividing module is used for acquiring a target subarea according to the two-dimensional matrix of the vector layout file; acquiring position information data from any target data point to another target data point in the target subregion;
the data analysis module is used for acquiring all the position information groups in each target subarea and initial data points of each position information group; acquiring a position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set;
the data selection module is used for acquiring a compression starting point of each target subarea according to the preference degree of the position information set and the starting data point of each position information group;
the data compression module is configured to obtain a compression result of the vector layout file according to the compression start point of each target sub-region.
The technical scheme of the invention has the beneficial effects that: because a large number of angles and distances exist in the vector format file, a large amount of space is occupied when the vector format file is stored, and the transmission efficiency is low when the vector format file is transmitted; the invention provides a method for quickly converting a vector layout file of a scanned file, which aims to reduce the storage space for storing the vector layout file and improve the transmission efficiency of transmitting the vector layout file by compressing the vector layout file.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the invention, but any modifications, equivalent substitutions, improvements, etc. within the principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for rapidly converting a scanned file into a vector layout file, the method comprising the steps of:
acquiring a two-dimensional matrix of a vector layout file and target data points;
acquiring a target subarea according to the two-dimensional matrix of the vector layout file; acquiring position information data from any target data point to another target data point in the target subregion;
acquiring all the position information groups in each target subarea and the initial data point of each position information group according to the position information data from any target data point to another target data point in the target subarea; acquiring a position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set;
acquiring a compression starting point of each target sub-region according to the preference degree of the position information set and the initial data point of each position information set;
and obtaining a compression result of the vector layout file according to the compression starting point of each target subarea.
2. The method for quickly converting a vector layout file into a scanned file according to claim 1, wherein the method for obtaining the two-dimensional matrix of the vector layout file and the target data points comprises the following specific steps:
scanning a paper file through a file scanner to obtain a scanned file matrix, and acquiring a specific position of a text in the scanned file by utilizing an optical character recognition technology; setting the data value of the data point at the text position in the scanning file matrix to be 1, and marking the data point as a target data point; the data value of the data point which is not at the text position in the scanning file matrix is set to 0 and is recorded as a blank data point; and obtaining a two-dimensional matrix of the vector layout file.
3. The method for rapidly converting a vector layout file into a scanned file according to claim 2, wherein the obtaining the target sub-region according to the two-dimensional matrix of the vector layout file comprises the following specific steps:
in the two-dimensional matrix of the vector layout file, if two target data points are adjacent, the two target data points are classified into the same target subarea, and a plurality of target subareas are obtained.
4. A method for quickly converting a scanned file into a vector layout file according to claim 3, wherein the method for obtaining the position information data from any target data point to another target data point in the target subregion comprises the following specific steps:
the horizontal rightward direction is recorded as a reference direction; acquiring an included angle between a ray from any target data point in each target subarea to another target data point and a reference direction, and taking the included angle as a direction angle from any target data point in each target subarea to another target data point;
acquiring Euclidean distance from any target data point to another target data point in each target subregion, and obtaining the distance from any target data point to another target data point in the target subregion;
the direction angle and distance of any target data point to another target data point are recorded as the position information data of any target data point to another target data point.
5. The method for quickly converting a vector layout file according to claim 1, wherein the step of obtaining all the position information groups in each target subregion and the start data point of each position information group comprises the following specific steps:
for the firstA target subarea, acquiring the position information data from the first target data point to all other target data points, classifying the position information data from the first target data point to all other target data points into a group, and marking the group as +.>A first set of location information in the target subregion; and the first target data point is marked as +.>A first set of location information for a first set of location information in the target subregion;
acquiring the position information data from the second target data point to all other target data points, grouping the position information data from the second target data point to all other target data points, and marking the position information data as the first groupA second set of location information in the target subregion; and the second target data point is marked as +.>A starting data point of a second set of location information in the target subregion;
and so on until the position information data from the last target data point to all other target data points is obtained, and the position information data from the last target data point to all other target data points are grouped and recorded as the firstThe last one of the target subregions is the first position information group; and the first last target data point is marked as the first +.>A starting data point of the penultimate position information set in the target subregion;
all the sets of location information in each target sub-area are obtained, along with the starting data point for each set of location information.
6. The method for quickly converting a vector layout file according to claim 1, wherein the acquiring the position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set, which comprises the following specific methods:
randomly selecting a position information group from each target subarea, classifying the position information data in the selected position information group into the same position information set, and classifying the position information data which are completely the same in the set into the same position information data; and counting the occurrence frequency of each type of position information data in the position information set and the quantity of the position information data in the position information set for the same position information set, and acquiring the preference degree of the position information set according to the type quantity of the position information data in the position information set, the occurrence frequency of each type of position information data in the position information set and the quantity of the position information data in the position information set.
7. The method for quickly converting a vector layout file for a scanned file according to claim 6, wherein the obtaining the preference degree of the location information set comprises the following specific method calculation formula:
in the method, in the process of the invention,indicate->The degree of preference of the set of location information; />Indicate->The number of location information data in the set of location information; />Indicate->Seed position information data at->Frequency of occurrence in the set of location information; />Represent the firstThe number of categories of location information data in the set of individual location information; />An exponential function based on a natural constant;a logarithmic function in 2 bases is represented.
8. The method for quickly converting a vector layout file by scanning a file according to claim 1, wherein the obtaining a compression starting point of each target sub-region according to the preference degree of the location information set and the starting data point of each location information group comprises the following specific steps:
selecting the position information set with the highest preference degree as the optimal position information set; recording all the position information groups forming the optimal position information set as compressed data groups; and taking the initial data point of each compressed data set as a compression starting point of the target subarea corresponding to each compressed data set.
9. The method for quickly converting a vector layout file according to claim 8, wherein the obtaining the compression result of the vector layout file according to the compression start point of each target sub-region comprises the following specific steps:
recording the compression starting point position of each target subarea; and carrying out Huffman coding operation on the position information data in the optimal position information set, constructing a coding tree, and compressing the position information data from the compression starting point of each target sub-region to other target data points to obtain the compression result of the vector layout file.
10. A system for fast converting a scanned document into a vector layout document, the system comprising:
the data acquisition module is used for acquiring a two-dimensional matrix of the vector layout file and target data points;
the data dividing module is used for acquiring a target subarea according to the two-dimensional matrix of the vector layout file; acquiring position information data from any target data point to another target data point in the target subregion;
the data analysis module is used for acquiring all the position information groups in each target subarea and initial data points of each position information group; acquiring a position information set according to all the position information groups in each target subarea; calculating the preference degree of the position information set;
the data selection module is used for acquiring a compression starting point of each target subarea according to the preference degree of the position information set and the starting data point of each position information group;
and the data compression module is used for acquiring the compression result of the vector layout file according to the compression starting point of each target subarea.
CN202311523381.7A 2023-11-16 2023-11-16 Method and system for rapidly converting scanned file into vector layout file Active CN117236291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311523381.7A CN117236291B (en) 2023-11-16 2023-11-16 Method and system for rapidly converting scanned file into vector layout file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311523381.7A CN117236291B (en) 2023-11-16 2023-11-16 Method and system for rapidly converting scanned file into vector layout file

Publications (2)

Publication Number Publication Date
CN117236291A true CN117236291A (en) 2023-12-15
CN117236291B CN117236291B (en) 2024-01-12

Family

ID=89097057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311523381.7A Active CN117236291B (en) 2023-11-16 2023-11-16 Method and system for rapidly converting scanned file into vector layout file

Country Status (1)

Country Link
CN (1) CN117236291B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701073A (en) * 2015-12-31 2016-06-22 北京中科江南信息技术股份有限公司 Layout file generation method and device
CN109829139A (en) * 2019-01-30 2019-05-31 中国软件与技术服务股份有限公司 The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format
CN111753500A (en) * 2020-07-07 2020-10-09 江苏中威科技软件系统有限公司 Method for merging and displaying formatted electronic form and OFD (office file format) and generating catalog
CN115346227A (en) * 2022-10-17 2022-11-15 景臣科技(南通)有限公司 Method for vectorizing electronic file based on layout file
WO2023098447A1 (en) * 2021-12-02 2023-06-08 江苏中威科技软件系统有限公司 Method for converting layout data stream file into ofd file
CN116932492A (en) * 2023-09-15 2023-10-24 北京点聚信息技术有限公司 Storage optimization method for layout file identification data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701073A (en) * 2015-12-31 2016-06-22 北京中科江南信息技术股份有限公司 Layout file generation method and device
CN109829139A (en) * 2019-01-30 2019-05-31 中国软件与技术服务股份有限公司 The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format
CN111753500A (en) * 2020-07-07 2020-10-09 江苏中威科技软件系统有限公司 Method for merging and displaying formatted electronic form and OFD (office file format) and generating catalog
WO2023098447A1 (en) * 2021-12-02 2023-06-08 江苏中威科技软件系统有限公司 Method for converting layout data stream file into ofd file
CN115346227A (en) * 2022-10-17 2022-11-15 景臣科技(南通)有限公司 Method for vectorizing electronic file based on layout file
CN116932492A (en) * 2023-09-15 2023-10-24 北京点聚信息技术有限公司 Storage optimization method for layout file identification data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冯辉;王寒冰;韩宇菲;: "版式文档在电子签章应用中的必要性探讨", 信息技术与标准化, no. 08 *

Also Published As

Publication number Publication date
CN117236291B (en) 2024-01-12

Similar Documents

Publication Publication Date Title
CN103026368B (en) Use the process identification that increment feature extracts
US8615138B2 (en) Image compression using sub-resolution images
WO2020258491A1 (en) Universal character recognition method, apparatus, computer device, and storage medium
CN102521618A (en) Extracting method for local descriptor, image searching method and image matching method
CN115801902B (en) Compression method of network access request data
CN109471853A (en) Data noise reduction, device, computer equipment and storage medium
CN117236291B (en) Method and system for rapidly converting scanned file into vector layout file
CN103999097B (en) System and method for compact descriptor for visual search
CN102595138B (en) Method, device and terminal for image compression
JP2016127475A (en) Image processing system, image processing method, and program
CN117176175B (en) Data transmission method for computer
CN115865099B (en) Huffman coding-based multi-type data segment compression method and system
CN115567609B (en) Communication method of Internet of things for boiler
CN111209451A (en) Title password code generation method, identification method, equipment terminal, server and image-text medium
CN110853063A (en) Image segmentation information processing method, device, equipment and storage medium
CN110956108B (en) Small frequency scale detection method based on characteristic pyramid
CN110866577B (en) Two-dimensional code generation and identification method, storage medium, terminal and warehousing system
CN109644030A (en) Unit norm codebook design and quantization
CN113808225B (en) Lossless coding method for image
CN111476101A (en) Video shot switching detection method and device and computer readable storage medium
CN111104871A (en) Table area recognition model generation method and device and table positioning method and device
CN112911303B (en) Image encoding method, decoding method, device, electronic device and storage medium
KR20140104789A (en) Method for generating identifier of image and apparatus for performing the same
WO2022117104A1 (en) Systems and methods for video processing
CN117134777B (en) Intelligent compression method for positioning data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant