CN110728111B - Document content messy code repairing method and device, terminal equipment and server - Google Patents

Document content messy code repairing method and device, terminal equipment and server Download PDF

Info

Publication number
CN110728111B
CN110728111B CN201810782438.8A CN201810782438A CN110728111B CN 110728111 B CN110728111 B CN 110728111B CN 201810782438 A CN201810782438 A CN 201810782438A CN 110728111 B CN110728111 B CN 110728111B
Authority
CN
China
Prior art keywords
text
document
messy code
font
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810782438.8A
Other languages
Chinese (zh)
Other versions
CN110728111A (en
Inventor
冷志峰
张作兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Kingsoft Mobile Technology Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Kingsoft Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd, Guangzhou Kingsoft Mobile Technology Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201810782438.8A priority Critical patent/CN110728111B/en
Publication of CN110728111A publication Critical patent/CN110728111A/en
Application granted granted Critical
Publication of CN110728111B publication Critical patent/CN110728111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to the field of word processing technologies, and in particular, to a method, an apparatus, a terminal device, and a server for repairing a messy code of document content. The method is applied to the terminal equipment and comprises the following steps: acquiring a document to be processed, extracting font information, loading a corresponding font library for a text in the document according to the font information, and encoding to obtain text data; identifying the text data by adopting a preset messy code identification algorithm, if the text data contain messy codes, acquiring font information corresponding to the messy code text from the document, uploading the font information and the document to a server, acquiring and storing a download path of a correct font library corresponding to the messy code text; and returning the downloading path from the server, downloading a correct font library, and repairing the messy codes of the document to be processed by using the correct font library. The invention can automatically restore the document messy codes aiming at the messy code phenomenon caused by mismatching of the font library and the content of the document.

Description

Document content messy code repairing method and device, terminal equipment and server
Technical Field
The present invention relates to the field of word processing technologies, and in particular, to a method, an apparatus, a terminal device, and a server for repairing a messy code of document content.
Background
The messy code refers to characters that the terminal device cannot correctly display, but are displayed as other meaningless characters. The text content in the document has corresponding font information, and the correct text content can be displayed only by finding a correct font library in the local terminal equipment and analyzing the text content. If the terminal device lacks the font library version corresponding to the text content, the text content is parsed by using the wrong font library, thereby causing messy codes. In the technical field of word processing of terminal equipment, document messy codes caused by mismatching of fonts often occur, and reading experience of users is affected.
The current method for processing document messy codes caused by mismatching of font libraries is to import as many font libraries into terminal equipment as possible, and then observe whether the messy codes of the document are eliminated by naked eyes of users.
However, the above document messy code repairing method has the following problems: 1) The user is required to carry out manual operation to repair the messy codes, whether the messy codes disappear or not is judged through naked eyes, and when a plurality of documents are processed, the working efficiency of the method is low; 2) Because the method can not confirm what fonts are lacking in the terminal equipment, the messy codes can only be repaired by introducing a large number of font libraries, and if the introduced font libraries are not matched with the previous text content, the document messy codes can not be repaired.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device, terminal equipment and a server for repairing messy codes of document contents, which aim at the messy code phenomenon caused by mismatching of a font library and contents of a document, and automatically repair the messy codes of the document. The specific technical scheme is as follows:
In a first aspect, an embodiment of the present invention provides a method for repairing a messy code of document content, which is applied to a terminal device, and the method includes:
Acquiring a document to be processed;
Analyzing the document to be processed;
extracting each font information from the document to be processed;
loading a corresponding font library for the text in the document to be processed according to the font information and encoding to obtain encoded text data;
Identifying the coded text data by adopting a preset messy code identification algorithm;
If the coded text data are identified to contain messy code text data, font information corresponding to the messy code text is obtained from the document to be processed;
uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a downloading path of the correct font library;
Obtaining a download path of the correct font library from the server;
Downloading the correct font library from a network according to the downloading path;
and carrying out document messy code restoration on the document to be processed by using the correct font library.
Optionally, before the step of loading the corresponding font library into the text in the document to be processed according to the font information and encoding to obtain encoded text data, the method includes:
Classifying all texts in the document to be processed according to the font information, and taking texts corresponding to the font information as target texts respectively;
determining the word number of the target text in each target text;
The step of loading the corresponding font library into the text in the document to be processed and encoding according to the font information to obtain encoded text data comprises the following steps:
loading a corresponding font library for each target text according to each font information, and coding to obtain coded text data corresponding to the target text;
the step of identifying the coded text data by adopting a preset messy code identification algorithm comprises the following steps:
Judging whether the coded text data has uncommon words or not;
If the rarely used word exists, calculating the occupancy of the rarely used word in the target text according to the word number of the rarely used word in each target text and the word number of the target text;
Judging whether the occupancy rate of the uncommon words in the target text is greater than a first preset threshold value;
and if the occupancy rate of the uncommon words in the target text is greater than a first preset threshold, recognizing that the encoded text data contains messy code text data.
Optionally, the step of determining whether the encoded text data has the uncommon word includes:
obtaining the word frequency of each target word from a pre-stored word frequency table;
and if the word frequency of the target word is lower than the word frequency threshold of the preset rarely used word, determining that the target word is the rarely used word.
Optionally, after determining that the occupancy of the uncommon word in the target text is not greater than a first preset threshold, the method further includes:
Judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
if the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value, identifying that no messy code text data exists in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
After receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data;
and executing the step of obtaining font information corresponding to the messy code text from the document to be processed.
Optionally, the step of obtaining the download path of the correct font library from the server includes:
Sending a font inquiry request to the server; the font inquiry request comprises the following steps: font information corresponding to the messy code text;
and receiving a downloading path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
Optionally, the step of obtaining the download path of the correct font library from the server includes:
and receiving the download path of the correct font library returned by the server after obtaining the download path of the correct font library.
Optionally, the step of repairing the document to be processed by using the correct font library includes:
embedding the correct font library into the document to be processed, and repairing the document messy codes.
Optionally, the step of repairing the document to be processed by using the correct font library includes:
Installing the correct font library on the terminal equipment;
and encoding the document to be processed by using the correct font library, and repairing the document messy codes.
In a second aspect, an embodiment of the present invention provides a method for repairing a messy code of document content, which is applied to a server, and the method includes:
Receiving font information corresponding to a document to be processed and a messy code text to be repaired, which are uploaded by the terminal equipment;
Determining a correct font library corresponding to the messy code text, and obtaining and storing a downloading path of the correct font library;
And providing a downloading path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
Optionally, the step of determining the correct font library corresponding to the messy code text, obtaining and storing a download path of the correct font library includes:
According to the corresponding font information, loading different font libraries in a plurality of preset font libraries for encoding each to-be-repaired messy code text one by one, and respectively obtaining encoded text data of the to-be-repaired messy code text;
Respectively identifying each coded text data obtained by using codes of different font libraries by adopting a preset messy code identification algorithm until the coded text data has no messy code;
And determining a font library loaded when the text data does not have the messy codes as a correct font library corresponding to the messy code text to be repaired, and obtaining and storing a downloading path of the correct font library.
Optionally, the step of providing the download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path to repair the document includes:
Receiving a font inquiry request sent by terminal equipment; the font inquiry request comprises the following steps: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
Optionally, the step of providing the download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path to repair the document includes:
And after obtaining the download path of the correct font library, returning the download path of the correct font library to the terminal equipment.
Optionally, the step of identifying the encoded text data by using a preset messy code identification algorithm until the encoded text data has no messy code includes:
Judging whether the coded text data has uncommon words or not;
if the rarely used word exists, calculating the occupancy rate of the rarely used word in the to-be-repaired messy code text according to the number of the rarely used word in each to-be-repaired messy code text and the number of the to-be-repaired messy code text;
Judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value;
and if the occupancy rate of the rarely used words in the messy code text to be repaired is larger than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
Optionally, the step of determining whether the encoded text data has the uncommon word includes:
obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
And if the word frequency of the messy code word to be repaired is lower than the word frequency threshold value of the preset rarely used word, determining that the messy code word to be repaired is the rarely used word.
Optionally, after determining that the occupancy of the uncommon word in the to-be-repaired messy code text is not greater than a first preset threshold, the method includes:
Judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
if the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value, identifying that no messy code text data exists in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
In a third aspect, an embodiment of the present invention provides a device for repairing a messy code of a document content, which is applied to a terminal device, and the device includes:
The document to be processed acquisition module is used for acquiring the document to be processed;
the analysis module is used for analyzing the document to be processed;
font information extraction module: extracting each font information from the document to be processed;
Text encoding module: loading a corresponding font library for the text in the document to be processed according to the font information and encoding to obtain encoded text data;
the messy code recognition module is used for recognizing the coded text data by adopting a preset messy code recognition algorithm;
The messy code text font information obtaining module is used for obtaining font information corresponding to the messy code text from the document to be processed if the messy code text data are identified to be contained in the encoded text data;
The uploading module is used for uploading font information corresponding to the document to be processed and the messy code text to a server so that the server can determine a correct font library corresponding to the messy code text and obtain a downloading path of the correct font library;
And the messy code repairing module is used for obtaining the downloading path of the correct font library from the server, downloading the correct font library from a network according to the downloading path, and repairing the messy code of the document to be processed by using the correct font library.
Optionally, the apparatus further includes: a text classification module;
the text classification module is used for classifying all texts in the document to be processed according to the font information, and taking texts corresponding to the font information as target texts respectively; determining the word number of the target text in each target text;
The text encoding module is specifically configured to:
and loading a corresponding font library for each target text according to each font information, and coding to obtain coded text data corresponding to the target text.
Optionally, the messy code identification module includes:
The uncommon word determining submodule is used for judging whether the uncommon words exist in the coded text data;
The rarely used word occupation ratio calculation sub-module is used for calculating the occupation ratio of the rarely used words in the target text according to the number of the rarely used words in each target text and the number of the target words when the rarely used words exist;
And the messy code text judging sub-module is used for judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value, and if the occupancy rate of the uncommon word in the target text is greater than the first preset threshold value, recognizing that the encoded text data contains messy code text data.
Optionally, the rarely used word determining sub-module is specifically configured to:
obtaining the word frequency of each target word from a pre-stored word frequency table;
and if the word frequency of the target word is lower than the word frequency threshold of the preset rarely used word, determining that the target word is the rarely used word.
Optionally, the messy code text judging sub-module is further configured to, after judging that the occupancy of the uncommon word in the target text is not greater than a first preset threshold, further:
Judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
if the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value, identifying that no messy code text data exists in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
After receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data;
Triggering the messy code text font information obtaining module.
Optionally, the messy code repairing module specifically obtains the download path of the correct font library by adopting the following steps:
Sending a font inquiry request to the server; the font inquiry request comprises the following steps: font information corresponding to the messy code text;
and receiving a downloading path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
Optionally, the messy code repairing module specifically obtains the download path of the correct font library by adopting the following steps:
And receiving a download path of the correct font library returned by the server after obtaining the download path of the correct font library, and downloading the correct font library from a network according to the download path.
Optionally, the messy code repairing module specifically adopts the following steps to repair the messy code of the document to be processed:
embedding the correct font library into the document to be processed, and repairing the document messy codes.
Optionally, the messy code repairing module specifically adopts the following steps to repair the messy code of the document to be processed:
Installing the correct font library on the terminal equipment;
and encoding the document to be processed by using the correct font library, and repairing the document messy codes.
In a fourth aspect, an embodiment of the present invention provides a device for repairing a messy code of document content, which is applied to a server, and the device includes:
The receiving module is used for receiving font information corresponding to the document to be processed and the messy code text to be repaired, which are uploaded by the terminal equipment;
the correct font library determining module is used for determining a correct font library corresponding to the messy code text, obtaining a downloading path of the correct font library and storing the downloading path;
And the correct font library downloading path providing module is used for providing a downloading path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
Optionally, the correct font library determining module includes:
The font library coding sub-module is used for loading different font libraries in a plurality of preset font libraries and coding each to-be-repaired messy code text one by one according to the corresponding font information to respectively obtain coded text data of the to-be-repaired messy code text;
The messy code identification sub-module is used for respectively identifying each piece of coded text data obtained by using different font library codes by adopting a preset messy code identification algorithm until the coded text data has no messy code;
And the correct font library determining submodule is used for determining the font library loaded when the text data does not have the messy codes as the correct font library corresponding to the messy code text to be repaired, and obtaining and storing the downloading path of the correct font library.
Optionally, the correct font library download path providing module is specifically configured to:
Receiving a font inquiry request sent by terminal equipment; the font inquiry request comprises the following steps: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
Optionally, the correct font library download path providing module is specifically configured to:
And after obtaining the download path of the correct font library, returning the download path of the correct font library to the terminal equipment.
Optionally, the scrambling code identification sub-module includes:
the rarely used word determining unit is used for respectively judging whether the rarely used words exist in the coded text data or not for the coded text data obtained by using different font library codes;
The rarely used word occupation ratio calculation unit is used for calculating the occupation ratio of the rarely used words in the to-be-repaired messy code text according to the number of the rarely used words in each to-be-repaired messy code text and the number of the to-be-repaired messy code words;
The messy code text judging unit judges whether the occupancy rate of the uncommon words in the messy code text to be repaired is larger than a first preset threshold value, and if the occupancy rate of the uncommon words in the messy code text to be repaired is larger than the first preset threshold value, the encoded text data is identified to contain messy code text data.
Optionally, the uncommon word determining unit is specifically configured to:
obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
And if the word frequency of the messy code word to be repaired is lower than the word frequency threshold value of the preset rarely used word, determining that the messy code word to be repaired is the rarely used word.
Optionally, the messy code text judging unit is specifically configured to:
After judging that the occupancy rate of the uncommon word in the messy code text to be repaired is not more than a first preset threshold value, judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is less than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
if the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value, identifying that no messy code text data exists in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
In a fifth aspect, embodiments of the present invention provide a terminal device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method steps of the first aspect are realized.
In a sixth aspect, embodiments of the present invention provide a server comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method steps of the second aspect are realized.
In the embodiment of the application, the terminal equipment firstly adopts a preset messy code recognition algorithm to recognize the messy code text data in the document to be processed, and then obtains font information corresponding to the messy code text from the document. And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads a corresponding correct font library, and the correct font library is used for carrying out messy code restoration on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library which is lack by the terminal equipment, and automatically repair the messy codes in the plurality of documents, thereby improving the working efficiency. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a method for repairing a messy code of document content applied to a terminal device according to an embodiment of the present invention.
Fig. 2 is a flowchart of step S105 of the embodiment shown in fig. 1, in which a preset scrambling code recognition algorithm is used to recognize the encoded text data.
Fig. 3 is a flowchart showing step S201 in the embodiment shown in fig. 2 for determining whether a rarely used word exists in the encoded text data.
Fig. 4 is another specific flowchart for identifying the encoded text data by using the preset scrambling code identification algorithm in step S105 in the embodiment shown in fig. 1.
FIG. 5 is a flowchart of a method for repairing a messy code of document content applied to a server according to an embodiment of the present invention.
FIG. 6 is a flowchart of step S502 in the embodiment of FIG. 5 to determine the correct font library and obtain the download path thereof.
Fig. 7 is a schematic structural diagram of a device for repairing a messy code of document content applied to a terminal device according to an embodiment of the present invention.
Fig. 8 is a schematic structural diagram of the scrambling code identification module 705 in the embodiment shown in fig. 7.
Fig. 9 is a schematic structural diagram of a device for repairing a messy code of document content applied to a server according to an embodiment of the present invention.
Fig. 10 is a schematic diagram of the structure of the correct font library determining module 902 in the embodiment shown in fig. 9.
Fig. 11 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Fig. 12 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. Embodiments of the present application are intended to be within the scope of the present application as defined by the appended claims.
Aiming at the messy code phenomenon caused by mismatching of a font library and content of a document, in order to automatically repair the messy code of the document, the embodiment of the invention provides a messy code repairing method, a messy code repairing device, terminal equipment and a server of the document content.
In a first aspect, an embodiment of the present invention provides a method for repairing a messy code of document content, which is applied to a terminal device, as shown in fig. 1, and may include the following steps:
s101, acquiring a document to be processed.
In implementations, the document to be processed may be a variety of documents that contain text.
For example: the document to be processed can be word processing document in doc, wps and the like, presentation document in ppt, dps and the like, table document in xls, et and the like, and portable document in pdf.
S102, analyzing the document to be processed.
S103, extracting each font information from the document to be processed.
In implementations, document parsing software may be used to open the document to be processed, from which font information for the text of the document is automatically extracted. Specifically, the font information includes font names, font information, and information such as positions in the document of respective characters in the document.
S104, loading a corresponding font library for the text in the document to be processed according to each font information, and encoding to obtain encoded text data.
In specific implementation, a corresponding font library stored in the terminal equipment can be determined according to the acquired font names, and the corresponding font library is loaded on the text with the same font information; and then according to the font information of each word in the text, finding out the corresponding code of each word in the text by using the font index in the corresponding font library, and finally obtaining the coded text data.
For example: extracting Song Ti and bold font information from the document, searching Song Ti and bold font library files from a font library folder of the terminal equipment according to Song Ti and bold names, loading Song Ti texts in the document into a Song body font library, and loading bold texts into a bold font library; and then according to font information of each word in Song Ti texts and bold texts, respectively using font indexes in a Song body and bold font library to find out corresponding codes of each word in Song Ti texts and bold texts, and finally obtaining the coded Song Ti text data and bold text data.
In detail, there are various ways of encoding text in a document to be processed, including but not limited to the following:
if the text in the document to be processed uses Unicode, obtaining encoded text data according to the implementation process of S104;
If the text in the document to be processed is a CID code (character code developed by Adobe, which is mainly applied to the document in pdf format), the CID code of the document needs to be converted into Unicode code, and then the encoded text data is obtained according to the implementation procedure of S104 described above.
S105, recognizing the coded text data by adopting a preset messy code recognition algorithm.
In a specific implementation, the messy code may be identified according to whether the text data includes the uncommon word and whether the occupancy of the uncommon word in the target text is greater than a preset threshold, please refer to the specific implementation process of steps S201 to S206 in fig. 2.
S106, if the coded text data are identified to contain the messy code text data, font information corresponding to the messy code text is obtained from the document to be processed.
In a specific implementation, there are two cases for the reason that the scrambled text appears: 1) When the terminal equipment does not have a font library corresponding to the text, loading other font libraries to encode the text characters, and generating messy codes due to loading the wrong font library; 2) When a font library corresponding to the text is arranged in the terminal equipment, but as one font library has multiple versions and the font information does not have the version information of the font library used by the document, other version font libraries of the font library can be loaded to encode the text, so that messy codes can occur due to loading of the font library of the wrong version. Font information corresponding to the scrambled text is obtained because the font information is then uploaded to the server.
For example: there are two cases for the cause of the random text in Song Ti text: 1) When the Song-style character library corresponding to Song Ti texts does not exist in the terminal equipment, loading the bold-type character library to encode text characters, wherein messy codes can occur to Song Ti texts due to loading the wrong character library; 2) When the terminal equipment has the 1.0 version of the Song-body font library corresponding to the Song-body text, the 1.2 version of the Song-body font library is loaded to encode the text characters, and the text is scrambled due to the loading of the wrong font library version Song Ti.
S107, uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a downloading path of the correct font library.
In implementations, the server has a large portion of the font library published, and basically can determine the correct font library and its download path.
S108, obtaining a download path of the correct font library from the server.
S109, downloading the correct font library from the network according to the downloading path.
In a specific implementation, the correct font library is downloaded from the network to the terminal device according to the download path of the obtained correct font library and the copyright requirement of the font library.
S110, repairing the document disorder code of the document to be processed by using the correct font library.
In practice, there are two ways to perform document scrambling code repair: the correct font library is embedded in the document, the correct font library can be installed in the terminal equipment, and the document is reloaded with the font library. Please refer to the detailed steps of step S110.
In the embodiment of the application, the terminal equipment firstly adopts a preset messy code recognition algorithm to recognize the messy code text data in the document to be processed, and then obtains font information corresponding to the messy code text from the document. And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads a corresponding correct font library, and the correct font library is used for carrying out messy code restoration on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library which is lack by the terminal equipment, and automatically repair the messy codes in the plurality of documents, thereby improving the working efficiency. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
In one embodiment, to determine whether there is a rarely used word in the text and calculate the occupation ratio of the rarely used word in the text, before step S104 in fig. 1, the following steps may be included:
Classifying all texts in the document to be processed according to each font information, and taking the texts corresponding to each font information as target texts.
In a specific implementation, all texts in the document to be processed are classified according to font names in the respective font information and serve as target texts.
The number of words of the target text in each target text is determined.
In a specific implementation, determining the number of words of the target word provides for calculating the rare word occupancy later.
For example: the text in the document is divided into Song Ti text and bold text, and the text is used as target text, and the word numbers of the target words in the Song Ti text and the bold text are respectively determined.
In one embodiment, step S104 in fig. 1, according to each font information, loads a corresponding font library for the text in the document to be processed and encodes the text, so as to obtain encoded text data, which may specifically include:
And loading a corresponding font library for each target text according to each font information, and coding to obtain coded text data corresponding to the target text.
In specific implementation, a corresponding font library stored in the terminal equipment can be determined according to the acquired font names, and the corresponding font library is loaded to the target text; and then, according to the font information of each word in the target text, finding out the corresponding code of each word in the target text by using the font index in the corresponding font library, and finally obtaining the coded text data.
In one embodiment, step S105 in fig. 1, using a preset scrambling code recognition algorithm, may be as shown in fig. 2, where the method includes:
S201, judging whether the coded text data contains uncommon words or not. If so, i.e. there is a rarely used word, step S202 is performed, if not, i.e. there is no rarely used word, and it is determined that the document has no scrambled text.
S202, calculating the occupancy of the rarely used words in the target text according to the number of words of the rarely used words in each target text and the number of words of the target text.
In a specific implementation, the calculation mode of the occupancy of the rarely used word in the target text may be: the word number of the uncommon words in the target text/the word number of the target text.
S203, judging whether the occupancy of the rarely used words in the target text is larger than a first preset threshold value. If yes, the coded text data can be identified to contain messy code text data; if not, the coded text data can be identified as not containing the messy code text data.
For example: referring to the example of step S104, after the coded Song Ti text data and the bold text data are obtained, it is determined whether there are rare words in the two text data, taking songbytes text data as an example. If the text data of Song Ti contains the uncommon word, counting the word number of the uncommon word, and calculating the occupancy of the uncommon word in the Song Ti text: the number of words of the rarely used words in Song Ti text/the number of words of Song Ti text.
Judging whether the occupancy rate of the rarely used words in Song Ti texts is larger than a first preset threshold, if so, recognizing that the Song body text data contains the disordered text data, and if not, recognizing that the Song body text data does not contain the disordered text data. The identification of bold text data is also referred to above.
If Song Ti text data has no rarely used word, judging that the text is not a messy code text; if the bold text data does not contain rarely used words, judging that the document does not contain messy code text.
In one embodiment, step S201 in fig. 2, the step of determining whether there is a rare word in the encoded text data, as shown in fig. 3, may specifically include:
s301, obtaining the word frequency of each target word from a pre-stored word frequency table.
In a specific implementation, the word frequency in the word frequency table is the frequency of the occurrence of characters in a general document, for example, 20000 characters in one document, and if the words occur 690 times in total, the word frequency is 3.45% (690/20000=3.45%).
A pre-stored word frequency table is obtained through statistics of a large number of document samples, the pre-stored word frequency table mainly comprises character names, character codes and character frequencies, the word frequency table is ordered according to the sizes of the character frequencies from high to low, and part of the contents are shown in the following table 1.
TABLE 1
S302, judging whether the word frequency of the target word is lower than a word frequency threshold of a preset rarely used word, and if so, executing a step S303; if not, step S304 is performed.
In a specific implementation, a word frequency threshold of the uncommon word is preset, and if the word frequency of the target word is lower than the word frequency threshold of the uncommon word, the word is considered to be the uncommon word.
S303, determining the target character as a rarely used word.
S304, determining that the target character is not a rare character, and continuing to judge the next target character; if all the target characters are not uncommon words, no uncommon words exist in the coded text data.
For example: and (S104) performing wiping and examination to obtain coded Song Ti text data, obtaining the word frequency of each word in the Song body text data from a pre-stored word frequency table, judging whether the word frequency of the word is lower than a word frequency threshold of a preset rarely used word, and if the word frequency is lower than the word frequency threshold of the preset rarely used word, determining that the word is the rarely used word.
If the word frequency of the text in Song Ti text data is not lower than the word frequency threshold of the preset rarely used word, determining that the text is not the rarely used word, and continuously judging the word frequency of the next text; if all the characters in the Song Ti text data are not rare words, the rare words are not contained in the Song body text data.
In another embodiment, step S105 in fig. 1, the identifying the encoded text data by using a preset scrambling code identification algorithm, as shown in fig. 4, may include:
S401, judging whether the coded text data contains uncommon words or not. If so, step S402 is performed, and if not, it is determined that the document has no scrambled text.
S402, calculating the occupancy of the rarely used words in the target text according to the number of words of the rarely used words in each target text and the number of words of the target text.
S403, judging whether the occupancy of the rarely used words in the target text is larger than a first preset threshold value. If yes, recognizing that the coded text data contains messy code text data. If not, step S404 is performed.
In specific implementations, the steps S401 to S403 may be the same as the steps S201 to S203 shown in fig. 2.
S404, judging whether the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold. If yes, recognizing that no messy code text data exists in the coded text data. If not, step S405 is performed.
S405, outputting whether the document prompt needs to be repaired or not to the user.
S406, judging whether the user inputs "yes" or "no", if yes, determining that the encoded text data contains messy code text data; if not, determining that the coded text data does not contain the messy code text data.
In the implementation, if the occupancy of the rarely used word in the target text is not greater than the first preset threshold value or not less than the second preset threshold value, the algorithm cannot judge whether the disordered text data exists, and a repair document prompt box can be popped up at this time, so that a user can select whether to repair the document. After receiving the yes input by the user, the terminal equipment determines that the coded text data contains the messy code text data.
For example: referring to the examples of steps S201 to S203, after determining that the occupancy of the uncommon word in the Song Ti text is not greater than the first preset threshold, it is further determined whether the occupancy of the uncommon word in the Song Ti text is less than a second preset threshold, where the second preset threshold is less than the first preset threshold. If the occupancy of the rarely used words in Song Ti texts is smaller than a second preset threshold, recognizing that no messy code text data exists in the coded Song Ti text data.
And if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to the user. After receiving a document repairing instruction input by a user, judging that the coded Song Ti text data contains messy code text data; at the moment, the Song-body character information corresponding to the messy code text is obtained from the document to be processed, and preparation is made for uploading the information to the cloud background. After receiving an instruction which is input by a user and does not need to repair the document, judging that the coded Song Ti text data does not contain messy code text data.
In one embodiment, step S108 in fig. 1, the step of obtaining the download path of the correct font library from the server may include:
Sending a font inquiry request to a server; the font inquiry request comprises: font information corresponding to the messy code text;
And receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
In a specific implementation, the terminal device may first send a font inquiry request to the server according to font information corresponding to the messy code text, and then receive a download path of a correct font library corresponding to the messy code text returned by the server.
In another embodiment, step S108 in fig. 1, the step of obtaining the download path of the correct font library from the server may include:
And receiving the download path of the correct font library returned by the server after obtaining the download path of the correct font library.
In the implementation, after the server obtains the download path of the correct font library, the download path of the correct font library can be directly returned to the terminal device, so that the terminal device can also directly receive the download path of the correct font library corresponding to the messy code text automatically returned by the server.
In one embodiment, step S110 in fig. 1, the step of repairing the document to be processed by using the correct font library may include:
And embedding a correct font library into the document to be processed, and repairing the document messy codes.
In the implementation, the downloaded correct font library is embedded in the document to be processed, the correct font library is loaded on the corresponding text in the document, and the messy codes are eliminated. By using the method, the correct font library can be always stored in the document, and if the document is opened in other terminal equipment without the correct font library, disorder codes can not occur.
In another embodiment, step S110 in fig. 1, the step of repairing the document to be processed by using the correct font library may include:
installing a correct font library in the terminal equipment;
And encoding the document to be processed by using the correct font library, and repairing the document messy codes.
In specific implementation, the downloaded correct font library is installed in a font library folder of the terminal equipment, the document to be processed is encoded by using the correct font library, and at the moment, the messy codes are eliminated.
By using the method, the correct font library is not stored in the document, and if the document is opened in other terminal equipment without the correct font library, messy codes still appear.
In a second aspect, an embodiment of the present invention provides a method for repairing a messy code of document content, which is applied to a server, as shown in fig. 5, and may include the following steps:
s501, receiving font information corresponding to a document to be processed and a text to be repaired, which are uploaded by a terminal device.
S502, determining a correct font library corresponding to the messy code text, and obtaining and storing a download path of the correct font library.
In specific implementation, different font libraries can be loaded to each messy code text to be repaired one by one and coded to obtain text data, then the messy code recognition algorithm is adopted to recognize the text data until the messy code does not exist, and finally the correct font library corresponding to each messy code text to be repaired is determined to obtain and store the downloading path of the text data. Reference is made to the following specific implementation procedure of steps S601 to S603 in fig. 6.
S503, providing a download path of the correct font library for the terminal equipment, so that the terminal equipment downloads the correct font library according to the download path to repair the document.
In practice, there are two ways to provide the download path described above: the server may receive the query request of the terminal device first, and then return a corresponding download path to the terminal device; the server may also directly return the corresponding download path to the terminal device. Please refer to the implementation process of step S503.
In the embodiment of the application, the terminal equipment firstly adopts a preset messy code recognition algorithm to recognize the messy code text data in the document to be processed, and then obtains font information corresponding to the messy code text from the document. And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads a corresponding correct font library, and the correct font library is used for carrying out messy code restoration on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library which is lack by the terminal equipment, and automatically repair the messy codes in the plurality of documents, thereby improving the working efficiency. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
In one embodiment, step S502 in fig. 5, the step of determining the correct font library corresponding to the messy code text, and obtaining and saving the download path of the correct font library may include, as shown in fig. 6:
S601, loading different font libraries in a plurality of preset font libraries into each to-be-repaired messy code text one by one according to the corresponding font information, and coding to obtain coded text data of the to-be-repaired messy code text respectively.
In specific implementation, referring to the specific implementation process of the previous terminal equipment step S104, classifying the to-be-repaired messy code texts in the document according to the font information corresponding to the to-be-repaired messy code texts uploaded to the server, starting from a first font library in a plurality of font libraries preset in the server, loading all font libraries one by one for the messy code texts, finding out the corresponding codes of the words in the messy code texts according to the font information of the words in the messy code texts, and finally obtaining the coded messy code text data.
For example: referring to the previous example of the terminal device S104, if it is recognized that the Song Ti text and the bold text in the document are both the scrambled text in the terminal device, the document is parsed again in the server according to the Song Ti and bold font information uploaded to the server, and the songbody scrambled text and the bold scrambled text to be repaired are classified.
Taking Song's body disorder code text as an example: starting from a first font library in a preset font library folder of a server, loading all preset font libraries into the Song body messy code text one by one, finding out the corresponding codes of each word in the Song body messy code text by using the font indexes in the loaded font libraries according to the font information of each word in the Song body messy code text, and finally obtaining the coded Song body messy code text data. The method for obtaining the bold-type messy-code text data also refers to the steps.
S602, respectively identifying each coded text data obtained by using different font library codes by adopting a preset messy code identification algorithm until the coded text data has no messy codes.
In specific implementation, firstly judging whether the encoded text data contains rarely used words, then calculating the occupation ratio of the rarely used words, then judging whether the occupation ratio is larger than a first preset threshold value, starting to identify whether the text data contains messy code text data or not, and stopping identification until the fact that the encoded original messy code text data does not contain messy codes is identified. Refer to the specific implementation procedure of step S602 thereafter.
S603, determining a font library loaded when the text data does not have the messy codes as a correct font library corresponding to the messy code text to be repaired, and obtaining and storing a downloading path of the correct font library.
In one embodiment, step S503 in fig. 5 provides a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path to perform document repair, and may include:
receiving a font inquiry request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
And returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
In a specific implementation, the server may receive a font query request sent by the terminal device, where the font query request includes font information corresponding to the messy code text, and then return a download path of a correct font library corresponding to the messy code text to the terminal device.
In another embodiment, step S503 in fig. 5 provides a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path to perform document repair, and may include:
And after obtaining the download path of the correct font library, returning the download path of the correct font library to the terminal equipment.
In the implementation, after obtaining the download path of the correct font library, the server may also directly return the download path of the correct font library corresponding to the messy code text to the terminal device.
In one embodiment, step S602 in fig. 6, a preset scrambling code recognition algorithm is used to respectively recognize the encoded text data obtained by encoding until no scrambling code exists in the encoded text data, and specifically, the following steps may be executed for each type of encoded text data:
judging whether the coded text data has uncommon words or not.
If the rarely used word exists, the occupancy of the rarely used word in the to-be-repaired messy code text is calculated according to the number of the rarely used word in each to-be-repaired messy code text and the number of the to-be-repaired messy code text.
Judging whether the occupancy rate of the rarely used words in the text of the messy codes to be repaired is larger than a first preset threshold value.
If the occupancy rate of the rarely used words in the messy code text to be repaired is larger than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
If the occupancy rate of the rarely used words in the messy code text to be repaired is not greater than a first preset threshold value, identifying that the encoded text data does not contain the messy code text data.
If no rarely used word exists in all the text data, determining that the document has no messy code text, and stopping messy code recognition.
In a specific implementation, the above steps in the server may be the same as the steps of the previous terminal devices S201 to S203.
In one embodiment, in step S602 in fig. 6, the step of determining whether there is a rare word in the encoded text data may include:
and obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table.
If the word frequency of the messy word to be repaired is lower than the word frequency threshold of the preset rarely used word, determining that the messy word to be repaired is the rarely used word.
If the word frequency of the messy code word to be repaired is not lower than the word frequency threshold of the preset rarely used word, determining that the messy code word to be repaired is not the rarely used word, and continuing to judge the next messy code word to be repaired; if all the messy code words to be repaired are not uncommon words, the encoded text data has no uncommon words.
In specific implementation, the steps are the same as those of steps S301 to S304 in fig. 3.
In another embodiment, step S602 in fig. 6 adopts a preset scrambling code recognition algorithm to respectively recognize the encoded text data obtained by encoding until the encoded text data has no scrambling code, and specifically, the following steps may be executed for each type of encoded text data:
judging whether the coded text data has uncommon words or not.
If the rarely used word exists, the occupancy of the rarely used word in the to-be-repaired messy code text is calculated according to the number of the rarely used word in each to-be-repaired messy code text and the number of the to-be-repaired messy code text.
Judging whether the occupancy rate of the rarely used words in the text of the messy codes to be repaired is larger than a first preset threshold value.
If the occupancy rate of the rarely used words in the messy code text to be repaired is larger than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
If the occupancy rate of the rarely used words in the messy code text to be repaired is not greater than a first preset threshold value, judging whether the occupancy rate of the rarely used words in the messy code text to be repaired is smaller than a second preset threshold value or not; the second preset threshold is smaller than the first preset threshold.
If the occupancy rate of the rarely used words in the messy code text to be repaired is smaller than a second preset threshold value, recognizing that the messy code text data does not exist in the encoded text data.
And if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to the user.
After receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
If no rarely used word exists in all the text data, determining that the document has no messy code text, and stopping messy code recognition.
In specific implementation, the steps are the same as those of steps S401 to S406 in fig. 4.
In a third aspect, an embodiment of the present invention provides a device for repairing a messy code of document content, which is applied to a terminal device, as shown in fig. 7, where the device includes:
A pending document acquisition module 701, configured to acquire a pending document.
The parsing module 702 is configured to parse a document to be processed.
A font information extracting module 703, configured to extract each font information from the document to be processed.
And the text coding module 704 is used for loading the corresponding font library into the text in the document to be processed according to each font information and coding the text to obtain coded text data.
The messy code recognition module 705 is configured to recognize the encoded text data by using a preset messy code recognition algorithm.
And the messy code text font information obtaining module 706 is configured to obtain font information corresponding to the messy code text from the document to be processed if the messy code text data is identified to be included in the encoded text data.
And the uploading module 707 uploads font information corresponding to the document to be processed and the messy code text to the server, so that the server determines a correct font library corresponding to the messy code text and obtains a downloading path of the correct font library.
And the messy code repairing module 708 is used for obtaining a downloading path of the correct font library from the server, downloading the correct font library from the network according to the downloading path, and repairing the messy code of the document to be processed by using the correct font library.
In one embodiment, the device for repairing a messy code further comprises: a text classification module;
The text classification module is used for classifying all texts in the document to be processed according to the font information, and taking texts corresponding to the font information as target texts respectively; the number of words of the target text in each target text is determined.
In the embodiment of the application, the terminal equipment firstly adopts a preset messy code recognition algorithm to recognize the messy code text data in the document to be processed, and then obtains font information corresponding to the messy code text from the document. And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads a corresponding correct font library, and the correct font library is used for carrying out messy code restoration on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library which is lack by the terminal equipment, and automatically repair the messy codes in the plurality of documents, thereby improving the working efficiency. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
In one embodiment, the scrambling code identification module 705 in fig. 7, as shown in fig. 8, may include:
The uncommon word determining sub-module 801 is configured to determine whether the encoded text data has uncommon words. If no rarely used word exists in all the text data, determining that the document has no messy code text.
The rarely used word occupation ratio calculating submodule 802 is used for calculating the occupation ratio of the rarely used words in the target text according to the number of the rarely used words in each target text and the number of the target words after the rarely used words are determined.
The messy code text judging sub-module 803 is configured to judge whether the occupancy rate of the rarely used word in the target text is greater than a first preset threshold value, and if the occupancy rate of the rarely used word in the target text is greater than the first preset threshold value, identify that the encoded text data includes messy code text data; if the occupancy rate of the rarely used words in the target text is not greater than a first preset threshold value, identifying that the coded text data does not contain the messy code text data.
In one embodiment, the uncommon word determination sub-module 801 of fig. 8 is specifically configured to:
and obtaining the word frequency of each target word from a pre-stored word frequency table.
And if the word frequency of the target word is lower than the word frequency threshold value of the preset rarely used word, determining that the target word is the rarely used word.
In one embodiment, the messy code text determination submodule 803 in fig. 8 is specifically configured to:
Judging whether the occupancy rate of the rarely used words in the target text is larger than a first preset threshold value, and if the occupancy rate of the rarely used words in the target text is larger than the first preset threshold value, recognizing that the encoded text data contains the disordered text data.
If the occupancy rate of the rarely used word in the target text is not greater than a first preset threshold value, judging whether the occupancy rate of the rarely used word in the target text is less than a second preset threshold value; the second preset threshold is smaller than the first preset threshold.
If the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the encoded text data.
And if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to the user.
After receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
The scrambled text font information acquisition module 706 is triggered.
In one embodiment, the garbled repair module 708 in fig. 7 specifically obtains the download path of the correct font library by:
Sending a font inquiry request to a server; the font inquiry request comprises: font information corresponding to the messy code text.
And receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
In another embodiment, the garbled repair module 708 in fig. 7 specifically obtains the download path of the correct font library by:
And the receiving server obtains the download path of the correct font library and then returns the download path of the correct font library, and downloads the correct font library from the network according to the download path.
In one embodiment, the messy code repair module 708 in FIG. 7 specifically performs document messy code repair on the document to be processed by:
And embedding a correct font library into the document to be processed, and repairing the document messy codes.
In another embodiment, the messy code repair module 708 in FIG. 7 specifically performs document messy code repair on the document to be processed by:
installing a correct font library in the terminal equipment;
and encoding the document to be processed by using the correct font library, and repairing the document messy codes.
In a fourth aspect, an embodiment of the present invention provides a device for repairing a messy code of document content, which is applied to a server, as shown in fig. 9, where the device includes:
The receiving module 901 is configured to receive font information corresponding to a document to be processed and a messy code text to be repaired, which are uploaded by a terminal device.
The correct font library determining module 902 is configured to determine a correct font library corresponding to the messy code text, and obtain and store a download path of the correct font library.
The correct font library download path providing module 903 is configured to provide a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path to repair the document.
In the embodiment of the application, the terminal equipment firstly adopts a preset messy code recognition algorithm to recognize the messy code text data in the document to be processed, and then obtains font information corresponding to the messy code text from the document. And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads a corresponding correct font library, and the correct font library is used for carrying out messy code restoration on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library which is lack by the terminal equipment, and automatically repair the messy codes in the plurality of documents, thereby improving the working efficiency. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
In one embodiment, the correct font library determination module 902 of fig. 9, as shown in fig. 10, comprises:
The font library encoding submodule 1001 is configured to load different font libraries in a preset plurality of font libraries and encode each to-be-repaired messy code text one by one according to corresponding font information, so as to obtain encoded text data of the to-be-repaired messy code text;
The messy code recognition sub-module 1002 is configured to respectively recognize each piece of encoded text data obtained by encoding using different font libraries by using a preset messy code recognition algorithm until the encoded text data has no messy code.
The correct font library determining submodule 1003 is configured to determine a font library loaded when the text data does not have a messy code as a correct font library corresponding to the messy code text to be repaired, and obtain and store a download path of the correct font library.
In one embodiment, the correct font library download path providing module 903 in fig. 9 is specifically configured to:
receiving a font inquiry request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
And returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
In another embodiment, the correct font library download path providing module 903 in fig. 9 is specifically configured to:
And after obtaining the download path of the correct font library, returning the download path of the correct font library to the terminal equipment.
In one embodiment, the scrambling code identification submodule 1002 of FIG. 10 includes:
And the rarely used word determining unit is used for respectively judging whether the rarely used words exist in the encoded text data obtained by encoding the encoded text data by using different font libraries.
In the implementation, if all text data have no rarely used word, determining that the document has no messy code text, and stopping messy code recognition.
The rarely used word occupation ratio calculation unit is used for calculating the occupation ratio of the rarely used words in the to-be-repaired messy code text according to the number of the rarely used words in each to-be-repaired messy code text and the number of the to-be-repaired messy code text after the rarely used words are determined.
The messy code text judging unit judges whether the occupancy rate of the rarely used word in the messy code text to be repaired is larger than a first preset threshold value, and if the occupancy rate of the rarely used word in the messy code text to be repaired is larger than the first preset threshold value, the encoded text data is identified to contain the messy code text data.
In the implementation, if the occupancy rate of the rarely used word in the messy code text to be repaired is not greater than a first preset threshold value, identifying that the encoded text data does not contain the messy code text data.
In one embodiment, the rare word determining unit in the nonce recognition submodule 1002 in fig. 10 is specifically configured to:
and obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table.
If the word frequency of the messy word to be repaired is lower than the word frequency threshold of the preset rarely used word, determining that the messy word to be repaired is the rarely used word.
In one embodiment, the messy code text judging unit in the messy code recognition submodule 1002 in fig. 10 is specifically configured to:
Judging whether the occupancy rate of the rarely used words in the text of the messy codes to be repaired is larger than a first preset threshold value.
If the occupancy rate of the rarely used words in the messy code text to be repaired is larger than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
If the occupancy rate of the rarely used words in the messy code text to be repaired is not greater than a first preset threshold value, judging whether the occupancy rate of the rarely used words in the messy code text to be repaired is smaller than a second preset threshold value or not; the second preset threshold is smaller than the first preset threshold.
If the occupancy rate of the rarely used words in the messy code text to be repaired is smaller than a second preset threshold value, recognizing that the messy code text data does not exist in the encoded text data.
And if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to the user.
After receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
In a fifth aspect, an embodiment of the present invention provides a terminal device, as shown in fig. 11, including a processor 1101, a communication interface 1102, a memory 1103 and a communication bus 1104, where the processor 1101, the communication interface 1102 and the memory 1103 complete communication with each other through the communication bus 1104.
A memory 1103 for storing a computer program.
The processor 1101 is configured to execute a program stored on the memory 1103, so that the node device performs the following steps:
and acquiring a document to be processed.
And analyzing the document to be processed.
Each font information is extracted from the document to be processed.
And loading a corresponding font library for the text in the document to be processed according to each font information, and encoding to obtain encoded text data.
And identifying the coded text data by adopting a preset messy code identification algorithm.
If the coded text data is identified to contain the messy code text data, font information corresponding to the messy code text is obtained from the document to be processed.
And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a downloading path of the correct font library.
A download path for the correct font library is obtained from the server.
And downloading the correct font library from the network according to the downloading path.
And carrying out document messy code repairing on the document to be processed by using the correct font library.
In the embodiment of the application, the terminal equipment firstly adopts a preset messy code recognition algorithm to recognize the messy code text data in the document to be processed, and then obtains font information corresponding to the messy code text from the document. And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads a corresponding correct font library, and the correct font library is used for carrying out messy code restoration on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library which is lack by the terminal equipment, and automatically repair the messy codes in the plurality of documents, thereby improving the working efficiency. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
The machine-readable storage medium may include RAM (Random Access Memory ) or may include NVM (Non-Volatile Memory), such as at least one disk Memory. Additionally, the machine-readable storage medium may be at least one storage device located remotely from the processor.
The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but may also be a DSP (DIGITAL SIGNAL Processing), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field-Programmable gate array) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
In a sixth aspect, an embodiment of the present invention provides a server, as shown in fig. 12, including a processor 1201, a communication interface 1202, a memory 1203, and a communication bus 1204, where the processor 1201, the communication interface 1202, and the memory 1203 perform communication with each other through the communication bus 1204.
A memory 1203 for storing a computer program.
A processor 1201, configured to execute a program stored in the memory 1203, to cause the node apparatus to perform the steps of:
and receiving font information corresponding to the document to be processed and the messy code text to be repaired, which are uploaded by the terminal equipment.
And determining a correct font library corresponding to the messy code text, and obtaining and storing a downloading path of the correct font library.
And providing a download path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the download path to repair the document.
In the embodiment of the application, the terminal equipment firstly adopts a preset messy code recognition algorithm to recognize the messy code text data in the document to be processed, and then obtains font information corresponding to the messy code text from the document. And uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads a corresponding correct font library, and the correct font library is used for carrying out messy code restoration on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library which is lack by the terminal equipment, and automatically repair the messy codes in the plurality of documents, thereby improving the working efficiency. Of course, it is not necessary for any one product or method of practicing the application to achieve all of the advantages set forth above at the same time.
The machine-readable storage medium may include RAM (Random Access Memory ) or may include NVM (Non-Volatile Memory), such as at least one disk Memory. Additionally, the machine-readable storage medium may be at least one storage device located remotely from the processor.
The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but may also be a DSP (DIGITAL SIGNAL Processing), ASIC (Application SPECIFIC INTEGRATED Circuit), FPGA (Field-Programmable gate array) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, terminal device and server embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (24)

1. A method for repairing a messy code of document content, which is applied to a terminal device, the method comprising:
Acquiring a document to be processed;
Analyzing the document to be processed;
extracting each font information from the document to be processed;
loading a corresponding font library for the text in the document to be processed according to the font information and encoding to obtain encoded text data;
Identifying the coded text data by adopting a preset messy code identification algorithm;
If the coded text data contains a messy code text, font information corresponding to the messy code text is obtained from the document to be processed;
uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a downloading path of the correct font library; wherein, the correct font library is: identifying each piece of coded text data until a font library loaded when the coded text data does not have messy codes exists; each coded text data is: according to the corresponding font information, loading different font libraries in a plurality of preset font libraries one by one for each messy code text, and coding;
Obtaining a download path of the correct font library from the server;
Downloading the correct font library from a network according to the downloading path;
Using the correct font library to carry out document messy code restoration on the document to be processed;
The step of obtaining the download path of the correct font library from the server includes:
Sending a font inquiry request to the server; the font inquiry request comprises the following steps: font information corresponding to the messy code text;
and receiving a downloading path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
2. The method according to claim 1, wherein before the step of loading and encoding a corresponding font library for text in a document to be processed based on the respective font information to obtain encoded text data, the method comprises:
Classifying all texts in the document to be processed according to the font information, and taking texts corresponding to the font information as target texts respectively;
determining the word number of the target text in each target text;
The step of loading the corresponding font library into the text in the document to be processed and encoding according to the font information to obtain encoded text data comprises the following steps:
loading a corresponding font library for each target text according to each font information, and coding to obtain coded text data corresponding to the target text;
the step of identifying the coded text data by adopting a preset messy code identification algorithm comprises the following steps:
Judging whether the coded text data has uncommon words or not;
If the rarely used word exists, calculating the occupancy of the rarely used word in the target text according to the word number of the rarely used word in each target text and the word number of the target text;
Judging whether the occupancy rate of the uncommon words in the target text is greater than a first preset threshold value;
and if the occupancy rate of the uncommon words in the target text is greater than a first preset threshold, recognizing that the encoded text data contains messy code text data.
3. The method of claim 2, wherein the step of determining whether the encoded text data has a rarely used word comprises:
obtaining the word frequency of each target word from a pre-stored word frequency table;
and if the word frequency of the target word is lower than the word frequency threshold of the preset rarely used word, determining that the target word is the rarely used word.
4. The method of claim 2, further comprising, after determining that the occupancy of the uncommon word in the target text is not greater than a first preset threshold:
Judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
if the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value, identifying that no messy code text data exists in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
After receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data;
and executing the step of obtaining font information corresponding to the messy code text from the document to be processed.
5. The method of claim 1, wherein the step of obtaining the download path of the correct font library from the server comprises:
and receiving the download path of the correct font library returned by the server after obtaining the download path of the correct font library.
6. The method of claim 1, wherein the step of repairing the document to be processed for a document scrambling code using the correct font library comprises:
embedding the correct font library into the document to be processed, and repairing the document messy codes.
7. The method of claim 1, wherein the step of repairing the document to be processed for a document scrambling code using the correct font library comprises:
Installing the correct font library on the terminal equipment;
and encoding the document to be processed by using the correct font library, and repairing the document messy codes.
8. A method for repairing a messy code of document contents, which is applied to a server, the method comprising:
receiving font information corresponding to a document to be processed and a messy code text to be repaired, which are uploaded by a terminal device;
According to the corresponding font information, loading different font libraries in a plurality of preset font libraries for encoding each to-be-repaired messy code text one by one, and respectively obtaining encoded text data of the to-be-repaired messy code text;
Respectively identifying each piece of coded text data obtained by using codes of different font libraries by adopting a preset messy code identification algorithm until the coded text data has no messy code;
Determining a font library loaded when the text data does not have messy codes as a correct font library corresponding to the messy code text to be repaired, and obtaining and storing a downloading path of the correct font library;
Providing a downloading path of the correct font library for terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair a document;
The step of providing the downloading path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the downloading path to repair the document comprises the following steps:
After obtaining the download path of the correct font library, returning the download path of the correct font library to the terminal equipment;
The step of providing the downloading path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the downloading path to repair the document comprises the following steps:
Receiving a font inquiry request sent by terminal equipment; the font inquiry request comprises the following steps: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
9. The method of claim 8, wherein the step of identifying the encoded text data using a predetermined scrambling code identification algorithm until the encoded text data is free of scrambling codes comprises:
Judging whether the coded text data has uncommon words or not;
if the rarely used word exists, calculating the occupancy rate of the rarely used word in the to-be-repaired messy code text according to the number of the rarely used word in each to-be-repaired messy code text and the number of the to-be-repaired messy code text;
Judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value;
and if the occupancy rate of the rarely used words in the messy code text to be repaired is larger than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
10. The method of claim 9, wherein the step of determining whether the encoded text data has a rarely used word comprises:
obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
And if the word frequency of the messy code word to be repaired is lower than the word frequency threshold value of the preset rarely used word, determining that the messy code word to be repaired is the rarely used word.
11. The method of claim 9, wherein after determining that the occupancy of the uncommon word in the messy code text to be repaired is not greater than a first preset threshold, comprising:
Judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
If the occupancy rate of the rarely used words in the messy code text to be repaired is smaller than a second preset threshold value, identifying that the messy code text data does not exist in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
12. A document content scrambling code repairing apparatus, applied to a terminal device, comprising:
The document to be processed acquisition module is used for acquiring the document to be processed;
the analysis module is used for analyzing the document to be processed;
a font information extracting module for extracting each font information from the document to be processed;
The text coding module is used for loading a corresponding font library for the text in the document to be processed according to the font information and coding the text to obtain coded text data;
the messy code recognition module is used for recognizing the coded text data by adopting a preset messy code recognition algorithm;
The messy code text font information obtaining module is used for obtaining font information corresponding to the messy code text from the document to be processed if the messy code text is identified to be contained in the encoded text data;
The uploading module is used for uploading the font information corresponding to the document to be processed and the messy code text to the server so that the server can determine a correct font library corresponding to the messy code text and obtain a downloading path of the correct font library; wherein, the correct font library is: identifying each piece of coded text data until a font library loaded when the coded text data does not have messy codes exists; each coded text data is: according to the corresponding font information, loading different font libraries in a plurality of preset font libraries one by one for each messy code text, and coding;
The messy code repairing module is used for obtaining a downloading path of the correct font library from the server, downloading the correct font library from a network according to the downloading path, and repairing the messy code of the document to be processed by using the correct font library;
The messy code repairing module specifically adopts the following steps to obtain the downloading path of the correct font library:
Sending a font inquiry request to the server; the font inquiry request comprises the following steps: font information corresponding to the messy code text;
and receiving a downloading path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
13. The apparatus of claim 12, wherein the apparatus further comprises: a text classification module;
the text classification module is used for classifying all texts in the document to be processed according to the font information, and taking texts corresponding to the font information as target texts respectively; determining the word number of the target text in each target text;
The text encoding module is specifically configured to:
loading a corresponding font library for each target text according to each font information, and coding to obtain coded text data corresponding to the target text;
The messy code identification module comprises:
The uncommon word determining submodule is used for judging whether the uncommon words exist in the coded text data;
The rarely used word occupation ratio calculation sub-module is used for calculating the occupation ratio of the rarely used words in the target text according to the number of the rarely used words in each target text and the number of the target words when the rarely used words exist;
And the messy code text judging sub-module is used for judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value, and if the occupancy rate of the uncommon word in the target text is greater than the first preset threshold value, recognizing that the encoded text data contains messy code text data.
14. The apparatus of claim 13, wherein the uncommon word determination submodule is specifically configured to:
obtaining the word frequency of each target word from a pre-stored word frequency table;
and if the word frequency of the target word is lower than the word frequency threshold of the preset rarely used word, determining that the target word is the rarely used word.
15. The apparatus of claim 13, wherein the scrambled text determination submodule is further configured to, after determining that the occupancy of the uncommon word in the target text is not greater than a first preset threshold:
Judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
if the occupancy rate of the rarely used words in the target text is smaller than a second preset threshold value, identifying that no messy code text data exists in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
After receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data;
Triggering the messy code text font information obtaining module.
16. The apparatus of claim 12, wherein the messy code repair module obtains the download path of the correct font library by:
And receiving a download path of the correct font library returned by the server after obtaining the download path of the correct font library, and downloading the correct font library from a network according to the download path.
17. The apparatus of claim 12, wherein the messy code repairing module specifically performs document messy code repairing on the document to be processed by:
embedding the correct font library into the document to be processed, and repairing the document messy codes.
18. The apparatus of claim 12, wherein the messy code repairing module specifically performs document messy code repairing on the document to be processed by:
Installing the correct font library on the terminal equipment;
and encoding the document to be processed by using the correct font library, and repairing the document messy codes.
19. A document content scrambling code repair device, for use with a server, the device comprising:
The receiving module is used for receiving font information corresponding to the document to be processed and the messy code text to be repaired, which are uploaded by the terminal equipment;
the correct font library determining module is used for determining a correct font library corresponding to the messy code text to be repaired, obtaining a downloading path of the correct font library and storing the downloading path;
the correct font library downloading path providing module is used for providing a downloading path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the downloading path to repair the document;
the correct font library determining module includes:
The font library coding sub-module is used for loading different font libraries in a plurality of preset font libraries and coding each to-be-repaired messy code text one by one according to the corresponding font information to respectively obtain coded text data of the to-be-repaired messy code text;
The messy code identification sub-module is used for respectively identifying each piece of coded text data obtained by using different font library codes by adopting a preset messy code identification algorithm until the coded text data has no messy code;
The correct font library determining submodule is used for determining a font library loaded when the text data does not have a messy code as a correct font library corresponding to the messy code text to be repaired, and obtaining and storing a downloading path of the correct font library;
the correct font library download path providing module is specifically configured to:
After obtaining the download path of the correct font library, returning the download path of the correct font library to the terminal equipment;
the correct font library download path providing module is specifically configured to:
Receiving a font inquiry request sent by terminal equipment; the font inquiry request comprises the following steps: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
20. The apparatus of claim 19, wherein the nonce identification sub-module comprises:
the rarely used word determining unit is used for respectively judging whether rarely used words exist in the coded text data obtained by using different font library codes;
The rarely used word occupation ratio calculation unit is used for calculating the occupation ratio of the rarely used words in the to-be-repaired messy code text according to the number of the rarely used words in each to-be-repaired messy code text and the number of the to-be-repaired messy code words after the rarely used words are determined;
The messy code text judging unit judges whether the occupancy rate of the uncommon words in the messy code text to be repaired is larger than a first preset threshold value, and if the occupancy rate of the uncommon words in the messy code text to be repaired is larger than the first preset threshold value, the encoded text data is identified to contain messy code text data.
21. The device according to claim 20, wherein the uncommon word determining unit is specifically configured to:
obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
And if the word frequency of the messy code word to be repaired is lower than the word frequency threshold value of the preset rarely used word, determining that the messy code word to be repaired is the rarely used word.
22. The apparatus of claim 20, wherein the messy code text judging unit is specifically configured to:
After judging that the occupancy rate of the uncommon word in the messy code text to be repaired is not more than a first preset threshold value, judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is less than a second preset threshold value; the second preset threshold value is smaller than the first preset threshold value;
If the occupancy rate of the rarely used words in the messy code text to be repaired is smaller than a second preset threshold value, identifying that the messy code text data does not exist in the encoded text data;
if the document is not smaller than the second preset threshold, outputting a prompt whether the document needs to be repaired or not to a user;
after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
23. A terminal device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: method steps of any of claims 1-7 are carried out.
24. A server comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: method steps for carrying out any one of claims 8-11.
CN201810782438.8A 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server Active CN110728111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810782438.8A CN110728111B (en) 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810782438.8A CN110728111B (en) 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server

Publications (2)

Publication Number Publication Date
CN110728111A CN110728111A (en) 2020-01-24
CN110728111B true CN110728111B (en) 2024-06-25

Family

ID=69217009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810782438.8A Active CN110728111B (en) 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server

Country Status (1)

Country Link
CN (1) CN110728111B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113850050B (en) * 2020-06-28 2022-09-23 荣耀终端有限公司 Character display method, character display device and terminal equipment
CN113051235A (en) * 2021-04-22 2021-06-29 平安普惠企业管理有限公司 Document loading method and device, terminal and storage medium
CN113743051A (en) * 2021-08-10 2021-12-03 广州坚和网络科技有限公司 Font setting method, user terminal, server and system
CN114218318B (en) * 2022-02-21 2022-05-17 国网山东省电力公司乳山市供电公司 Data processing system and method for electric power big data
CN115086423B (en) * 2022-05-18 2024-06-18 深圳市科陆电子科技股份有限公司 Data transmission method, data transmission device, computer device, and storage medium
CN115622997B (en) * 2022-10-27 2023-07-04 广东保伦电子股份有限公司 Method, device and storage medium for sharing host font library

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
CN104424165A (en) * 2013-09-06 2015-03-18 北大方正集团有限公司 Messy code detection method and system for text documents
CN106598923A (en) * 2016-12-26 2017-04-26 北京致远互联软件股份有限公司 Online document format conversion method and apparatus based on font object library loading

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4363533B2 (en) * 2007-01-31 2009-11-11 インターナショナル・ビジネス・マシーンズ・コーポレーション Apparatus, method, and program for detecting garbled characters
CN103150293B (en) * 2011-12-06 2017-06-06 富泰华工业(深圳)有限公司 The method that the electronic installation of mess code recovery can be carried out and recover mess code
CN103425257B (en) * 2012-05-24 2017-03-15 北京搜狗科技发展有限公司 A kind of reminding method of uncommon character information and device
CN104732228B (en) * 2015-04-16 2018-03-30 同方知网数字出版技术股份有限公司 A kind of detection of PDF document mess code, the method for correction
CN106845159A (en) * 2015-12-03 2017-06-13 福建福昕软件开发股份有限公司 A kind of PDF texts mess code method
CN106874263A (en) * 2017-01-17 2017-06-20 中译语通科技(北京)有限公司 A kind of Sino-British corpus proofreading method based on multi-dimensional data analysis and semanteme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
CN104424165A (en) * 2013-09-06 2015-03-18 北大方正集团有限公司 Messy code detection method and system for text documents
CN106598923A (en) * 2016-12-26 2017-04-26 北京致远互联软件股份有限公司 Online document format conversion method and apparatus based on font object library loading

Also Published As

Publication number Publication date
CN110728111A (en) 2020-01-24

Similar Documents

Publication Publication Date Title
CN110728111B (en) Document content messy code repairing method and device, terminal equipment and server
CN110795258B (en) Font library matching method, device and equipment
CN110020424B (en) Contract information extraction method and device and text information extraction method
CN107122342B (en) Text code recognition method and device
US20170364861A1 (en) Method and apparatus for processing logistics information
CN111859919A (en) Text error correction model training method and device, electronic equipment and storage medium
CN107085568B (en) Text similarity distinguishing method and device
CN111831920A (en) User demand analysis method and device, computer equipment and storage medium
AU2012201539B2 (en) Systems and methods for processing documents of unknown or unspecified format
CN111339166A (en) Word stock-based matching recommendation method, electronic device and storage medium
CN110019640B (en) Secret-related file checking method and device
CN111273891A (en) Business decision method and device based on rule engine and terminal equipment
CN108052686B (en) Abstract extraction method and related equipment
CN108804487A (en) A kind of method and device of extraction target character
CN106651972B (en) Binary image coding and decoding methods and devices
US9658989B2 (en) Apparatus and method for extracting and manipulating the reading order of text to prepare a display document for analysis
CN114743012B (en) Text recognition method and device
CN110728115B (en) Document content messy code identification method and device and electronic equipment
CN114065762A (en) Text information processing method, device, medium and equipment
CN110191124B (en) Web front-end development data-based website identification method and device and storage equipment
CN109661779A (en) Method and system for compressed data
CN117827111A (en) Data compression method and related device
JP2018195272A (en) Information extraction device
CN113627129B (en) Text copying method and device, electronic equipment and readable storage medium
CN111695327B (en) Method and device for repairing messy codes, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant