CN110728111A - Messy code repairing method and device for document content, terminal equipment and server - Google Patents

Messy code repairing method and device for document content, terminal equipment and server Download PDF

Info

Publication number
CN110728111A
CN110728111A CN201810782438.8A CN201810782438A CN110728111A CN 110728111 A CN110728111 A CN 110728111A CN 201810782438 A CN201810782438 A CN 201810782438A CN 110728111 A CN110728111 A CN 110728111A
Authority
CN
China
Prior art keywords
text
document
messy code
font
font library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810782438.8A
Other languages
Chinese (zh)
Other versions
CN110728111B (en
Inventor
冷志峰
张作兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Jinshan Mobile Technology Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Jinshan Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd, Guangzhou Jinshan Mobile Technology Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201810782438.8A priority Critical patent/CN110728111B/en
Publication of CN110728111A publication Critical patent/CN110728111A/en
Application granted granted Critical
Publication of CN110728111B publication Critical patent/CN110728111B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention relates to the technical field of word processing, in particular to a method and a device for repairing messy codes of document contents, terminal equipment and a server. The method is applied to the terminal equipment, and comprises the following steps: acquiring a document to be processed, extracting font information, loading a corresponding font library for a text in the document according to the font information, and coding to obtain text data; adopting a preset messy code recognition algorithm to recognize the text data, if the text data is recognized to contain messy codes, obtaining font information corresponding to the messy code text from the document, uploading the font information and the document to a server, obtaining a download path of a correct font library corresponding to the messy code text and storing the download path; and returning the downloading path from the server, downloading the correct font library, and performing messy code repair on the document to be processed by using the correct font library. By adopting the method and the device, the messy code of the document can be automatically repaired aiming at the messy code phenomenon caused by the mismatching of the font library and the content of the document.

Description

Messy code repairing method and device for document content, terminal equipment and server
Technical Field
The invention relates to the technical field of word processing, in particular to a method and a device for repairing messy codes of document contents, terminal equipment and a server.
Background
The messy code refers to characters which cannot be correctly displayed by the terminal equipment and are displayed as other meaningless characters. The text content in the document has corresponding font information, and the correct text content can be displayed only by finding a correct font library in the local terminal equipment to analyze the text content. If the terminal device lacks the font library version corresponding to the text content, the text content is analyzed by the wrong font library, thereby causing messy codes. In the technical field of word processing of terminal equipment, the phenomenon of messy codes of documents caused by mismatching of fonts often occurs, and the reading experience of a user is influenced.
At present, a method for processing document messy codes caused by mismatching of font libraries is to introduce as many font libraries as possible into a terminal device and then observe whether the messy codes of the document are eliminated or not through naked eyes of a user.
However, the above method for repairing the document messy codes has the following problems: 1) the messy codes are repaired by manual operation of a user, whether the messy codes disappear or not is judged by naked eyes, and the working efficiency of the method is low when a plurality of documents are processed; 2) because the method cannot confirm what font is lacked by the terminal equipment, the messy codes can be repaired only by importing a large number of font libraries, and if the imported font libraries do not match with the previous text content, the document messy codes cannot be repaired.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device, a terminal device and a server for repairing messy codes of document contents, aiming at the messy code phenomenon caused by mismatching of font libraries and contents of documents and automatically repairing the messy codes of the documents. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a method for repairing a scrambled code of a document content, where the method is applied to a terminal device, and the method includes:
acquiring a document to be processed;
analyzing the document to be processed;
extracting each font information from the document to be processed;
loading a corresponding font library to the text in the document to be processed according to the font information, and coding the font library to obtain coded text data;
recognizing the coded text data by adopting a preset messy code recognition algorithm;
if the coded text data is identified to contain messy code text data, font information corresponding to the messy code text is obtained from the document to be processed;
uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a download path of the correct font library;
obtaining a download path of the correct font library from the server;
downloading the correct font library from the network according to the downloading path;
and using the correct font library to repair the document to be processed in messy codes.
Optionally, before the step of loading a corresponding font library to the text in the document to be processed according to the font information, and encoding the font library to obtain encoded text data, the method includes:
classifying all texts in the document to be processed according to the font information, and respectively taking the texts corresponding to the font information as target texts;
determining the word number of the target characters in each target text;
the step of loading a corresponding font library to the text in the document to be processed according to the font information, coding the font library and obtaining coded text data comprises the following steps:
loading a corresponding font library for each target text according to the font information, and coding to obtain coded text data corresponding to the target text;
the step of adopting a preset messy code recognition algorithm to recognize the coded text data comprises the following steps:
judging whether the coded text data contains uncommon words or not;
if the rare words exist, calculating the occupancy rate of the rare words in the target text according to the word number of the rare words and the word number of the target words in each target text;
judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value;
and if the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value, recognizing that the encoded text data contains messy code text data.
Optionally, the step of determining whether a rarely-used word exists in the encoded text data includes:
acquiring the word frequency of each target character from a pre-stored word frequency table;
and if the word frequency of the target word is lower than a preset word frequency threshold value of the uncommon word, determining that the target word is the uncommon word.
Optionally, after determining that the occupancy rate of the uncommon word in the target text is not greater than a first preset threshold, the method further includes:
judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
after a document instruction which needs to be repaired and is input by a user is received, determining that the coded text data contains messy code text data;
and executing the step of obtaining font information corresponding to the messy code text from the document to be processed.
Optionally, the step of obtaining a download path of the correct font library from the server includes:
sending a font query request to the server; the font inquiry request comprises: font information corresponding to the messy code text;
and receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
Optionally, the step of obtaining a download path of the correct font library from the server includes:
and receiving the download path of the correct font library returned by the server after the download path of the correct font library is obtained.
Optionally, the step of performing document messy code restoration on the document to be processed by using the correct font library includes:
and embedding the correct font library into the document to be processed, and performing messy code repair on the document.
Optionally, the step of performing document messy code restoration on the document to be processed by using the correct font library includes:
installing the correct font library on the terminal equipment;
and coding the document to be processed by using the correct font library, and repairing the messy code of the document.
In a second aspect, an embodiment of the present invention provides a method for repairing a scrambled code of a document content, which is applied to a server, and the method includes:
receiving font information corresponding to the document to be processed and the messy code text to be repaired, which are uploaded by the terminal equipment;
determining a correct font library corresponding to the messy code text, and obtaining and storing a download path of the correct font library;
and providing a download path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the download path to repair the document.
Optionally, the step of determining a correct font library corresponding to the scrambled text, and obtaining and storing a download path of the correct font library includes:
loading different font libraries in a plurality of preset font libraries for each messy code text to be repaired one by one according to the corresponding font information, and coding to obtain coded text data of the messy code text to be repaired respectively;
respectively identifying each coded text data obtained by using different font library codes by adopting a preset messy code identification algorithm until the coded text data has no messy codes;
and determining the font library loaded when the messy codes do not exist in the text data as a correct font library corresponding to the messy code text to be repaired, and obtaining and storing a downloading path of the correct font library.
Optionally, the step of providing a download path of the correct font library for the terminal device so that the terminal device downloads the correct font library according to the download path to perform document repairing includes:
receiving a font query request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
Optionally, the step of providing a download path of the correct font library for the terminal device so that the terminal device downloads the correct font library according to the download path to perform document repairing includes:
and after the downloading path of the correct font library is obtained, returning the downloading path of the correct font library to the terminal equipment.
Optionally, the step of identifying the encoded text data by using a preset messy code identification algorithm until the encoded text data has no messy code includes:
judging whether the coded text data contains uncommon words or not;
if the rarely-used words exist, calculating the occupancy rate of the rarely-used words in the messy code text to be repaired according to the word number of the rarely-used words in each messy code text to be repaired and the word number of the messy code text to be repaired;
judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value;
and if the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value, recognizing that the encoded text data contains messy code text data.
Optionally, the step of determining whether a rarely-used word exists in the encoded text data includes:
acquiring the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
and if the word frequency of the messy code words to be repaired is lower than a preset word frequency threshold value of the uncommon words, determining that the messy code words to be repaired are the uncommon words.
Optionally, after determining that the occupancy rate of the uncommon word in the garbled text to be repaired is not greater than a first preset threshold, the method includes:
judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
and after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
In a third aspect, an embodiment of the present invention provides a device for repairing a scrambled code of a document content, where the device is applied to a terminal device, and the device includes:
the document to be processed acquisition module is used for acquiring a document to be processed;
the analysis module is used for analyzing the document to be processed;
the font information extraction module: extracting each font information from the document to be processed;
a text encoding module: loading a corresponding font library to the text in the document to be processed according to the font information, and coding the font library to obtain coded text data;
the messy code identification module is used for identifying the coded text data by adopting a preset messy code identification algorithm;
a garbled text font information obtaining module, configured to obtain font information corresponding to the garbled text from the to-be-processed document if it is identified that the encoded text data includes garbled text data;
the uploading module uploads the document to be processed and font information corresponding to the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and obtains a downloading path of the correct font library;
and the messy code repairing module is used for obtaining the downloading path of the correct font library from the server, downloading the correct font library from a network according to the downloading path, and performing document messy code repairing on the document to be processed by using the correct font library.
Optionally, the apparatus further comprises: a text classification module;
the text classification module is used for classifying all texts in the document to be processed according to the font information, and respectively taking the texts corresponding to the font information as target texts; determining the word number of the target characters in each target text;
the text encoding module is specifically configured to:
and loading a corresponding font library for each target text according to the font information, and coding to obtain coded text data corresponding to the target text.
Optionally, the messy code identification module includes:
the rarely-used word determining submodule is used for judging whether rarely-used words exist in the coded text data;
the uncommon word occupancy rate calculation sub-module is used for calculating the occupancy rate of the uncommon words in the target text according to the word number of the uncommon words and the word number of the target words in each target text when the uncommon words exist;
and the messy code text judgment sub-module is used for judging whether the occupancy rate of the rarely-used word in the target text is greater than a first preset threshold value or not, and if the occupancy rate of the rarely-used word in the target text is greater than the first preset threshold value, recognizing that the coded text data contains messy code text data.
Optionally, the uncommon word determination sub-module is specifically configured to:
acquiring the word frequency of each target character from a pre-stored word frequency table;
and if the word frequency of the target word is lower than a preset word frequency threshold value of the uncommon word, determining that the target word is the uncommon word.
Optionally, the messy code text judgment sub-module is further configured to, after judging that the occupancy rate of the uncommon word in the target text is not greater than a first preset threshold:
judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
after a document instruction which needs to be repaired and is input by a user is received, determining that the coded text data contains messy code text data;
and triggering the messy code text font information acquisition module.
Optionally, the messy code recovery module obtains the download path of the correct font library by specifically adopting the following steps:
sending a font query request to the server; the font inquiry request comprises: font information corresponding to the messy code text;
and receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
Optionally, the messy code recovery module obtains the download path of the correct font library by specifically adopting the following steps:
and receiving a download path of the correct font library returned by the server after the download path of the correct font library is obtained, and downloading the correct font library from the network according to the download path.
Optionally, the messy code restoration module performs document messy code restoration on the document to be processed by specifically adopting the following steps:
and embedding the correct font library into the document to be processed, and performing messy code repair on the document.
Optionally, the messy code restoration module performs document messy code restoration on the document to be processed by specifically adopting the following steps:
installing the correct font library on the terminal equipment;
and coding the document to be processed by using the correct font library, and repairing the messy code of the document.
In a fourth aspect, an embodiment of the present invention provides a device for repairing a scrambled code of a document content, which is applied to a server, and the device includes:
the receiving module is used for receiving the font information corresponding to the document to be processed and the messy code text to be repaired, which are uploaded by the terminal equipment;
a correct font library determining module, configured to determine a correct font library corresponding to the scrambled text, and obtain and store a download path of the correct font library;
and the correct font library downloading path providing module is used for providing a downloading path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
Optionally, the correct font library determining module includes:
the font library coding submodule is used for loading and coding different font libraries in a plurality of preset font libraries for each messy code text to be repaired one by one according to the corresponding font information, and respectively obtaining the coded text data of the messy code text to be repaired;
the messy code identification submodule is used for respectively identifying each coded text data obtained by coding different font libraries by adopting a preset messy code identification algorithm until the coded text data has no messy code;
and the correct font library determining submodule is used for determining the font library loaded when the messy codes do not exist in the text data as the correct font library corresponding to the messy code text to be repaired, and acquiring and storing a downloading path of the correct font library.
Optionally, the correct font library download path providing module is specifically configured to:
receiving a font query request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
Optionally, the correct font library download path providing module is specifically configured to:
and after the downloading path of the correct font library is obtained, returning the downloading path of the correct font library to the terminal equipment.
Optionally, the messy code identification submodule includes:
a rarely-used word determining unit, configured to determine whether a rarely-used word exists in the encoded text data, for the encoded text data obtained by using the different font library codes;
the rarely-used word occupancy calculation unit is used for calculating the occupancy rate of rarely-used words in the messy code text to be repaired according to the word number of rarely-used words in each messy code text to be repaired and the word number of messy code words to be repaired;
and the messy code text judgment unit is used for judging whether the occupancy rate of the rarely-used word in the messy code text to be repaired is greater than a first preset threshold value or not, and if the occupancy rate of the rarely-used word in the messy code text to be repaired is greater than the first preset threshold value, recognizing that the coded text data contains messy code text data.
Optionally, the uncommon word determination unit is specifically configured to:
acquiring the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
and if the word frequency of the messy code words to be repaired is lower than a preset word frequency threshold value of the uncommon words, determining that the messy code words to be repaired are the uncommon words.
Optionally, the messy code text determination unit is specifically configured to:
after judging that the occupancy rate of the uncommon word in the messy code text to be repaired is not greater than a first preset threshold value, judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
and after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
In a fifth aspect, an embodiment of the present invention provides a terminal device, including a processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method steps of the first aspect are implemented.
In a sixth aspect, an embodiment of the present invention provides a server, including a processor and a machine-readable storage medium, the machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: the method steps according to the second aspect are implemented.
In the embodiment of the invention, the terminal equipment firstly adopts a preset messy code identification algorithm to identify the messy code text data in the document to be processed, and then obtains the font information corresponding to the messy code text from the document. And then uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads the corresponding correct font library, and the correct font library is used for carrying out messy code repair on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library lacking in the terminal equipment and automatically repair the messy codes in the plurality of documents, and the working efficiency is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a method for repairing a scrambled code of a document content applied to a terminal device according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating that the step S105 adopts a preset scrambling code recognition algorithm to recognize the encoded text data in the embodiment shown in fig. 1.
Fig. 3 is a specific flowchart illustrating the step S201 of determining whether a uncommon word exists in the encoded text data in the embodiment shown in fig. 2.
Fig. 4 is another specific flowchart illustrating that the preset scrambling code recognition algorithm is adopted in step S105 to recognize the encoded text data in the embodiment shown in fig. 1.
Fig. 5 is a flowchart of a method for repairing scrambled code of document content applied to a server according to an embodiment of the present invention.
Fig. 6 is a flowchart of the step S502 of determining the correct font library and obtaining the download path thereof in the embodiment shown in fig. 5.
Fig. 7 is a schematic structural diagram of a device for repairing a scrambled code of a document content, which is applied to a terminal device according to an embodiment of the present invention.
Fig. 8 is a schematic structural diagram of the scrambling code identification module 705 in the embodiment shown in fig. 7.
Fig. 9 is a schematic structural diagram of a device for repairing scrambled document contents applied to a server according to an embodiment of the present invention.
Fig. 10 is a schematic structural diagram of the correct font library determining module 902 in the embodiment shown in fig. 9.
Fig. 11 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.
Fig. 12 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. The embodiments of the present application, and all other embodiments that can be obtained by a person of ordinary skill in the art without any inventive work, belong to the scope of protection of the present application.
Aiming at the messy code phenomenon caused by mismatching of a font library and content of a document, the embodiment of the invention provides a messy code repairing method and device of document content, terminal equipment and a server in order to automatically repair the messy code of the document.
In a first aspect, an embodiment of the present invention provides a method for repairing a scrambled code of a document content, which is applied to a terminal device, and as shown in fig. 1, the method may include the following steps:
s101, obtaining a document to be processed.
In particular implementations, the documents to be processed may be a variety of text-containing documents.
For example: the document to be processed can be a word processing document in doc, wps and other formats, a presentation document in ppt, dps and other formats, a table document in xls, et and other formats, and a portable document in pdf format.
S102, analyzing the document to be processed.
S103, extracting each font information from the document to be processed.
In a specific implementation, the document to be processed can be opened by using the document parsing software, and font information of document characters is automatically extracted from the document to be processed. Specifically, the font information includes the font name, font style information, and position in the document of each character in the document.
And S104, loading a corresponding font library to the text in the document to be processed according to each font information, and coding to obtain coded text data.
In specific implementation, a corresponding font library stored in the terminal device can be determined according to the obtained font name, and the corresponding font library is loaded to the text with the same font information; and then, according to the font information of each character in the text, finding the corresponding code of each character in the text by using the font index in the corresponding font library, and finally obtaining the coded text data.
For example: extracting font information of the Song body and the black body from the document, searching font library files of the Song body and the black body from a font library folder of the terminal equipment according to names of the Song body and the black body, loading the Song body text in the document into a Song body font library, and loading the black body text into a black body font library; and then respectively using font indexes in the Song body font library and the black body font library according to font information of each character in the Song body text and the black body text to find corresponding codes of each character in the Song body text and the black body text, and finally obtaining coded Song body text data and black body text data.
In detail, there are various ways to encode the text in the document to be processed, including but not limited to the following ways:
if the text in the document to be processed uses a Unicode code (Unicode), obtaining coded text data according to the implementation process of the S104;
if the text in the document to be processed uses a CID code (character code developed by Adobe, mainly applied to pdf format documents), the CID code of the document needs to be converted into a Unicode code, and then the coded text data is obtained according to the implementation process of S104.
And S105, recognizing the coded text data by adopting a preset messy code recognition algorithm.
In the specific implementation, the messy code may be identified according to whether the text data includes the uncommon word and whether the occupancy rate of the uncommon word in the target text is greater than a preset threshold, specifically refer to the following specific implementation process of steps S201 to S206 in fig. 2.
S106, if the encoded text data contains the messy code text data, font information corresponding to the messy code text is obtained from the document to be processed.
In the specific implementation, there are two cases of the reason for the occurrence of the scrambled text: 1) when the terminal equipment does not have a font library corresponding to the text, other font libraries are loaded to code the text characters, and messy codes appear due to the loading of the wrong font library; 2) when a font library corresponding to a text exists in a terminal device, but one font library has multiple versions, and the font information does not have the version information of the font library used by a document, other versions of the font library may be loaded to encode text characters at this time, so that messy codes occur due to the loading of the font library of an incorrect version. The font information corresponding to the messy code text is obtained because the font information needs to be uploaded to the server later.
For example: the reason why the messy code text appears in the Song style text is as follows: 1) when no Song body font library corresponding to the Song body text exists in the terminal equipment, the black body font library is loaded to encode text characters, and messy codes appear in the Song body text due to the loading of the wrong font library; 2) when the version 1.0 of the Song body font library corresponding to the Song body text exists in the terminal equipment, the version 1.2 of the Song body font library is loaded to encode text characters, and messy codes appear in the Song body text due to the loading of the wrong font library version.
And S107, uploading font information corresponding to the document to be processed and the messy code text to a server, so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a download path of the correct font library.
In a specific implementation, the server has most of the published font libraries, and can basically determine the correct font library and its download path.
And S108, obtaining a downloading path of the correct font library from the server.
And S109, downloading the correct font library from the network according to the downloading path.
In the specific implementation, the correct font library is downloaded from the network to the terminal device according to the download path of the obtained correct font library and the copyright requirement of the font library.
And S110, performing document messy code repair on the document to be processed by using the correct font library.
In implementation, there are two ways to perform the repairing of the document messy codes: and embedding a correct font library in the document, and installing the correct font library in the terminal equipment, and then loading the font library into the document. Please refer to the detailed steps of step S110.
In the embodiment of the invention, the terminal equipment firstly adopts a preset messy code identification algorithm to identify the messy code text data in the document to be processed, and then obtains the font information corresponding to the messy code text from the document. And then uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads the corresponding correct font library, and the correct font library is used for carrying out messy code repair on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library lacking in the terminal equipment and automatically repair the messy codes in the plurality of documents, and the working efficiency is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
In one embodiment, in order to determine whether there is a uncommon word in a text and calculate the occupancy rate of the uncommon word in the text, before step S104 in fig. 1, the following steps may be included:
and classifying all texts in the document to be processed according to each font information, and respectively taking the texts corresponding to each font information as target texts.
In specific implementation, all texts in the document to be processed are classified according to the font name in each font information and are used as target texts.
The number of words of the target word in each target text is determined.
In one embodiment, the word count of the target word is determined to provide for a later calculation of rare word occupancy.
For example: and dividing the text in the document into a Song body text and a black body text, taking the Song body text and the black body text as target texts, and respectively determining the word number of the target characters in the Song body text and the black body text.
In an embodiment, in step S104 in fig. 1, the step of loading a corresponding font library to the text in the document to be processed according to each font information, and encoding the font library to obtain encoded text data may specifically include:
and loading a corresponding font library for each target text according to each font information, and coding to obtain coded text data corresponding to the target text.
In specific implementation, a corresponding font library stored in the terminal device can be determined according to the obtained font name, and the corresponding font library is loaded to the target text; and then, according to the font information of each character in the target text, finding the corresponding code of each character in the target text by using the font index in the corresponding font library, and finally obtaining the coded text data.
In an embodiment, in step S105 in fig. 1, a preset scrambling code recognition algorithm is adopted to recognize the encoded text data, which may be as shown in fig. 2, and includes:
s201, judging whether the encoded text data contains uncommon words. If so, the uncommon word exists, step S202 is executed, and if not, the uncommon word does not exist, it is determined that the document has no scrambled text.
S202, calculating the occupancy rate of the uncommon words in the target text according to the word number of the uncommon words and the word number of the target words in each target text.
In a specific implementation, the occupancy rate of uncommon words in the target text can be calculated as follows: word count of uncommon words/word count of target words in the target text.
S203, judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value. If so, recognizing that the encoded text data contains messy code text data; if not, the encoded text data can be identified not to contain the messy code text data.
For example: referring to the example of step S104, after the coded song body text data and the black body text data are obtained, it is respectively determined whether there are uncommon words in the two text data, taking the song body text data as an example. If the rare words exist in the Song body text data, counting the words of the rare words, and calculating the occupancy rate of the rare words in the Song body text: the number of rarely used words in the Song text/the number of words in the Song text.
Judging whether the occupancy rate of the uncommon words in the Song body text is greater than a first preset threshold, if so, identifying that the Song body text data contains disordered code text data, and if not, identifying that the Song body text data does not contain disordered code text data. The identification of bold text data is also referred to above.
If the uncommon word is not contained in the Song body text data, judging that the text is not a messy code text; and if the uncommon word is not in the black body text data, judging that the document has no messy code text.
In an embodiment, step S201 in fig. 2, the step of determining whether there is a uncommon word in the encoded text data, as shown in fig. 3, may specifically include:
s301, the word frequency of each target word is obtained from a pre-stored word frequency table.
In a specific implementation, the word frequency in the word frequency table is the frequency of occurrence of words in a general document, for example, if there are 20000 words in a document, and the "word with" appears 690 times in total, the word frequency is 3.45% (690/20000 ═ 3.45%).
A pre-stored word frequency table is obtained through statistics of a large number of document samples, the pre-stored word frequency table mainly comprises character names, character codes and character frequencies, the pre-stored word frequency table is sorted according to the character frequencies from high to low, and part of contents are shown in the following table 1.
TABLE 1
S302, judging whether the word frequency of the target character is lower than a preset word frequency threshold of the uncommon character, and if so, executing the step S303; if not, step S304 is performed.
In the specific implementation, a word frequency threshold of the uncommon word is generally preset, and if the word frequency of the target word is lower than the word frequency threshold of the uncommon word, the word is considered to be the uncommon word.
S303, determining that the target character is a uncommon character.
S304, determining that the target character is not a uncommon character, and continuously judging the next target character; if all the target characters are determined not to be uncommon characters, the rarely-used characters are absent in the encoded text data.
For example: and (S104) erasing the example of the step S104 to obtain the coded Song body text data, obtaining the word frequency of each word in the Song body text data from a pre-stored word frequency table, judging whether the word frequency of the word is lower than a word frequency threshold value of a preset rarely-used word, and if the word frequency is lower than the word frequency threshold value of the preset rarely-used word, determining that the word is the rarely-used word.
If the word frequency of the characters in the Song body text data is not lower than the preset word frequency threshold of the uncommon word, determining that the characters are not the uncommon word, and continuously judging the word frequency of the next character; and if all the characters in the Song body text data are determined not to be uncommon characters, determining that the uncommon characters do not exist in the Song body text data.
In another embodiment, in step S105 in fig. 1, a preset scrambling code recognition algorithm is adopted to recognize the encoded text data, as shown in fig. 4, the method may include:
s401, judging whether the encoded text data contains uncommon words. If so, step S402 is performed, and if not, it is determined that the document has no scrambled text.
S402, calculating the occupancy rate of the uncommon words in the target text according to the word number of the uncommon words and the word number of the target words in each target text.
And S403, judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value. If so, recognizing that the encoded text data contains the scrambled text data. If not, step S404 is performed.
In a specific implementation, the steps S401 to S403 may be the same as the steps S201 to S203 shown in fig. 2.
S404, judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold. If so, recognizing that the scrambled text data does not exist in the coded text data. If not, step S405 is performed.
S405, outputting a prompt whether the document needs to be repaired to the user.
S406, judging whether the user inputs 'yes' or 'no', and if the user inputs 'yes', determining that the coded text data contains messy code text data; if not, determining that the encoded text data does not contain the scrambled text data.
In the specific implementation, if the occupancy rate of the uncommon word in the target text is not greater than a first preset threshold value and not less than a second preset threshold value, the algorithm cannot judge whether the text data with messy codes exists, at this time, a prompt box for repairing the document can be popped up, and the user selects whether the document is repaired. And after the terminal equipment receives the 'yes' input by the user, determining that the coded text data contains the scrambled text data.
For example: referring to the example of steps S201 to S203, after determining that the occupancy rate of the uncommon word in the song body text is not greater than the first preset threshold, determining whether the occupancy rate of the uncommon word in the song body text is less than a second preset threshold, where the second preset threshold is less than the first preset threshold. And if the occupancy rate of the uncommon words in the Song body text is smaller than a second preset threshold value, recognizing that no scrambled code text data exists in the encoded Song body text data.
And if the current time is not less than the second preset threshold, outputting a prompt of whether the document needs to be repaired to the user. After receiving a document instruction which needs to be repaired and is input by a user, judging that the coded Song body text data contains messy code text data; and obtaining Song style font information corresponding to the messy code text from the document to be processed, and preparing for uploading the information to the cloud background. And after receiving a document command which is input by a user and does not need to be repaired, judging that the coded Song body text data does not contain messy code text data.
In an embodiment, the step S108 in fig. 1, obtaining the download path of the correct font library from the server may include:
sending a font query request to a server; the font inquiry request comprises: font information corresponding to the messy code text;
and receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
In specific implementation, the terminal device may send a font query request to the server according to the font information corresponding to the scrambled text, and then receive a download path of a correct font library corresponding to the scrambled text returned by the server.
In another embodiment, in step S108 in fig. 1, the step of obtaining the download path of the correct font library from the server may include:
and receiving the download path of the correct font library returned by the server after the download path of the correct font library is obtained.
In specific implementation, after the server obtains the download path of the correct font library, the server can directly return the download path of the correct font library to the terminal device, so that the terminal device can also directly receive the download path of the correct font library corresponding to the messy code text automatically returned by the server.
In an embodiment, in step S110 in fig. 1, the step of performing document scrambling code recovery on the document to be processed by using the correct font library may include:
and embedding a correct font library into the document to be processed, and repairing the messy code of the document.
In the specific implementation, the downloaded correct font library is embedded in the document to be processed, the correct font library is loaded into the corresponding text in the document, and at the moment, the messy codes are eliminated. By using the method, the correct font library can be always stored in the document, and messy codes can not appear if the document is opened in other terminal equipment without the correct font library.
In another embodiment, in step S110 in fig. 1, the step of performing document scrambling code recovery on the document to be processed by using the correct font library may include:
installing a correct font library on the terminal equipment;
and coding the document to be processed by using the correct font library, and repairing the disordered code of the document.
In specific implementation, the downloaded correct font library is installed in a font library folder of the terminal device, and the document to be processed is encoded by using the correct font library, and at the moment, messy codes are eliminated.
By using the method, the correct font library can not be stored in the document, and the messy codes can still occur if the document is opened in other terminal equipment without the correct font library.
In a second aspect, an embodiment of the present invention provides a method for repairing a scrambled code of a document content, which is applied to a server, and as shown in fig. 5, the method may include the following steps:
s501, receiving font information corresponding to the document to be processed and the messy code text to be repaired uploaded by the terminal equipment.
S502, determining a correct font library corresponding to the messy code text, and obtaining and storing a download path of the correct font library.
In specific implementation, different font libraries can be loaded and encoded one by one for each messy code text to be repaired to obtain text data, then a messy code identification algorithm is adopted to identify the text data until no messy code exists, and finally a correct font library corresponding to each messy code text to be repaired is determined, and a download path is obtained and stored. Refer to the following detailed implementation of steps S601 to S603 in fig. 6.
S503, providing a downloading path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the downloading path to repair the document.
In the implementation, there are two ways to provide the download path: the server can firstly receive the query request of the terminal equipment and then return a corresponding download path to the terminal equipment; the server can also directly return the corresponding download path to the terminal equipment. Please refer to the following step S503.
In the embodiment of the invention, the terminal equipment firstly adopts a preset messy code identification algorithm to identify the messy code text data in the document to be processed, and then obtains the font information corresponding to the messy code text from the document. And then uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads the corresponding correct font library, and the correct font library is used for carrying out messy code repair on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library lacking in the terminal equipment and automatically repair the messy codes in the plurality of documents, and the working efficiency is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
In an embodiment, step S502 in fig. 5 is a step of determining a correct font library corresponding to the scrambled text, and obtaining and storing a download path of the correct font library, as shown in fig. 6, and may include:
s601, loading different font libraries in a plurality of preset font libraries for each messy code text to be repaired one by one according to the corresponding font information, and coding to obtain the coded text data of the messy code text to be repaired respectively.
In specific implementation, referring to the specific implementation process of the previous terminal device in step S104, the garbled text to be repaired in the document is classified according to the font information corresponding to the garbled text to be repaired uploaded to the server, all font libraries are loaded to the garbled text one by one from the first font library in the multiple font libraries preset in the server, the corresponding code of each character in the garbled text is found by using the font index in the loaded font library according to the font information of each character in the garbled text, and finally the coded garbled text data is obtained.
For example: referring to the previous example of the terminal device S104, if it is recognized in the terminal device that both the song body text and the black body text in the document are scrambled texts, the document is parsed again in the server according to the font information of the song body and the black body uploaded to the server, and the song body scrambled text to be repaired and the black body scrambled text are classified.
Take the song style scrambled text as an example: loading all preset font libraries to the Song style scrambled text one by one from a first font library in a font library folder preset by a server, finding corresponding codes of all characters in the Song style scrambled text by using font indexes in the loaded font libraries according to font information of all characters in the Song style scrambled text, and finally obtaining coded Song style scrambled text data. The above steps are also referred to for the manner of obtaining the bold scrambled text data.
And S602, respectively identifying each coded text data obtained by coding different font libraries by adopting a preset messy code identification algorithm until the coded text data has no messy codes.
In the specific implementation, whether the rarely-used word exists in the coded text data is judged, the occupation rate of the rarely-used word is calculated, whether the rarely-used word is larger than a first preset threshold value is judged, whether the text data contains messy code text data is identified, and identification is stopped until the original messy code text data after being coded is identified to have no messy code. Refer to the detailed implementation process of the following step S602.
S603, determining the font library loaded when the messy codes do not exist in the text data as a correct font library corresponding to the messy code text to be repaired, and obtaining and storing a downloading path of the correct font library.
In an embodiment, step S503 in fig. 5 provides a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library for document repairing according to the download path, which may include:
receiving a font query request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
and returning a downloading path of the correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
In a specific implementation, the server may receive a font query request sent by the terminal device, where the font query request includes font information corresponding to the scrambled text, and then return a download path of a correct font library corresponding to the scrambled text to the terminal device.
In another embodiment, step S503 in fig. 5 provides a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path for document repairing, which may include:
and after the downloading path of the correct font library is obtained, returning the downloading path of the correct font library to the terminal equipment.
In specific implementation, after obtaining the download path of the correct font library, the server may also directly and automatically return the download path of the correct font library corresponding to the garbled text to the terminal device.
In an embodiment, step S602 in fig. 6 respectively identifies the encoded text data obtained by encoding by using a preset messy code identification algorithm until the encoded text data has no messy code, and specifically, for each encoded text data, the following steps are performed:
and judging whether the rarely-used words exist in the coded text data.
And if the rarely-used words exist, calculating the occupancy rate of the rarely-used words in the messy code text to be repaired according to the word number of the rarely-used words in each messy code text to be repaired and the word number of the messy code text to be repaired.
And judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value.
And if the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
And if the occupancy rate of the uncommon word in the messy code text to be repaired is not greater than a first preset threshold value, recognizing that the encoded text data does not contain the messy code text data.
And if the uncommon word is not found in all the text data, determining that the document has no messy code text, and stopping messy code recognition.
In a specific implementation, the above steps in the server may be the same as the steps of the previous terminal devices S201 to S203.
In an embodiment, the step of determining whether there is a uncommon word in the encoded text data in step S602 in fig. 6 may include:
and obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table.
And if the word frequency of the messy code words to be repaired is lower than a preset word frequency threshold of the uncommon words, determining that the messy code words to be repaired are the uncommon words.
If the word frequency of the messy code words to be repaired is not lower than the preset word frequency threshold of the rarely-used words, determining that the messy code words to be repaired are not the rarely-used words, and continuously judging the next messy code words to be repaired; if all the messy code characters to be repaired are determined not to be uncommon words, the rarely-used words are not in the coded text data.
In the specific implementation, the steps are the same as the implementation processes of the steps S301 to S304 in fig. 3.
In another embodiment, step S602 in fig. 6 respectively identifies the encoded text data obtained by encoding by using a preset messy code identification algorithm until the encoded text data has no messy code, and specifically, for each encoded text data, the following steps may be performed:
and judging whether the rarely-used words exist in the coded text data.
And if the rarely-used words exist, calculating the occupancy rate of the rarely-used words in the messy code text to be repaired according to the word number of the rarely-used words in each messy code text to be repaired and the word number of the messy code text to be repaired.
And judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value.
And if the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
If the occupancy rate of the uncommon word in the messy code text to be repaired is not greater than a first preset threshold value, judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold.
And if the occupancy rate of the uncommon word in the messy code text to be repaired is less than a second preset threshold value, recognizing that no messy code text data exists in the coded text data.
And if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user.
After a document instruction which needs to be repaired and is input by a user is received, the text data after being coded is determined to contain messy code text data.
And if the uncommon word is not found in all the text data, determining that the document has no messy code text, and stopping messy code recognition.
In the specific implementation, the steps are the same as the implementation processes of the steps S401 to S406 in fig. 4.
In a third aspect, an embodiment of the present invention provides a device for repairing a scrambled code of a document content, where the device is applied to a terminal device, and as shown in fig. 7, the device includes:
a to-be-processed document obtaining module 701, configured to obtain a to-be-processed document.
And the analysis module 702 is configured to analyze the document to be processed.
The font information extracting module 703 is configured to extract each font information from the document to be processed.
And the text encoding module 704 is configured to load a corresponding font library for encoding the text in the document to be processed according to each font information, and obtain encoded text data.
The messy code recognition module 705 is configured to recognize the encoded text data by using a preset messy code recognition algorithm.
A garbled text font information obtaining module 706, configured to obtain font information corresponding to the garbled text from the to-be-processed document if it is identified that the encoded text data includes the garbled text data.
The uploading module 707 uploads the font information corresponding to the document to be processed and the scrambled text to the server, so that the server determines a correct font library corresponding to the scrambled text and obtains a downloading path of the correct font library.
The messy code restoration module 708 is configured to obtain a download path of the correct font library from the server, download the correct font library from the network according to the download path, and perform document messy code restoration on the document to be processed by using the correct font library.
In one embodiment, the apparatus for repairing a scrambled code further comprises: a text classification module;
the text classification module is used for classifying all texts in the document to be processed according to the font information, and respectively taking the texts corresponding to the font information as target texts; the number of words of the target word in each target text is determined.
In the embodiment of the invention, the terminal equipment firstly adopts a preset messy code identification algorithm to identify the messy code text data in the document to be processed, and then obtains the font information corresponding to the messy code text from the document. And then uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads the corresponding correct font library, and the correct font library is used for carrying out messy code repair on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library lacking in the terminal equipment and automatically repair the messy codes in the plurality of documents, and the working efficiency is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
In one embodiment, the scrambling code identification module 705 in fig. 7, as shown in fig. 8, may include:
the uncommon word determination sub-module 801 is used for judging whether the rarely used words exist in the encoded text data. And if all the text data have no uncommon words, determining that the document has no messy code text.
And the uncommon word occupancy calculation sub-module 802 is used for calculating the occupancy of the uncommon word in the target text according to the word number of the uncommon word and the word number of the target word in each target text after the uncommon word is determined to be present.
The messy code text judgment sub-module 803 is used for judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value, and if the occupancy rate of the uncommon word in the target text is greater than the first preset threshold value, recognizing that the encoded text data contains messy code text data; and if the occupancy rate of the uncommon word in the target text is not greater than a first preset threshold value, recognizing that the encoded text data does not contain messy code text data.
In one embodiment, the uncommon word determination sub-module 801 in fig. 8 is specifically configured to:
and obtaining the word frequency of each target character from a pre-stored word frequency table.
And if the word frequency of the target word is lower than a preset word frequency threshold of the uncommon word, determining that the target word is the uncommon word.
In an embodiment, the garbled text determination sub-module 803 in fig. 8 is specifically configured to:
and judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value, and if so, identifying that the encoded text data contains messy code text data.
If the occupancy rate of the uncommon word in the target text is not greater than a first preset threshold value, judging whether the occupancy rate of the uncommon word in the target text is less than a second preset threshold value; the second preset threshold is smaller than the first preset threshold.
And if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data.
And if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user.
After a document instruction which needs to be repaired and is input by a user is received, the text data after being coded is determined to contain messy code text data.
Triggers the garbled text font information acquisition module 706.
In an embodiment, the messy code recovery module 708 in fig. 7 specifically obtains the download path of the correct font library by the following steps:
sending a font query request to a server; the font inquiry request comprises: font information corresponding to the messy code text.
And receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
In another embodiment, the messy code recovery module 708 in fig. 7 specifically obtains the download path of the correct font library by the following steps:
and receiving the download path of the correct font library returned by the server after obtaining the download path of the correct font library, and downloading the correct font library from the network according to the download path.
In an embodiment, the messy code recovery module 708 in fig. 7 performs the document messy code recovery on the document to be processed by specifically adopting the following steps:
and embedding a correct font library into the document to be processed, and repairing the messy code of the document.
In another embodiment, the messy code recovery module 708 in fig. 7 performs the document messy code recovery on the document to be processed by specifically adopting the following steps:
installing a correct font library on the terminal equipment;
and coding the document to be processed by using the correct font library, and repairing the messy code of the document.
In a fourth aspect, an embodiment of the present invention provides a device for repairing scrambled document content, which is applied to a server, and as shown in fig. 9, the device includes:
the receiving module 901 is configured to receive font information corresponding to the document to be processed and the messy code text to be repaired, which are uploaded by the terminal device.
The correct font library determining module 902 is configured to determine a correct font library corresponding to the scrambled text, and obtain and store a download path of the correct font library.
The correct font library download path providing module 903 is configured to provide a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path to perform document repairing.
In the embodiment of the invention, the terminal equipment firstly adopts a preset messy code identification algorithm to identify the messy code text data in the document to be processed, and then obtains the font information corresponding to the messy code text from the document. And then uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads the corresponding correct font library, and the correct font library is used for carrying out messy code repair on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library lacking in the terminal equipment and automatically repair the messy codes in the plurality of documents, and the working efficiency is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
In one embodiment, the correct font library determining module 902 of fig. 9, as shown in fig. 10, comprises:
the font library coding submodule 1001 is configured to load and code different font libraries in multiple preset font libraries for each to-be-repaired messy code text one by one according to corresponding font information, and obtain encoded text data of the to-be-repaired messy code text;
the messy code identification submodule 1002 is configured to respectively identify each encoded text data obtained by using different font library codes by using a preset messy code identification algorithm until the encoded text data does not have messy codes.
The correct font library determining submodule 1003 is configured to determine the font library loaded when the messy codes do not exist in the text data as the correct font library corresponding to the messy code text to be repaired, and obtain and store a download path of the correct font library.
In an embodiment, the correct font library download path providing module 903 in fig. 9 is specifically configured to:
receiving a font query request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
and returning a downloading path of the correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
In another embodiment, the correct font library download path providing module 903 in fig. 9 is specifically configured to:
and after the downloading path of the correct font library is obtained, returning the downloading path of the correct font library to the terminal equipment.
In one embodiment, the garbled code identification sub-module 1002 in fig. 10 comprises:
and the uncommon word determining unit is used for respectively judging whether the rarely used words exist in the coded text data obtained by using different font library codes.
In specific implementation, if all text data have no uncommon words, determining that the document has no messy code text, and stopping messy code recognition.
And the rarely-used word occupancy calculation unit is used for calculating the occupancy rate of the rarely-used words in the messy code text to be repaired according to the word number of the rarely-used words in each messy code text to be repaired and the word number of the messy code text to be repaired after the rarely-used words are determined to exist.
And the messy code text judging unit is used for judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value or not, and if the occupancy rate of the uncommon word in the messy code text to be repaired is greater than the first preset threshold value, recognizing that the encoded text data contains messy code text data.
In the specific implementation, if the occupancy rate of the uncommon word in the messy code text to be repaired is not greater than a first preset threshold value, it is recognized that the encoded text data does not contain the messy code text data.
In an embodiment, the rarely-used word determining unit in the messy code identification sub-module 1002 in fig. 10 is specifically configured to:
and obtaining the word frequency of each messy code word to be repaired from a pre-stored word frequency table.
And if the word frequency of the messy code words to be repaired is lower than a preset word frequency threshold of the uncommon words, determining that the messy code words to be repaired are the uncommon words.
In an embodiment, the garbled text determining unit in the garbled identification sub-module 1002 in fig. 10 is specifically configured to:
and judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value.
And if the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value, recognizing that the encoded text data contains the messy code text data.
If the occupancy rate of the uncommon word in the messy code text to be repaired is not greater than a first preset threshold value, judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold.
And if the occupancy rate of the uncommon word in the messy code text to be repaired is less than a second preset threshold value, recognizing that no messy code text data exists in the coded text data.
And if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user.
After a document instruction which needs to be repaired and is input by a user is received, the text data after being coded is determined to contain messy code text data.
In a fifth aspect, an embodiment of the present invention provides a terminal device, as shown in fig. 11, including a processor 1101, a communication interface 1102, a memory 1103, and a communication bus 1104, where the processor 1101, the communication interface 1102, and the memory 1103 complete communication with each other through the communication bus 1104.
A memory 1103 for storing a computer program.
A processor 1101, configured to, when executing the program stored in the memory 1103, cause the node apparatus to perform the following steps:
and acquiring a document to be processed.
And analyzing the document to be processed.
And extracting each font information from the document to be processed.
And loading a corresponding font library to the text in the document to be processed according to each font information, and coding to obtain coded text data.
And recognizing the coded text data by adopting a preset messy code recognition algorithm.
And if the encoded text data is identified to contain the messy code text data, obtaining font information corresponding to the messy code text from the document to be processed.
And uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a download path of the correct font library.
And obtaining a download path of the correct font library from the server.
And downloading the correct font library from the network according to the downloading path.
And performing document messy code repair on the document to be processed by using the correct font library.
In the embodiment of the invention, the terminal equipment firstly adopts a preset messy code identification algorithm to identify the messy code text data in the document to be processed, and then obtains the font information corresponding to the messy code text from the document. And then uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads the corresponding correct font library, and the correct font library is used for carrying out messy code repair on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library lacking in the terminal equipment and automatically repair the messy codes in the plurality of documents, and the working efficiency is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
The machine-readable storage medium may include a RAM (Random Access Memory) and may also include a NVM (Non-Volatile Memory), such as at least one disk Memory. Additionally, the machine-readable storage medium may be at least one memory device located remotely from the aforementioned processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
In a sixth aspect, an embodiment of the present invention provides a server, as shown in fig. 12, including a processor 1201, a communication interface 1202, a memory 1203, and a communication bus 1204, where the processor 1201, the communication interface 1202, and the memory 1203 complete communication with each other through the communication bus 1204.
A memory 1203 is used for storing the computer program.
The processor 1201 is configured to, when executing the program stored in the memory 1203, cause the node apparatus to perform the following steps:
and receiving font information corresponding to the document to be processed and the messy code text to be repaired, which is uploaded by the terminal equipment.
And determining a correct font library corresponding to the messy code text, and obtaining and storing a download path of the correct font library.
And providing a download path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the download path to repair the document.
In the embodiment of the invention, the terminal equipment firstly adopts a preset messy code identification algorithm to identify the messy code text data in the document to be processed, and then obtains the font information corresponding to the messy code text from the document. And then uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and a downloading path thereof. The terminal equipment obtains the download path from the server and downloads the corresponding correct font library, and the correct font library is used for carrying out messy code repair on the document. Therefore, the method does not need to manually import the font library and judge whether the messy codes are eliminated by naked eyes, not only can automatically identify the messy codes of a plurality of documents in the terminal equipment, but also can determine the correct font library lacking in the terminal equipment and automatically repair the messy codes in the plurality of documents, and the working efficiency is improved. Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.
The machine-readable storage medium may include a RAM (Random Access Memory) and may also include a NVM (Non-Volatile Memory), such as at least one disk Memory. Additionally, the machine-readable storage medium may be at least one memory device located remotely from the aforementioned processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. Especially, as for the device, the terminal device and the server embodiment, since they are basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (32)

1. A method for repairing messy codes of document contents is applied to terminal equipment, and comprises the following steps:
acquiring a document to be processed;
analyzing the document to be processed;
extracting each font information from the document to be processed;
loading a corresponding font library to the text in the document to be processed according to the font information, and coding the font library to obtain coded text data;
recognizing the coded text data by adopting a preset messy code recognition algorithm;
if the coded text data is identified to contain messy code text data, font information corresponding to the messy code text is obtained from the document to be processed;
uploading font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text, and obtains and stores a download path of the correct font library;
obtaining a download path of the correct font library from the server;
downloading the correct font library from the network according to the downloading path;
and using the correct font library to repair the document to be processed in messy codes.
2. The method according to claim 1, wherein before the step of loading a corresponding font library to the text in the document to be processed according to the font information, encoding the font library, and obtaining encoded text data, the method comprises:
classifying all texts in the document to be processed according to the font information, and respectively taking the texts corresponding to the font information as target texts;
determining the word number of the target characters in each target text;
the step of loading a corresponding font library to the text in the document to be processed according to the font information, coding the font library and obtaining coded text data comprises the following steps:
loading a corresponding font library for each target text according to the font information, and coding to obtain coded text data corresponding to the target text;
the step of adopting a preset messy code recognition algorithm to recognize the coded text data comprises the following steps:
judging whether the coded text data contains uncommon words or not;
if the rare words exist, calculating the occupancy rate of the rare words in the target text according to the word number of the rare words and the word number of the target words in each target text;
judging whether the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value;
and if the occupancy rate of the uncommon word in the target text is greater than a first preset threshold value, recognizing that the encoded text data contains messy code text data.
3. The method of claim 2, wherein the step of determining whether the encoded text data contains uncommon words comprises:
acquiring the word frequency of each target character from a pre-stored word frequency table;
and if the word frequency of the target word is lower than a preset word frequency threshold value of the uncommon word, determining that the target word is the uncommon word.
4. The method of claim 2, further comprising, after determining that the occupancy rate of the uncommon word in the target text is not greater than a first preset threshold:
judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
after a document instruction which needs to be repaired and is input by a user is received, determining that the coded text data contains messy code text data;
and executing the step of obtaining font information corresponding to the messy code text from the document to be processed.
5. The method according to claim 1, wherein the step of obtaining the download path of the correct font library from the server comprises:
sending a font query request to the server; the font inquiry request comprises: font information corresponding to the messy code text;
and receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
6. The method according to claim 1, wherein the step of obtaining the download path of the correct font library from the server comprises:
and receiving the download path of the correct font library returned by the server after the download path of the correct font library is obtained.
7. The method according to claim 1, wherein the step of using the correct font library to perform document scrambling repair on the document to be processed comprises:
and embedding the correct font library into the document to be processed, and performing messy code repair on the document.
8. The method according to claim 1, wherein the step of using the correct font library to perform document scrambling repair on the document to be processed comprises:
installing the correct font library on the terminal equipment;
and coding the document to be processed by using the correct font library, and repairing the messy code of the document.
9. A method for repairing messy codes of document contents is applied to a server, and comprises the following steps:
receiving font information corresponding to the document to be processed and the messy code text to be repaired, which are uploaded by the terminal equipment;
determining a correct font library corresponding to the messy code text, and obtaining and storing a download path of the correct font library;
and providing a download path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the download path to repair the document.
10. The method according to claim 9, wherein the step of determining a correct font library corresponding to the scrambled text, and obtaining and storing a download path of the correct font library comprises:
loading different font libraries in a plurality of preset font libraries for each messy code text to be repaired one by one according to the corresponding font information, and coding to obtain coded text data of the messy code text to be repaired respectively;
respectively identifying each coded text data obtained by using different font library codes by adopting a preset messy code identification algorithm until the coded text data has no messy codes;
and determining the font library loaded when the messy codes do not exist in the text data as a correct font library corresponding to the messy code text to be repaired, and obtaining and storing a downloading path of the correct font library.
11. The method according to claim 9, wherein the step of providing a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path for document repairing, comprises:
receiving a font query request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
12. The method according to claim 9, wherein the step of providing a download path of the correct font library for the terminal device, so that the terminal device downloads the correct font library according to the download path for document repairing, comprises:
and after the downloading path of the correct font library is obtained, returning the downloading path of the correct font library to the terminal equipment.
13. The method according to claim 10, wherein the step of recognizing the encoded text data by using a preset scrambling code recognition algorithm until the encoded text data has no scrambling code comprises:
judging whether the coded text data contains uncommon words or not;
if the rarely-used words exist, calculating the occupancy rate of the rarely-used words in the messy code text to be repaired according to the word number of the rarely-used words in each messy code text to be repaired and the word number of the messy code text to be repaired;
judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value;
and if the occupancy rate of the uncommon word in the messy code text to be repaired is greater than a first preset threshold value, recognizing that the encoded text data contains messy code text data.
14. The method of claim 13, wherein the step of determining whether the encoded text data contains uncommon words comprises:
acquiring the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
and if the word frequency of the messy code words to be repaired is lower than a preset word frequency threshold value of the uncommon words, determining that the messy code words to be repaired are the uncommon words.
15. The method according to claim 13, wherein after judging that the occupancy rate of the uncommon word in the scrambled text to be repaired is not greater than a first preset threshold value, the method comprises the following steps:
judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
and after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
16. A messy code repairing device of document contents is characterized by being applied to terminal equipment and comprising the following components:
the document to be processed acquisition module is used for acquiring a document to be processed;
the analysis module is used for analyzing the document to be processed;
the font information extraction module is used for extracting each font information from the document to be processed;
the text coding module is used for loading a corresponding font library to the text in the document to be processed according to each font information and coding the font library to obtain coded text data;
the messy code identification module is used for identifying the coded text data by adopting a preset messy code identification algorithm;
a garbled text font information obtaining module, configured to obtain font information corresponding to the garbled text from the to-be-processed document if it is identified that the encoded text data includes garbled text data;
the uploading module is used for uploading the font information corresponding to the document to be processed and the messy code text to a server so that the server determines a correct font library corresponding to the messy code text and obtains a downloading path of the correct font library;
and the messy code repairing module is used for obtaining the downloading path of the correct font library from the server, downloading the correct font library from a network according to the downloading path, and performing document messy code repairing on the document to be processed by using the correct font library.
17. The apparatus of claim 16, further comprising: a text classification module;
the text classification module is used for classifying all texts in the document to be processed according to the font information, and respectively taking the texts corresponding to the font information as target texts; determining the word number of the target characters in each target text;
the text encoding module is specifically configured to:
loading a corresponding font library for each target text according to the font information, and coding to obtain coded text data corresponding to the target text;
the messy code identification module comprises:
the rarely-used word determining submodule is used for judging whether rarely-used words exist in the coded text data;
the uncommon word occupancy rate calculation sub-module is used for calculating the occupancy rate of the uncommon words in the target text according to the word number of the uncommon words and the word number of the target words in each target text when the uncommon words exist;
and the messy code text judgment sub-module is used for judging whether the occupancy rate of the rarely-used word in the target text is greater than a first preset threshold value or not, and if the occupancy rate of the rarely-used word in the target text is greater than the first preset threshold value, recognizing that the coded text data contains messy code text data.
18. The apparatus of claim 17, wherein said uncommon word determination sub-module is specifically configured to:
acquiring the word frequency of each target character from a pre-stored word frequency table;
and if the word frequency of the target word is lower than a preset word frequency threshold value of the uncommon word, determining that the target word is the uncommon word.
19. The apparatus of claim 17, wherein the garbled text determination sub-module, after determining that the occupancy rate of the uncommon word in the target text is not greater than a first preset threshold, is further configured to:
judging whether the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
after a document instruction which needs to be repaired and is input by a user is received, determining that the coded text data contains messy code text data;
and triggering the messy code text font information acquisition module.
20. The apparatus according to claim 16, wherein the scrambling code recovery module obtains the download path of the correct font library by:
sending a font query request to the server; the font inquiry request comprises: font information corresponding to the messy code text;
and receiving a download path of a correct font library corresponding to the font information corresponding to the messy code text returned by the server.
21. The apparatus according to claim 16, wherein the scrambling code recovery module obtains the download path of the correct font library by:
and receiving a download path of the correct font library returned by the server after the download path of the correct font library is obtained, and downloading the correct font library from the network according to the download path.
22. The apparatus according to claim 16, wherein the scrambling code recovery module performs the document scrambling code recovery on the document to be processed by specifically adopting the following steps:
and embedding the correct font library into the document to be processed, and performing messy code repair on the document.
23. The apparatus according to claim 16, wherein the scrambling code recovery module performs the document scrambling code recovery on the document to be processed by specifically adopting the following steps:
installing the correct font library on the terminal equipment;
and coding the document to be processed by using the correct font library, and repairing the messy code of the document.
24. An apparatus for repairing scrambled document contents, applied to a server, the apparatus comprising:
the receiving module is used for receiving the document to be processed uploaded by the terminal equipment and font information corresponding to the messy code text to be repaired;
a correct font library determining module, configured to determine a correct font library corresponding to the scrambled text, and obtain and store a download path of the correct font library;
and the correct font library downloading path providing module is used for providing a downloading path of the correct font library for the terminal equipment so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
25. The apparatus of claim 24, wherein the correct font library determination module comprises:
the font library coding submodule is used for loading and coding different font libraries in a plurality of preset font libraries for each messy code text to be repaired one by one according to the corresponding font information, and respectively obtaining the coded text data of the messy code text to be repaired;
the messy code identification submodule is used for respectively identifying each coded text data obtained by coding different font libraries by adopting a preset messy code identification algorithm until the coded text data has no messy code;
and the correct font library determining submodule is used for determining the font library loaded when the messy codes do not exist in the text data as the correct font library corresponding to the messy code text to be repaired, and acquiring and storing a downloading path of the correct font library.
26. The apparatus according to claim 24, wherein the correct font library download path providing module is specifically configured to:
receiving a font query request sent by terminal equipment; the font inquiry request comprises: font information corresponding to the messy code text to be repaired;
and returning a downloading path of a correct font library corresponding to the font information corresponding to the messy code text to be repaired to the terminal equipment, so that the terminal equipment downloads the correct font library according to the downloading path to repair the document.
27. The apparatus according to claim 24, wherein the correct font library download path providing module is specifically configured to:
and after the downloading path of the correct font library is obtained, returning the downloading path of the correct font library to the terminal equipment.
28. The apparatus of claim 25, wherein the scrambling code identifier module comprises:
a rarely-used word determining unit, configured to determine whether a rarely-used word exists in the encoded text data, for the encoded text data obtained by using the different font library codes;
the rarely-used word occupancy calculation unit is used for calculating the occupancy rate of rarely-used words in the messy code text to be repaired according to the word number of the rarely-used words in each messy code text to be repaired and the word number of the messy code text to be repaired after the rarely-used words are determined to exist;
and the messy code text judgment unit is used for judging whether the occupancy rate of the rarely-used word in the messy code text to be repaired is greater than a first preset threshold value or not, and if the occupancy rate of the rarely-used word in the messy code text to be repaired is greater than the first preset threshold value, recognizing that the coded text data contains messy code text data.
29. The apparatus according to claim 28, wherein the uncommon word determination unit is specifically configured to:
acquiring the word frequency of each messy code word to be repaired from a pre-stored word frequency table;
and if the word frequency of the messy code words to be repaired is lower than a preset word frequency threshold value of the uncommon words, determining that the messy code words to be repaired are the uncommon words.
30. The apparatus according to claim 28, wherein the scrambled text determination unit is specifically configured to:
after judging that the occupancy rate of the uncommon word in the messy code text to be repaired is not greater than a first preset threshold value, judging whether the occupancy rate of the uncommon word in the messy code text to be repaired is smaller than a second preset threshold value; the second preset threshold is smaller than the first preset threshold;
if the occupancy rate of the uncommon word in the target text is smaller than a second preset threshold value, recognizing that no messy code text data exists in the coded text data;
if the current time is not less than the second preset threshold, outputting a prompt whether the document needs to be repaired to the user;
and after receiving a document repairing instruction input by a user, determining that the coded text data contains messy code text data.
31. A terminal device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: carrying out the method steps of any one of claims 1 to 8.
32. A server comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: -carrying out the method steps of any one of claims 9 to 15.
CN201810782438.8A 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server Active CN110728111B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810782438.8A CN110728111B (en) 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810782438.8A CN110728111B (en) 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server

Publications (2)

Publication Number Publication Date
CN110728111A true CN110728111A (en) 2020-01-24
CN110728111B CN110728111B (en) 2024-06-25

Family

ID=69217009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810782438.8A Active CN110728111B (en) 2018-07-17 2018-07-17 Document content messy code repairing method and device, terminal equipment and server

Country Status (1)

Country Link
CN (1) CN110728111B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051235A (en) * 2021-04-22 2021-06-29 平安普惠企业管理有限公司 Document loading method and device, terminal and storage medium
CN113743051A (en) * 2021-08-10 2021-12-03 广州坚和网络科技有限公司 Font setting method, user terminal, server and system
CN113850050A (en) * 2020-06-28 2021-12-28 荣耀终端有限公司 Character display method, character display device and terminal equipment
CN114218318A (en) * 2022-02-21 2022-03-22 国网山东省电力公司乳山市供电公司 Data processing system and method for electric power big data
CN115086423A (en) * 2022-05-18 2022-09-20 深圳市科陆电子科技股份有限公司 Data transmission method, data transmission device, computer device, and storage medium
CN115622997A (en) * 2022-10-27 2023-01-17 广州市保伦电子有限公司 Method, device and storage medium for sharing host font library

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
US20080181504A1 (en) * 2007-01-31 2008-07-31 International Business Machines Corporation Apparatus, method, and program for detecting garbled characters
US20130141457A1 (en) * 2011-12-06 2013-06-06 Hon Hai Precision Industry Co., Ltd. Electronic device capable of recovering garbled characters and method for recovering garbled characters
CN103425257A (en) * 2012-05-24 2013-12-04 北京搜狗科技发展有限公司 Method and device for prompting information of uncommon characters
CN104424165A (en) * 2013-09-06 2015-03-18 北大方正集团有限公司 Messy code detection method and system for text documents
CN104732228A (en) * 2015-04-16 2015-06-24 同方知网数字出版技术股份有限公司 Detection and correction method for messy codes of PDF (portable document format) document
CN106598923A (en) * 2016-12-26 2017-04-26 北京致远互联软件股份有限公司 Online document format conversion method and apparatus based on font object library loading
WO2017092151A1 (en) * 2015-12-03 2017-06-08 福建福昕软件开发股份有限公司 Method for creating garbled pdf text
CN106874263A (en) * 2017-01-17 2017-06-20 中译语通科技(北京)有限公司 A kind of Sino-British corpus proofreading method based on multi-dimensional data analysis and semanteme

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1924866A (en) * 2006-09-28 2007-03-07 北京理工大学 Static feature based web page malicious scenarios detection method
US20080181504A1 (en) * 2007-01-31 2008-07-31 International Business Machines Corporation Apparatus, method, and program for detecting garbled characters
US20130141457A1 (en) * 2011-12-06 2013-06-06 Hon Hai Precision Industry Co., Ltd. Electronic device capable of recovering garbled characters and method for recovering garbled characters
CN103425257A (en) * 2012-05-24 2013-12-04 北京搜狗科技发展有限公司 Method and device for prompting information of uncommon characters
CN104424165A (en) * 2013-09-06 2015-03-18 北大方正集团有限公司 Messy code detection method and system for text documents
CN104732228A (en) * 2015-04-16 2015-06-24 同方知网数字出版技术股份有限公司 Detection and correction method for messy codes of PDF (portable document format) document
WO2017092151A1 (en) * 2015-12-03 2017-06-08 福建福昕软件开发股份有限公司 Method for creating garbled pdf text
CN106598923A (en) * 2016-12-26 2017-04-26 北京致远互联软件股份有限公司 Online document format conversion method and apparatus based on font object library loading
CN106874263A (en) * 2017-01-17 2017-06-20 中译语通科技(北京)有限公司 A kind of Sino-British corpus proofreading method based on multi-dimensional data analysis and semanteme

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113850050A (en) * 2020-06-28 2021-12-28 荣耀终端有限公司 Character display method, character display device and terminal equipment
CN113051235A (en) * 2021-04-22 2021-06-29 平安普惠企业管理有限公司 Document loading method and device, terminal and storage medium
CN113743051A (en) * 2021-08-10 2021-12-03 广州坚和网络科技有限公司 Font setting method, user terminal, server and system
CN114218318A (en) * 2022-02-21 2022-03-22 国网山东省电力公司乳山市供电公司 Data processing system and method for electric power big data
CN114218318B (en) * 2022-02-21 2022-05-17 国网山东省电力公司乳山市供电公司 Data processing system and method for electric power big data
CN115086423A (en) * 2022-05-18 2022-09-20 深圳市科陆电子科技股份有限公司 Data transmission method, data transmission device, computer device, and storage medium
CN115622997A (en) * 2022-10-27 2023-01-17 广州市保伦电子有限公司 Method, device and storage medium for sharing host font library

Also Published As

Publication number Publication date
CN110728111B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN110728111B (en) Document content messy code repairing method and device, terminal equipment and server
CN110795258B (en) Font library matching method, device and equipment
CN109062874B (en) Financial data acquisition method, terminal device and medium
CN110020424B (en) Contract information extraction method and device and text information extraction method
US7865355B2 (en) Fast text character set recognition
CN107122342B (en) Text code recognition method and device
CN107085568B (en) Text similarity distinguishing method and device
CN111831920A (en) User demand analysis method and device, computer equipment and storage medium
CN111339166A (en) Word stock-based matching recommendation method, electronic device and storage medium
US9658989B2 (en) Apparatus and method for extracting and manipulating the reading order of text to prepare a display document for analysis
CN108052686B (en) Abstract extraction method and related equipment
CN115422125A (en) Electronic document automatic filing method and system based on intelligent algorithm
CN114743012A (en) Text recognition method and device
CN110096478B (en) Document index generation method and device
CN110728115B (en) Document content messy code identification method and device and electronic equipment
CN114065762A (en) Text information processing method, device, medium and equipment
CN110874398B (en) Forbidden word processing method and device, electronic equipment and storage medium
CN109661779A (en) Method and system for compressed data
CN113627129B (en) Text copying method and device, electronic equipment and readable storage medium
CN110941704B (en) Text content similarity analysis method
CN115994210A (en) Method and device for quickly searching text in OFD document and electronic equipment
CN107066601A (en) File contrasts management method and system
CN114997137A (en) Document information extraction method, device and equipment and readable storage medium
CN111695327B (en) Method and device for repairing messy codes, electronic equipment and readable storage medium
CN109656821B (en) Test method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant