CN113595683A - Conversion processing method, device, terminal and medium based on various encoding files - Google Patents

Conversion processing method, device, terminal and medium based on various encoding files Download PDF

Info

Publication number
CN113595683A
CN113595683A CN202110769384.3A CN202110769384A CN113595683A CN 113595683 A CN113595683 A CN 113595683A CN 202110769384 A CN202110769384 A CN 202110769384A CN 113595683 A CN113595683 A CN 113595683A
Authority
CN
China
Prior art keywords
converted
file
text content
files
contents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110769384.3A
Other languages
Chinese (zh)
Inventor
关瑞
姜坤
卫宣安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Zhenyou Communication Technology Co ltd
Original Assignee
Xi'an Zhenyou Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Zhenyou Communication Technology Co ltd filed Critical Xi'an Zhenyou Communication Technology Co ltd
Priority to CN202110769384.3A priority Critical patent/CN113595683A/en
Publication of CN113595683A publication Critical patent/CN113595683A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0057Block codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0078Avoidance of errors by organising the transmitted data in a format specifically designed to deal with errors, e.g. location
    • H04L1/0084Formats for payload data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Transfer Between Computers (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a conversion processing method, a device, a terminal and a medium based on various encoding files, wherein the method comprises the following steps: acquiring a file to be converted; extracting the text content of the file to be converted based on the file to be converted; recognizing character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically converting the character string codes into new text contents in a corresponding format; and outputting the converted new text content according to a uniform format. The invention provides an adaptive method for rapidly converting various codes, which can automatically acquire the coding mode of a page and then decode the page, thereby acquiring the desired content.

Description

Conversion processing method, device, terminal and medium based on various encoding files
Technical Field
The invention relates to the technical field of file coding, in particular to a conversion processing method and device based on various coded files, an intelligent terminal and a storage medium.
Background
Various forms of web pages, addresses, are often encountered in work. They employ different codes. When the problem of webpage code identification is to be solved due to the acquisition requirement, the opened file can be found to have a messy code. There are many ways to search out on the web, such as based on the header byte, or based on the charset identifier of the web page. However, these methods often only solve a part of the problems, and cannot achieve multiple adaptations.
Such problems are also often encountered when writing web pages, and if one forgets to explicitly assign codes to the browser in the head, one often happens to be garbled, but such garbled occurrences are not regularly traceable. Or when a downloaded file is opened, the file is opened with joy, but a piece of messy code is reflected in the eye curtain.
Many times, the coding format is often ignored. Because Windows defaults to the GB18030/BIG5 encoding, text will typically be saved as the default encoding, which further increases the probability that a user will encounter scrambling code.
Thus, there is still a need for improvement and development of the prior art.
Disclosure of Invention
The invention provides a conversion processing method, a conversion processing device, an intelligent terminal and a storage medium based on various coding files, aiming at the defects of the prior art.
The technical scheme adopted by the invention for solving the problems is as follows:
a conversion processing method based on various types of encoded files, wherein the method comprises the following steps:
acquiring a file to be converted;
extracting the text content of the file to be converted based on the file to be converted;
recognizing character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically converting the character string codes into new text contents in a corresponding format;
and outputting the converted new text content according to a uniform format.
The conversion processing method based on various types of encoded files, wherein the step of obtaining the file to be converted comprises the following steps:
a converter class dictionary library which can automatically convert various types of codes into corresponding recognizable formats is preset.
The conversion processing method based on various types of encoded files, wherein the step of acquiring the files to be converted comprises the following steps:
selecting a file to be converted according to a selection instruction;
receiving a conversion instruction through a preset automatic conversion button, and acquiring a file to be converted from the selected file; the files to be converted comprise web pages, local files, download files and/or url address types.
The conversion processing method based on various types of encoded files, wherein the step of extracting the text content of the file to be converted based on the file to be converted comprises the following steps:
extracting the text content of the file to be converted one by one based on the file to be converted;
and carrying out error correction identification on the extracted text contents, and sorting the contents subjected to error correction identification into the text contents of the file needing to be converted.
The conversion processing method based on various types of encoded files, wherein the step of identifying character string encoding for the text content of the file through a preset converter dictionary library based on the extracted text content and automatically converting the text content into new text content with a corresponding format further comprises the following steps of:
recognizing character string codes according to the extracted literal contents through an internal function of a preset converter dictionary library based on the extracted literal contents;
and coding the identified character string, and automatically converting the character string into new character content in a preset corresponding format.
The conversion processing method based on various encoding files is characterized in that the step of automatically converting the extracted text content into the new text content with the corresponding format by recognizing character string encoding of the text content of the file through a preset converter dictionary library based on the extracted text content comprises the following steps of:
acquiring coding information of a current webpage according to the extracted webpage source code, acquiring a charset through information in a meta tag, or acquiring a charset variable in a header returned by a server;
and then automatically identify new web page content that is converted to the corresponding format.
The conversion processing method based on various encoding files, wherein the step of outputting the converted new text content according to a uniform format comprises the following steps:
and outputting the converted new text content according to a preset uniform format.
A conversion processing apparatus based on various types of encoded files, wherein the apparatus comprises:
the acquisition module is used for acquiring files needing conversion;
the extraction module is used for extracting the text content of the file to be converted based on the file to be converted;
the conversion module is used for identifying character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents and automatically converting the literal contents into new literal contents in a corresponding format;
and the output control module is used for outputting the converted new text contents according to a uniform format.
An intelligent terminal comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to implement the steps of any of the methods when the one or more programs are executed by one or more processors.
A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an electronic device, enable the electronic device to perform any of the methods described herein.
The invention has the beneficial effects that: the embodiment of the invention provides an adaptive method for quickly converting various codes, which is characterized in that files needing to be converted are acquired and then are clicked for automatic conversion. The method content obtains the content of the related files one by one, and then the function inside the method can identify the character string code according to the content and automatically convert the character string code into a correct format.
In the case of various codes of numerous web pages, the method can acquire the coding information of the current web page according to the web page source code, for example, the charset is acquired through the information in the meta tag, or the charset variable in the header returned by the server. And then the correct webpage content is converted through automatic identification, so that convenience is provided for the use of the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a conversion processing method based on various types of encoded files according to an embodiment of the present invention.
Fig. 2 is a schematic flowchart of a conversion processing method based on various types of encoded files according to a second embodiment of the present invention.
Fig. 3 is a schematic block diagram of a conversion processing apparatus based on various types of encoded files according to an embodiment of the present invention.
Fig. 4 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
Researches find that various types of web pages and addresses are encountered in daily work. They employ different codes. When the problem of webpage code identification is to be solved due to the acquisition requirement, the opened file can be found to have a messy code. There are many ways to search out on the web, such as based on the header byte, or based on the charset identifier of the web page. However, these methods often only solve a part of the problems, and cannot achieve multiple adaptations.
Such problems are also often encountered, for example, when writing web pages, and if one forgets to explicitly assign a code to the browser in the head (header of the web page document), one often happens to be a messy code, but the messy code appears irregularly. Or when a downloaded file is opened, the file is opened with joy, but a piece of messy code is reflected in the eye curtain.
Many times, the coding format is often ignored. Because Windows defaults to the GB18030/BIG5 encoding, text will typically be saved as the default encoding, which further increases the probability that a user will encounter scrambling code.
In practical application, the browser has an identification function, and needs to set coding options, but the browser cannot completely identify various codes. The conversion tool on the network is used, so that the method is not completely suitable for various messy code conditions.
In order to solve the technical problem, the invention provides an adaptive method for rapidly converting various codes, which can automatically acquire the coding mode of a page and then decode the page, thereby acquiring the desired content. The invention provides an adaptive method for quickly converting various codes, which uploads files to be converted and then clicks to automatically convert. The method content obtains the content of the related files one by one, and then the function inside the method can identify the character string code according to the content and automatically convert the character string code into a correct format.
In the case of various codes of numerous web pages, the method can acquire the coding information of the current web page according to the web page source code, such as acquiring charset through the information in meta tag, or acquiring charset variable in the header returned by the server. And then converted into the correct web page content through automatic recognition. The META label is an important HTML label in HTML webpage source code. META tags are used to describe attributes of an HTML web document such as author, date and time, web page description, keywords, page refreshes, etc.
Of course, the present invention also provides automatic translation functions when some address encodings are translated. Such as: the web page of hundred degrees is coded by gb2312, and google is coded by utf8, and the url code codes converted by the two web page coding schemes are different. When the gb2312 webpage uses the urencode code, the ansi code of the character is firstly obtained and then converted into the 16-system code, while the utf8 is more complicated when the urencode code is used, and the urencode algorithm is used.
That is, the method provides various types of automatic code conversion functions such as web pages, local files, download files, url addresses and the like. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding.
Exemplary method
As shown in fig. 1, an embodiment of the present invention provides a method for converting various types of encoded files, where in the embodiment of the present invention, the method includes the following steps:
s100, acquiring a file to be converted;
in the embodiment of the invention, when various different types of coded files need to be checked, the adopted codes are different, and in the prior art, due to the acquisition requirement, when the problem of webpage code identification is to be solved, a piece of messy code of the opened file can be found, so that the messy code is easy to appear.
The embodiment of the invention can solve the problem of messy codes.
Before the invention is implemented, a converter dictionary library which can automatically convert various types of codes into corresponding recognizable formats needs to be preset.
In the embodiment of the invention, a converter class dictionary library is preset, namely a dictionary library. The converter is contained in the box body and corresponds to various types. After the client inputs a segment of characters, matching the characters in the library according to the keywords. And then enters the conversion class of the response, i.e., the conversion flow.
A translator is actually a class in code. Called converter class (converter class dictionary library), it is the responsibility to convert addresses of various formats.
The embodiment of the invention is provided with the converter dictionary library in advance, and can automatically convert various types of codes into the converter dictionary library with the corresponding recognizable format.
When various encoded files need to be converted, the files needing to be converted are firstly acquired.
When the method is used specifically, for example, a user selects a file to be converted through operation, and the file to be converted is selected according to a selection instruction; then, a user clicks a preset automatic conversion button, receives a conversion instruction through the preset automatic conversion button, and acquires a file to be converted from the selected file; the files to be converted comprise web pages, local files, download files and/or url address types.
S200, extracting the text content of the file to be converted based on the file to be converted;
in the embodiment of the invention, the text content of the file to be converted is extracted based on the file to be converted.
In other words, in the embodiment of the present invention, the text content of the file to be converted is extracted one by one based on the file to be converted; for example, when the document to be converted is a web page, the text content of the selected web page document is extracted, and in the embodiment of the present invention, preferably, the extracted text content is subjected to error correction recognition, and the content after error correction recognition is arranged into the text content of the document to be converted.
Step S300, recognizing character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents, and automatically converting the character string codes into new literal contents in a corresponding format;
in this step, based on the extracted text content, the text content of the file is identified by a character string code through a preset converter dictionary library, and is automatically converted into new text content in a corresponding format. In the embodiment of the invention, the extracted text content is input into a preset converter dictionary library to identify character string codes of the text content of the file, and the character string codes are automatically converted into new text content in a corresponding format, preferably, the invention adopts the mode of automatically and uniformly converting files in various coding formats into files in a url-encoding format; that is, the character string is encoded in a URL.
In the embodiment of the invention, the character string is coded by URL through the file converted into the URL coding format, so that the problem of Chinese messy codes in URL can be effectively solved.
Specifically, based on the extracted text content, recognizing character string codes according to the extracted text content through an internal function of a preset converter dictionary library; and coding the identified character string, and automatically converting the character string into new character content in a preset corresponding format.
For example, when the file to be converted is a webpage file, a source code of the webpage needs to be extracted, encoding information of the current webpage is acquired according to the extracted webpage source code, the charset is acquired through information in a meta tag, or the charset is acquired through a charset variable in a header returned by a server; and then automatically identify new web page content that is converted to the corresponding format.
The META tag is an important HTML tag in HTML webpage source code. META tags are used to describe attributes of an HTML web document such as author, date and time, web page description, keywords, page refreshes, etc. The charset attribute specifies the character encoding used in the external script file. The header (header) is a string sent by the server before sending the HTML data to the browser by HTTP protocol, and a line separation is needed between the header and the HTML file.
In the embodiment of the invention, the preset dictionary library (namely the converter dictionary library) is actually a dictionary library. The converter is contained in the box body and corresponds to various types. After the client inputs a segment of characters, matching the characters in the library according to the keywords. And then enters the conversion class of the response, i.e., the conversion flow.
And step S400, outputting the converted new text content according to a uniform format.
In the embodiment of the invention, the converted new text content is output according to a uniform format. For example, the new converted text content is output according to a preset uniform format. The converter (i.e. the converter class dictionary library) adopted in the embodiment of the invention is actually a class in the code. Converter class, which is used to convert various format addresses; the output is in a unified format.
Namely, in the embodiment of the invention, the file needing to be converted is uploaded, and then the automatic conversion is clicked. The contents of the related files are acquired item by item, and then the character string codes can be recognized according to the contents through a function in a preset converter class dictionary library, so that the character string codes are automatically converted into the correct format.
In the face of various codes of numerous web pages, the invention can acquire the coding information of the current web page according to the web page source code, such as acquiring the charset through the information in the meta tag or acquiring the charset variable in the header returned by the server. And then converted into the correct web page content through automatic recognition.
In further embodiments, the present invention also provides automatic translation, for example, when some address encodings are translated. Such as: the web page of hundred degrees is coded by gb2312, and google is coded by utf8, and the url code codes converted by the two web page coding schemes are different. When the gb2312 webpage uses the urencode code, the ansi code of the character is firstly obtained and then converted into the 16-system code, while the utf8 is more complicated when the urencode code is used, and the urencode algorithm is used.
The GB code is called GB2312-80 basic set of Chinese character coding character set for information exchange, published in 1980, is a national standard for Chinese information processing, and is a unique Chinese code which is forcibly used in continental lands and overseas areas (such as Singapore and the like) using simplified Chinese.
UTF-8(8 bit, Universal Character Set/Unicode Transformation Format) is a variable length Character encoding for Unicode.
The invention relates to a URL encoding, namely encoding a character string by using URL, which is an encoding mode and can effectively solve the problem of Chinese messy codes in URL.
That is, the method of the present invention provides various types of transcoding automatic conversion functions based on web pages, local files, download files, url addresses, etc. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding.
One specific application embodiment is shown in fig. 2, and the method for converting based on various types of encoded files provided in this specific application embodiment includes the following steps:
step S10, start;
step S20, obtaining the files to be converted, wherein the files to be converted include local files, web pages, URL addresses and other various types of coding files
Step S30, automatic identification; automatically identifying through a preset converter dictionary library;
step S40, automatic conversion is carried out according to the codes;
the method can acquire the coding information of the current webpage according to the webpage source code, such as acquiring the charset through the information in the meta tag or acquiring the charset variable in the header returned by the server. And then converted into the correct web page content through automatic recognition.
Step S50, outputting a correct format;
and step S60, end.
Namely, the embodiment of the invention provides various types of automatic coding conversion functions such as web pages, local files, download files, url addresses and the like. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding.
Exemplary device
As shown in fig. 3, an embodiment of the present invention provides a conversion processing apparatus based on various types of encoded files, including:
an obtaining module 310, configured to obtain a file to be converted;
an extracting module 320, configured to extract, based on the file to be converted, text content of the file to be converted;
the conversion module 330 is configured to identify character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically convert the character string codes into new text contents in a corresponding format;
the output control module 340 is configured to output the converted new text content according to a uniform format, which is specifically described above.
Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 4. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program is executed by a processor to realize a conversion processing method based on various types of encoded files. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the intelligent terminal is arranged inside the intelligent terminal in advance and used for detecting the operating temperature of internal equipment.
It will be understood by those skilled in the art that the block diagram shown in fig. 4 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.
In one embodiment, an intelligent terminal is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
acquiring a file to be converted;
extracting the text content of the file to be converted based on the file to be converted;
recognizing character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically converting the character string codes into new text contents in a corresponding format;
and outputting the converted new text content according to a uniform format.
Wherein, the step of obtaining the file to be converted comprises the following steps:
a converter class dictionary library which can automatically convert various types of codes into corresponding recognizable formats is preset.
Wherein, the step of obtaining the file to be converted comprises the following steps:
selecting a file to be converted according to a selection instruction;
receiving a conversion instruction through a preset automatic conversion button, and acquiring a file to be converted from the selected file; the files to be converted comprise web pages, local files, download files and/or url address types.
Wherein, the step of extracting the text content of the file to be converted based on the file to be converted comprises the following steps:
extracting the text content of the file to be converted one by one based on the file to be converted;
and carrying out error correction identification on the extracted text contents, and sorting the contents subjected to error correction identification into the text contents of the file needing to be converted.
The step of recognizing character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents, and automatically converting the character string codes into new literal contents in a corresponding format further comprises the following steps of:
recognizing character string codes according to the extracted literal contents through an internal function of a preset converter dictionary library based on the extracted literal contents;
and coding the identified character string, and automatically converting the character string into new character content in a preset corresponding format.
The step of recognizing character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents, and automatically converting the literal contents into new literal contents with corresponding formats comprises the following steps:
acquiring coding information of a current webpage according to the extracted webpage source code, acquiring a charset through information in a meta tag, or acquiring a charset variable in a header returned by a server;
and then automatically identify new web page content that is converted to the corresponding format.
Wherein, the step of outputting the converted new text content according to a uniform format comprises:
and outputting the converted new text content according to a preset uniform format.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
In summary, the invention discloses a conversion processing method, device, intelligent terminal and storage medium based on various encoded files, which is to acquire the files to be converted and then click to automatically convert. The method content obtains the content of the related files one by one, and then the function inside the method can identify the character string code according to the content and automatically convert the character string code into a correct format. The method of the invention provides various types of automatic code conversion functions based on web pages, local files, downloaded files, url addresses and the like. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding. The invention provides an adaptive method for rapidly converting various codes, which can automatically acquire the coding mode of a page and then decode the page, thereby acquiring the desired content.
It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims (10)

1. A conversion processing method based on various types of encoded files is characterized by comprising the following steps:
acquiring a file to be converted;
extracting the text content of the file to be converted based on the file to be converted;
recognizing character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically converting the character string codes into new text contents in a corresponding format;
and outputting the converted new text content according to a uniform format.
2. The method of claim 1, wherein the step of obtaining the files to be converted comprises:
a converter class dictionary library which can automatically convert various types of codes into corresponding recognizable formats is preset.
3. The method of claim 1, wherein the step of obtaining the files to be converted comprises:
selecting a file to be converted according to a selection instruction;
receiving a conversion instruction through a preset automatic conversion button, and acquiring a file to be converted from the selected file; the files to be converted comprise web pages, local files, download files and/or url address types.
4. The method according to claim 1, wherein the step of extracting the text content of the file to be converted based on the file to be converted comprises:
extracting the text content of the file to be converted one by one based on the file to be converted;
and carrying out error correction identification on the extracted text contents, and sorting the contents subjected to error correction identification into the text contents of the file needing to be converted.
5. The method as claimed in claim 1, wherein the step of automatically converting the text content of the document into a new text content in a corresponding format by encoding the text content recognition character string of the document through a predetermined converter dictionary based on the extracted text content further comprises:
recognizing character string codes according to the extracted literal contents through an internal function of a preset converter dictionary library based on the extracted literal contents;
and coding the identified character string, and automatically converting the character string into new character content in a preset corresponding format.
6. The method for converting and processing various types of encoded files according to claim 1, wherein the step of automatically converting the text content of the file into the new text content in the corresponding format by encoding the text content recognition character string of the file through a preset converter dictionary library based on the extracted text content comprises:
acquiring coding information of a current webpage according to the extracted webpage source code, acquiring a charset through information in a meta tag, or acquiring a charset variable in a header returned by a server;
and then automatically identify new web page content that is converted to the corresponding format.
7. The method of claim 1, wherein the step of outputting the converted new text content in a unified format comprises:
and outputting the converted new text content according to a preset uniform format.
8. A conversion processing apparatus based on various types of encoded files, the apparatus comprising:
the acquisition module is used for acquiring files needing conversion;
the extraction module is used for extracting the text content of the file to be converted based on the file to be converted;
the conversion module is used for identifying character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents and automatically converting the literal contents into new literal contents in a corresponding format;
and the output control module is used for outputting the converted new text contents according to a uniform format.
9. An intelligent terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to implement the steps of the method according to any one of claims 1-7 when the one or more programs are executed by one or more processors.
10. A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1-7.
CN202110769384.3A 2021-07-07 2021-07-07 Conversion processing method, device, terminal and medium based on various encoding files Pending CN113595683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110769384.3A CN113595683A (en) 2021-07-07 2021-07-07 Conversion processing method, device, terminal and medium based on various encoding files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110769384.3A CN113595683A (en) 2021-07-07 2021-07-07 Conversion processing method, device, terminal and medium based on various encoding files

Publications (1)

Publication Number Publication Date
CN113595683A true CN113595683A (en) 2021-11-02

Family

ID=78246222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110769384.3A Pending CN113595683A (en) 2021-07-07 2021-07-07 Conversion processing method, device, terminal and medium based on various encoding files

Country Status (1)

Country Link
CN (1) CN113595683A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040057355A (en) * 2002-12-26 2004-07-02 주식회사 인지소프트 Method for encoding and decoding document
CN101110072A (en) * 2007-08-21 2008-01-23 无敌科技(西安)有限公司 Device and method for automatic identifying literal code
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
CN102073623A (en) * 2009-11-25 2011-05-25 英业达股份有限公司 System and method for converting file coding
CN102567293A (en) * 2010-12-13 2012-07-11 汉王科技股份有限公司 Coded format detection method and coded format detection device for text files
CN104361021A (en) * 2014-10-21 2015-02-18 小米科技有限责任公司 Webpage encoding identifying method and device
CN104391993A (en) * 2014-12-15 2015-03-04 浪潮(北京)电子信息产业有限公司 Method and system for recognizing webpage codes
US20150113391A1 (en) * 2012-06-29 2015-04-23 SKK Ltd. Document processing system, document processing method and storage medium
CN104750663A (en) * 2013-12-27 2015-07-01 阿里巴巴集团控股有限公司 Identification method and device for text messy codes in page
CN104994128A (en) * 2015-05-15 2015-10-21 北京网康科技有限公司 Data coding type identifying and transcoding method and device
CN107122342A (en) * 2017-04-21 2017-09-01 东莞中国科学院云计算产业技术创新与育成中心 Text code recognition methods and device
CN110020343A (en) * 2017-09-01 2019-07-16 北京国双科技有限公司 The determination method and apparatus of web page coding format
CN110046331A (en) * 2019-03-20 2019-07-23 北京品友互动信息技术股份公司 Data-encoding scheme and device, storage medium, electronic device
CN110196968A (en) * 2019-06-06 2019-09-03 北京林业大学 A kind of simplified form of Chinese Character coding mode automatic recognition system and method searched based on specific character string
CN111898340A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 File processing method and device and readable storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040057355A (en) * 2002-12-26 2004-07-02 주식회사 인지소프트 Method for encoding and decoding document
CN101110072A (en) * 2007-08-21 2008-01-23 无敌科技(西安)有限公司 Device and method for automatic identifying literal code
CN101526963A (en) * 2009-04-17 2009-09-09 深圳华为通信技术有限公司 Method for identifying web page coding, device and terminal equipment
CN102073623A (en) * 2009-11-25 2011-05-25 英业达股份有限公司 System and method for converting file coding
CN102567293A (en) * 2010-12-13 2012-07-11 汉王科技股份有限公司 Coded format detection method and coded format detection device for text files
US20150113391A1 (en) * 2012-06-29 2015-04-23 SKK Ltd. Document processing system, document processing method and storage medium
CN104750663A (en) * 2013-12-27 2015-07-01 阿里巴巴集团控股有限公司 Identification method and device for text messy codes in page
CN104361021A (en) * 2014-10-21 2015-02-18 小米科技有限责任公司 Webpage encoding identifying method and device
CN104391993A (en) * 2014-12-15 2015-03-04 浪潮(北京)电子信息产业有限公司 Method and system for recognizing webpage codes
CN104994128A (en) * 2015-05-15 2015-10-21 北京网康科技有限公司 Data coding type identifying and transcoding method and device
CN107122342A (en) * 2017-04-21 2017-09-01 东莞中国科学院云计算产业技术创新与育成中心 Text code recognition methods and device
CN110020343A (en) * 2017-09-01 2019-07-16 北京国双科技有限公司 The determination method and apparatus of web page coding format
CN110046331A (en) * 2019-03-20 2019-07-23 北京品友互动信息技术股份公司 Data-encoding scheme and device, storage medium, electronic device
CN110196968A (en) * 2019-06-06 2019-09-03 北京林业大学 A kind of simplified form of Chinese Character coding mode automatic recognition system and method searched based on specific character string
CN111898340A (en) * 2020-07-30 2020-11-06 北京字节跳动网络技术有限公司 File processing method and device and readable storage medium

Similar Documents

Publication Publication Date Title
CN111209004B (en) Code conversion method and device
US20070208997A1 (en) Xsl transformation and translation
US20110137943A1 (en) Apparatus for deciding word-related keywords, and method and program for controlling operation of same
CN107704615B (en) Webpage font display method and system based on Chinese font subset
WO2019153979A1 (en) Text translation method, apparatus, computer device and storage medium
CN108363588B (en) Method for realizing interaction between web and native function, electronic device and readable storage medium
CN112306620B (en) Multi-language loading method and device for user-defined form control
CN111460835B (en) Auxiliary translation method and device and electronic equipment
EP2874071A1 (en) Method of implementing structured and non-structured data in xml document
CN116227505A (en) Internationalization file translation method, device, equipment and medium
CN113408244B (en) Method, device, equipment and medium for generating Word document by Java application
CN104536769A (en) International file achieving method
TW201530322A (en) Font process method and font process system
CN113595683A (en) Conversion processing method, device, terminal and medium based on various encoding files
CN112965772A (en) Web page display method and device and electronic equipment
US20030033334A1 (en) Method and system for ascertaining code sets associated with requests and responses in multi-lingual distributed environments
CN116484223A (en) Model training method, standard format document generation method and device
JP2006065467A (en) Device for creating data extraction definition information and method for creating data extraction definition information
CN115328455A (en) Information display method, device, equipment and medium for flutter application
CN109657178B (en) Page form processing method and device, computer equipment and storage medium
CN113627129B (en) Text copying method and device, electronic equipment and readable storage medium
CN112650479B (en) Webpage CSS structure automatic generation method and storage medium
US20040044652A1 (en) Information extraction device and storage medium
CN106569939A (en) Multilateral language analysis system and multilateral language analysis method for control script programs
CN111563223B (en) Webpage localization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211102