CN113595683A

CN113595683A - Conversion processing method, device, terminal and medium based on various encoding files

Info

Publication number: CN113595683A
Application number: CN202110769384.3A
Authority: CN
Inventors: 关瑞; 姜坤; 卫宣安
Original assignee: Xi'an Zhenyou Communication Technology Co ltd
Current assignee: Xi'an Zhenyou Communication Technology Co ltd
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-11-02

Abstract

The invention discloses a conversion processing method, a device, a terminal and a medium based on various encoding files, wherein the method comprises the following steps: acquiring a file to be converted; extracting the text content of the file to be converted based on the file to be converted; recognizing character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically converting the character string codes into new text contents in a corresponding format; and outputting the converted new text content according to a uniform format. The invention provides an adaptive method for rapidly converting various codes, which can automatically acquire the coding mode of a page and then decode the page, thereby acquiring the desired content.

Description

Conversion processing method, device, terminal and medium based on various encoding files

Technical Field

The invention relates to the technical field of file coding, in particular to a conversion processing method and device based on various coded files, an intelligent terminal and a storage medium.

Background

Various forms of web pages, addresses, are often encountered in work. They employ different codes. When the problem of webpage code identification is to be solved due to the acquisition requirement, the opened file can be found to have a messy code. There are many ways to search out on the web, such as based on the header byte, or based on the charset identifier of the web page. However, these methods often only solve a part of the problems, and cannot achieve multiple adaptations.

Such problems are also often encountered when writing web pages, and if one forgets to explicitly assign codes to the browser in the head, one often happens to be garbled, but such garbled occurrences are not regularly traceable. Or when a downloaded file is opened, the file is opened with joy, but a piece of messy code is reflected in the eye curtain.

Many times, the coding format is often ignored. Because Windows defaults to the GB18030/BIG5 encoding, text will typically be saved as the default encoding, which further increases the probability that a user will encounter scrambling code.

Thus, there is still a need for improvement and development of the prior art.

Disclosure of Invention

The invention provides a conversion processing method, a conversion processing device, an intelligent terminal and a storage medium based on various coding files, aiming at the defects of the prior art.

The technical scheme adopted by the invention for solving the problems is as follows:

a conversion processing method based on various types of encoded files, wherein the method comprises the following steps:

acquiring a file to be converted;

extracting the text content of the file to be converted based on the file to be converted;

recognizing character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically converting the character string codes into new text contents in a corresponding format;

and outputting the converted new text content according to a uniform format.

The conversion processing method based on various types of encoded files, wherein the step of obtaining the file to be converted comprises the following steps:

a converter class dictionary library which can automatically convert various types of codes into corresponding recognizable formats is preset.

The conversion processing method based on various types of encoded files, wherein the step of acquiring the files to be converted comprises the following steps:

selecting a file to be converted according to a selection instruction;

receiving a conversion instruction through a preset automatic conversion button, and acquiring a file to be converted from the selected file; the files to be converted comprise web pages, local files, download files and/or url address types.

The conversion processing method based on various types of encoded files, wherein the step of extracting the text content of the file to be converted based on the file to be converted comprises the following steps:

extracting the text content of the file to be converted one by one based on the file to be converted;

and carrying out error correction identification on the extracted text contents, and sorting the contents subjected to error correction identification into the text contents of the file needing to be converted.

The conversion processing method based on various types of encoded files, wherein the step of identifying character string encoding for the text content of the file through a preset converter dictionary library based on the extracted text content and automatically converting the text content into new text content with a corresponding format further comprises the following steps of:

recognizing character string codes according to the extracted literal contents through an internal function of a preset converter dictionary library based on the extracted literal contents;

and coding the identified character string, and automatically converting the character string into new character content in a preset corresponding format.

The conversion processing method based on various encoding files is characterized in that the step of automatically converting the extracted text content into the new text content with the corresponding format by recognizing character string encoding of the text content of the file through a preset converter dictionary library based on the extracted text content comprises the following steps of:

acquiring coding information of a current webpage according to the extracted webpage source code, acquiring a charset through information in a meta tag, or acquiring a charset variable in a header returned by a server;

and then automatically identify new web page content that is converted to the corresponding format.

The conversion processing method based on various encoding files, wherein the step of outputting the converted new text content according to a uniform format comprises the following steps:

and outputting the converted new text content according to a preset uniform format.

A conversion processing apparatus based on various types of encoded files, wherein the apparatus comprises:

the acquisition module is used for acquiring files needing conversion;

the extraction module is used for extracting the text content of the file to be converted based on the file to be converted;

the conversion module is used for identifying character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents and automatically converting the literal contents into new literal contents in a corresponding format;

and the output control module is used for outputting the converted new text contents according to a uniform format.

An intelligent terminal comprising a memory and one or more programs, wherein the one or more programs are stored in the memory and configured to implement the steps of any of the methods when the one or more programs are executed by one or more processors.

A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of an electronic device, enable the electronic device to perform any of the methods described herein.

The invention has the beneficial effects that: the embodiment of the invention provides an adaptive method for quickly converting various codes, which is characterized in that files needing to be converted are acquired and then are clicked for automatic conversion. The method content obtains the content of the related files one by one, and then the function inside the method can identify the character string code according to the content and automatically convert the character string code into a correct format.

In the case of various codes of numerous web pages, the method can acquire the coding information of the current web page according to the web page source code, for example, the charset is acquired through the information in the meta tag, or the charset variable in the header returned by the server. And then the correct webpage content is converted through automatic identification, so that convenience is provided for the use of the user.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a conversion processing method based on various types of encoded files according to an embodiment of the present invention.

Fig. 2 is a schematic flowchart of a conversion processing method based on various types of encoded files according to a second embodiment of the present invention.

Fig. 3 is a schematic block diagram of a conversion processing apparatus based on various types of encoded files according to an embodiment of the present invention.

Fig. 4 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

Researches find that various types of web pages and addresses are encountered in daily work. They employ different codes. When the problem of webpage code identification is to be solved due to the acquisition requirement, the opened file can be found to have a messy code. There are many ways to search out on the web, such as based on the header byte, or based on the charset identifier of the web page. However, these methods often only solve a part of the problems, and cannot achieve multiple adaptations.

Such problems are also often encountered, for example, when writing web pages, and if one forgets to explicitly assign a code to the browser in the head (header of the web page document), one often happens to be a messy code, but the messy code appears irregularly. Or when a downloaded file is opened, the file is opened with joy, but a piece of messy code is reflected in the eye curtain.

In practical application, the browser has an identification function, and needs to set coding options, but the browser cannot completely identify various codes. The conversion tool on the network is used, so that the method is not completely suitable for various messy code conditions.

In order to solve the technical problem, the invention provides an adaptive method for rapidly converting various codes, which can automatically acquire the coding mode of a page and then decode the page, thereby acquiring the desired content. The invention provides an adaptive method for quickly converting various codes, which uploads files to be converted and then clicks to automatically convert. The method content obtains the content of the related files one by one, and then the function inside the method can identify the character string code according to the content and automatically convert the character string code into a correct format.

In the case of various codes of numerous web pages, the method can acquire the coding information of the current web page according to the web page source code, such as acquiring charset through the information in meta tag, or acquiring charset variable in the header returned by the server. And then converted into the correct web page content through automatic recognition. The META label is an important HTML label in HTML webpage source code. META tags are used to describe attributes of an HTML web document such as author, date and time, web page description, keywords, page refreshes, etc.

Of course, the present invention also provides automatic translation functions when some address encodings are translated. Such as: the web page of hundred degrees is coded by gb2312, and google is coded by utf8, and the url code codes converted by the two web page coding schemes are different. When the gb2312 webpage uses the urencode code, the ansi code of the character is firstly obtained and then converted into the 16-system code, while the utf8 is more complicated when the urencode code is used, and the urencode algorithm is used.

That is, the method provides various types of automatic code conversion functions such as web pages, local files, download files, url addresses and the like. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding.

Exemplary method

As shown in fig. 1, an embodiment of the present invention provides a method for converting various types of encoded files, where in the embodiment of the present invention, the method includes the following steps:

s100, acquiring a file to be converted;

in the embodiment of the invention, when various different types of coded files need to be checked, the adopted codes are different, and in the prior art, due to the acquisition requirement, when the problem of webpage code identification is to be solved, a piece of messy code of the opened file can be found, so that the messy code is easy to appear.

The embodiment of the invention can solve the problem of messy codes.

Before the invention is implemented, a converter dictionary library which can automatically convert various types of codes into corresponding recognizable formats needs to be preset.

In the embodiment of the invention, a converter class dictionary library is preset, namely a dictionary library. The converter is contained in the box body and corresponds to various types. After the client inputs a segment of characters, matching the characters in the library according to the keywords. And then enters the conversion class of the response, i.e., the conversion flow.

A translator is actually a class in code. Called converter class (converter class dictionary library), it is the responsibility to convert addresses of various formats.

The embodiment of the invention is provided with the converter dictionary library in advance, and can automatically convert various types of codes into the converter dictionary library with the corresponding recognizable format.

When various encoded files need to be converted, the files needing to be converted are firstly acquired.

When the method is used specifically, for example, a user selects a file to be converted through operation, and the file to be converted is selected according to a selection instruction; then, a user clicks a preset automatic conversion button, receives a conversion instruction through the preset automatic conversion button, and acquires a file to be converted from the selected file; the files to be converted comprise web pages, local files, download files and/or url address types.

S200, extracting the text content of the file to be converted based on the file to be converted;

in the embodiment of the invention, the text content of the file to be converted is extracted based on the file to be converted.

In other words, in the embodiment of the present invention, the text content of the file to be converted is extracted one by one based on the file to be converted; for example, when the document to be converted is a web page, the text content of the selected web page document is extracted, and in the embodiment of the present invention, preferably, the extracted text content is subjected to error correction recognition, and the content after error correction recognition is arranged into the text content of the document to be converted.

Step S300, recognizing character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents, and automatically converting the character string codes into new literal contents in a corresponding format;

in this step, based on the extracted text content, the text content of the file is identified by a character string code through a preset converter dictionary library, and is automatically converted into new text content in a corresponding format. In the embodiment of the invention, the extracted text content is input into a preset converter dictionary library to identify character string codes of the text content of the file, and the character string codes are automatically converted into new text content in a corresponding format, preferably, the invention adopts the mode of automatically and uniformly converting files in various coding formats into files in a url-encoding format; that is, the character string is encoded in a URL.

In the embodiment of the invention, the character string is coded by URL through the file converted into the URL coding format, so that the problem of Chinese messy codes in URL can be effectively solved.

Specifically, based on the extracted text content, recognizing character string codes according to the extracted text content through an internal function of a preset converter dictionary library; and coding the identified character string, and automatically converting the character string into new character content in a preset corresponding format.

For example, when the file to be converted is a webpage file, a source code of the webpage needs to be extracted, encoding information of the current webpage is acquired according to the extracted webpage source code, the charset is acquired through information in a meta tag, or the charset is acquired through a charset variable in a header returned by a server; and then automatically identify new web page content that is converted to the corresponding format.

The META tag is an important HTML tag in HTML webpage source code. META tags are used to describe attributes of an HTML web document such as author, date and time, web page description, keywords, page refreshes, etc. The charset attribute specifies the character encoding used in the external script file. The header (header) is a string sent by the server before sending the HTML data to the browser by HTTP protocol, and a line separation is needed between the header and the HTML file.

In the embodiment of the invention, the preset dictionary library (namely the converter dictionary library) is actually a dictionary library. The converter is contained in the box body and corresponds to various types. After the client inputs a segment of characters, matching the characters in the library according to the keywords. And then enters the conversion class of the response, i.e., the conversion flow.

And step S400, outputting the converted new text content according to a uniform format.

In the embodiment of the invention, the converted new text content is output according to a uniform format. For example, the new converted text content is output according to a preset uniform format. The converter (i.e. the converter class dictionary library) adopted in the embodiment of the invention is actually a class in the code. Converter class, which is used to convert various format addresses; the output is in a unified format.

Namely, in the embodiment of the invention, the file needing to be converted is uploaded, and then the automatic conversion is clicked. The contents of the related files are acquired item by item, and then the character string codes can be recognized according to the contents through a function in a preset converter class dictionary library, so that the character string codes are automatically converted into the correct format.

In the face of various codes of numerous web pages, the invention can acquire the coding information of the current web page according to the web page source code, such as acquiring the charset through the information in the meta tag or acquiring the charset variable in the header returned by the server. And then converted into the correct web page content through automatic recognition.

In further embodiments, the present invention also provides automatic translation, for example, when some address encodings are translated. Such as: the web page of hundred degrees is coded by gb2312, and google is coded by utf8, and the url code codes converted by the two web page coding schemes are different. When the gb2312 webpage uses the urencode code, the ansi code of the character is firstly obtained and then converted into the 16-system code, while the utf8 is more complicated when the urencode code is used, and the urencode algorithm is used.

The GB code is called GB2312-80 basic set of Chinese character coding character set for information exchange, published in 1980, is a national standard for Chinese information processing, and is a unique Chinese code which is forcibly used in continental lands and overseas areas (such as Singapore and the like) using simplified Chinese.

UTF-8(8 bit, Universal Character Set/Unicode Transformation Format) is a variable length Character encoding for Unicode.

The invention relates to a URL encoding, namely encoding a character string by using URL, which is an encoding mode and can effectively solve the problem of Chinese messy codes in URL.

That is, the method of the present invention provides various types of transcoding automatic conversion functions based on web pages, local files, download files, url addresses, etc. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding.

One specific application embodiment is shown in fig. 2, and the method for converting based on various types of encoded files provided in this specific application embodiment includes the following steps:

step S10, start;

step S20, obtaining the files to be converted, wherein the files to be converted include local files, web pages, URL addresses and other various types of coding files

Step S30, automatic identification; automatically identifying through a preset converter dictionary library;

step S40, automatic conversion is carried out according to the codes;

the method can acquire the coding information of the current webpage according to the webpage source code, such as acquiring the charset through the information in the meta tag or acquiring the charset variable in the header returned by the server. And then converted into the correct web page content through automatic recognition.

Step S50, outputting a correct format;

and step S60, end.

Namely, the embodiment of the invention provides various types of automatic coding conversion functions such as web pages, local files, download files, url addresses and the like. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding.

Exemplary device

As shown in fig. 3, an embodiment of the present invention provides a conversion processing apparatus based on various types of encoded files, including:

an obtaining module 310, configured to obtain a file to be converted;

an extracting module 320, configured to extract, based on the file to be converted, text content of the file to be converted;

the conversion module 330 is configured to identify character string codes for the text contents of the file through a preset converter dictionary library based on the extracted text contents, and automatically convert the character string codes into new text contents in a corresponding format;

the output control module 340 is configured to output the converted new text content according to a uniform format, which is specifically described above.

Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 4. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program is executed by a processor to realize a conversion processing method based on various types of encoded files. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the intelligent terminal is arranged inside the intelligent terminal in advance and used for detecting the operating temperature of internal equipment.

It will be understood by those skilled in the art that the block diagram shown in fig. 4 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have a different arrangement of components.

In one embodiment, an intelligent terminal is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

acquiring a file to be converted;

and outputting the converted new text content according to a uniform format.

Wherein, the step of obtaining the file to be converted comprises the following steps:

selecting a file to be converted according to a selection instruction;

Wherein, the step of extracting the text content of the file to be converted based on the file to be converted comprises the following steps:

The step of recognizing character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents, and automatically converting the character string codes into new literal contents in a corresponding format further comprises the following steps of:

The step of recognizing character string codes for the literal contents of the file through a preset converter dictionary library based on the extracted literal contents, and automatically converting the literal contents into new literal contents with corresponding formats comprises the following steps:

Wherein, the step of outputting the converted new text content according to a uniform format comprises:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

In summary, the invention discloses a conversion processing method, device, intelligent terminal and storage medium based on various encoded files, which is to acquire the files to be converted and then click to automatically convert. The method content obtains the content of the related files one by one, and then the function inside the method can identify the character string code according to the content and automatically convert the character string code into a correct format. The method of the invention provides various types of automatic code conversion functions based on web pages, local files, downloaded files, url addresses and the like. Various types of coded conversions (UTF-8, Unicode, ASCII, URL, IP address) are supported. The method is convenient for developers, saves time for searching data on the internet, is suitable for various files, can automatically identify the files, and provides reliability and correctness for decoding. The invention provides an adaptive method for rapidly converting various codes, which can automatically acquire the coding mode of a page and then decode the page, thereby acquiring the desired content.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A conversion processing method based on various types of encoded files is characterized by comprising the following steps:

acquiring a file to be converted;

and outputting the converted new text content according to a uniform format.

2. The method of claim 1, wherein the step of obtaining the files to be converted comprises:

3. The method of claim 1, wherein the step of obtaining the files to be converted comprises:

selecting a file to be converted according to a selection instruction;

4. The method according to claim 1, wherein the step of extracting the text content of the file to be converted based on the file to be converted comprises:

5. The method as claimed in claim 1, wherein the step of automatically converting the text content of the document into a new text content in a corresponding format by encoding the text content recognition character string of the document through a predetermined converter dictionary based on the extracted text content further comprises:

6. The method for converting and processing various types of encoded files according to claim 1, wherein the step of automatically converting the text content of the file into the new text content in the corresponding format by encoding the text content recognition character string of the file through a preset converter dictionary library based on the extracted text content comprises:

7. The method of claim 1, wherein the step of outputting the converted new text content in a unified format comprises:

8. A conversion processing apparatus based on various types of encoded files, the apparatus comprising:

the acquisition module is used for acquiring files needing conversion;

9. An intelligent terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to implement the steps of the method according to any one of claims 1-7 when the one or more programs are executed by one or more processors.

10. A non-transitory computer readable storage medium having instructions therein, which when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1-7.