CN111368508A - Data processing method, device, equipment and medium - Google Patents

Data processing method, device, equipment and medium Download PDF

Info

Publication number
CN111368508A
CN111368508A CN202010142908.1A CN202010142908A CN111368508A CN 111368508 A CN111368508 A CN 111368508A CN 202010142908 A CN202010142908 A CN 202010142908A CN 111368508 A CN111368508 A CN 111368508A
Authority
CN
China
Prior art keywords
preset
format
target
data
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010142908.1A
Other languages
Chinese (zh)
Other versions
CN111368508B (en
Inventor
龚炜林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202010142908.1A priority Critical patent/CN111368508B/en
Publication of CN111368508A publication Critical patent/CN111368508A/en
Application granted granted Critical
Publication of CN111368508B publication Critical patent/CN111368508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a data processing method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring a target coding format of data to be transmitted, and judging whether the target coding format is a preset first coding format or a preset second coding format, wherein the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format; if the target coding format is a preset first coding format, acquiring a format conversion rule between the preset first coding format and the preset second coding format; and converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule. The invention solves the technical problems of much resource waste and low transmission speed in the existing data transmission process.

Description

Data processing method, device, equipment and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and medium.
Background
At present, a computer usually adopts the international character coding standard Unicode to transmit data, (the Unicode fixedly uses 2 bytes to represent a text, wherein each byte of the Unicode can include 8 bits), and generally an english text does not have a length of 8 bits, so that in the process of transmitting data by adopting the international character coding standard Unicode, resource waste is easily caused, and transmission efficiency is slowed down.
Disclosure of Invention
The invention mainly aims to provide a data processing method, a data processing device, data processing equipment and a data processing medium, and aims to solve the technical problems of much resource waste and low transmission speed in the existing data transmission process.
In order to achieve the above object, an embodiment of the present invention provides a data processing method, where the data processing method includes:
acquiring a target coding format of data to be transmitted, and judging whether the target coding format is a preset first coding format or a preset second coding format, wherein the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format;
if the target coding format is a preset first coding format, acquiring a format conversion rule between the preset first coding format and the preset second coding format;
and converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule.
Optionally, the step of determining whether the target encoding format is a preset first encoding format or a preset second encoding format includes:
determining a byte corresponding relation between the preset second coding format and the preset first coding format;
determining the maximum continuous number of continuous bytes contained in the second coding unit according to the byte correspondence;
acquiring a first text of a target type in the data to be transmitted, and determining a coding unit value of the first text of the target type according to the maximum continuous number;
judging whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format according to the coding unit value and a preset mask operation rule so as to obtain a first judgment result that the first text of the target type is the preset second coding format or a second judgment result that the first text of the target type is the preset first coding format;
and acquiring all judgment results corresponding to the data to be transmitted, and judging whether the data to be transmitted is in a preset second coding format according to a first incidence relation between all the judgment results and the first judgment result and a second incidence relation between all the judgment results and the second judgment result.
Optionally, the step of determining, according to the coding unit value and a preset mask operation rule, whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format, so as to obtain a first determination result that the first text of the target type is the preset second coding format, or a second determination result that the first text of the target type is the preset first coding format includes:
taking a first byte corresponding to the first text of the target type as a target index byte, and acquiring a target mask value corresponding to the target index byte and a first mask operation result corresponding to the target index byte from a preset mask table corresponding to the preset second coding format;
presetting and operating processing in a mask operating rule according to the coding unit value and the target mask value to obtain a second coding operating result;
if the first mask operation result is consistent with the second encoding operation result, judging that the encoding format of the first text of the target type is a preset second encoding format, and taking the encoding format of the first text of the target type as the preset second encoding format as a first judgment result;
if the first mask operation result is inconsistent with the second encoding operation result, the encoding format of the first text of the target type is judged to be a preset first encoding format, and the encoding format of the first text of the target type is the preset first encoding format and is used as a second judgment result.
Optionally, the step of obtaining all judgment results corresponding to the data to be transmitted, and judging whether the data to be transmitted is in a preset second coding format according to a first association relationship between the all judgment results and the first judgment result and a second association relationship between the all judgment results and the second judgment result includes:
acquiring first texts of a plurality of other target types corresponding to the data to be transmitted, wherein the first texts of the plurality of other target types refer to first texts which are in a preset position and are in the data to be transmitted and need to be judged subsequently and are arranged in the preset position in a connecting manner after judgment of a first text of a previous target type at the preset position is completed;
obtaining a plurality of judgment results of the first texts of the other target types to obtain all judgment results corresponding to the data to be transmitted;
and if the plurality of judgment results are second judgment results that the coding formats of the corresponding first texts of other target types are the preset second coding format, determining that the target coding format of the data to be transmitted is the preset second coding format.
Optionally, the step of converting the data to be transmitted into target data encoded in the preset second encoding format according to the format conversion rule, and transmitting the target data includes:
and if a processing instruction of the target data is detected, restoring the target data into the data in the first coding format according to the byte corresponding relation.
Optionally, if a processing instruction of the target data is detected, the step of restoring the target data to the data in the first encoding format according to the byte correspondence includes:
if a processing instruction of the target data is detected, acquiring the composition of each second coding unit of the target data;
and restoring the target data into the data in the first coding format according to the composition of each second coding unit, the first corresponding relation and the second corresponding relation.
Optionally, the first encoding format is an international text coding standard Unicode encoding format, and the preset second encoding format is a variable length encoding format UTF-8.
The present invention also provides a data processing apparatus, comprising:
the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a target coding format of data to be transmitted and judging whether the target coding format is a preset first coding format or a preset second coding format, and the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format;
the second obtaining module is used for obtaining a format conversion rule between the preset first coding format and the preset second coding format if the target coding format is the preset first coding format;
and the conversion module is used for converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule.
Optionally, the first obtaining module includes:
the first determining unit is used for determining the byte corresponding relation between the preset second coding format and the preset first coding format;
a second determining unit, configured to determine, according to the byte correspondence, a maximum consecutive number of consecutive bytes included in the second coding unit, where the consecutive bytes constitute a single text;
the first obtaining unit is used for obtaining a first text of a target type in the data to be transmitted and determining a coding unit value of the first text of the target type according to the maximum continuous number;
the judging unit is used for judging whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format according to the coding unit value and a preset mask operation rule so as to obtain a first judgment result that the first text of the target type is the preset second coding format or a second judgment result that the first text of the target type is the preset first coding format;
and the second obtaining unit is used for obtaining all judgment results corresponding to the data to be transmitted, and judging whether the data to be transmitted is in a preset second coding format according to a first incidence relation between all the judgment results and the first judgment result and a second incidence relation between all the judgment results and the second judgment result.
Optionally, the determining unit includes:
a first obtaining subunit, configured to obtain, by using a first byte corresponding to the first text of the target type as a target index byte, a target mask value corresponding to the target index byte and a first mask operation result corresponding to the target index byte from a preset mask table corresponding to the preset second coding format;
the and operation processing subunit is used for performing preset and operation processing in a mask operation rule according to the coding unit value and the target mask value to obtain a second coding operation result;
the first judging subunit is configured to, if the first mask operation result is consistent with the second encoding operation result, judge that the encoding format of the first text of the target type is a preset second encoding format, and take the encoding format of the first text of the target type as the preset second encoding format as a first judgment result;
and the second judging subunit is configured to, if the first mask operation result is inconsistent with the second encoding operation result, judge that the encoding format of the first text of the target type is the preset first encoding format, and take the encoding format of the first text of the target type as the preset first encoding format as a second judgment result.
Optionally, the second obtaining unit includes:
the second obtaining subunit is configured to obtain a plurality of first texts of other target types corresponding to the data to be transmitted, where the plurality of first texts of other target types refer to first texts, which are located at preset positions and need to be subsequently determined, in the data to be transmitted, and are successively placed at the preset positions after the determination of a previous first text of a target type is completed;
the third obtaining subunit is configured to obtain multiple determination results of the first texts of multiple other target types, so as to obtain all determination results corresponding to the data to be transmitted;
and the determining subunit is configured to determine that the target coding format of the data to be transmitted is the preset second coding format if the plurality of determination results are second determination results that the coding formats of the corresponding first texts of other target types are the preset second coding format.
Optionally, the data processing apparatus further includes:
and the restoring module is used for restoring the target data into the data in the first coding format according to the byte corresponding relation if the processing instruction of the target data is detected.
Optionally, the reduction module comprises:
the detection unit is used for acquiring the composition of each second coding unit of the target data when a processing instruction of the target data is detected;
and a restoring unit, configured to restore the target data to the data in the first encoding format according to the configuration of each second encoding unit, the first corresponding relationship, and the second corresponding relationship.
Optionally, the first encoding format is an international text coding standard Unicode encoding format, and the preset second encoding format is a variable length encoding format UTF-8.
The invention also provides a medium having stored thereon a data processing program which, when executed by a processor, implements the steps of the data processing method as described above.
The method comprises the steps of obtaining a target coding format of data to be transmitted, and judging whether the target coding format is a preset first coding format or a preset second coding format, wherein the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format, and if the target coding format is the preset first coding format, obtaining a format conversion rule between the preset first coding format and the preset second coding format; and converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule. In the method, after the data to be transmitted is received, whether the target coding format of the data to be transmitted is the preset second coding format with the corresponding coding unit including less digits is judged, if the target coding format of the data to be transmitted is not the preset second coding format with the corresponding coding unit including less digits, format conversion is carried out firstly, and then transmission is carried out, so that resource waste and reduction of transmission speed are avoided.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a data processing method according to a first embodiment of the present invention;
FIG. 2 is a schematic view of a detailed flow of a step of determining whether the target encoding format is a preset first encoding format or a preset second encoding format according to a second embodiment of the data processing method of the present invention;
fig. 3 is a schematic device structure diagram of a hardware operating environment related to the method according to the embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a data processing method, in an embodiment of the data processing method, referring to fig. 1, the data processing method includes:
step S10, obtaining a target coding format of data to be transmitted, and judging whether the target coding format is a preset first coding format or a preset second coding format, wherein the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format;
step S20, if the target encoding format is a preset first encoding format, obtaining a format conversion rule between the preset first encoding format and the preset second encoding format;
step S30, according to the format conversion rule, convert the data to be transmitted into the target data encoded in the preset second encoding format.
The method comprises the following specific steps:
step S10, obtaining a target coding format of data to be transmitted, and judging whether the target coding format is a preset first coding format or a preset second coding format, wherein the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format;
it should be noted that, in this embodiment, the data processing method is applied to a data processing device, where the data processing device may be a PC computer, or a mobile terminal, and when receiving data to be transmitted, obtains a target encoding Format of the data to be transmitted, where the data to be transmitted includes data such as a file, chat content, and a data package, the target encoding Format may be a Unicode (international text encoding standard, a representation rule of a text in a computer) encoding Format, and in addition, the target encoding Format may also be a variable length encoding Format UTF-8(8-bit Unicode Transformation Format), and obtains the target encoding Format of the data to be transmitted, and determines whether the target encoding Format is a preset first encoding Format or a preset second encoding Format, where a second number of each second coding unit of the preset second encoding Format is less than a first number of each first coding unit of the first encoding Format, in this embodiment, since the second number of bits of each second coding unit of the preset second coding format is less than the first number of bits of each first coding unit of the first coding format, and since a general text does not have an excessive bit length, the preset second coding format is adopted for data transmission, so that the waste of resources can be reduced, that is, specifically, the first number of bits of each first coding unit is 16bits (binary number), and each text can be determined only by 8 bits (binary number).
It should be noted that, in this embodiment, the preset first encoding format may be an international text coding standard Unicode encoding format, and the preset second encoding format may be a UTF-8 encoding format, that is, in this embodiment, the first encoding format is a Unicode encoding format, and the preset second encoding format is a UTF-8 encoding format as an example, which is specifically described. In this case, Unicode is a fixed word represented by 2 bytes, and the 2-byte value is called codepoint (code point).
Referring to fig. 2, the step of determining whether the target encoding format is the preset first encoding format or the preset second encoding format includes:
step S11, determining a byte correspondence between the preset second encoding format and the preset first encoding format;
in this embodiment, a byte correspondence between a preset second encoding format and the first encoding format is first determined, specifically, the byte correspondence includes a first correspondence of a number of bytes and a second correspondence of a byte content, where the first correspondence of the number of bytes may be pre-stored (for example, 1-4 bytes are determined to replace 2 bytes in the Unicode encoding format in the UTF-8 encoding format), and the second correspondence of the byte content is determined according to the number of bytes. It should be noted that, a byte of the Unicode encoding format includes 8 bits, and the Unicode encoding format uses 2 bytes to represent a text, so that there are 256 × 256 — 65536 possible texts (such as 65536 english alphabet texts) in two bytes of the Unicode encoding format, because a letter can be represented by only one bit, and the Unicode encoding format uses 16bits, it is obvious that the Unicode encoding format causes vacant space and waste of resources, and if the UTF-8 encoding format is used, a byte can determine to represent a text, which obviously saves resources, and if the UTF-8 encoding format is required to fully express each data of the Unicode encoding format, this can save resourcesIn table 1, if the codepoint encoding format data is between 0 and 127, then because 2 bytes are between 2 and 127, the encoding format data needs to use 1 to 4 bytes to carry 16bits (2 bytes ═ 16bits) of 2 codepoint bytes, as shown in table 1 (the 1 st to 4 th byte xxx part stores bits of codepoint in sequence)8127, which is determined using one byte in the UTF-8 encoding format, if the codepoint range is between 128 and 2048, which is determined using two bytes in the UTF-8 encoding format, and if the codepoint range is other ranges, which is determined using more bytes in the UTF-8 encoding format.
Total number of bytes codepoint range Byte 1 2 nd byte Byte 3 Byte 4
1 U+0000 U+007F 0xxxxxxx
2 U+0080 U+07FF 110xxxxx 10xxxxxx
3 U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
4 U+10000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
TABLE 1
Step S12, determining the maximum consecutive number of consecutive bytes contained in the second coding unit according to the byte correspondence, wherein the consecutive bytes constitute a single text;
after obtaining the byte correspondence relationship, determining the maximum continuous number of continuous bytes contained in the second coding unit of the preset second coding format, where the continuous bytes refer to a plurality of non-null bytes that collectively form a single text, assuming that the chinese "country" needs to be represented by 4 bytes, the 4 bytes representing the "country" are continuous bytes, assuming that the chinese "home" needs to be represented by 3 bytes, the 3 bytes representing the "home" are continuous bytes, according to the byte correspondence relationship, determining the maximum continuous number of continuous bytes contained in the second coding unit, specifically, determining the maximum continuous number of continuous bytes contained in the second coding unit by a multiple relationship (which may also be by other means, and is not limited herein) between the number of bytes of each second coding unit and the number of bytes of each first coding unit, if it is determined that the maximum continuous number of the continuous bytes included in the coding unit of the preset second coding format is 4, for example, the UTF-8 coding format can represent any data of the Unicode coding format by continuous 4 bytes (it should be noted that, in this embodiment, only 4 bytes are used, and the UTF-8 coding format with any byte length may also be used).
Step S13, acquiring a first text of a target type in the data to be transmitted, and determining a coding unit value of the first text of the target type according to the maximum continuous number;
after the data to be transmitted is obtained, a first text (an English letter or a Chinese character) of a target type of the data to be transmitted is obtained, a coding unit value of the first text of the target type is determined according to the maximum continuous number, if the maximum continuous number is 4, a value (the value is formed by numbers on each bit of a byte) of the first 4 bytes of all bytes corresponding to the data to be transmitted is obtained, and the value of the first 4 bytes is used as a coding unit value (if the data to be transmitted corresponds to 1-3333 bytes, the bytes arranged in a range of 1-4 are selected).
Step S14, determining, according to the coding unit value and a preset mask operation rule, whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format, so as to obtain a first determination result that the first text of the target type is the preset second coding format, or a second determination result that the first text of the target type is the preset first coding format;
after obtaining the coding unit value such as 11110xxx10xxxxxx10xxxxxx10 xxxxxxxx, judging whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format according to the coding unit value 11110xxx10xxxxxx10 xxxxxxxx and a preset mask operation rule, so as to obtain a first judgment result that the first text of the target type is the preset second coding format or a second judgment result that the first text of the target type is the preset first coding format. In this embodiment, since the encoding unit value is obtained once and operated once according to the mask operation rule without performing multiple byte-by-byte processing of the conventional if-else, the determination speed of the second encoding format, such as UTF-8, in the data transmission process can be increased, and the technical problem that the determination speed of UTF-8 is too slow is solved. That is, in the prior art, it is necessary to first determine whether the first byte is correct through if-else, then determine whether the second byte is correct through if-else, and then determine the third byte, etc., so that it is obvious that the efficiency is lower.
Specifically, the step of determining whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format according to the coding unit value and a preset mask operation rule to obtain a first determination result that the first text of the target type is the preset second coding format, or a second determination result that the first text of the target type is the preset first coding format includes:
step S141, taking a first byte corresponding to the first text of the target type as a target index byte, and obtaining a pre-stored target mask value corresponding to the target index byte and a pre-stored first mask operation result corresponding to the target index byte from a pre-set mask table corresponding to the pre-set second encoding format;
the preset mask table is pre-stored with the index byte, the mask value, the mask operation result and each association relationship among the continuous numbers of the continuous bytes contained in each coding unit in the preset second coding format;
in this embodiment, for the second encoding format UTF-8, a preset mask table exists, where the preset mask table is a preset mask table that is pre-stored and has a length of 256 (the longest length of the first byte) and takes the value of the first byte as an index, and in the preset mask table, each association relationship among the index byte, the mask value, the mask operation result, and the consecutive number of consecutive bytes included in each encoding unit of the preset second encoding format is pre-stored, as shown in fig. 2, each element of the mask table includes a mask value mask, a mask operation result flag, and the total number of bytes cnt included in the UTF-8 encoding unit, and the like.
Indexing mask flag cnt Efficient UTF-8
00-7F 0x00000000 0x00000000 1
80-BF 0x00000080 0x00000000 1 ×
C0-C1 0x000000E0 0x00000000 1 ×
C2-DF 0x0000C000 0x00008000 2
E0 0x00C0E000 0x0080A000 3
E1-EC 0x00C0C000 0x00808000 3
ED 0x00C0E000 0x00808000 3
EE-EF 0x00C0C000 0x00808000 3
F0 0xC0C0F000 0x80809000 4
F1-F3 0xC0C0C000 0x80808000 4
F4 0xC0C0F000 0x80808000 4
F5-FF 0x000000F0 0x00000000 1 ×
TABLE 2
In this embodiment, a first byte, e.g., E6, corresponding to the first text of the target type is used as a target index byte, a pre-stored target mask value mask (or may be obtained through a byte correspondence relationship) corresponding to the target index byte is obtained from a preset mask table corresponding to the preset second coding format, and a pre-stored first mask operation result, e.g., 0x00808000, corresponding to the target index byte.
Step S142, presetting and operation processing in a mask operation rule are carried out according to the coding unit value and the target mask value, and a second coding operation result is obtained;
and performing preset and operation processing according to the coding unit value and the mask value to obtain a coding operation result, specifically, for the data to be transmitted, the value of n consecutive bytes (including a target index byte) is value, and if value & table [ n ]. mask ═ table [ n ]. flag, the text corresponding to the n consecutive bytes is a UTF-8 coding unit. For example: e6889199, which is 4 continuous bytes of data to be transmitted, wherein if value & table [ n ]. mask ═ table [ n ]. flag is satisfied, E68891 is a UTF-8 coding unit. Note that if table [ E6], mask ═ 0x00C0C000, and flag ═ 0x00808000, n ═ 3 can be determined.
Step S143, if the first mask operation result is consistent with the second encoding operation result, determining that the encoding format of the first text of the target type is a preset second encoding format, and taking the encoding format of the first text of the target type as the preset second encoding format as a first determination result;
step S144, if the first mask operation result is inconsistent with the second encoding operation result, determining that the encoding format of the first text of the target type is a preset first encoding format, and taking the encoding format of the first text of the target type as the preset first encoding format as a second determination result.
If the first mask operation result is consistent with the second encoding operation result, the encoding format of the first text of the target type is judged to be a preset second encoding format, the encoding format of the first text of the target type is the preset second encoding format and is used as a first judgment result, if the first mask operation result is inconsistent with the second encoding operation result, the encoding format of the first text of the target type is judged not to be the preset second encoding format but to be the preset first encoding format, and the encoding format of the first text of the target type is the preset first encoding format and is used as a second judgment result.
Step S15, acquiring all judgment results corresponding to the data to be transmitted, and judging whether the data to be transmitted is in a preset second encoding format according to a first association between the all judgment results and the first judgment result and a second association between the all judgment results and the second judgment result;
acquiring all judgment results corresponding to the data to be transmitted, according to a first association relation between all the judgment results and the first judgment result, a second incidence relation between all the judgment results and the second judgment result judges whether the data to be transmitted is in a preset second coding format, namely the data to be transmitted comprises a plurality of first texts of other target types (determined from the remaining data to be transmitted after the first texts of the target types are removed), and obtains the judgment results corresponding to the plurality of first texts of other target types to obtain all the judgment results, if all the judgment results are the first judgment results, and judging that the data to be transmitted is in a preset second coding format, and if the whole judgment result is the second judgment result, judging that the data to be transmitted is in the preset first coding format.
Step S20, if the target encoding format is a preset first encoding format, obtaining a format conversion rule between the preset first encoding format and the preset second encoding format;
if the target encoding format is a preset first encoding format, a format conversion rule (prestored as shown in table 1) between the first encoding format and the preset second encoding format is obtained, and if the target encoding format of the data to be transmitted is the preset second encoding format, the data to be transmitted can be directly transmitted.
Step S30, according to the format conversion rule, convert the data to be transmitted into the target data encoded in the preset second encoding format.
And converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule, and transmitting the target data so as to improve the transmission efficiency.
The method comprises the steps of judging whether a target coding format is a preset first coding format or a preset second coding format by acquiring the target coding format of data to be transmitted, wherein the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format; if the target coding format is a preset first coding format, acquiring a format conversion rule between the preset first coding format and the preset second coding format; and converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule. In the method, after the data to be transmitted is received, whether the target coding format of the data to be transmitted is the preset second coding format with the corresponding coding unit including less digits is judged, if the target coding format of the data to be transmitted is not the preset second coding format with the corresponding coding unit including less digits, format conversion is carried out firstly, and then transmission is carried out, so that resource waste and reduction of transmission speed are avoided.
Further, based on the foregoing embodiment, the present invention provides another embodiment of the data processing method, in which the obtaining a plurality of second determination results except for a first determination result corresponding to the to-be-transmitted data, and the determining whether the to-be-transmitted data is in a preset second encoding format according to the first determination result and the plurality of second determination results includes:
step A1, obtaining a plurality of first texts of other target types corresponding to the data to be transmitted, wherein the first texts of the other target types refer to first texts in the data to be transmitted which are successively arranged on a preset position and need to be judged subsequently after judgment of first texts of a previous target type at the preset position is completed;
in this embodiment, if the determination result of the first text of the target type is that the encoding format of the first text of the target type is a preset second encoding format, the first text of the data to be transmitted corresponding to a plurality of other target types is obtained, specifically, after the determination of the first text of the previous target type at the preset position is completed (the first text of the previous target type may be a first character to be determined), the first text of the data to be transmitted that is subsequently required to be determined and is successively placed at the preset position (the first byte of the data to be transmitted is subjected to a removal process, after the removal process, the remaining data to be transmitted is obtained, a new first byte of the remaining data to be transmitted is determined, or the first texts of the plurality of other target types are determined after the first character is determined, other first texts which become the initials again), if the judgment result of the first text of the target type is that the coding format of the first text of the target type is not the preset second coding format, determining that the coding format of the data to be transmitted is not the preset second coding format, and needing to perform coding format conversion.
Step A2, obtaining a plurality of judgment results of the first texts of the other target types to obtain all judgment results corresponding to the data to be transmitted;
step a3, if the multiple determination results are the second determination results that the coding formats of the corresponding first texts of other target types are the preset second coding format, determining that the target coding format of the data to be transmitted is the preset second coding format.
And if the judgment result shows that the coding formats of the first texts of other target types are all the preset second coding formats, determining that the target coding format of the data to be transmitted is the preset second coding format, and if the judgment result shows that the coding formats of the first texts of other target types are not all the preset second coding formats, determining that the target coding format of the data to be transmitted is not the preset second coding format.
In this embodiment, a plurality of first texts of other target types corresponding to the data to be transmitted are obtained, where the plurality of first texts of other target types refer to first texts in the data to be transmitted, which are successively placed at a preset position and need to be subsequently determined, after a determination of a previous first text of a target type at the preset position is completed; obtaining a plurality of judgment results of the first texts of the other target types to obtain all judgment results corresponding to the data to be transmitted; what is judged; and if the plurality of judgment results are second judgment results that the coding formats of the corresponding first texts of other target types are the preset second coding format, determining that the target coding format of the data to be transmitted is the preset second coding format. In this embodiment, whether the target encoding format of the data to be transmitted is the preset second encoding format is accurately determined.
Further, based on the foregoing embodiment, the present invention provides another embodiment of the data processing method, where in this embodiment, the converting, according to the format conversion rule, the to-be-transmitted data into target data encoded in the preset second encoding format, and the transmitting the target data includes:
step S40, if a processing instruction of the target data is detected, restoring the target data to the data in the first encoding format according to the byte correspondence.
In this embodiment, in order to facilitate data processing, it is often necessary to restore data in the second encoding format to data in the first encoding format, and specifically, if a processing instruction of target data is detected, the target data is restored to data in the first encoding format according to the byte correspondence.
The step of restoring the target data to the data in the first encoding format according to the target data and the byte correspondence includes:
step S41, if a processing instruction of the target data is detected, the structure of each second coding unit of the target data is obtained;
step S42, restoring the target data to the data in the first encoding format according to the configuration of each second encoding unit, the first corresponding relationship and the second corresponding relationship.
In this embodiment, when a processing instruction of target data is detected, the configuration of each second coding unit of the target data is acquired, specifically, the configuration of each second coding unit includes a byte content configuration and a byte number configuration, the target data is restored to the data in the first coding format according to the configuration of each second coding unit, the first correspondence relationship and the second correspondence relationship, first, the decoded content and the byte number are determined according to the byte content configuration and the second correspondence relationship, the target data is restored to the data in the first coding format according to the first correspondence relationship, the byte number and the decoded content, specifically, e.g., E6889199, the byte number and the decoded content in the second coding unit are determined according to whether the byte content configuration of E6889199 is 0xxxxxxx or 110xxxxx, and the like, after the number of bytes is determined, a decoding operation is performed on the bytes, specifically, the decoding operation is performed through the formula codepoint ((E6&0F) < <12) + ((88&3F) < <6) + (91&3F), in which < < is a left shift, and a decoupling operation is realized.
In this embodiment, if a processing instruction of target data is detected, the configuration of each second coding unit of the target data is acquired; and restoring the target data into the data in the first coding format according to the composition of each second coding unit, the first corresponding relation and the second corresponding relation. In this embodiment, when the operation is required, the fast decoding operation is realized, and the judgment is not required, so that the decoding efficiency is improved.
Referring to fig. 3, fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present invention.
The data processing equipment of the embodiment of the invention can be a PC, and can also be terminal equipment such as a smart phone, a tablet computer, a portable computer and the like.
As shown in fig. 3, the data processing apparatus may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Optionally, the data processing device may further include a target user interface, a network interface, a camera, RF (radio frequency) circuitry, a sensor, audio circuitry, a WiFi module, and so forth. The target user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the selectable target user interfaces may also include standard wired interfaces, wireless interfaces. The network interface optionally may include a standard wired interface, a wireless interface (e.g., WI-FI interface).
Those skilled in the art will appreciate that the data processing device architecture shown in fig. 3 does not constitute a limitation of the data processing device and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 3, a memory 1005, which is a kind of computer storage medium, may include an operating system, a network communication module, and a data processing program therein. An operating system is a program that manages and controls the hardware and software resources of the data processing device, supporting the operation of the data processing program as well as other software and/or programs. The network communication module is used to enable communication between components within the memory 1005, as well as with other hardware and software within the data processing device.
In the data processing apparatus shown in fig. 3, the processor 1001 is configured to execute a data processing program stored in the memory 1005, and implement the steps of the data processing method according to any one of the above.
The specific implementation of the data processing apparatus of the present invention is substantially the same as the embodiments of the data processing method described above, and will not be described herein again.
In addition, an embodiment of the present invention further provides a data processing apparatus, where the data processing apparatus includes:
the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining a target coding format of data to be transmitted and judging whether the target coding format is a preset first coding format or a preset second coding format, and the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format;
the second obtaining module is used for obtaining a format conversion rule between the preset first coding format and the preset second coding format if the target coding format is the preset first coding format;
and the conversion module is used for converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule.
Optionally, the first obtaining module includes:
the first determining unit is used for determining the byte corresponding relation between the preset second coding format and the preset first coding format;
a second determining unit, configured to determine, according to the byte correspondence, a maximum consecutive number of consecutive bytes included in the second coding unit, where the consecutive bytes constitute a single text;
the first obtaining unit is used for obtaining a first text of a target type in the data to be transmitted and determining a coding unit value of the first text of the target type according to the maximum continuous number;
the judging unit is used for judging whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format according to the coding unit value and a preset mask operation rule so as to obtain a first judgment result that the first text of the target type is the preset second coding format or a second judgment result that the first text of the target type is the preset first coding format;
and the second obtaining unit is used for obtaining all judgment results corresponding to the data to be transmitted, and judging whether the data to be transmitted is in a preset second coding format according to a first incidence relation between all the judgment results and the first judgment result and a second incidence relation between all the judgment results and the second judgment result.
Optionally, the determining unit includes:
a first obtaining subunit, configured to obtain, by using a first byte corresponding to the first text of the target type as a target index byte, a target mask value corresponding to the target index byte and a first mask operation result corresponding to the target index byte from a preset mask table corresponding to the preset second coding format;
the and operation processing subunit is used for performing preset and operation processing in a mask operation rule according to the coding unit value and the target mask value to obtain a second coding operation result;
the first judging subunit is configured to, if the first mask operation result is consistent with the second encoding operation result, judge that the encoding format of the first text of the target type is a preset second encoding format, and take the encoding format of the first text of the target type as the preset second encoding format as a first judgment result;
and the second judging subunit is configured to, if the first mask operation result is inconsistent with the second encoding operation result, judge that the encoding format of the first text of the target type is the preset first encoding format, and take the encoding format of the first text of the target type as the preset first encoding format as a second judgment result.
Optionally, the second obtaining unit includes:
the second obtaining subunit is configured to obtain a plurality of first texts of other target types corresponding to the data to be transmitted, where the plurality of first texts of other target types refer to first texts, which are located at preset positions and need to be subsequently determined, in the data to be transmitted, and are successively placed at the preset positions after the determination of a previous first text of a target type is completed;
the third obtaining subunit is configured to obtain multiple determination results of the first texts of multiple other target types, so as to obtain all determination results corresponding to the data to be transmitted;
and the determining subunit is configured to determine that the target coding format of the data to be transmitted is the preset second coding format if the plurality of determination results are second determination results that the coding formats of the corresponding first texts of other target types are the preset second coding format.
Optionally, the data processing apparatus further includes:
and the restoring module is used for restoring the target data into the data in the first coding format according to the byte corresponding relation if the processing instruction of the target data is detected.
Optionally, the reduction module comprises:
the detection unit is used for acquiring the composition of each second coding unit of the target data when a processing instruction of the target data is detected;
and a restoring unit, configured to restore the target data to the data in the first encoding format according to the configuration of each second encoding unit, the first corresponding relationship, and the second corresponding relationship.
Optionally, the first encoding format is an international text coding standard Unicode encoding format, and the preset second encoding format is a variable length encoding format UTF-8.
The specific implementation of the data processing apparatus is substantially the same as that of each of the embodiments of the data processing method, and is not described herein again.
Furthermore, the present invention also provides a computer medium, in which one or more programs are stored, and the one or more programs are also executable by one or more processors for implementing the steps of the embodiments of the data processing method.
The specific implementation of the device and medium (i.e., computer medium) of the present invention is basically the same as the embodiments of the data processing method described above, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A data processing method, characterized in that the data processing method comprises:
acquiring a target coding format of data to be transmitted, and judging whether the target coding format is a preset first coding format or a preset second coding format, wherein the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format;
if the target coding format is a preset first coding format, acquiring a format conversion rule between the preset first coding format and the preset second coding format;
and converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule.
2. The data processing method of claim 1, wherein the determining whether the target encoding format is a preset first encoding format or a preset second encoding format comprises:
determining a byte corresponding relation between the preset second coding format and the preset first coding format;
determining the maximum continuous number of continuous bytes contained in the second coding unit according to the byte correspondence;
acquiring a first text of a target type in the data to be transmitted, and determining a coding unit value of the first text of the target type according to the maximum continuous number;
judging whether the coding format of the first text of the target type is a preset second coding format or a preset first coding format according to the coding unit value and a preset mask operation rule so as to obtain a first judgment result that the first text of the target type is the preset second coding format or a second judgment result that the first text of the target type is the preset first coding format;
and acquiring all judgment results corresponding to the data to be transmitted, and judging whether the data to be transmitted is in a preset second coding format according to a first incidence relation between all the judgment results and the first judgment result and a second incidence relation between all the judgment results and the second judgment result.
3. The data processing method as claimed in claim 2, wherein the step of determining whether the encoding format of the first text of the target type is the predetermined second encoding format or the predetermined first encoding format according to the encoding unit value and the predetermined mask operation rule to obtain a first determination result that the first text of the target type is the predetermined second encoding format, or the second determination result that the first text of the target type is the predetermined first encoding format comprises:
taking a first byte corresponding to the first text of the target type as a target index byte, and acquiring a target mask value corresponding to the target index byte and a first mask operation result corresponding to the target index byte from a preset mask table corresponding to the preset second coding format;
presetting and operating processing in a mask operating rule according to the coding unit value and the target mask value to obtain a second coding operating result;
if the first mask operation result is consistent with the second encoding operation result, judging that the encoding format of the first text of the target type is a preset second encoding format, and taking the encoding format of the first text of the target type as the preset second encoding format as a first judgment result;
if the first mask operation result is inconsistent with the second encoding operation result, the encoding format of the first text of the target type is judged to be a preset first encoding format, and the encoding format of the first text of the target type is the preset first encoding format and is used as a second judgment result.
4. The data processing method according to claim 3, wherein the step of obtaining a plurality of second determination results corresponding to the data to be transmitted except for the first determination result, and determining whether the data to be transmitted is in a preset second encoding format according to the first determination result and the plurality of second determination results comprises:
if the first judgment result is that the coding format of the first text of the target type is a preset second coding format, acquiring a plurality of first texts of other target types corresponding to the data to be transmitted, wherein the plurality of first texts of other target types refer to first texts in the data to be transmitted which are successively arranged on a preset position and need to be judged subsequently after the judgment of the previous first text of the target type at the preset position is completed;
obtaining a plurality of judgment results of the first texts of the other target types;
and if the plurality of judgment results are second judgment results that the coding formats of the corresponding first texts of other target types are the preset second coding format, determining that the target coding format of the data to be transmitted is the preset second coding format.
5. The data processing method according to claim 3, wherein the step of converting the data to be transmitted into target data encoded in the preset second encoding format according to the format conversion rule and transmitting the target data comprises:
and if a processing instruction of the target data is detected, restoring the target data into the data in the first coding format according to the byte corresponding relation.
6. The data processing method according to claim 5, wherein the step of restoring the target data to the data of the first encoding format according to the byte correspondence if the processing instruction of the target data is detected comprises:
if a processing instruction of the target data is detected, acquiring the composition of each second coding unit of the target data;
and restoring the target data into the data in the first coding format according to the composition of each second coding unit, the first corresponding relation and the second corresponding relation.
7. The data processing method according to any of claims 1 to 6, wherein the first encoding format is the Unicode encoding format and the predetermined second encoding format is the variable length encoding format UTF-8.
8. A data processing apparatus, characterized in that the data processing apparatus comprises:
the device comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for judging whether a target coding format of data to be transmitted is a preset first coding format or a preset second coding format, and the second digit of each second coding unit in the preset second coding format is less than or equal to the first digit of each first coding unit in the first coding format;
the second obtaining module is used for obtaining a format conversion rule between the preset first coding format and the preset second coding format if the target coding format is the preset first coding format;
and the conversion module is used for converting the data to be transmitted into target data coded in the preset second coding format according to the format conversion rule.
9. A data processing apparatus, characterized in that the apparatus comprises: memory, processor and data processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the data processing method according to any one of claims 1 to 7.
10. A medium, characterized in that it has stored thereon a data processing program which, when executed by a processor, implements the steps of the data processing method according to any one of claims 1 to 7.
CN202010142908.1A 2020-03-03 2020-03-03 Data processing method, device, equipment and medium Active CN111368508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010142908.1A CN111368508B (en) 2020-03-03 2020-03-03 Data processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010142908.1A CN111368508B (en) 2020-03-03 2020-03-03 Data processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN111368508A true CN111368508A (en) 2020-07-03
CN111368508B CN111368508B (en) 2024-04-09

Family

ID=71208463

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010142908.1A Active CN111368508B (en) 2020-03-03 2020-03-03 Data processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN111368508B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685414A (en) * 2020-12-29 2021-04-20 勤智数码科技股份有限公司 Method and device for associating information resource catalog with data resource
CN113033150A (en) * 2021-03-18 2021-06-25 深圳市元征科技股份有限公司 Method and device for coding program text and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043801A (en) * 2009-10-16 2011-05-04 无锡华润上华半导体有限公司 Inter-database data interaction method and system, database of transmitter and database of receiver
CN104052577A (en) * 2014-06-23 2014-09-17 硅谷数模半导体(北京)有限公司 Signal transmission processing method and device and video data transmission method and system
CN105391514A (en) * 2014-09-05 2016-03-09 北京奇虎科技有限公司 Character coding and decoding method and device
CN105468753A (en) * 2015-11-27 2016-04-06 北京金和网络股份有限公司 Multi-coding-format data display system and method
CN106775909A (en) * 2016-11-22 2017-05-31 中国银行股份有限公司 The determination methods and device of the coded format of a kind of JAVA files and byte stream
CN107786331A (en) * 2017-09-28 2018-03-09 平安普惠企业管理有限公司 Data processing method, device, system and computer-readable recording medium
CN109542965A (en) * 2018-11-07 2019-03-29 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102043801A (en) * 2009-10-16 2011-05-04 无锡华润上华半导体有限公司 Inter-database data interaction method and system, database of transmitter and database of receiver
CN104052577A (en) * 2014-06-23 2014-09-17 硅谷数模半导体(北京)有限公司 Signal transmission processing method and device and video data transmission method and system
CN105391514A (en) * 2014-09-05 2016-03-09 北京奇虎科技有限公司 Character coding and decoding method and device
CN105468753A (en) * 2015-11-27 2016-04-06 北京金和网络股份有限公司 Multi-coding-format data display system and method
CN106775909A (en) * 2016-11-22 2017-05-31 中国银行股份有限公司 The determination methods and device of the coded format of a kind of JAVA files and byte stream
CN107786331A (en) * 2017-09-28 2018-03-09 平安普惠企业管理有限公司 Data processing method, device, system and computer-readable recording medium
CN109542965A (en) * 2018-11-07 2019-03-29 平安医疗健康管理股份有限公司 A kind of data processing method, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孟宪昊: "殊域机器人现场监视远程智能交互控制平台及关键技术" *
孟宪昊: "殊域机器人现场监视远程智能交互控制平台及关键技术", 中国优秀硕士学位论文全文数据库信息科技辑, no. 09, pages 140 - 69 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685414A (en) * 2020-12-29 2021-04-20 勤智数码科技股份有限公司 Method and device for associating information resource catalog with data resource
CN112685414B (en) * 2020-12-29 2023-04-25 勤智数码科技股份有限公司 Method and device for associating information resource catalog with data resource
CN113033150A (en) * 2021-03-18 2021-06-25 深圳市元征科技股份有限公司 Method and device for coding program text and storage medium

Also Published As

Publication number Publication date
CN111368508B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN110932822B (en) Data encoding method, data decoding method, device, equipment and storage medium
CN106407201B (en) Data processing method and device and computer readable storage medium
CN111368508B (en) Data processing method, device, equipment and medium
US11581903B2 (en) Data compression method and apparatus, computer-readable storage medium, and electronic device
CN103023511A (en) Applied compressed encoding method and device
CN112995199B (en) Data encoding and decoding method, device, transmission system, terminal equipment and storage medium
CN110888862A (en) Data storage method, data query method, data storage device, data query device, server and storage medium
CN101534124A (en) Compression algorithm for short natural language
CN1322401C (en) Communications terminal apparatus, reception apparatus, and method therefor
CN101345952B (en) Data storage and reading method, device and system for customer identity identification card
US7023365B1 (en) System and method for compression of words and phrases in text based on language features
CN112016270B (en) Logistics information coding method, device and equipment of Chinese-character codes
CN114070470A (en) Encoding and decoding method and device
CN110287147B (en) Character string sorting method and device
CN114626338B (en) Method, system, equipment and storage medium for encoding and decoding characters of data
CN111629020A (en) Remote input method, device, PC (personal computer) terminal, android device and system
CN1310561A (en) Character display technique
CN100440778C (en) Device and method for recognizing quick response codes run on mobile terminals
CN114500670A (en) Encoding compression method, decoding method and device
EP2113845A1 (en) Character conversion method and apparatus
CN114513209A (en) Data compression method, device, equipment and storage medium
CN106502971B (en) Input information processing method and device and mobile terminal
CN111625372A (en) Text pasting method, device, PC (personal computer) terminal, mobile terminal and system
CN111178008A (en) Digital character-oriented data encoding method, digital character-oriented data analyzing method and digital character-oriented data encoding system
US8090362B2 (en) Mobile electronic device and method for displaying characters on a bluetooth device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant