CN108197087B - Character code recognition method and device - Google Patents

Character code recognition method and device Download PDF

Info

Publication number
CN108197087B
CN108197087B CN201810050150.1A CN201810050150A CN108197087B CN 108197087 B CN108197087 B CN 108197087B CN 201810050150 A CN201810050150 A CN 201810050150A CN 108197087 B CN108197087 B CN 108197087B
Authority
CN
China
Prior art keywords
text
coding mode
recognized
probability value
conforming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810050150.1A
Other languages
Chinese (zh)
Other versions
CN108197087A (en
Inventor
王占一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN201810050150.1A priority Critical patent/CN108197087B/en
Publication of CN108197087A publication Critical patent/CN108197087A/en
Application granted granted Critical
Publication of CN108197087B publication Critical patent/CN108197087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Abstract

The invention provides a character code recognition method and a device, wherein the method comprises the following steps: acquiring a text to be identified; acquiring a coding mode which accords with the text to be recognized according to the text to be recognized and a preset coding mode recognition model; and decoding the file to be identified according to the obtained coding mode to obtain a decoding result. The embodiment of the invention provides a character code identification method and device, which are used for acquiring a text to be identified, obtaining a coincidence probability value of the text to be identified corresponding to each preset coding mode according to the text to be identified and a coding mode identification model, determining the coding mode conforming to the text to be identified from the coincidence probability values, and then decoding to obtain a decoding result, so that the purpose that characteristic sequences required by manually setting the coding mode and matching the coding mode are not needed is achieved, the workload is reduced, and the flexibility is strong.

Description

Character code recognition method and device
Technical Field
The embodiment of the invention relates to the technical field of information processing, in particular to a character code identification method and device.
Background
In the field of computer information technology, character encoding is a basic technology. Character encoding, also known as word set coding, is the encoding of characters in a character set into an object in a specified set for the storage of text in a computer and the transmission of text over a communications network. The information stored in the computer is represented by binary numbers, and in order to be understood by a user, the information must be converted by character encoding according to a certain character set. Common encoding modes mainly include UTF-8, GB2312, GBK, BIG5 and the like. Generally, different languages have their corresponding applicable codes, such as ISO-8859-1, which is mainly used to represent Latin characters, GBK, GB2312, which is commonly used in simplified Chinese, and BIG5, which is commonly used in traditional Chinese.
When a computer stores and displays information, the correct coding mode sometimes cannot be obtained due to the fact that the information is missing or modified, and therefore normal application cannot be achieved. Therefore, a method and system for recognizing character codes are very important. There are three common identification methods: (1) each code has its own usage range, determined by the code range, but this approach will fail when there are a large number of code coincidence points. (2) Using feature matching, current information is matched with keywords in the dictionary or manually defined features, which can be determined once matching is successful. But cannot be determined if the match is unsuccessful. (3) The character distribution method is characterized in that a probability model of characters is established in advance, and the attribution condition is judged by calculating the probability of the current character distribution according to the model. This method has limited effect on coded information that has a short space and a habit of using a specific word.
Disclosure of Invention
The embodiment of the invention provides a character code identification method and device, which are used for solving the problems that in the prior art, a coding mode depends on manual setting and the flexibility is poor.
In a first aspect, an embodiment of the present invention provides a character encoding and recognition method, including:
acquiring a text to be identified;
acquiring a coding mode which accords with the text to be recognized according to the text to be recognized and a preset coding mode recognition model;
and decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
Optionally, the obtaining, according to the text to be recognized and a preset coding mode recognition model, a coding mode that conforms to the text to be recognized includes:
sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and determining the coding mode conforming to the text to be recognized according to the conforming probability value.
Optionally, the obtaining, according to the text to be recognized and a preset coding mode recognition model, a coding mode that conforms to the text to be recognized includes:
selecting a plurality of text segments from the text to be recognized;
sending each text segment to the coding mode identification model for calculation to obtain a coincidence probability value of each text segment corresponding to each preset coding mode, and determining the coding mode conforming to each text segment according to the coincidence probability value;
and determining the coding mode of the text to be recognized according to the coding mode of each text segment.
Optionally, determining, according to the coincidence probability value, a coding mode that coincides with the text to be recognized includes: selecting a maximum probability value according to the coincidence probability values; and taking the coding mode corresponding to the maximum probability value as the coding mode conforming to the text to be recognized.
In a second aspect, an embodiment of the present invention provides a character encoding and recognizing apparatus, including:
the acquisition module is used for acquiring a text to be recognized;
the processing module is used for acquiring a coding mode conforming to the text to be recognized according to the text to be recognized and a preset coding mode recognition model;
and the decoding module is used for decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
Optionally, the processing module is specifically configured to:
sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and determining the coding mode conforming to the text to be recognized according to the conforming probability value.
Optionally, the processing module is specifically configured to:
selecting a plurality of text segments from the text to be recognized;
sending each text segment to the coding mode identification model for calculation to obtain a coincidence probability value of each text segment corresponding to each preset coding mode, and determining the coding mode conforming to each text segment according to the coincidence probability value;
and determining the coding mode of the text to be recognized according to the coding mode of each text segment.
Optionally, the processing module comprises a computing unit and a determining unit, wherein:
the calculation unit is used for sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and the determining unit is used for selecting a maximum probability value according to the coincidence probability values and taking the coding mode corresponding to the maximum probability value as the coding mode conforming to the text to be recognized.
In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor, a memory, a bus, and a computer program stored on the memory and executable on the processor;
the processor and the memory complete mutual communication through the bus;
the processor, when executing the computer program, implements the method as described above.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium having a computer program stored thereon, which when executed by a processor implements the method as described above.
According to the technical scheme, the character code identification method and the character code identification device are provided, the obtained text to be identified is subjected to coincidence probability values of the text to be identified corresponding to the preset coding modes according to the text to be identified and the coding mode identification model, the coding mode conforming to the text to be identified is determined from the coincidence probability values, and then decoding is performed to obtain a decoding result, so that the characteristic sequences required by the coding mode and the matched coding mode are not required to be manually set, the workload is reduced, and the flexibility is high.
Drawings
Fig. 1 is a schematic flow chart of a character encoding and recognizing method according to an embodiment of the present invention;
FIG. 2 is a diagram of a learning framework according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a character encoding and recognizing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Fig. 1 shows that an embodiment of the present invention provides a character encoding and recognizing method, including:
s11, acquiring a text to be recognized;
s12, acquiring a coding mode conforming to the text to be recognized according to the text to be recognized and a preset coding mode recognition model;
s13, decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
It should be noted that, in the embodiment of the present invention, after the data is encoded by using a certain encoding method, a certain sequence text is generated in the steps S11 to S13.
For example, "rapid development of computer technology" is encoded in UTF-8, expressed as 16-ary: e8aea1e7ae97e69cbae68a80e69cafe5bfabe9809fe58f91e5b 195; coded in GBK, expressed in 16-ary: bcc6cbe3 bbfabbccaf 5bfeccbd9b7a2d5b 9. Here the sequence length is limited to no more than L characters (L can be flexibly set, e.g. 128).
In the embodiment of the present invention, it should be further noted that the coding mode identification model may be obtained through deep learning training, and specifically may be:
and performing deep learning repeated iteration on hundreds of thousands of even hundreds of thousands of sequence data until the training error and the truth rate reach an acceptable level. The model may use a deep learning structure such as LSTM (temporal recursive Neural network), Text-CNN (temporal Neural Networks for Session Classification), etc.
Fig. 2 is a diagram of a learning structure framework according to an embodiment of the present invention.
(1) Starting from the input layer of input _1, the embedded layer (also called the presentation layer) of embedding _1 is connected, and the parameter values of the embedded layer are obtained by model automatic learning.
After reading in the sequence, each 16-ary code is first converted to an index number of a positive integer for ease of calculation. A mapping table is established, as shown in the following table:
reservation a b c d e f 0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
As abc123 translates into: 1,2,3,8,9, 10. These sequences of index numbers, as input layer data, may be received by an embedding layer in the model. The part with length less than L is supplemented with 0.
After receiving the sequence of index numbers, the embedding layer converts the sequence into a matrix form capable of performing operations such as convolution and the like, namely, initializing each index number of the sequence into a vector. Common conversion methods include a random method, a one-hot method (one-hot method), a word embedding method based on word2vec, and the like, and the one-hot method is taken as an example here. The basic idea is that only one bit in a vector corresponding to a character is 1, and the others are all 0. For example abc123 translates to:
Figure BDA0001552125760000051
(2) the embedded layer is followed by 3 one-dimensional convolutional layers of convolutional kernels of different sizes, conv1d _1, conv1d _2 and conv1d _3, the three convolutional layers being in parallel relation. And (4) automatically learning the convolutional layer parameters and the model.
(3) The above 3 results are aggregated together, i.e. aggregation layer concatenate _ 1.
(4) After being processed by the tiling layer flatten _1, the hierarchical structure is connected to a plurality of nodes representing various encoding modes by a constraint layer dropout _1 and a fully connected layer dense _ 1. Through multiple iterations, the output loss function value (i.e., the difference metric between the predicted value and the true value) gradually decreases until an acceptable minimum value is reached. Meanwhile, the model effect can be checked by the accuracy of the verification set.
And when the model achieves a satisfactory effect, the model structure and the weight value are stored for the system to use.
The method is a mature technology for obtaining the coding mode identification model by adopting deep learning.
In the embodiment of the present invention, the system obtains, according to the text to be recognized and the preset coding mode recognition model, a coding mode conforming to the text to be recognized, which specifically includes:
11) sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
12) and determining the coding mode conforming to the text to be recognized according to the conforming probability value.
For step 11) and step 12), it should be noted that, the text to be recognized is sent to the coding mode recognition model, the processing mode is the same as the deep learning training process, and the text to be recognized is processed into the sequence of the index number, and if the sequence of the text to be recognized before conversion is c4a7 masked 7d0 …, the text to be recognized can be converted into: 3, 11, 1, 14, 3, 1, 4, 5, 3, 1, 4, 14, 4, 7 … …. And then, according to the stored weight values, the weight values are used as parameters of the embedded layer and the convolutional layer, and further, probability values of the text to be recognized in each coding mode are obtained through calculation. And selecting a maximum probability value according to the coincidence probability values, and taking the coding mode corresponding to the maximum probability value as the coding mode conforming to the text to be recognized.
Such as UTF-8:0.01, GBK: 0.98, Latin1:0.01, and GBK is taken as a predictive coding mode because 0.98 is the largest.
And decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
The embodiment of the invention provides a character code identification method, which is characterized in that according to the obtained text to be identified, the coincidence probability values of the text to be identified, which correspond to the preset coding modes, are obtained according to the text to be identified and the coding mode identification model, the coding modes conforming to the text to be identified are determined from the coincidence probability values, and then decoding is carried out to obtain a decoding result, so that the characteristic sequences required by the coding modes and the matched coding modes are not required to be manually set, the workload is reduced, and the flexibility is strong.
An embodiment of the present invention provides a character code recognition method, including:
s21, acquiring a text to be recognized;
s22, acquiring a coding mode conforming to the text to be recognized according to the text to be recognized and a preset coding mode recognition model;
s23, decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
It should be noted that, in the embodiment of the present invention, after the data is encoded by using a certain encoding method, a certain sequence text is generated in the steps S21 to S23.
The system obtains the coding mode according with the text to be recognized and the preset coding mode recognition model, and specifically may include:
21) selecting a plurality of text segments from the text to be recognized;
22) sending each text segment to the coding mode identification model for calculation to obtain a coincidence probability value of each text segment corresponding to each preset coding mode, and determining the coding mode conforming to each text segment according to the coincidence probability value;
23) and determining the coding mode of the text to be recognized according to the coding mode of each text segment.
With respect to step 11) and step 12), it should be noted that, a text to be recognized is sent to the coding mode recognition model, and a plurality of text segments are selected from the text to be recognized. The processing mode of each text segment is the same as the training deep learning process, and the text segment is also processed into a sequence of index numbers, and if the sequence of the text to be recognized before conversion is c4a7 cached 7d0 …, the text to be recognized can be converted into the following text sequence according to a mapping table: 3, 11, 1, 14, 3, 1, 4, 5, 3, 1, 4, 14, 4, 7 … ….
And then, according to the stored weight values, the weight values are used as parameters of the embedded layer and the convolution layer, and the coincidence probability values of the text segments in each coding mode are obtained through calculation. And selecting a maximum probability value according to the coincidence probability values, and taking the coding mode corresponding to the maximum probability value as the coding mode conforming to each text segment. And then taking the most appeared coding mode as the coding mode of the text to be recognized.
And decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
The embodiment of the invention provides a character code identification method, which comprises the steps of selecting a plurality of text segments from an acquired text to be identified, obtaining a coincidence probability value of each text segment corresponding to each preset coding mode according to each text segment and a coding mode identification model, determining the coding mode conforming to each text segment from the coincidence probability values, then determining the coding mode of the text to be identified, and decoding to obtain a decoding result, so that the characteristic sequence required by manually setting the coding mode and matching the coding mode is not needed, the workload is reduced, and the flexibility is strong.
Fig. 3 shows a character code recognition apparatus provided in an embodiment of the present invention, which includes an obtaining module 31, a processing module 32, and a decoding module 33, where:
the acquiring module 31 is used for acquiring a text to be recognized;
the processing module 32 is configured to obtain a coding mode according with the text to be recognized according to the text to be recognized and a preset coding mode recognition model;
and the decoding module 33 is configured to decode the file to be identified according to the obtained encoding mode, so as to obtain a decoding result.
The processing module is specifically configured to:
sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and determining the coding mode conforming to the text to be recognized according to the conforming probability value.
The processing module comprises a calculation unit and a determination unit, wherein:
the calculation unit is used for sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and the determining unit is used for selecting a maximum probability value according to the coincidence probability values and taking the coding mode corresponding to the maximum probability value as the coding mode conforming to the text to be recognized.
Since the principle of the apparatus according to the embodiment of the present invention is the same as that of the method according to the above embodiment, further details are not described herein for further explanation.
It should be noted that, in the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
The embodiment of the invention provides a character code recognition device, which is characterized in that according to the obtained text to be recognized, the coincidence probability values of the text to be recognized corresponding to each preset coding mode are obtained according to the text to be recognized and the coding mode recognition model, the coding mode conforming to the text to be recognized is determined from the coincidence probability values, and then decoding is carried out to obtain a decoding result, so that the characteristic sequences required by manually setting the coding mode and matching the coding mode are not needed, the workload is reduced, and the flexibility is strong.
An embodiment of the present invention provides a character code recognition apparatus, including an obtaining module, a processing module, and a decoding module, wherein:
the acquisition module is used for acquiring a text to be recognized;
the processing module is used for acquiring a coding mode conforming to the text to be recognized according to the text to be recognized and a preset coding mode recognition model;
and the decoding module is used for decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
The processing module is specifically configured to:
selecting a plurality of text segments from the text to be recognized;
sending each text segment to the coding mode identification model for calculation to obtain a coincidence probability value of each text segment corresponding to each preset coding mode, and determining the coding mode conforming to each text segment according to the coincidence probability value;
and determining the coding mode of the text to be recognized according to the coding mode of each text segment.
Since the principle of the apparatus according to the embodiment of the present invention is the same as that of the method according to the above embodiment, further details are not described herein for further explanation.
It should be noted that, in the embodiment of the present invention, the relevant functional module may be implemented by a hardware processor (hardware processor).
The embodiment of the invention provides a character code recognition device, which selects a plurality of text segments for an acquired text to be recognized, obtains a coincidence probability value of each text segment corresponding to each preset coding mode according to each text segment and a coding mode recognition model, determines the coding mode conforming to each text segment from the coincidence probability values, then determines the coding mode of the text to be recognized, and decodes the coding mode to obtain a decoding result, so that the characteristic sequence required by manually setting and matching the coding mode is not needed, the workload is reduced, and the flexibility is strong.
Fig. 4 shows an electronic device provided in an embodiment of the present invention, including: a processor 401, a memory 402, a bus 403, and computer programs stored on the memory and executable on the processor;
the processor and the memory complete mutual communication through the bus;
the processor, when executing the computer program, implements a method as described above, for example comprising: acquiring a text to be identified; acquiring a coding mode which accords with the text to be recognized according to the text to be recognized and a preset coding mode recognition model; and decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
An embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, and when executed by a processor, the computer program implements the method as described above, for example, including: acquiring a text to be identified; acquiring a coding mode which accords with the text to be recognized according to the text to be recognized and a preset coding mode recognition model; and decoding the file to be identified according to the obtained coding mode to obtain a decoding result.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
Those of ordinary skill in the art will understand that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions and scope of the present invention as defined in the appended claims.

Claims (8)

1. A character code recognition method, comprising:
acquiring a text to be identified;
obtaining a coding mode which accords with the text to be recognized according to the text to be recognized and a preset coding mode recognition model, wherein the coding mode recognition model is obtained through deep learning training;
decoding the file to be identified according to the obtained coding mode to obtain a decoding result;
the obtaining of the coding mode conforming to the text to be recognized according to the text to be recognized and a preset coding mode recognition model comprises the following steps:
selecting a plurality of text segments from the text to be recognized;
sending each text segment to the coding mode identification model for calculation to obtain a coincidence probability value of each text segment corresponding to each preset coding mode, and determining the coding mode conforming to each text segment according to the coincidence probability value;
determining the coding mode of the text to be recognized according to the occurrence frequency of the coding mode of each text segment;
and the coding mode identification model is connected to a plurality of nodes of each coding mode through full-connection layers to obtain the coincidence probability value of each coding mode.
2. The method according to claim 1, wherein the obtaining of the coding mode conforming to the text to be recognized according to the text to be recognized and a preset coding mode recognition model comprises:
sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and determining the coding mode conforming to the text to be recognized according to the conforming probability value.
3. The method of claim 2, wherein determining the encoding mode corresponding to the text to be recognized according to the corresponding probability value comprises: selecting a maximum probability value according to the coincidence probability values; and taking the coding mode corresponding to the maximum probability value as the coding mode conforming to the text to be recognized.
4. A character code recognition apparatus, comprising:
the acquisition module is used for acquiring a text to be recognized;
the processing module is used for acquiring a coding mode conforming to the text to be recognized according to the text to be recognized and a preset coding mode recognition model, and the coding mode recognition model is obtained through deep learning training;
the decoding module is used for decoding the file to be identified according to the obtained coding mode to obtain a decoding result;
wherein the processing module is specifically configured to:
selecting a plurality of text segments from the text to be recognized;
sending each text segment to the coding mode identification model for calculation to obtain a coincidence probability value of each text segment corresponding to each preset coding mode, and determining the coding mode conforming to each text segment according to the coincidence probability value;
determining the coding mode of the text to be recognized according to the occurrence frequency of the coding mode of each text segment;
and the coding mode identification model is connected to a plurality of nodes of each coding mode through full-connection layers to obtain the coincidence probability value of each coding mode.
5. The apparatus of claim 4, wherein the processing module is specifically configured to:
sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and determining the coding mode conforming to the text to be recognized according to the conforming probability value.
6. The apparatus of claim 5, wherein the processing module comprises a computing unit and a determining unit, wherein:
the calculation unit is used for sending the text to be recognized to the coding mode recognition model for calculation to obtain the coincidence probability values of the text to be recognized corresponding to the preset coding modes;
and the determining unit is used for selecting a maximum probability value according to the coincidence probability values and taking the coding mode corresponding to the maximum probability value as the coding mode conforming to the text to be recognized.
7. An electronic device, comprising: a processor, a memory, a bus, and a computer program stored on the memory and executable on the processor;
the processor and the memory complete mutual communication through the bus;
the processor, when executing the computer program, implements the method of any of claims 1-3.
8. A non-transitory computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements the method of any one of claims 1-3.
CN201810050150.1A 2018-01-18 2018-01-18 Character code recognition method and device Active CN108197087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810050150.1A CN108197087B (en) 2018-01-18 2018-01-18 Character code recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810050150.1A CN108197087B (en) 2018-01-18 2018-01-18 Character code recognition method and device

Publications (2)

Publication Number Publication Date
CN108197087A CN108197087A (en) 2018-06-22
CN108197087B true CN108197087B (en) 2021-11-16

Family

ID=62589725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810050150.1A Active CN108197087B (en) 2018-01-18 2018-01-18 Character code recognition method and device

Country Status (1)

Country Link
CN (1) CN108197087B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064733A (en) * 2018-09-30 2018-12-21 珠海全志科技股份有限公司 Adaptive infrared signal coding/decoding method, computer installation and its control device
CN111681670B (en) * 2019-02-25 2023-05-12 北京嘀嘀无限科技发展有限公司 Information identification method, device, electronic equipment and storage medium
US11139827B2 (en) 2019-03-15 2021-10-05 Samsung Electronics Co., Ltd. Conditional transcoding for encoded data
TWI825305B (en) * 2019-04-16 2023-12-11 南韓商三星電子股份有限公司 Transcoder and method and article for transcoding
CN113064863B (en) * 2019-04-19 2022-06-07 福建天晴数码有限公司 Method for automatically recognizing file code and computer readable storage medium
CN110113327A (en) * 2019-04-26 2019-08-09 北京奇安信科技有限公司 A kind of method and device detecting DGA domain name
CN110135566A (en) * 2019-05-21 2019-08-16 四川长虹电器股份有限公司 Registration user name detection method based on bis- Classification Neural model of LSTM
CN111428484B (en) * 2020-04-14 2022-02-18 广州云从鼎望科技有限公司 Information management method, system, device and medium
CN113627173A (en) * 2021-08-16 2021-11-09 深圳市云采网络科技有限公司 Manufacturer name identification method and device, electronic equipment and readable medium
CN113807807A (en) * 2021-08-16 2021-12-17 深圳市云采网络科技有限公司 Component parameter identification method and device, electronic equipment and readable medium
CN117391070B (en) * 2023-12-08 2024-03-22 和元达信息科技有限公司 Method and system for adjusting random character

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104360988A (en) * 2014-10-17 2015-02-18 北京锐安科技有限公司 Method and device for identifying coding mode of Chinese characters
CN104750666A (en) * 2015-03-12 2015-07-01 明博教育科技有限公司 Text character encoding mode identification method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078191A1 (en) * 2002-10-22 2004-04-22 Nokia Corporation Scalable neural network-based language identification from written text
CN106354701B (en) * 2016-08-30 2019-06-21 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN107480723B (en) * 2017-08-22 2019-11-08 武汉大学 Texture Recognition based on partial binary threshold learning network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104360988A (en) * 2014-10-17 2015-02-18 北京锐安科技有限公司 Method and device for identifying coding mode of Chinese characters
CN104750666A (en) * 2015-03-12 2015-07-01 明博教育科技有限公司 Text character encoding mode identification method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于N-gram模型的高速汉字编码识别系统;李继峰等;《计算机工程与应用》;20040331;说明书第12段 *

Also Published As

Publication number Publication date
CN108197087A (en) 2018-06-22

Similar Documents

Publication Publication Date Title
CN108197087B (en) Character code recognition method and device
JP6594988B2 (en) Method and apparatus for processing address text
CN109450596B (en) Encoding method, decoding method, encoding device, decoding device, storage medium, and terminal
CN109697451B (en) Similar image clustering method and device, storage medium and electronic equipment
EP3244540A1 (en) Data processing method and device
CN108108436B (en) Data storage method and device, storage medium and electronic equipment
CN112579462B (en) Test case acquisition method, system, equipment and computer readable storage medium
CN112632912A (en) Text error correction method, device and equipment and readable storage medium
CN116978011B (en) Image semantic communication method and system for intelligent target recognition
CN116579618B (en) Data processing method, device, equipment and storage medium based on risk management
CN115496970A (en) Training method of image task model, image recognition method and related device
CN108596001B (en) Two-dimensional code error correction decoding method and device, electronic equipment and computer readable medium
CN110276811B (en) Image conversion method and device, electronic equipment and readable storage medium
CN104065460A (en) Encoding method and device based on binary tree
CN112995199B (en) Data encoding and decoding method, device, transmission system, terminal equipment and storage medium
CN111046631A (en) Name storage method and device based on character conversion and computer equipment
CN111126420A (en) Method and device for establishing recognition model
CN114818695A (en) Text style migration method, device, equipment and storage medium
CN109802690B (en) Decoding method, device and computer readable storage medium
CN108039935B (en) Channel coding identification method based on maximum likelihood decoding
CN113761845A (en) Text generation method and device, storage medium and electronic equipment
CN112749532A (en) Address text processing method, device and equipment
CN111859917A (en) Topic model construction method and device and computer readable storage medium
US20170117918A1 (en) Method and Apparatus for Calculating Estimated Data Compression Ratio
CN105634668B (en) A kind of empty inspection screening method and device of DCI0 signal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant after: Qianxin Technology Group Co.,Ltd.

Address before: 100015 15, 17 floor 1701-26, 3 building, 10 Jiuxianqiao Road, Chaoyang District, Beijing.

Applicant before: BEIJING QIANXIN TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant