CN108197087A - Character code recognition methods and device - Google Patents

Character code recognition methods and device Download PDF

Info

Publication number
CN108197087A
CN108197087A CN201810050150.1A CN201810050150A CN108197087A CN 108197087 A CN108197087 A CN 108197087A CN 201810050150 A CN201810050150 A CN 201810050150A CN 108197087 A CN108197087 A CN 108197087A
Authority
CN
China
Prior art keywords
text
coding mode
identified
probability value
identification model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810050150.1A
Other languages
Chinese (zh)
Other versions
CN108197087B (en
Inventor
王占
王占一
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qianxin Technology Co Ltd
Original Assignee
Beijing Qianxin Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qianxin Technology Co Ltd filed Critical Beijing Qianxin Technology Co Ltd
Priority to CN201810050150.1A priority Critical patent/CN108197087B/en
Publication of CN108197087A publication Critical patent/CN108197087A/en
Application granted granted Critical
Publication of CN108197087B publication Critical patent/CN108197087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Abstract

The present invention provides a kind of character code recognition methods and device, this method include:Obtain text to be identified;Meet the coding mode of the text to be identified according to the text to be identified and the acquisition of preset coding mode identification model;The file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.The embodiment of the present invention provides a kind of character code recognition methods and device, pass through the text to be identified to getting, text to be identified is obtained according to text to be identified and coding mode identification model and meets probability value corresponding to preset each coding mode, from meeting coding mode that text to be identified is determined for compliance in probability value, then it is decoded acquisition decoding result, so as to reach need not be manually set coding mode and match coding mode needed for characteristic sequence, reduce workload, flexibility is strong.

Description

Character code recognition methods and device
Technical field
The present embodiments relate to technical field of information processing more particularly to a kind of character code recognition methods and devices.
Background technology
In computer information technology field, character code is a basic fundamental.Character code is also referred to as encoding, is word The character code that symbol is concentrated is certain an object in specified set, the biography for storing in a computer so as to text and passing through communication network It passs.The information stored in computer is all to use binary number representation, and to make user readable, it is necessary to according to a certain character Collection is converted by way of character code.Common coding mode mainly has UTF-8, GB2312, GBK, BIG5 etc..It is logical Often, different language has its corresponding applicable coding, and if ISO-8859-1 is mainly used for representing Latin character, GBK, GB2312 are normal For simplified form of Chinese Character, and BIG5 is usually used in Chinese-traditional.
When computer stores and shows information, correct coding staff can not be obtained when due to loss of learning or being modified with Formula leads to not normal use.Therefore, identify that the method and system of character code is extremely important.Common recognition methods has three Kind:(1) determined according to coding range, each coding has a use scope of oneself, but when there is a large amount of coding coincidence point this Kind method will fail.(2) it using characteristic matching, goes to match current letter with the keyword in dictionary or the feature of Manual definition Breath, once successful match can determine.But if matching is unsuccessful, can not determine.(3) character distribution establishes character in advance Probabilistic model, the probability that current character is distributed is calculated according to model and judges ownership situation.This method is for there is specific word The too short coding information effect of language use habit, length is limited.
Invention content
The embodiment of the present invention provides a kind of character code recognition methods and device, for solving coding mode in the prior art The problem of artificial setting of dependence, flexibility is poor.
In a first aspect, the embodiment of the present invention provides a kind of character code recognition methods, including:
Obtain text to be identified;
Meet the volume of the text to be identified according to the text to be identified and the acquisition of preset coding mode identification model Code mode;
The file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
Optionally, it is described to wait to know according to meeting the text to be identified and the acquisition of preset coding mode identification model The coding mode of other text, including:
The text to be identified is sent in the coding mode identification model calculate and obtains the text to be identified This meets probability value corresponding to preset each coding mode;
According to the coding mode for meeting probability value and being determined for compliance with the text to be identified.
Optionally, it is described to wait to know according to meeting the text to be identified and the acquisition of preset coding mode identification model The coding mode of other text, including:
Multiple text chunks are chosen from the text to be identified;
Each text chunk is sent in the coding mode identification model calculate and obtains each text chunk and corresponds to Preset each coding mode meets probability value, according to the coding staff for meeting probability value and being determined for compliance with each text chunk Formula;
The coding mode of the text to be identified is determined according to the coding mode of each text chunk.
Optionally, according to the coding mode for meeting probability value and being determined for compliance with the text to be identified, including:According to institute It states to meet and most probable value is chosen in probability value;The corresponding coding mode of the most probable value is described to be identified as meeting The coding mode of text.
Second aspect, the embodiment of the present invention provide a kind of character code identification device, including:
Acquisition module, for obtaining text to be identified;
Processing module, for according to the text to be identified and preset coding mode identification model acquisition meet described in treat Identify the coding mode of text;
Decoder module is decoded the file to be identified for the coding mode according to acquisition, is decoded As a result.
Optionally, the processing module is specifically used for:
The text to be identified is sent in the coding mode identification model calculate and obtains the text to be identified This meets probability value corresponding to preset each coding mode;
According to the coding mode for meeting probability value and being determined for compliance with the text to be identified.
Optionally, the processing module is specifically used for:
Multiple text chunks are chosen from the text to be identified;
Each text chunk is sent in the coding mode identification model calculate and obtains each text chunk and corresponds to Preset each coding mode meets probability value, according to the coding staff for meeting probability value and being determined for compliance with each text chunk Formula;
The coding mode of the text to be identified is determined according to the coding mode of each text chunk.
Optionally, the processing module includes computing unit and determination unit, wherein:
Computing unit carries out calculating acquisition for the text to be identified to be sent in the coding mode identification model The text to be identified meets probability value corresponding to preset each coding mode;
Determination unit chooses most probable value for meeting according to, the most probable value is corresponded in probability value Coding mode as the coding mode for meeting the text to be identified.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, which is characterized in that including:Processor, memory, Bus and storage are on a memory and the computer program that can run on a processor;
Wherein, the processor, memory complete mutual communication by the bus;
The processor realizes method as described above when performing the computer program.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium storing program for executing, the non-transient calculating Computer program is stored on machine readable storage medium storing program for executing, which realizes method as described above when being executed by processor.
As shown from the above technical solution, the embodiment of the present invention provides a kind of character code recognition methods and device, by right The text to be identified got obtains text to be identified corresponding to preset according to text to be identified and coding mode identification model Each coding mode meets probability value, from coding mode that text to be identified is determined for compliance in probability value is met, then carries out Decoding obtain decoding result, so as to reach need not be manually set coding mode and match coding mode needed for characteristic sequence, subtract Workload is lacked, flexibility is strong.
Description of the drawings
Fig. 1 is the flow diagram of character code recognition methods that one embodiment of the invention provides;
Fig. 2 is a kind of learning structure frame diagram that one embodiment of the invention provides;
Fig. 3 is the structure diagram of character code identification device that one embodiment of the invention provides;
Fig. 4 is the structure diagram of electronic equipment that one embodiment of the invention provides.
Specific embodiment
With reference to the accompanying drawings and examples, the specific embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but be not limited to the scope of the present invention.
Fig. 1 shows that one embodiment of the invention provides a kind of character code recognition methods, including:
S11, text to be identified is obtained;
S12, the text to be identified is met according to the text to be identified and the acquisition of preset coding mode identification model Coding mode;
S13, the file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
For above-mentioned steps S11- steps S13, it should be noted that in embodiments of the present invention, data are compiled using certain Code mode can generate certain sequence text after being encoded.
For example, " computer technology is fast-developing " is encoded by UTF-8, it is expressed as with 16 systems: e8aea1e7ae97e69cbae68a80e69cafe5bfabe9809fe58f91e5b195;It encodes by GBK, is represented with 16 systems For:bcc6cbe3bbfabcbccaf5bfeccbd9b7a2d5b9.Here sequence length is limited to be no more than L character that (L can be clever Setting living, such as 128).
In embodiments of the present invention, it is also necessary to which explanation can train to obtain coding mode identification mould by deep learning Type, concretely:
100,000 even hundreds thousand of sequence datas are carried out deep learning to iterate, until training error and true rate reach To acceptable degree.LSTM (time recurrent neural network), Text-CNN (Convolutional Neural can be used in model Networks for Sentence Classification) even depth learning structure.
It is illustrated in figure 2 a kind of learning structure frame diagram provided in an embodiment of the present invention.
(1) by the input layer of input_1, the embeding layer (being also expression layer) of embedding_1, the ginseng of embeding layer are connect Numerical value is automatically learned by model.
After sequence is read in, for ease of calculating, each 16 ary codes are converted to the call number of a positive integer first.It builds Vertical mapping table, it is as shown in the table:
It is reserved a b c d e f 0 1 2 3 4 5 6 7 8 9
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
As abc123 is converted to:1,2,3,8,9,10.The sequence of these call numbers, can be by model as input layer data In embeding layer received.Mend 0 in the part of curtailment L.
After embeding layer receives the sequence of call number, the matrix form that can carry out the operations such as convolution is converted into, is exactly Each call number of sequence is initialized as vector.Common conversion regime has randomized, one-hot methods (only hot method), is based on Word embedding inlay technique of word2vec etc., here by taking one-hot methods as an example.Its basic ideas is only one in the corresponding vector of certain character Position is 1, other are 0.Such as abc123 is converted to:
(2) one-dimensional convolutional layer conv1d_1, conv1d_2, conv1d_ for being followed by 3 different size convolution kernels of embeding layer 3, three convolutional layers are concurrency relation.Convolution layer parameter is automatically learned with model.
(3) above-mentioned 3 results condense together, i.e. polymer layer concatenate_1.
(4) after tiling layer flatten_1 processing, restraint layer dropout_1 and full articulamentum dense_1 is met, is connected to Represent multiple nodes of various coding modes.Through excessively taking turns iteration, loss function value (the i.e. difference of predicted value and actual value of output Other metric) it is gradually reduced, until reaching acceptable minimum.Meanwhile mould can be examined with the accuracy rate of verification collection Type effect.
After model reaches promising result, preservation model structure and weighted value are used for system.
It is more ripe technology for obtaining coding mode identification model using deep learning.
In embodiments of the present invention, system is accorded with according to the text to be identified and preset coding mode identification model The coding mode of the text to be identified is closed, specifically may include:
11) text to be identified is sent in the coding mode identification model carry out calculate obtain it is described to be identified Text meets probability value corresponding to preset each coding mode;
12) meet the coding mode that probability value is determined for compliance with the text to be identified according to.
For step 11) and step 12), it should be noted that text to be identified is sent to the coding mode and is identified In model, processing mode is identical with training deep learning process, is also intended to first be processed into the sequence of call number, before conversion The sequence of text to be identified is c4a7cadecad7d0 ..., then can be exchanged into according to mapping table:3,11,1,14,3, Isosorbide-5-Nitrae, 5,3, Isosorbide-5-Nitrae, 14,4,7 ....Then it according to parameter of the weighted value of preservation as embeding layer, convolutional layer, and then is obtained by operation Text to be identified is in the probability value of each coding mode.According to it is described meet most probable value is chosen in probability value, by maximum probability It is worth corresponding coding mode as the coding mode for meeting the text to be identified.
Such as UTF-8:0.01, GBK:0.98, Latin1:0.01, because of 0.98 maximum, therefore take the coding mode that GBK is prediction.
The file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
The embodiment of the present invention provides a kind of character code recognition methods, by the text to be identified to getting, according to treating Identification text and coding mode identification model obtain text to be identified and meet probability value corresponding to preset each coding mode, From coding mode that text to be identified is determined for compliance in probability value is met, acquisition decoding result is then decoded, so as to reach Need not be manually set coding mode and match coding mode needed for characteristic sequence, reduce workload, flexibility is strong.
One embodiment of the invention provides a kind of character code recognition methods, including:
S21, text to be identified is obtained;
S22, the text to be identified is met according to the text to be identified and the acquisition of preset coding mode identification model Coding mode;
S23, the file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
For above-mentioned steps S21- steps S23, it should be noted that in embodiments of the present invention, data are compiled using certain Code mode can generate certain sequence text after being encoded.
System meets the text to be identified according to the text to be identified and the acquisition of preset coding mode identification model Coding mode, specifically may include:
21) multiple text chunks are chosen from the text to be identified;
22) each text chunk is sent in the coding mode identification model calculate and obtains each text chunk correspondence Meet probability value in preset each coding mode, according to the coding staff for meeting probability value and being determined for compliance with each text chunk Formula;
23) coding mode of the text to be identified is determined according to the coding mode of each text chunk.
For step 11) and step 12), it should be noted that text to be identified is sent to the coding mode and is identified In model, multiple text chunks are chosen from the text to be identified.The processing mode of each text chunk and training deep learning mistake Cheng Xiangtong is also intended to first be processed into the sequence of call number, as the sequence of the text to be identified before converting is C4a7cadecad7d0 ... then can be exchanged into according to mapping table:3,11,1,14,3, Isosorbide-5-Nitrae, 5,3, Isosorbide-5-Nitrae, 14,4,7 ....
Then according to parameter of the weighted value of preservation as embeding layer, convolutional layer, and then each text is obtained by operation This section meets probability value in each coding mode.According to it is described meet most probable value is chosen in probability value, by most probable value Corresponding coding mode is as the coding mode for meeting each text chunk.Then using most coding modes of appearance as The coding mode of the text to be identified.
The file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
The embodiment of the present invention provides a kind of character code recognition methods, by multiple to the text selection to be identified got Text chunk obtains each text chunk according to each text chunk and coding mode identification model and corresponds to preset each coding mode Meet probability value, from coding mode that each text chunk is determined for compliance in probability value is met, then determine text to be identified Coding mode is decoded acquisition decoding result, need not be manually set needed for coding mode and matching coding mode so as to reach Characteristic sequence, reduce workload, flexibility is strong.
Fig. 3 shows a kind of character code identification device that one embodiment of the invention provides, including acquisition module 31, processing Module 32 and decoder module 33, wherein:
Acquisition module 31, for obtaining text to be identified;
Processing module 32, described in being met according to the text to be identified and the acquisition of preset coding mode identification model The coding mode of text to be identified;
Decoder module 33 is decoded the file to be identified for the coding mode according to acquisition, is solved Code result.
The processing module is specifically used for:
The text to be identified is sent in the coding mode identification model calculate and obtains the text to be identified This meets probability value corresponding to preset each coding mode;
According to the coding mode for meeting probability value and being determined for compliance with the text to be identified.
The processing module includes computing unit and determination unit, wherein:
Computing unit carries out calculating acquisition for the text to be identified to be sent in the coding mode identification model The text to be identified meets probability value corresponding to preset each coding mode;
Determination unit chooses most probable value for meeting according to, the most probable value is corresponded in probability value Coding mode as the coding mode for meeting the text to be identified.
Since described device of the embodiment of the present invention is identical with the principle of above-described embodiment the method, for more detailed Explain that details are not described herein for content.
It it should be noted that can be by hardware processor (hardware processor) come real in the embodiment of the present invention Existing related function module.
The embodiment of the present invention provides a kind of character code identification device, by the text to be identified to getting, according to treating Identification text and coding mode identification model obtain text to be identified and meet probability value corresponding to preset each coding mode, From coding mode that text to be identified is determined for compliance in probability value is met, acquisition decoding result is then decoded, so as to reach Need not be manually set coding mode and match coding mode needed for characteristic sequence, reduce workload, flexibility is strong.
A kind of character code identification device that one embodiment of the invention provides, including acquisition module, processing module and decoding Module, wherein:
Acquisition module, for obtaining text to be identified;
Processing module, for according to the text to be identified and preset coding mode identification model acquisition meet described in treat Identify the coding mode of text;
Decoder module is decoded the file to be identified for the coding mode according to acquisition, is decoded As a result.
The processing module is specifically used for:
Multiple text chunks are chosen from the text to be identified;
Each text chunk is sent in the coding mode identification model calculate and obtains each text chunk and corresponds to Preset each coding mode meets probability value, according to the coding staff for meeting probability value and being determined for compliance with each text chunk Formula;
The coding mode of the text to be identified is determined according to the coding mode of each text chunk.
Since described device of the embodiment of the present invention is identical with the principle of above-described embodiment the method, for more detailed Explain that details are not described herein for content.
It it should be noted that can be by hardware processor (hardware processor) come real in the embodiment of the present invention Existing related function module.
The embodiment of the present invention provides a kind of character code identification device, by multiple to the text selection to be identified got Text chunk obtains each text chunk according to each text chunk and coding mode identification model and corresponds to preset each coding mode Meet probability value, from coding mode that each text chunk is determined for compliance in probability value is met, then determine text to be identified Coding mode is decoded acquisition decoding result, need not be manually set needed for coding mode and matching coding mode so as to reach Characteristic sequence, reduce workload, flexibility is strong.
Fig. 4 shows a kind of electronic equipment that one embodiment of the invention provides, including:It is processor 401, memory 402, total Line 403 and storage are on a memory and the computer program that can run on a processor;
Wherein, the processor, memory complete mutual communication by the bus;
The processor realizes method as described above when performing the computer program, such as including:Obtain text to be identified This;Meet the coding staff of the text to be identified according to the text to be identified and the acquisition of preset coding mode identification model Formula;The file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
The embodiment of the present invention provides a kind of non-transient computer readable storage medium storing program for executing, the non-transient computer readable storage Computer program is stored on medium, which realizes method as described above when being executed by processor, such as including:It obtains Take text to be identified;The text to be identified is met according to the text to be identified and the acquisition of preset coding mode identification model Coding mode;The file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments means in of the invention Within the scope of and form different embodiments.For example, in the following claims, embodiment claimed is appointed One of meaning mode can use in any combination.
It should be noted that the present invention will be described rather than limits the invention, and ability for above-described embodiment Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference mark between bracket should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.If in the unit claim for listing equipment for drying, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any sequence.These words can be explained and run after fame Claim.
One of ordinary skill in the art will appreciate that:The above embodiments are only used to illustrate the technical solution of the present invention., and It is non-that it is limited;Although the present invention is described in detail with reference to foregoing embodiments, those of ordinary skill in the art It should be understood that:It can still modify to the technical solution recorded in foregoing embodiments either to which part or All technical features carries out equivalent replacement;And it these modifications or replaces, it does not separate the essence of the corresponding technical solution this hair Bright claim limited range.

Claims (10)

1. a kind of character code recognition methods, which is characterized in that including:
Obtain text to be identified;
Meet the coding staff of the text to be identified according to the text to be identified and the acquisition of preset coding mode identification model Formula;
The file to be identified is decoded according to the coding mode of acquisition, obtains decoding result.
It is 2. according to the method described in claim 1, it is characterized in that, described according to the text to be identified and preset coding staff Formula identification model obtains the coding mode for meeting the text to be identified, including:
The text to be identified is sent in the coding mode identification model calculate and obtains the text pair to be identified Probability value should be met in preset each coding mode;
According to the coding mode for meeting probability value and being determined for compliance with the text to be identified.
It is 3. according to the method described in claim 1, it is characterized in that, described according to the text to be identified and preset coding staff Formula identification model obtains the coding mode for meeting the text to be identified, including:
Multiple text chunks are chosen from the text to be identified;
Each text chunk is sent in the coding mode identification model calculate and obtains each text chunk corresponding to default Each coding mode meet probability value, according to the coding mode for meeting probability value and being determined for compliance with each text chunk;
The coding mode of the text to be identified is determined according to the coding mode of each text chunk.
4. according to the method described in claim 2, it is characterized in that, according to it is described meet probability value be determined for compliance with it is described to be identified The coding mode of text, including:According to it is described meet most probable value is chosen in probability value;The most probable value is corresponding Coding mode is as the coding mode for meeting the text to be identified.
5. a kind of character code identification device, which is characterized in that including:
Acquisition module, for obtaining text to be identified;
Processing module, it is described to be identified for being met according to the text to be identified and the acquisition of preset coding mode identification model The coding mode of text;
Decoder module is decoded the file to be identified for the coding mode according to acquisition, obtains decoding result.
6. device according to claim 5, which is characterized in that the processing module is specifically used for:
The text to be identified is sent in the coding mode identification model calculate and obtains the text pair to be identified Probability value should be met in preset each coding mode;
According to the coding mode for meeting probability value and being determined for compliance with the text to be identified.
7. device according to claim 5, which is characterized in that the processing module is specifically used for:
Multiple text chunks are chosen from the text to be identified;
Each text chunk is sent in the coding mode identification model calculate and obtains each text chunk corresponding to default Each coding mode meet probability value, according to the coding mode for meeting probability value and being determined for compliance with each text chunk;
The coding mode of the text to be identified is determined according to the coding mode of each text chunk.
8. device according to claim 6, which is characterized in that the processing module includes computing unit and determination unit, Wherein:
Computing unit is carried out for the text to be identified to be sent in the coding mode identification model described in calculating acquisition Text to be identified meets probability value corresponding to preset each coding mode;
Determination unit chooses most probable value, by the corresponding volume of the most probable value for meeting according in probability value Code mode is as the coding mode for meeting the text to be identified.
9. a kind of electronic equipment, which is characterized in that including:Processor, memory, bus and storage on a memory and can located The computer program run on reason device;
Wherein, the processor, memory complete mutual communication by the bus;
The processor realizes the method as described in any one of claim 1-4 when performing the computer program.
10. a kind of non-transient computer readable storage medium storing program for executing, which is characterized in that on the non-transient computer readable storage medium storing program for executing Computer program is stored with, the side as described in any one of claim 1-4 is realized when which is executed by processor Method.
CN201810050150.1A 2018-01-18 2018-01-18 Character code recognition method and device Active CN108197087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810050150.1A CN108197087B (en) 2018-01-18 2018-01-18 Character code recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810050150.1A CN108197087B (en) 2018-01-18 2018-01-18 Character code recognition method and device

Publications (2)

Publication Number Publication Date
CN108197087A true CN108197087A (en) 2018-06-22
CN108197087B CN108197087B (en) 2021-11-16

Family

ID=62589725

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810050150.1A Active CN108197087B (en) 2018-01-18 2018-01-18 Character code recognition method and device

Country Status (1)

Country Link
CN (1) CN108197087B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064733A (en) * 2018-09-30 2018-12-21 珠海全志科技股份有限公司 Adaptive infrared signal coding/decoding method, computer installation and its control device
CN110096481A (en) * 2019-04-19 2019-08-06 福建天晴数码有限公司 The recognition methods of document No. and computer readable storage medium
CN110113327A (en) * 2019-04-26 2019-08-09 北京奇安信科技有限公司 A kind of method and device detecting DGA domain name
CN110135566A (en) * 2019-05-21 2019-08-16 四川长虹电器股份有限公司 Registration user name detection method based on bis- Classification Neural model of LSTM
CN111428484A (en) * 2020-04-14 2020-07-17 广州云从鼎望科技有限公司 Information management method, system, device and medium
CN111681670A (en) * 2019-02-25 2020-09-18 北京嘀嘀无限科技发展有限公司 Information identification method and device, electronic equipment and storage medium
CN111832257A (en) * 2019-04-16 2020-10-27 三星电子株式会社 Conditional transcoding of encoded data
CN113627173A (en) * 2021-08-16 2021-11-09 深圳市云采网络科技有限公司 Manufacturer name identification method and device, electronic equipment and readable medium
CN113807807A (en) * 2021-08-16 2021-12-17 深圳市云采网络科技有限公司 Component parameter identification method and device, electronic equipment and readable medium
US11838035B2 (en) 2019-03-15 2023-12-05 Samsung Electronics Co., Ltd. Using predicates in conditional transcoder for column store
CN117391070A (en) * 2023-12-08 2024-01-12 和元达信息科技有限公司 Method and system for adjusting random character

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078191A1 (en) * 2002-10-22 2004-04-22 Nokia Corporation Scalable neural network-based language identification from written text
CN104360988A (en) * 2014-10-17 2015-02-18 北京锐安科技有限公司 Method and device for identifying coding mode of Chinese characters
CN104750666A (en) * 2015-03-12 2015-07-01 明博教育科技有限公司 Text character encoding mode identification method and system
CN106354701A (en) * 2016-08-30 2017-01-25 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN107480723A (en) * 2017-08-22 2017-12-15 武汉大学 Texture Recognition based on partial binary threshold learning network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040078191A1 (en) * 2002-10-22 2004-04-22 Nokia Corporation Scalable neural network-based language identification from written text
CN104360988A (en) * 2014-10-17 2015-02-18 北京锐安科技有限公司 Method and device for identifying coding mode of Chinese characters
CN104750666A (en) * 2015-03-12 2015-07-01 明博教育科技有限公司 Text character encoding mode identification method and system
CN106354701A (en) * 2016-08-30 2017-01-25 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN107480723A (en) * 2017-08-22 2017-12-15 武汉大学 Texture Recognition based on partial binary threshold learning network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ARDEN DERTAT: "Applied Deep Learning - Part3:Autoencoders", 《TOWARDS DATA SCIENCE》 *
李继峰等: "基于N-gram模型的高速汉字编码识别系统", 《计算机工程与应用》 *
杨艳等: "脱机手写体汉字识别的改进算法", 《微计算机侣息》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109064733A (en) * 2018-09-30 2018-12-21 珠海全志科技股份有限公司 Adaptive infrared signal coding/decoding method, computer installation and its control device
CN111681670B (en) * 2019-02-25 2023-05-12 北京嘀嘀无限科技发展有限公司 Information identification method, device, electronic equipment and storage medium
CN111681670A (en) * 2019-02-25 2020-09-18 北京嘀嘀无限科技发展有限公司 Information identification method and device, electronic equipment and storage medium
US11838035B2 (en) 2019-03-15 2023-12-05 Samsung Electronics Co., Ltd. Using predicates in conditional transcoder for column store
CN111832257A (en) * 2019-04-16 2020-10-27 三星电子株式会社 Conditional transcoding of encoded data
CN111832257B (en) * 2019-04-16 2023-02-28 三星电子株式会社 Conditional transcoding of encoded data
CN113064862B (en) * 2019-04-19 2022-06-07 福建天晴数码有限公司 File code identification method based on forward and reverse word stock and storage medium
CN110096481B (en) * 2019-04-19 2021-03-23 福建天晴数码有限公司 Method for identifying file code and computer readable storage medium
CN113064863A (en) * 2019-04-19 2021-07-02 福建天晴数码有限公司 Method for automatically recognizing file code and computer readable storage medium
CN113064862A (en) * 2019-04-19 2021-07-02 福建天晴数码有限公司 File code identification method based on forward and reverse word stock and storage medium
CN113064863B (en) * 2019-04-19 2022-06-07 福建天晴数码有限公司 Method for automatically recognizing file code and computer readable storage medium
CN110096481A (en) * 2019-04-19 2019-08-06 福建天晴数码有限公司 The recognition methods of document No. and computer readable storage medium
CN110113327A (en) * 2019-04-26 2019-08-09 北京奇安信科技有限公司 A kind of method and device detecting DGA domain name
CN110135566A (en) * 2019-05-21 2019-08-16 四川长虹电器股份有限公司 Registration user name detection method based on bis- Classification Neural model of LSTM
CN111428484A (en) * 2020-04-14 2020-07-17 广州云从鼎望科技有限公司 Information management method, system, device and medium
CN113627173A (en) * 2021-08-16 2021-11-09 深圳市云采网络科技有限公司 Manufacturer name identification method and device, electronic equipment and readable medium
CN113807807A (en) * 2021-08-16 2021-12-17 深圳市云采网络科技有限公司 Component parameter identification method and device, electronic equipment and readable medium
CN117391070A (en) * 2023-12-08 2024-01-12 和元达信息科技有限公司 Method and system for adjusting random character
CN117391070B (en) * 2023-12-08 2024-03-22 和元达信息科技有限公司 Method and system for adjusting random character

Also Published As

Publication number Publication date
CN108197087B (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN108197087A (en) Character code recognition methods and device
CN107798235B (en) Unsupervised abnormal access detection method and unsupervised abnormal access detection device based on one-hot coding mechanism
JP6955580B2 (en) Document summary automatic extraction method, equipment, computer equipment and storage media
CN107704625B (en) Method and device for field matching
CN111476023B (en) Method and device for identifying entity relationship
CN110825857B (en) Multi-round question and answer identification method and device, computer equipment and storage medium
WO2021189844A1 (en) Detection method and apparatus for multivariate kpi time series, and device and storage medium
CN108959312A (en) A kind of method, apparatus and terminal that multi-document summary generates
CN113011189A (en) Method, device and equipment for extracting open entity relationship and storage medium
CN112257437B (en) Speech recognition error correction method, device, electronic equipment and storage medium
CN111382271B (en) Training method and device of text classification model, text classification method and device
CN109522395A (en) Automatic question-answering method and device
CN107451106A (en) Text method and device for correcting, electronic equipment
CN116579618B (en) Data processing method, device, equipment and storage medium based on risk management
CN108320778A (en) Medical record ICD coding methods and system
CN114722091A (en) Data processing method, data processing device, storage medium and processor
CN111340638A (en) Abnormal medical insurance document identification method and device, computer equipment and storage medium
CN111126056B (en) Method and device for identifying trigger words
CN112765330A (en) Text data processing method and device, electronic equipment and storage medium
CN109033078B (en) The recognition methods of sentence classification and device, storage medium, processor
CN110705258A (en) Text entity identification method and device
CN111241843A (en) Semantic relation inference system and method based on composite neural network
CN115019316A (en) Training method of text recognition model and text recognition method
CN112749532A (en) Address text processing method, device and equipment
CN110378457A (en) A kind of yard of target generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 332, 3 / F, Building 102, 28 xinjiekouwei street, Xicheng District, Beijing 100088

Applicant after: Qianxin Technology Group Co.,Ltd.

Address before: 100015 15, 17 floor 1701-26, 3 building, 10 Jiuxianqiao Road, Chaoyang District, Beijing.

Applicant before: BEIJING QIANXIN TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant