WO2009039689A1 - Abstracting method for computer plain text character - Google Patents

Abstracting method for computer plain text character Download PDF

Info

Publication number
WO2009039689A1
WO2009039689A1 PCT/CN2007/003394 CN2007003394W WO2009039689A1 WO 2009039689 A1 WO2009039689 A1 WO 2009039689A1 CN 2007003394 W CN2007003394 W CN 2007003394W WO 2009039689 A1 WO2009039689 A1 WO 2009039689A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
computer
code
plain text
characters
Prior art date
Application number
PCT/CN2007/003394
Other languages
French (fr)
Chinese (zh)
Inventor
Jianming Wu
Original Assignee
Jianming Wu
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jianming Wu filed Critical Jianming Wu
Publication of WO2009039689A1 publication Critical patent/WO2009039689A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • G06F40/129Handling non-Latin characters, e.g. kana-to-kanji conversion

Definitions

  • the present invention relates to a method for extracting computer plain text characters, and more particularly to a method for extracting computer plain text characters for an electronic file that uses a computer to edit and print paper documents. Background technique
  • Electronic documents are computer representations of people's thoughts and meanings. Because electronic files are easily lost, such as: computer virus damage, computer data storage media shields, short life, human damage, and other unforeseen factors are easy. It causes the loss of computer electronic files. Therefore, in many cases, it is necessary to rely on paper documents to record the same thought and meaning.
  • the computer uses code to represent these language literals, such as: decimal ASC II character codes used in computers, hexadecimal Unicode character codes, and hexadecimal national standard Chinese character codes.
  • Digital signature technology authenticates an authenticity of an electronic document, but only a collection of computer character code sets that make up the electronic file.
  • Such as: "(1) I am Chinese” is composed of 7 characters, namely “(1),,, “half-width space”, "I am Chinese” 5 Chinese characters.
  • the computer only needs to represent these characters.
  • the set of computer code can be, ie: "(1)” is the bracketed alphanumeric code “A2C5", “half-width space” code “0020”, “I” code “CED2”, “Yes” code “CAC7”, “ In the “code” D6D0”, “country” code “B9FA,,, "person” code “C8CB,, the set is a hexadecimal array of "A2C50020CED2CAC7D6D0B9FAC8CB".
  • spaces there are many spaces in a file.
  • spaces have two forms: “half-width space” and “full-width space”, which are represented by codes “0020”, “A1A1”, and so on.
  • these characters can be easily distinguished by using a character area code or an internal code; however, when these characters are printed on paper by a computer-controlled digital printing machine, as expressed in this document, The visual form of a character is incapable of recognizing its representation of computer code. Similarly, on a printed paper file, it is impossible to determine how many computer space characters are there. There are several full-width spaces and several half-width spaces in these space characters. We can't distinguish them.
  • the present invention provides a method for extracting computer plain text characters, and by extracting, filtering, and filtering the computer character set character codes, when extracting plain text characters of computer electronic files, the computer characters are
  • the multiple code representations are attributed to a code representation method, in which the filtered plain text characters are digitally signed data sources, and the digital signature data is recorded in the paper file, thereby improving the fault tolerance of the paper file text content input.
  • the authenticity verification of the digital signature of the text content recorded in the paper media files edited and printed by the computer is effectively performed.
  • the computer character code When extracting plain text characters in a computer electronic file, the computer character code is used to perform consistency filtering on the plurality of location codes of the visual characters representing a meaning in the plain text characters in the original electronic file, so that the extracted files are in plain text characters.
  • Each visual character that represents a meaning has only one location code (internal code) or code.
  • a full-width and half-width character code is used to represent a visual character of a meaning, all represented by a full-width character code.
  • Combining more than two visual characters together uses a single code to represent a visual character of a meaning, split into more than two visual characters, and is represented by its character code.
  • the present invention relates to a method for extracting computer plain text characters. The following describes in detail how the present invention is implemented:
  • a character filtering correspondence table is established according to the technical solution of the present invention.
  • Step 1 According to the character filtering method of the file of the present invention, all characters of different computer character sets are subjected to corresponding analysis and screening, and a computer character set corresponding filtering table is established.
  • Step 2 Obtain an electronic file to extract all plain text characters from the electronic file.
  • Step 3 The computer automatically retrieves the character code, and performs consistent filtering on the character according to the character filtering correspondence table to form a new plain text character.
  • Step 4 Use the new plain text character as the data source and input a hash function (for example: SHA-1 dedicated hash function) to get the hash value.
  • a hash function for example: SHA-1 dedicated hash function
  • Step 5 Digitally sign the hash value using an asymmetric key, such as a digital certificate for the RSA or ECC key system, to derive the digital signature value.
  • an asymmetric key such as a digital certificate for the RSA or ECC key system
  • Step 6 Record the hash value and the digital signature value in the selected area on the paper file to be printed, for example, on the stamp.
  • Step 7 Print the paper document in the original format in the electronic file.
  • the verification procedure of the authenticity of the paper documents printed by the above method is as follows: Prerequisite: The electronic file corresponding to the paper document has been lost, only one paper document cannot be matched with the electronic document for matching verification; or the electronic device has no condition to be matched. file.
  • the first step the input personnel in the text in accordance with their own input habits in the computer Re-enter all the characters in the paper file.
  • Step 2 The computer automatically retrieves the character code, and performs consistent filtering on the character according to the character filtering correspondence table to form a new plain text character.
  • Step 3 Extract the hash value and digital signature value recorded in the paper file.
  • Step 4 Use the same hash function to compute new plain text characters to get the hash value.
  • Step 5 Compare the hash value recorded on the paper file with the hash value obtained from the new operation. If the content of the file is consistent, the content of the file is unchanged.
  • Step 6 Decrypt the digital signature value using the public key of the original digital certificate user.
  • the value obtained should be the hash value of the fourth step when printing the paper file. Compare the third and fourth steps in the verification of authenticity. 2 sets of hash values, the same is true, and the inconsistency is ⁇ ⁇ .
  • the invention has the advantages of standardizing the inconsistency of the character codes that may occur when the computer is entered by different people and multiple times, so that the paper document can be verified by the digital signature technology on the basis of character consistency, and is a paper credit.

Abstract

An abstracting method for computer plain text character is disclosed. The method includes: when abstracting the plain text character from a computer electronic file, making a consistency filtering replacement to multiple codes representing the visual characters with one meaning in the plain text character of the original electronic file by using the computer character code, thereby making each visual character representing one meaning in the file plain text character after abstracted has only one zone bit code (internal code) or one computer character code representing manner.

Description

计算机纯文本字符的提取方法 技术领域  Method for extracting computer plain text characters
本发明涉及计算机纯文本字符的提取方法, 尤其涉及对使用计算 机编辑并印制纸文件的电子文件的计算机纯文本字符的提取方法。 背景技术  The present invention relates to a method for extracting computer plain text characters, and more particularly to a method for extracting computer plain text characters for an electronic file that uses a computer to edit and print paper documents. Background technique
随着计算机技术的普及使用, 人们在大量使用各种格式的电子文 件表示思想、 意思, 同时, 也在使用计算机控制下的各种数码印刷机 械 (如打印机) 同时把这些不同格式的电子文件印制成纸文件, 以便 保存与交流。  With the popularization of computer technology, people use a large number of electronic files in various formats to express their thoughts and meanings. At the same time, they also use various digital printing machines (such as printers) under computer control to simultaneously print electronic files of these different formats. Make paper documents for preservation and communication.
例如, 2 个企业在签署一份经济合同时, 首先在计算机里起草一 份具有文字内容的电子文件, 双方同意后, 再使用数码印刷机械(如: 激光打印机)把这份合同印制到纸张上, 双方分别在纸张文件上盖章 签字, 作为有效信用文件。  For example, when two companies sign an economic contract, they first draft an electronic file with text content on the computer. After the two parties agree, they use digital printing machines (such as laser printers) to print the contract to paper. On the other hand, both parties stamped and stamped the paper documents as valid credit documents.
人们在相互的思想和意思表示中, 需要证明自己的意思表示的真 实性, 例如, 上述合同的盖章签字。  In the mutual expression of ideas and meanings, people need to prove the true meaning of their meaning, for example, the seal of the above contract.
在信息时代,大量使用的计算机电子文件也需要进行真实性验证, 从上世纪九十年代中期开始, 国际范围内普遍使用散列函数与非对称 密钥体系构成的数字签名技术对计算机电子文件进行真实性验证, 为 此,各国先后颁布了涉及到对电子文件进行数字签名认证的法律, 2005 年, 我国 《电子签名法》 发布实施, 推动了数字签名技术在电子文件 真实性验证中的应用。  In the information age, computer electronic files that are used in large quantities also need to be verified for authenticity. Since the mid-1990s, digital signature techniques consisting of hash functions and asymmetric key systems have been widely used in the world for computer electronic files. Verification of authenticity, for this reason, countries have successively enacted laws concerning the digital signature verification of electronic documents. In 2005, China's "Electronic Signature Law" was issued and implemented, which promoted the application of digital signature technology in the authenticity verification of electronic documents.
电子文件是人们思想和意思的计算机表示方式, 由于电子文件具 有容易灭失的特点, 例如: 计算机病毒破坏, 计算机数据存储介质的 盾量差、 寿命短, 人为损坏等等不可预见的因素, 都容易造成计算机 电子文件的灭失, 因此, 在很多情况下, 需要依靠纸质文件记载同一 个思想和意思的表示。  Electronic documents are computer representations of people's thoughts and meanings. Because electronic files are easily lost, such as: computer virus damage, computer data storage media shields, short life, human damage, and other unforeseen factors are easy. It causes the loss of computer electronic files. Therefore, in many cases, it is necessary to rely on paper documents to record the same thought and meaning.
计算机电子文件与使用计算机控制的数码印刷机械印制的纸质文 件, 都使用人类几千年形成的语言文字符号供人们阅读。 Computer electronic documents and paper text printed by computer-controlled digital printing machinery The pieces use the language and characters formed by humans for thousands of years for people to read.
计算机使用代码表示这些语言文字符号, 如: 计算机中使用的十 进制 ASC II字符代码, 十六进制的 Unicode字符代码, 十六进制的国 标简体中文字符代码。  The computer uses code to represent these language literals, such as: decimal ASC II character codes used in computers, hexadecimal Unicode character codes, and hexadecimal national standard Chinese character codes.
数字签名技术对一份电子文件的真实性认证, 只是对组成这份电 子文件的一系列计算机字符代码集合的认证。 如: "(1) 我是中国人" 是由 7个字符, 即 " (1),,、 "半角空格"、 "我是中国人" 5个汉字组成, 认证时,计算机只要把代表这些字符的计算机代码集合即可,即: "(1)" 是带括号字母数字代码 "A2C5" , "半角空格" 代码 "0020" , "我" 代 码 "CED2" , "是" 代码 "CAC7" , "中" 代码 "D6D0" , "国" 代码 " B9FA ,, , " 人 " 代 码 " C8CB ,, , 集 合 为 "A2C50020CED2CAC7D6D0B9FAC8CB" 的 16进制数组。  Digital signature technology authenticates an authenticity of an electronic document, but only a collection of computer character code sets that make up the electronic file. Such as: "(1) I am Chinese" is composed of 7 characters, namely "(1),,, "half-width space", "I am Chinese" 5 Chinese characters. When authentication, the computer only needs to represent these characters. The set of computer code can be, ie: "(1)" is the bracketed alphanumeric code "A2C5", "half-width space" code "0020", "I" code "CED2", "Yes" code "CAC7", " In the "code" D6D0", "country" code "B9FA,,, "person" code "C8CB,,, the set is a hexadecimal array of "A2C50020CED2CAC7D6D0B9FAC8CB".
由于计算机技术是从英文国家向多语言文字的国家发展的, 在计 算机字符集,就有许多一个意义的视觉字符有多个区位码表示的现象, 例如: 英文字母数字有半角与全角两种输入和表示方式, 相对应的, 也就有 2个区位码 (内码)对应一个意义的字母数字; 同时, 为方便 计算机输入、 提高文字符号录入速度, 在字符输入法设计中, 也有把 多个符号集合为 1个符号的处理方法。  Since computer technology is developed from an English-speaking country to a multi-language country, in computer character sets, there are many meanings of visual characters with multiple location codes, for example: English alphanumeric characters have half-width and full-width input. Corresponding to the representation method, there are two location codes (internal codes) corresponding to one meaning of alphanumeric characters; at the same time, in order to facilitate computer input and improve the speed of text symbol entry, in the character input method design, there are also multiple The symbol set is a processing method of one symbol.
检索国标筒体中文字符集, 有以下几类容易混淆的字符代码: 如: ". " 的全角字母数字代码是 "A3AE" , 而 "." 半角字母数字 代码是 " 002E"。  To retrieve the Chinese character set of the Chinese standard cylinder, there are the following types of confusing character codes: For example: The full-width alphanumeric code of "." is "A3AE", and the "." half-width alphanumeric code is "002E".
如: " 3. "是 "A2B3"一个代码表示的字符, 而 " 3."是由 "0033" 代码表示的 "3" 与 "002E" 代码表示的 ".,, 2个字符组成的。  For example: "3" is the character represented by a code of "A2B3", and "3" is composed of ".", which is represented by the code "0033" and ".," which is represented by the code "002E".
如: "(1) 我是中国人" 这句话中的 "(1),, 是一个代码 "A2C5" 表示; 而 "(1)"则可以由 3个半角字符 "(,,、 " 1,,、 " )"的代码 "0028 0031 0029" 表示。  Such as: "(1) I am Chinese" "(1), is a code "A2C5" in the sentence; and "(1)" can be composed of 3 characters" (,,, " 1 The code "0028 0031 0029" of , , , " )" is indicated.
如: 一份文件中有许多空格, 而在计算机字符集中, 空格有 "半 角空格" 与 "全角空格" 2种形式, 分别由代码 "0020"、 "A1A1 " 等 表示。  For example, there are many spaces in a file. In the computer character set, spaces have two forms: "half-width space" and "full-width space", which are represented by codes "0020", "A1A1", and so on.
再如: 我们使用的计算机输入的字符中有 "Unicode (十六进制)" 与 "筒体中文 GB (十六进制)" 两种字符代码表, 同一个汉字在不同 的字符代码表中代码不一样, "我" 在 Unicode (十六进制)中的代码是 "6211" , 在简体中文 GB (十六进制) 中的代码则是 "CED2"。 Another example: We use the computer to input characters with "Unicode (hex)" With the "cylinder Chinese GB (hex)" two character code table, the same Chinese character is different in the different character code table, the code "I" in Unicode (hexadecimal) is "6211"" , the code in Simplified Chinese GB (hex) is "CED2".
在计算机电子文件中, 使用字符区位码或内码可以很容易地区分 这些字符; 但是, 当这些字符通过计算机控制的数码印刷机械印制到 纸张上以后, 如同在本文件中所表现的, 根据字符的视觉形态, 是无 法辨认其计算机代码表示方法的。 同样, 在一份打印好的纸张文件上, 也不能判断出究竟有多少计算机空格字符, 这些空格字符中有几个全 角空格, 几个半角空格, 我们也无法辨别。  In computer electronic files, these characters can be easily distinguished by using a character area code or an internal code; however, when these characters are printed on paper by a computer-controlled digital printing machine, as expressed in this document, The visual form of a character is incapable of recognizing its representation of computer code. Similarly, on a printed paper file, it is impossible to determine how many computer space characters are there. There are several full-width spaces and several half-width spaces in these space characters. We can't distinguish them.
"(1) 我是中国人"这句话印制在纸张上, 如果我们想通过计算 机数字签名技术认证纸张上的这句话的真实性, 则需要把这句话的字 符代码再输入到计算机中,使用散列函数中运算;由于这句话中的 "(1),, 的计算机字符表示方法有 2种, 组成的代码集合也就有不同的 2组: 一组是: "A2C5 0020 CED2 CAC7 D6D0 B9FA C8CB" , 7个代码; 另一组: "0028 0031 0029 0020 CED2 CAC7 D6D0 B9FA C8CB" , 9个代码;  "(1) I am Chinese" is printed on paper. If we want to verify the authenticity of this sentence on paper by computer digital signature technology, we need to re-enter the character code of this sentence into the computer. In the hash function, the operation is performed; because of the "(1), there are two kinds of computer character representation methods in this sentence, the code set consisting of two different groups: One set is: "A2C5 0020 CED2 CAC7 D6D0 B9FA C8CB" , 7 codes; another group: "0028 0031 0029 0020 CED2 CAC7 D6D0 B9FA C8CB" , 9 codes;
如果使用不同的字符集, 在 "Unicode (十六进制)" 中 "我是中国 人,, 5个字的代码是: "6211 662F 4E2D 56FD 4EBA"  If using a different character set, in "Unicode (hex)" "I am Chinese, the code for 5 words is: "6211 662F 4E2D 56FD 4EBA"
而在 "筒体中文 GB (十六进制)" 中这 5个字的代码是: "CED2 CAC7 D6D0 B9FA C8CB',。  The code for the five words in "Chinese GB (hex)" is: "CED2 CAC7 D6D0 B9FA C8CB',.
容易歧异输入的符号与不同的字符集交合在一起, "(1) 我是中 国人" 这句话就会产生 4组不同的代码组合, 如果一个文件中字符很 多, 其代码组合就会不计其数。  Symbols that are easy to discriminate are intertwined with different character sets. "(1) I am Chinese" This sentence will produce 4 different combinations of codes. If there are many characters in a file, the code combination will not be counted. number.
而不同的字符代码 (内码) 集合数据在散列函数中的运算结果是 完全不同的。 的文字内容的真实性认证。  The result of the different character code (internal code) set data in the hash function is completely different. Authenticity certification of the text content.
由于计算机技术的历史发展影响, 造成了计算机信息处理中的这 种混杂的字符代码表示方法, 使数字签名技术很难、 甚至无法对纸张 文件记载的文字内容进行真实性认证, 从而使电子文件的认证与使用 该电子文件打印的紙张文件的认证完全脱节, 形成了目前普遍存在的 纸张文件内容认证困难问题, 一旦出现了变造与伪造的假信用文件, 只能使用常规的理化检验方法, 而不能使用数字化的计算机数字签名 技术对纸张文件内容进行真实性认证。 Due to the historical development of computer technology, this mixed character code representation method in computer information processing has made it difficult for digital signature technology to perform authenticity verification on the text content recorded in paper documents, thus making electronic documents Certification and use The certification of the paper documents printed by the electronic document is completely out of touch, which has formed the problem of difficulty in authenticating the paper file content that is currently ubiquitous. Once the fake and forged fake credit documents appear, only the conventional physical and chemical inspection methods can be used, but cannot be used. Digital computer digital signature technology authenticates the authenticity of paper document content.
这是伴随计算;  This is the accompanying calculation;
世界性问题 Worldwide problem
^口何 4巴 T异孑 ¾ -T又 1亍具 ^ T^ ^J  ^口何4巴 T异孑 3⁄4 -T又一亍具 ^ T^ ^J
计算机打印的紙张文件的真实性认证上面? 使人们的一个意思表示下 的电子文件与纸张文件的 2种文件形态统一在一个数字签名技术体系 认证中, 从而使现代计算机信息表现形式与人类传统信用表示形式具 有同样的安全认证技术保证, 为形成有序的社会政治、 经济秩序提供 文件真实性认证的技术支持, 这是需要解决的文件安全认证的世界性 问题。 发明内容 Is the authenticity of the computer-printed paper document certified? The two kinds of file forms of the electronic file and the paper file under the meaning of one person are unified in a digital signature technology system certification, so that the modern computer information expression form has the same safety certification technology guarantee as the traditional human credit representation form. The formation of an orderly social politics, economic order to provide technical support for the authenticity of documents, this is a worldwide problem of document security certification that needs to be resolved. Summary of the invention
为解决上述问题与技术缺陷, 本发明提供计算机纯文本字符的提 取方法, 通过对计算机字符集中字符代码的对应、 筛选、 过滤处理方 法, 在提取计算机电子文件的纯文本字符时, 将计算机字符中的多种 代码表示方式归于一种代码表示方式, 以过滤后的纯文本字符为数字 签名的数据源, 并在纸张文件中记录数字签名的数据, 提高紙张文件 文字内容再输入的容错性, 在不改变人们使用计算机的已有习惯与已 经形成的国际化字符标准的基础上,.有效地对计算机编辑和打印的纸 介质文件所记载的文字内容进行数字签名的真实性验证。  In order to solve the above problems and technical defects, the present invention provides a method for extracting computer plain text characters, and by extracting, filtering, and filtering the computer character set character codes, when extracting plain text characters of computer electronic files, the computer characters are The multiple code representations are attributed to a code representation method, in which the filtered plain text characters are digitally signed data sources, and the digital signature data is recorded in the paper file, thereby improving the fault tolerance of the paper file text content input. On the basis of not changing the existing habits of people using computers and the established international character standards, the authenticity verification of the digital signature of the text content recorded in the paper media files edited and printed by the computer is effectively performed.
本发明是通过以下技术方案实现的:  The invention is achieved by the following technical solutions:
提取计算机电子文件中纯文本字符时, 使用计算机字符代码对原 电子文件中的纯文本字符中表示一个意义的视觉字符的多个区位码进 行一致性过滤替换, 使提取后的文件纯文本字符中每一个表示一个意 义的视觉字符只有一个区位码 (内码) 或代码。  When extracting plain text characters in a computer electronic file, the computer character code is used to perform consistency filtering on the plurality of location codes of the visual characters representing a meaning in the plain text characters in the original electronic file, so that the extracted files are in plain text characters. Each visual character that represents a meaning has only one location code (internal code) or code.
提取一份电子文件的纯文本字符时, 可以做如下处理:  When extracting a plain text character of an electronic file, you can do the following:
使用全角与半角 2种字符代码表示一个意义的视觉字符, 全部使 用半角字符代码表示。 Use a full-width and half-width character code to represent a visual character, all Expressed in half-width character code.
使用全角与半角 2种字符代码表示一个意义的视觉字符, 全部使 用全角字符代码表示。  A full-width and half-width character code is used to represent a visual character of a meaning, all represented by a full-width character code.
把两个以上多个视觉字符集合在一起使用一个代码表示一个意义 的视觉字符, 分拆为两个以上多个视觉字符, 并分别使用其字符代码 表示。  Combining more than two visual characters together uses a single code to represent a visual character of a meaning, split into more than two visual characters, and is represented by its character code.
全部删除电子文件中的半角空格字符与全角空格字符。 具体实施方式  Delete all half-width space characters and full-width space characters in electronic files. detailed description
本发明涉及计算机纯文本字符的提取方法, 下面详细说明本发明 是如何实现的:  The present invention relates to a method for extracting computer plain text characters. The following describes in detail how the present invention is implemented:
按照本发明技术方案建立字符过滤对应表。  A character filtering correspondence table is established according to the technical solution of the present invention.
紙张文件印制的技术实施步骤:  Technical implementation steps for printing paper documents:
第一步: 按照本发明文件的字符过滤方法, 对不同计算机字符集 的全部字符进行对应分析筛选, 建立一份计算机字符集对应过滤表。  Step 1: According to the character filtering method of the file of the present invention, all characters of different computer character sets are subjected to corresponding analysis and screening, and a computer character set corresponding filtering table is established.
第二步: 取得一份电子文件,提取电子文件中的全部纯文本字符。 第三步: 计算机自动检索字符代码, 按照字符过滤对应表对字符 进行一致性过滤, 形成新的纯文本字符。  Step 2: Obtain an electronic file to extract all plain text characters from the electronic file. Step 3: The computer automatically retrieves the character code, and performs consistent filtering on the character according to the character filtering correspondence table to form a new plain text character.
第四步:以新的纯文本字符为数据源,输入散列函数(例如: SHA-1 专用散列函数)运算得出散列值。  Step 4: Use the new plain text character as the data source and input a hash function (for example: SHA-1 dedicated hash function) to get the hash value.
第五步: 使用非对称密钥, 如 RSA或 ECC密钥体系的数字证书 对散列值进行数字签名, 得出数字签名值。  Step 5: Digitally sign the hash value using an asymmetric key, such as a digital certificate for the RSA or ECC key system, to derive the digital signature value.
第六步: 在需要印制的纸张文件上的选定区域记录散列值和数字 签名值, 例如记录在印章上面。  Step 6: Record the hash value and the digital signature value in the selected area on the paper file to be printed, for example, on the stamp.
第七步: 按照电子文件中的原格式印制纸张文件。  Step 7: Print the paper document in the original format in the electronic file.
以上方法印制的纸张文件的真实性验证步骤按照下面进行: 前提: 对应纸张文件的电子文件已经灭失, 只有一份纸张文件, 不能与电子文件一起进行匹配性验证; 或者没有条件得到匹配的电子 文件。  The verification procedure of the authenticity of the paper documents printed by the above method is as follows: Prerequisite: The electronic file corresponding to the paper document has been lost, only one paper document cannot be matched with the electronic document for matching verification; or the electronic device has no condition to be matched. file.
第一步: 录入人员以纯文本方式按照自己的录入习惯在计算机中 重新输入纸张文件中的全部字符。 The first step: the input personnel in the text in accordance with their own input habits in the computer Re-enter all the characters in the paper file.
第二步: 计算机自动检索字符代码, 按照字符过滤对应表对字符 进行一致性过滤, 形成新的纯文本字符。  Step 2: The computer automatically retrieves the character code, and performs consistent filtering on the character according to the character filtering correspondence table to form a new plain text character.
第三步: 提取记载在该纸张文件中的散列值与数字签名值。  Step 3: Extract the hash value and digital signature value recorded in the paper file.
第四步: 使用同一散列函数运算新的纯文本字符,得出散列值。 第五步: 比较纸张文件上记载的散列值与新运算得出的散列值, 一致则文件内容没有改动。  Step 4: Use the same hash function to compute new plain text characters to get the hash value. Step 5: Compare the hash value recorded on the paper file with the hash value obtained from the new operation. If the content of the file is consistent, the content of the file is unchanged.
第六步: 使用原数字证书用户的公钥对数字签名值解密,得出的数 值应当是印制纸张文件时第四步的散列值, 对比真实性验证中第三步 与第四步的 2组散列值, 一致的为真, 不一致的为^ ί艮。  Step 6: Decrypt the digital signature value using the public key of the original digital certificate user. The value obtained should be the hash value of the fourth step when printing the paper file. Compare the third and fourth steps in the verification of authenticity. 2 sets of hash values, the same is true, and the inconsistency is ^ ί艮.
本发明的优点是规范了计算机在不同人员、 多次录入时可能发生 的字符代码不一致的问题, 使纸张文件可以在字符一致性的基础上使 用数字签名技术进行真实性验证, 为纸质类信用文件的有效防伪认证 提供了技术支持。  The invention has the advantages of standardizing the inconsistency of the character codes that may occur when the computer is entered by different people and multiple times, so that the paper document can be verified by the digital signature technology on the basis of character consistency, and is a paper credit. Technical support for effective anti-counterfeiting certification of documents.
以上所述, 仅为本发明较佳的具体实施方式, 但本发明的保护范 围并不局限于此, 任何熟悉本技术领域的技术人员在本发明揭露的技 术范围内, 可轻易想到的变化或替换, 都应涵盖在本发明的保护范围 之内, 因此, 本发明的保护范围应该以权利要求的保护范围为准。  The above is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or within the technical scope disclosed by the present invention. All modifications are intended to be included within the scope of the invention, and the scope of the invention should be construed as the scope of the claims.

Claims

权利要求: Rights request:
1. 计算机纯文本字符的提取方法, 其特征在于, 提取计算机电子 文件中纯文本字符时, 使用计算机字符代码对原电子文件中的纯文本 字符中表示一个意义的视觉字符的多个区位码进行一致性过滤替换, 使提取后的文件纯文本字符中每一个表示一个意义的视觉字符只有一 个区位码或一种计算机字符代码表示方式。 1. A method for extracting a plain text character of a computer, characterized in that, when extracting a plain text character in a computer electronic file, a computer character code is used to perform a plurality of location codes of a visual character representing a meaning in a plain text character in the original electronic file. Consistent filtering replaces, so that each of the extracted plain text characters represents a meaning visual character with only one location code or a computer character code representation.
2. 根据权利要求 1所述的计算机纯文本字符的提取方法,其特征 在于, 使用全角与半角 2种字符代码表示一个意义的视觉字符, 全部 使用半角字符代码表示。 The method for extracting computer plain text characters according to claim 1, wherein a full-width and half-width character code is used to represent a visual character of a meaning, all of which are represented by a half-width character code.
3. 根据权利要求 1所述的计算机纯文本字符的提取方法,其特征 在于, 使用全角与半角 2种字符代码表示一个意义的视觉字符, 全部 使用全角字符代码表示。 3. The method of extracting computer plain text characters according to claim 1, wherein a full-width and half-width character code is used to represent a visual character of a meaning, all of which are represented by a full-width character code.
4. 根据权利要求 1所述的计算机纯文本字符的提取方法,其特征 在于, 把两个及多个有独立代码表示的视觉字符集合在一起使用一个 代码表示的视觉字符, 分拆为原有的独立视觉字符, 并分别使用其原 字符代码表示。 4. The method for extracting computer plain text characters according to claim 1, wherein two or more visual characters represented by independent codes are grouped together using a visual character represented by a code, and is separated into original ones. Independent visual characters, and are represented by their original character codes, respectively.
5. 根据权利要求 1所述的计算机纯文本字符的提取方法,其特征 在于, 全部删除电子文件中的半角空格字符与全角空格字符。 5. The method of extracting computer plain text characters according to claim 1, wherein all half-width space characters and full-width space characters in the electronic file are deleted.
PCT/CN2007/003394 2007-09-24 2007-11-30 Abstracting method for computer plain text character WO2009039689A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNA2007101222220A CN101122897A (en) 2007-09-24 2007-09-24 Computer pure words character extraction method
CN200710122222.0 2007-09-24

Publications (1)

Publication Number Publication Date
WO2009039689A1 true WO2009039689A1 (en) 2009-04-02

Family

ID=39085234

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/003394 WO2009039689A1 (en) 2007-09-24 2007-11-30 Abstracting method for computer plain text character

Country Status (2)

Country Link
CN (1) CN101122897A (en)
WO (1) WO2009039689A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108861051A (en) * 2018-03-22 2018-11-23 贵州卓霖科技有限公司 A kind of electric appliance method for anti-counterfeit and electric appliance false proof device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6400287B1 (en) * 2000-07-10 2002-06-04 International Business Machines Corporation Data structure for creating, scoping, and converting to unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets
CN1834966A (en) * 2005-03-16 2006-09-20 日本电气株式会社 Display conversion device,display conversion method and program of watch

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6400287B1 (en) * 2000-07-10 2002-06-04 International Business Machines Corporation Data structure for creating, scoping, and converting to unicode data from single byte character sets, double byte character sets, or mixed character sets comprising both single byte and double byte character sets
CN1834966A (en) * 2005-03-16 2006-09-20 日本电气株式会社 Display conversion device,display conversion method and program of watch

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108861051A (en) * 2018-03-22 2018-11-23 贵州卓霖科技有限公司 A kind of electric appliance method for anti-counterfeit and electric appliance false proof device
CN108861051B (en) * 2018-03-22 2024-01-05 贵州卓霖科技有限公司 Anti-counterfeiting method and anti-counterfeiting device for electrical appliance

Also Published As

Publication number Publication date
CN101122897A (en) 2008-02-13

Similar Documents

Publication Publication Date Title
US20080091954A1 (en) Method and system for facilitating printed page authentication, unique code generation and content integrity verification of documents
EP2786270A1 (en) System, process and method for the detection of common content in multiple documents in an electronic system
CN101281581A (en) Method for checking whether contents of paper file is distorted or not
CN103150517A (en) Secure secret electronic file storage and archiving method and method for checking matching of user permission and file open permission
JP2018170036A (en) Snippet matching in file sharing network
US20040221162A1 (en) Method and systems to facilitate online electronic notary, signatures and time stamping
CN101938475B (en) Identity authentication method of internet information publisher and system thereof
Mthethwa et al. Proposing a blockchain-based solution to verify the integrity of hardcopy documents
CN114329634A (en) Anti-counterfeiting method for electronic signature document
Madhusudhan et al. A secure and enhanced elliptic curve cryptography‐based dynamic authentication scheme using smart card
WO2009039689A1 (en) Abstracting method for computer plain text character
Dlamini et al. Mitigating the challenge of hardcopy document forgery
CN107392060A (en) A kind of hard disk, duplicator safety detection method, system
Khadam et al. Data aggregation and privacy preserving using computational intelligence
WO2009146333A1 (en) Systems and methods for secure data entry and storage
CN110489265A (en) A kind of JSON data validation and storage method based on metadata
EP3611647A1 (en) Method for processing and verifying a document
CN103218766A (en) Method and system of encryption printing and original judgment of electronic medical record
CN103092940A (en) File structure, digital signature method and digital signature validation method with verifiable reconfiguration
CN117667842A (en) Electronic file full life cycle management method and system based on block chain technology
Wanli et al. A Novel Image Content Authentication Algorithm Based on Laplace Spectra Feature
Iwakura et al. Big Community Data before World Wide Web Era
TWM440493U (en) Authentication system for electronic document circulation
Leszczynska POLYPHONIC ARRANGEMENTS OF PROPRIUM AND ORDINARIUM MISSAE FROM THE BRANIEWO MANUSCRIPT (UPPSU 76F) IN THE CONTEXT OF EUROPEAN TRADITION
Cuilin et al. E-mail System Based on Dynamic Password and Fingerprint Recognition

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07845757

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 07/07/2010)

122 Ep: pct application non-entry in european phase

Ref document number: 07845757

Country of ref document: EP

Kind code of ref document: A1