WO2021072872A1 - 基于字符转换的姓名存储方法、装置、计算机设备 - Google Patents

基于字符转换的姓名存储方法、装置、计算机设备 Download PDF

Info

Publication number
WO2021072872A1
WO2021072872A1 PCT/CN2019/118235 CN2019118235W WO2021072872A1 WO 2021072872 A1 WO2021072872 A1 WO 2021072872A1 CN 2019118235 W CN2019118235 W CN 2019118235W WO 2021072872 A1 WO2021072872 A1 WO 2021072872A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
replacement
name
replaced
mapping
Prior art date
Application number
PCT/CN2019/118235
Other languages
English (en)
French (fr)
Inventor
甘丽婷
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021072872A1 publication Critical patent/WO2021072872A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text

Definitions

  • This application relates to the field of computer technology, in particular to a name storage method, device, and computer equipment based on character conversion.
  • the utf-8 encoding method has strong compatibility and a wide range of applications, and the information recorded in the utf-8 encoding method can be applied to all terminals.
  • some terminals will report errors due to processing exceptions due to the compatibility of encoding methods when processing the recorded rare characters, causing some terminals or some systems to be unable to handle the rare characters. word.
  • the rare characters appearing in user names written in Chinese are difficult to enumerate, it is impossible to establish a database covering all rare characters.
  • the encoding method for rare characters is used to record rare characters in all terminals, which will greatly reduce the information.
  • the processing speed is difficult to achieve, and new rare characters will be continuously received in the actual application process, which further increases the difficulty of recording and storing names containing rare Chinese characters. Therefore, the existing information storage method has a problem that it cannot store names containing rare characters.
  • the embodiments of the present application provide a name storage method, device, computer equipment, and storage medium based on character conversion, aiming to solve the problem that the information storage method in the prior art method cannot store names containing rare characters.
  • an embodiment of the present application provides a name storage method based on character conversion, which includes: if a newly-added name input by a user is received, collating the newly-added name according to a preset character verification model. Check to obtain the check result of whether it passed; if the check result is not passed, obtain the character that failed the check in the newly added name as the character to be replaced, and determine whether the preset replacement character mapping set contains and The character mapping information corresponding to the character to be replaced; if the character mapping information corresponding to the character to be replaced is included in the character replacement mapping set, the character to be replaced is converted to the replacement character according to the character replacement mapping set, In order to obtain the first replacement name and store the first replacement name; if the replacement character mapping set does not contain the character mapping information corresponding to the character to be replaced, the character mapping information corresponding to the character to be replaced is obtained according to the preset character replacement model. The replacement character corresponding to the character to be replaced is used to obtain the second replacement name and store the second replacement name.
  • an embodiment of the present application provides a name storage device based on character conversion, which includes: a name verification unit, configured to, if a newly-added name input by a user is received, perform verification based on a preset character verification model Performing verification on the newly added name to obtain a verification result of whether it is passed; a judging unit configured to, if the verification result is not passed, obtain the character that fails the verification in the newly added name as the character to be replaced, Determine whether the preset replacement character mapping set contains character mapping information corresponding to the character to be replaced; the first replacement name acquiring unit is configured to if the replacement character mapping set contains the character corresponding to the character to be replaced Mapping information, convert the character to be replaced into a replacement character according to the replacement character mapping set to obtain a first replacement name and store the first replacement name; a second replacement name acquiring unit is configured to: The replacement character mapping set does not include the character mapping information corresponding to the character to be replaced, and the replacement character corresponding to the character to be replaced is obtained according to
  • an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer
  • the program implements the name storage method based on character conversion described in the first aspect above.
  • the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned first On the one hand, the name storage method based on character conversion is described.
  • the embodiments of the present application provide a name storage method, device, computer equipment, and storage medium based on character conversion. Check the newly added name according to the character check model to obtain the check result. If the check result is not passed, judge whether the character mapping information corresponding to the character to be replaced is included in the replacement character mapping set, and if it contains the mapping set according to the replacement character. The first replacement name is obtained by converting the character to be replaced into the replacement character and stored. If it does not include obtaining the replacement character corresponding to the character to be replaced according to the preset character replacement model, the second replacement name is obtained and stored.
  • the replacement characters that are similar to the rare characters can be obtained, and the rare characters contained in the name can be replaced to obtain the first replacement name or the second replacement name, which can facilitate the storage of the name containing the rare characters and ensure that the stored The name has good compatibility.
  • FIG. 1 is a schematic flowchart of a name storage method based on character conversion provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a sub-flow of a name storage method based on character conversion provided by an embodiment of the application;
  • FIG. 3 is a schematic diagram of another sub-flow of the name storage method based on character conversion provided by an embodiment of the application;
  • FIG. 4 is a schematic diagram of another sub-flow of the name storage method based on character conversion provided by an embodiment of the application;
  • FIG. 5 is a schematic diagram of another flow chart of a name storage method based on character conversion provided by an embodiment of the application;
  • FIG. 6 is a schematic diagram of another sub-flow of the name storage method based on character conversion provided by an embodiment of the application;
  • FIG. 7 is a schematic diagram of another sub-flow of a name storage method based on character conversion provided by an embodiment of the application.
  • FIG. 8 is a schematic block diagram of a name storage device based on character conversion provided by an embodiment of the application.
  • FIG. 9 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • FIG. 1 is a schematic flowchart of a name storage method based on character conversion provided by an embodiment of the present application.
  • the name storage method based on character conversion is applied to a user terminal, and the method is executed by application software installed in the user terminal.
  • the user terminal is a terminal used to perform the name storage method based on character conversion to complete the storage of names Devices, such as desktop computers, laptops, tablets, or mobile phones.
  • the method includes steps S110 to S140.
  • the character verification model includes code conversion rules and regular expressions. Due to the differences between Chinese characters and English, numbers and other character forms, when Chinese characters are stored in the terminal device, the Chinese characters are converted to the corresponding character codes, and the character codes are stored in a binary manner. To read the corresponding Chinese character in, it is necessary to obtain the stored character code, and to parse the character code through the corresponding relationship between the character code and the Chinese character to obtain the Chinese character character.
  • the code conversion rules can convert the characters contained in the newly added name to obtain the character code corresponding to each character.
  • the regular expression can be used to verify the converted character code. When a character fails the check , The check result obtained is not passed; when all characters are checked, the check result obtained is passed.
  • step S110 includes sub-steps S111 and S112.
  • the code conversion rules include the rules for converting each Chinese character and symbol, that is, each character corresponds to a character code
  • the code conversion rules are the rules for converting characters using the Unicode character set encoding, including utf -8 encoding method, utf-16 encoding method and other conversion rules
  • utf-8 encoding method corresponds to the character encoding of commonly used Chinese characters
  • UTF-8 is convenient for different computers to use the network to transmit text in different languages and encodings
  • utf-16 The encoding method corresponds to the character encoding of other very commonly used Chinese characters except utf-8.
  • the encoding conversion rules also include the character encoding corresponding to the symbol, and the character encoding is represented by a hexadecimal number.
  • the character code is converted to " ⁇ u52c7" through the code conversion rule.
  • S112 Check the character code corresponding to the newly added name according to the regular expression to obtain a check result of whether the newly added name passes.
  • the character code corresponding to the newly added name is checked according to the regular expression to obtain a check result of whether the newly added name passes.
  • Regular expressions can be used to verify the obtained character codes.
  • the obtained character codes can be used to check whether the newly-added name conforms to the conventional Chinese name encoding.
  • each character code corresponding to the newly added name belongs to the set " ⁇ u3400— ⁇ u4dbf+ ⁇ u00B7", " ⁇ u3400— ⁇ u4dbf” is the code range corresponding to the utf-8 encoding method, " ⁇ U00B7” is the character code corresponding to the symbol " ⁇ " (some Chinese names include the symbol " ⁇ ", for example: Maimat ⁇ Aili). If each character code corresponding to the new name belongs to the above set, the new name verification is passed; if there are characters in the new name whose character code does not belong to the above set, the new name verification fails. If the newly-added name passes the verification, it indicates that the newly-added name conforms to the coding method of conventional Chinese names, and the newly-added name can be stored directly.
  • the replacement character mapping set contains multiple character mapping information.
  • the replacement character mapping set only contains some rare characters, and the new name will not pass the verification. As the character to be replaced, and determine whether the character mapping information corresponding to the character to be replaced is included in the replacement character mapping set.
  • the replacement character mapping set contains character mapping information corresponding to the character to be replaced, convert the character to be replaced into a replacement character according to the replacement character mapping set to obtain a first replacement name and The first alternative name is stored.
  • the replacement character mapping set contains character mapping information corresponding to the character to be replaced
  • the character to be replaced is converted into a replacement character according to the replacement character mapping set to obtain the first replacement name and the first replacement name
  • a replacement name is stored. If the replacement character mapping set contains the character mapping information corresponding to the character to be replaced, in order to avoid the rare characters in the new name being processed in other terminal devices, the new name may be reported due to processing exceptions. Replace the rare characters contained in, that is, replace the characters to be replaced to convert the newly-added name into a name that conforms to the encoding method of the conventional Chinese name.
  • step S130 includes sub-steps S131, S132, and S133.
  • the character mapping information contained in the replacement character mapping set is retrieved according to the character code corresponding to the character to be replaced to obtain target character mapping information.
  • the replacement character mapping set contains multiple character mapping information.
  • One character mapping information is the mapping relationship between the encoding information corresponding to a rare word and the encoding information corresponding to a mapped character, by obtaining the rare words contained in the newly added name ,
  • the rare character is the character to be replaced, and the character mapping information is retrieved based on the encoding information of the rare character to obtain the target character mapping information corresponding to the rare character.
  • the rare characters and the mapped characters have a high degree of similarity, and the replacement of the rare characters with the mapped characters has less impact on the newly-added name, and can be processed by all terminal devices compatible.
  • S132 Map the character to be replaced according to the target character mapping information to obtain a corresponding mapped character.
  • the character code that fails the verification result corresponds to the rare character in the newly added name, and the mapped character corresponding to the rare character can be obtained according to the target character mapping information.
  • the character corresponding to the mapping character in the newly added name is replaced with the mapping character to obtain the first replacement name.
  • the first replacement name is converted according to the encoding conversion rules to obtain the corresponding character code, and the binary method is used to compare all the characters.
  • the obtained character code is stored, that is, the first replacement name is stored.
  • the obtained first replacement name is "Wang Xi”
  • the character code obtained after the first replacement name is converted by the code conversion rule is " ⁇ u738b ⁇ u7199”
  • the hexadecimal character code is binary Way to store.
  • the replacement character mapping set does not include character mapping information corresponding to the character to be replaced, obtain the replacement character corresponding to the character to be replaced according to a preset character replacement model to obtain a second replacement name and Store the second alternate name.
  • the replacement character mapping set does not contain the character mapping information corresponding to the character to be replaced, the replacement character corresponding to the character to be replaced is obtained according to the preset character replacement model to obtain the second replacement name and The second alternative name is stored.
  • the character picture analysis model is a model for analyzing the generated character picture to obtain the corresponding replacement character.
  • the character replacement model includes a character picture generation model and a character picture analysis model.
  • the replacement character mapping set does not contain the character mapping information corresponding to the character to be replaced, which indicates that there are rare words in the new name that are not included in the replacement character mapping set.
  • the replacement character corresponding to the character to be replaced can be obtained through the character replacement model.
  • the newly added name and character code are all text information.
  • the character picture corresponding to the rare word needs to be generated first, and the character picture generation model is used Generate a model of character pictures corresponding to rare characters.
  • step S140 includes sub-steps S141, S142, and S143.
  • S141 Generate a character picture corresponding to the character to be replaced according to the character picture generation model.
  • a character picture corresponding to the character to be replaced is generated. Specifically, first create a div element with black characters on a white background through the character image generation model.
  • the div element provides structure and background for the block-level content in the HTML (an application under the standard general markup language) document.
  • Element will be the text information corresponding to the character to be replaced (for example ) Is added to the div element, and a character image corresponding to the character to be replaced can be generated through the Canvas plug-in.
  • the Canvas plug-in is a plug-in used to generate images in real time according to the content of the webpage in the HTML document.
  • the character picture is parsed according to the character picture analysis model to obtain a replacement character corresponding to the character to be replaced.
  • the character image analysis model includes the feature vector extraction formula and the matching degree threshold.
  • the character image analysis model also includes the feature vector corresponding to each standard character in the encoding range of " ⁇ u3400- ⁇ u4dbf", that is, each A picture feature vector corresponding to a character picture of a standard character.
  • step S142 includes sub-steps S1421, S1422, S1423, S1424, and S1425.
  • S1421 the calculation of the character picture of the character to be replaced according to the feature vector extraction formula, to obtain the feature vector of the character to be replaced; S1422, calculating the feature vector of the character to be replaced and each of the The matching degree between the feature vectors corresponding to the standard characters; S1423.
  • the resolution of the character picture corresponding to the character to be replaced is 100 ⁇ 100.
  • the resolution is 20*20 as the window and the step size is 1, and the convolution operation is performed ,
  • the step size is 9, and the down-sampling is performed to obtain a size of 9 ⁇ 9 vector matrix, which is the deep-level feature of the picture;
  • the resolution is 3 ⁇ 3 as the window, and the step size is 2 to perform the convolution operation to obtain the size It is a 4 ⁇ 4 5 vector matrix.
  • the five 4 ⁇ 4 vector matrices obtained are calculated through the first fully connected calculation formula.
  • the first fully connected formula contains a total of five nodes, and each node is associated with a 4 ⁇ 4 vector matrix. That is, the values of the five nodes associated with five 4 ⁇ 4 vector matrices are calculated through five calculation formulas.
  • the preset parameter values in the five calculation formulas can be used to calculate the values of the five nodes associated with the corresponding vector matrix; the second fully connected calculation formula is used to calculate the values of the five nodes to obtain the final character picture
  • the feature vector of the standard character is also a 1 ⁇ 16-dimensional vector matrix calculated by the above method.
  • the matching degree between the feature vector of the character to be replaced and the corresponding feature vector of each standard character can be calculated by the calculation formula.
  • the newly added name is replaced according to the replacement characters to obtain a second replacement name.
  • the replacement characters are not null characters, it means that each character to be replaced corresponds to a replacement character that is not a null character.
  • Replace the corresponding character to be replaced in the new name according to the replacement character to obtain the second replacement name Convert the second replacement name according to the code conversion rule to obtain the corresponding character code, and store the obtained character code in a binary manner. If the replacement character contains a null character, it indicates that the replacement character corresponding to one or more characters to be replaced is null, and at this time, a prompt message for handling abnormality can be fed back to the user.
  • step S150 is further included after step S140.
  • step S150 includes sub-steps S151, S152, and S153.
  • new character mapping information corresponding to the replacement character is generated.
  • the generated new character mapping information is the same as the character mapping information in Table 1.
  • the newly added name is verified according to the character verification model to obtain the verification result. If the verification result is not passed, it is determined whether the replacement character mapping set contains The character mapping information corresponding to the character to be replaced, if it includes the first replacement name obtained by converting the character to be replaced into the replacement character according to the replacement character mapping set, and store it, if it does not include the character mapping information obtained according to the preset character replacement model and the The replacement character corresponding to the replacement character obtains the second replacement name and stores it.
  • the replacement characters that are similar to the rare characters can be obtained, and the rare characters contained in the name can be replaced to obtain the first replacement name or the second replacement name, which can facilitate the storage of the name containing the rare characters and ensure that the stored The name has good compatibility.
  • the embodiment of the present application also provides a name storage device based on character conversion.
  • the name storage device based on character conversion is used to perform any embodiment of the aforementioned name storage method based on character conversion.
  • FIG. 8 is a schematic block diagram of a name storage device based on character conversion provided by an embodiment of the present application.
  • the name storage device based on character conversion can be configured in a user terminal.
  • the name storage device 100 based on character conversion includes a name verification unit 110, a judgment unit 120, a first replacement name acquisition unit 130, and a second replacement name acquisition unit 140.
  • the name verification unit 110 is configured to, if the newly-added name input by the user is received, verify the newly-added name according to a preset character verification model to obtain a verification result of passing.
  • the name verification unit 110 includes: a character code acquisition unit and a character code verification unit.
  • the character code acquisition unit is configured to convert each character in the newly added name into a corresponding character code according to the code conversion rule.
  • the character code checking unit is configured to check the character code corresponding to the newly added name according to the regular expression to obtain a check result of whether the newly added name passes.
  • the judging unit 120 is configured to, if the check result is not passed, obtain the character that fails the check in the newly added name as the character to be replaced, and determine whether the preset replacement character mapping set contains the same character as the character to be replaced. Character mapping information corresponding to the character.
  • the first replacement name obtaining unit 130 is configured to, if the replacement character mapping set contains character mapping information corresponding to the character to be replaced, convert the character to be replaced into a replacement character according to the replacement character mapping set, to Obtain the first replacement name and store the first replacement name.
  • the first replacement name obtaining unit 130 includes: a target character mapping information obtaining unit, a mapping character obtaining unit, and a mapping character replacing unit.
  • the target character mapping information acquiring unit is configured to retrieve the character mapping information contained in the replacement character mapping set according to the character code corresponding to the character to be replaced to obtain the target character mapping information.
  • the mapped character acquiring unit is used to map the character to be replaced according to the target character mapping information to obtain the corresponding mapped character.
  • the mapping character replacement unit is used to replace the character corresponding to the mapping character in the newly added name with the mapping character to obtain the first replacement name.
  • the second replacement name acquiring unit 140 is configured to, if the replacement character mapping set does not include character mapping information corresponding to the character to be replaced, obtain the replacement character corresponding to the character to be replaced according to a preset character replacement model To obtain the second alternate name and store the second alternate name.
  • the second replacement name obtaining unit 140 includes: a character picture obtaining unit, a character picture parsing unit, and a character replacing unit.
  • the character picture obtaining unit is configured to generate a character picture corresponding to the character to be replaced according to the character picture generation model.
  • the character picture analysis unit is configured to parse the character picture according to the character picture analysis model to obtain a replacement character corresponding to the character to be replaced.
  • the character replacement unit is configured to, if none of the replacement characters are empty characters, replace the newly added name according to the replacement characters to obtain a second replacement name.
  • the character picture analysis unit includes: a feature vector acquisition unit, a matching degree calculation unit, a standard character quantity judgment unit, a first replacement character acquisition unit, and a second replacement character acquisition unit.
  • the feature vector acquiring unit is configured to calculate the character picture of the character to be replaced according to the feature vector extraction formula to obtain the feature vector of the character to be replaced.
  • the matching degree calculation unit is used to calculate the matching degree between the feature vector of the character to be replaced and the feature vector corresponding to each standard character.
  • the number of standard characters judging unit is used to judge whether the number of standard characters whose matching degree with the character to be replaced is greater than the matching degree threshold is greater than zero.
  • the first replacement character acquiring unit is configured to, if the number of standard characters whose matching degree with the character to be replaced is greater than the matching degree threshold is greater than zero, obtain the standard character with the highest matching degree as the character corresponding to the character to be replaced Replace characters.
  • the second replacement character acquiring unit is configured to, if the number of standard characters whose matching degree with the character to be replaced is greater than the matching degree threshold is not greater than zero, use a null character as the replacement character corresponding to the character to be replaced.
  • the name storage device 100 based on character conversion further includes: a character mapping information adding unit.
  • the character mapping information adding unit is used to generate new character mapping information corresponding to the replacement characters contained in the second replacement name and add it to the replacement character mapping set.
  • the character mapping information adding unit includes a first coding information obtaining unit, a second coding information obtaining unit, and a new character mapping information generating unit.
  • the first encoding information acquiring unit is configured to acquire the character encoding corresponding to the character to be replaced in the newly added name as the first encoding information according to the encoding conversion rule.
  • the second encoding information acquiring unit is configured to acquire the character encoding corresponding to the replacement character as the second encoding information according to the encoding conversion rule.
  • the newly added character mapping information generating unit is configured to generate newly added character mapping information corresponding to the replacement character according to the corresponding relationship between the first code information and the second code information.
  • the name storage device based on character conversion provided in the embodiment of the present application is used to perform the above-mentioned name storage method based on character conversion, which can obtain replacement characters similar to rare characters, and replace the rare characters contained in the name to obtain the first replacement.
  • the name or the second replacement name can facilitate the storage of names containing rare characters and ensure that the stored names have good compatibility.
  • the above-mentioned name storage device based on character conversion can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 9.
  • FIG. 9 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute the name storage method based on character conversion.
  • the processor 502 is used to provide computing and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute the name storage method based on character conversion.
  • the network interface 505 is used for network communication, such as providing data information transmission.
  • the network interface 505 is used for network communication, such as providing data information transmission.
  • FIG. 9 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 502 is configured to run a computer program 5032 stored in the memory to implement the name storage method based on character conversion in this embodiment.
  • the embodiment of the computer device shown in FIG. 9 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or less components than those shown in the figure. Or some parts are combined, or different parts are arranged.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 9 and will not be repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the name storage method based on character conversion in the embodiment of the present application.
  • the storage medium may be an internal storage unit of the aforementioned device, such as a hard disk or memory of the device.
  • the storage medium may also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a Secure Digital (SD) card, and a flash memory card. (Flash Card) and so on.
  • the storage medium may also include both an internal storage unit of the device and an external storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Document Processing Apparatus (AREA)

Abstract

本申请公开了基于字符转换的姓名存储方法、装置、计算机设备。方法包括:根据字符校验模型对新增姓名进行校验以得到校验结果,若校验结果不通过,判断替换字符映射集合中是否包含与待替换字符对应的字符映射信息,若包含根据替换字符映射集合将所述待替换字符转换为替换字符得到第一替换姓名并存储,若不包含根据预置的字符替换模型获取与所述待替换字符对应的替换字符得到第二替换姓名并存储。

Description

基于字符转换的姓名存储方法、装置、计算机设备
本申请要求于2019年10月16日提交中国专利局、申请号为201910983727.9、申请名称为“基于字符转换的姓名存储方法、装置、计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种基于字符转换的姓名存储方法、装置、计算机设备。
背景技术
在对信息进行记载时,为便于所记载的信息在各终端或各系统之间进行流转,需采用所有终端均支持的编码方式对信息进行记载。例如,utf-8编码方式由于兼容性强、使用范围广,采用utf-8编码方式所记载的信息可在所有终端中适用。然而在对所输入的包含中文生僻字的姓名进行记载时,部分终端因编码方式的兼容性问题对所记载的生僻字进行处理时会因处理异常而报错,导致部分终端或部分系统无法处理生僻字。由于以中文书写的用户姓名中所出现的生僻字难以枚举,导致无法建立覆盖所有生僻字的数据库,在所有终端中均采用适用生僻字的编码方式对生僻字进行记载,会极大降低信息处理速度且难以实现,并且实际应用过程中会不断接收到新的生僻字,进一步增加了对包含中文生僻字的姓名进行记载存储的难度。因而,现有的信息存储方法存在无法对包含生僻字的姓名进行存储的问题。
发明内容
本申请实施例提供了一种基于字符转换的姓名存储方法、装置、计算机设备及存储介质,旨在解决现有技术方法中的信息存储方法存在无法对包含生僻字的姓名进行存储的问题。
第一方面,本申请实施例提供了一种基于字符转换的姓名存储方法,其包括:若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果;若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合 中是否包含与所述待替换字符对应的字符映射信息;若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储;若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
第二方面,本申请实施例提供了一种基于字符转换的姓名存储装置,其包括:姓名校验单元,用于若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果;判断单元,用于若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息;第一替换姓名获取单元,用于若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储;第二替换姓名获取单元,用于若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的基于字符转换的姓名存储方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的基于字符转换的姓名存储方法。
本申请实施例提供了一种基于字符转换的姓名存储方法、装置、计算机设备及存储介质。根据字符校验模型对新增姓名进行校验以得到校验结果,若校验结果不通过,判断替换字符映射集合中是否包含与待替换字符对应的字符映射信息,若包含根据替换字符映射集合将所述待替换字符转换为替换字符得到第一替换姓名并存储,若不包含根据预置的字符替换模型获取与所述待替换字符对应的替换字符得到第二替换姓名并存储。通过上述方法,可获取与生僻字相近的替换字符对姓名中所包含的生僻字进行替换得到第一替换姓名或第二替换姓名,可方便对包含生僻字的姓名进行存储,并确保所存储的姓名具有良好的兼容性。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的基于字符转换的姓名存储方法的流程示意图;
图2为本申请实施例提供的基于字符转换的姓名存储方法的子流程示意图;
图3为本申请实施例提供的基于字符转换的姓名存储方法的另一子流程示意图;
图4为本申请实施例提供的基于字符转换的姓名存储方法的另一子流程示意图;
图5为本申请实施例提供的基于字符转换的姓名存储方法的另一流程示意图;
图6为本申请实施例提供的基于字符转换的姓名存储方法的另一子流程示意图;
图7为本申请实施例提供的基于字符转换的姓名存储方法的另一子流程示意图;
图8为本申请实施例提供的基于字符转换的姓名存储装置的示意性框图;
图9为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该” 意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
请参阅图1,图1是本申请实施例提供的基于字符转换的姓名存储方法的流程示意图。该基于字符转换的姓名存储方法应用于用户终端中,该方法通过安装于用户终端中的应用软件进行执行,用户终端即是用于执行基于字符转换的姓名存储方法以完成对姓名进行存储的终端设备,例如台式电脑、笔记本电脑、平板电脑或手机等。
如图1所示,该方法包括步骤S110~S140。
S110、若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果。
若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果。其中,所述字符校验模型中包括编码转换规则及正则表达式。由于汉字字符与英文、数字等字符形式存在差异,在终端设备中对汉字字符进行存储时,均是将汉字字符转换为对应的字符编码,并采用二进制的方式对字符编码进行存储,从终端设备中读取对应的汉字字符,则需获取所存储的字符编码,并通过字符编码与汉字的对应关系对字符编码进行解析得到汉字字符。编码转换规则即可对新增姓名中所包含的字符进行转换以得到每一字符对应的字符编码,正则表达式即可用于对转换所得的字符编码进行校验,当某一字符校验不通过,则得到的校验结果为不通过;当所有字符均校验通过,则得到的校验结果为通过。
在一实施例中,如图2所示,步骤S110包括子步骤S111和S112。
S111、根据所述编码转换规则将所述新增姓名中每一字符转换为对应的字符编码。
根据所述编码转换规则将所述新增姓名中每一字符转换为对应的字符编码。具体的,编码转换规则中包含对每一汉字、符号进行转换的规则,也即是每一字符对应一个字符编码,编码转换规则即为采用Unicode字符集编码对字符进行转换的规则,其中包含utf-8编码方式、utf-16编码方式等多种转换规则,utf-8编码方式对应常用汉字的字符编码,UTF-8便于不同的计算机之间使用网络传输不同语言和编码的文字,utf-16编码方式对应除utf-8之外的其他非常用汉字的字符编码,编码转换规则中还包括与符号对应的字符编码,字符编码采用十 六进制数进行表示。
例如,新增姓名中包含“勇”这个字符,通过编码转换规则对该字符进行转换得到对应的字符编码为“\u52c7”。
S112、根据所述正则表达式对所述新增姓名对应的字符编码进行校验以得到所述新增姓名是否通过的校验结果。
根据所述正则表达式对所述新增姓名对应的字符编码进行校验以得到所述新增姓名是否通过的校验结果。正则表达式即可用于对所得到的字符编码进行校验,可通过所得到的字符编码对新增姓名是否符合常规中文姓名的编码方式进行校验。
具体的,可对新增姓名对应的每一字符编码是否属于“\u3400—\u4dbf+\u00B7”这一集合进行判断,“\u3400—\u4dbf”为采用utf-8编码方式对应的编码范围,“\u00B7”为“·”这一符号对应的字符编码(部分中文姓名中包含“·”这一符号,例如:买买提·艾力)。若新增姓名对应的每一字符编码均属于上述集合,则新增姓名校验通过;若新增姓名中存在字符编码不属于上述集合的字符,则新增姓名校验不通过。若新增姓名校验通过,则表明新增姓名符合常规中文姓名的编码方式,可直接对新增姓名进行存储。
S120、若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息。
若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息。若新增姓名校验不通过,也即是新增姓名对应的字符编码中存在校验不通过的字符编码,则表明新增姓名不符合常规中文姓名的编码方式,也即是新增姓名中包含无法在所有终端设备中兼容的生僻字,新增姓名中字符编码不属于上述集合的字符即为生僻字。具体的,替换字符映射集合中包含多个字符映射信息,由于实际应用过程中所遇到的生僻字难以枚举,替换字符映射集合中仅包含部分生僻字,将新增姓名中校验不通过的字符作为待替换字符,并判断替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息。
S130、若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储。
若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息, 根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储。若替换字符映射集合中包含与待替换字符对应的字符映射信息,为避免新增姓名中的生僻字在其他终端设备中进行处理时会因处理异常而报错,可通过替换字符合集对新增姓名中所包含的生僻字进行替换,也即是对待替换字符进行替换以将新增姓名转换为符合常规中文姓名的编码方式的姓名。
在一实施例中,如图3所示,步骤S130包括子步骤S131、S132和S133。
S131、根据所述待替换字符对应的字符编码对所述替换字符映射集合中所包含的字符映射信息进行检索以得到目标字符映射信息。
根据所述待替换字符对应的字符编码对所述替换字符映射集合中所包含的字符映射信息进行检索以得到目标字符映射信息。替换字符映射集合中包含多个字符映射信息,一个字符映射信息即为一个生僻字对应的编码信息与一个映射字符对应的编码信息之间的映射关系,通过获取新增姓名中所包含的生僻字,生僻字也即是待替换字符,并基于生僻字的编码信息对字符映射信息进行检索,即可得到与该生僻字对应的目标字符映射信息。其中,生僻字与映射字符之间具有较高的相似度,采用映射字符替换生僻字对新增姓名的影响较小,且可被所有终端设备兼容处理。
例如,替换字符映射集合中部分字符映射信息如表1所示。
Figure PCTCN2019118235-appb-000001
表1
S132、根据所述目标字符映射信息对所述待替换字符进行映射,以得到对应的映射字符。
根据所述目标字符映射信息对所述待替换字符进行映射,以得到对应的映射字符。校验结果为不通过的字符编码即与新增姓名中的生僻字对应,根据目标字符映射信息即可获取与该生僻字对应的映射字符。
S133、将所述新增姓名中与所述映射字符对应的字符替换为映射字符,以得到第一替换姓名。
将所述新增姓名中与所述映射字符对应的字符替换为映射字符,以得到第 一替换姓名。将新增姓名中与映射字符对应的生僻字替换为该映射字符即可得到第一替换姓名,根据编码转换规则对该第一替换姓名进行转换得到对应的字符编码,并采用二进制的方式对所得到的字符编码进行存储,也即是对第一替换姓名进行存储。
例如,所得到的第一替换姓名为“王熙”,通过编码转换规则对该第一替换姓名进行转换后得到的字符编码为“\u738b\u7199”,将十六进制的字符编码采用二进制方式进行存储。
S140、若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。字符图片解析模型即是用于对所生成的字符图片进行解析以获取对应替换字符的模型,所述字符替换模型中包括字符图片生成模型、字符图片解析模型。替换字符映射集合中不包含与待替换字符对应的字符映射信息,则表明新增姓名中存在不包含于替换字符映射集合中的生僻字,可通过字符替换模型获取待替换字符对应的替换字符,以得到新增姓名对应的第二替换姓名。新增姓名及字符编码等均属于文本信息,为对不包含于替换字符映射集合中的生僻字进行识别、替换,需先生成与该生僻字对应的字符图片,字符图片生成模型即是用于生成与生僻字对应字符图片的模型。
在一实施例中,如图4所示,步骤S140包括子步骤S141、S142和S143。
S141、根据所述字符图片生成模型生成与所述待替换字符对应的字符图片。
根据所述字符图片生成模型生成与所述待替换字符对应的字符图片。具体的,通过字符图片生成模型先创建一个白底黑字的div元素,div元素即是为HTML(标准通用标记语言下的一个应用)文档内大块(block-level)的内容提供结构和背景的元素,将与待替换字符对应的文本信息(例如
Figure PCTCN2019118235-appb-000002
)添加至该div元素中,通过Canvas插件即可生成与该待替换字符对应的字符图片,Canvas插件即是在HTML文档内用于根据网页内容实时生成图像的插件。
S142、根据所述字符图片解析模型对所述字符图片进行解析,以获取与所述待替换字符对应的替换字符。
根据所述字符图片解析模型对所述字符图片进行解析,以获取与所述待替换字符对应的替换字符。具体的,字符图片解析模型中包括特征向量提取公式 及匹配度阈值,字符图片解析模型中还包括“\u3400-\u4dbf”这一编码范围的每一标准字符对应的特征向量,也即是每一标准字符的字符图片对应的图片特征向量。
在一实施例中,如图7所示,步骤S142包括子步骤S1421、S1422、S1423、S1424和S1425。
S1421、所述根据所述特征向量提取公式对所述待替换字符的字符图片进行计算,以得到所述待替换字符的特征向量;S1422、计算所述待替换字符的特征向量与每一所述标准字符对应的特征向量之间的匹配度;S1423、判断与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量是否大于零;S1424、若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量大于零,获取匹配度最高的标准字符作为与所述待替换字符对应的替换字符;S1425、若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量不大于零,将空字符作为与所述待替换字符对应的替换字符。
例如,待替换字符对应的字符图片的分辨率为100×100,根据特征向量提取公式中第一卷积核的计算公式,以分辨率20*20作为窗口,步长为1,进行卷积操作,以得到大小为81×81的向量矩阵,也即是图片的浅层特征;根据池化计算公式,以分辨率9×9作为窗口,步长为9,进行降采样,以得到大小为9×9的向量矩阵,也即是图片的深层次特征;根据5个第二卷积核中的计算公式,以分辨率3×3作为窗口,步长为2的进行卷积操作,以得到大小为4×4的5个向量矩阵。通过第一全连接计算公式,对所得到的5个4×4的向量矩阵进行计算,第一全连接公式中共包含五个节点,每一个节点均与1个4×4的向量矩阵相关联,也即是分别通过五个计算公式计算得到与5个4×4的向量矩阵相关联的五个节点的值,第一个计算公式可表示为C 1=w 1×X 1+b 1,其中,C 1为第一个节点的计算值,X 1为该图片对应的第一个向量矩阵中的数值,w 1和b 1为第一节点与第一个向量矩阵相关联的第一计算公式中所预设的参数值,通过五个计算公式即可计算与对应向量矩阵向关联的五个节点的值;通过第二全连接计算公式对五个节点的值进行计算以得到最终该字符图片的特征向量,计算公式为F 1=a 1×C 1+a 2×C 2+a 3×C 3+a 4×C 4+a 5×C 5;其中C 1、C 2、C 3、C 4、C 5为与该图片的向量矩阵相关联的五个节点的值,a 1、a 2、a 3、a 4、a 5为五个节点至最后输出节点的预设参数值,由于4×4的向量矩阵共包含16个数值,最后得到该字符图片的特征向量为一个1×16维的向量矩阵,可以采用F x=(f 1,f 2……f 16)来表示。标准字符的特征向量亦是通过上述方法所计算得到的一个1×16维的向量矩阵, 通过计算公式即可计算待替换字符的特征向量与每一标准字符对应特征向量之间的匹配度。具体的,匹配度可通过P=1-((f 1-g 1) 2+(f 2-g 2) 2+…+(f 16-g 16) 2)/(g 1 2+g 2 2+…+g 16 2),其中,字符图片的特征向量为F x=(f 1,f 2……f 16),标准字符的特征向量为G=(g 1,g 2……g 16)。
S143、若所述替换字符均不为空字符,根据所述替换字符对所述新增姓名进行替换以得到第二替换姓名。
若所述替换字符均不为空字符,根据所述替换字符对所述新增姓名进行替换以得到第二替换姓名。若替换字符均不为空字符,则表明每一待替换字符均对应一个不为空字符的替换字符,根据替换字符对新增姓名中对应的待替换字符进行替换,即可得到第二替换姓名,根据编码转换规则对该第二替换姓名进行转换得到对应的字符编码,并采用二进制的方式对所得到的字符编码进行存储。若所述替换字符包含空字符,则表明某一个或多个待替换字符对应的替换字符为空,此时可向所述用户反馈处理异常的提示信息。
在一实施例中,如图5所示,步骤S140之后还包括步骤S150。
S150、生成与所述第二替换姓名中所包含的替换字符对应的新增字符映射信息并添加至所述替换字符映射集合中。
生成与所述第二替换姓名中所包含的替换字符对应的新增字符映射信息并添加至所述替换字符映射集合中。由于新增姓名中存在不包含于替换字符映射集合中的生僻字,为避免替换字符映射集合再次遇到相同的生僻字时无法获取对应的目标字符映射信息,可根据所得到的替换字符获取对应的新增字符映射信息,并将该新增字符映射信息添加至替换字符映射集合中,则再次遇到与该替换字符对应的生僻字时,可通过更新后的替换字符映射集合获取与该生僻字对应的映射字符。
在一实施例中,如图6所示,步骤S150包括子步骤S151、S152和S153。
S151、根据所述编码转换规则获取所述新增姓名中待替换字符对应的字符编码作为第一编码信息。
根据所述编码转换规则获取所述新增姓名中待替换字符对应的字符编码作为第一编码信息。获取所述新增姓名中与所述替换字符对应的第一编码信息,也即是获取新增姓名中与替换字符对应生僻字的字符编码作为第一编码信息,获取第一编码信息的具体方式同上述字符编码的获取方式,在此不作赘述。
S152、根据所述编码转换规则获取所述替换字符对应的字符编码作为第二编码信息。
根据所述编码转换规则获取所述替换字符对应的字符编码作为第二编码信息,也即是获取与新增姓名中生僻字对应替换字符的字符编码作为第二编码信息,获取第二编码信息的具体方式同上述字符编码的获取方式,在此不作赘述。
S153、根据所述第一编码信息与所述第二编码信息的对应关系,生成与所述替换字符对应的新增字符映射信息。
根据所述第一编码信息与所述第二编码信息的对应关系,生成与所述替换字符对应的新增字符映射信息。具体的,所生成的新增字符映射信息同表1中的字符映射信息。
在本申请实施例所提供的基于字符转换的姓名存储方法中,根据字符校验模型对新增姓名进行校验以得到校验结果,若校验结果不通过,判断替换字符映射集合中是否包含与待替换字符对应的字符映射信息,若包含根据替换字符映射集合将所述待替换字符转换为替换字符得到第一替换姓名并存储,若不包含根据预置的字符替换模型获取与所述待替换字符对应的替换字符得到第二替换姓名并存储。通过上述方法,可获取与生僻字相近的替换字符对姓名中所包含的生僻字进行替换得到第一替换姓名或第二替换姓名,可方便对包含生僻字的姓名进行存储,并确保所存储的姓名具有良好的兼容性。
本申请实施例还提供一种基于字符转换的姓名存储装置,该基于字符转换的姓名存储装置用于执行前述基于字符转换的姓名存储方法的任一实施例。具体地,请参阅图8,图8是本申请实施例提供的基于字符转换的姓名存储装置的示意性框图。该基于字符转换的姓名存储装置可以配置于用户终端中。
如图8所示,基于字符转换的姓名存储装置100包括姓名校验单元110、判断单元120、第一替换姓名获取单元130和第二替换姓名获取单元140。
姓名校验单元110,用于若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果。
在一实施例中,所述姓名校验单元110包括:字符编码获取单元和字符编码校验单元。
字符编码获取单元,用于根据所述编码转换规则将所述新增姓名中每一字符转换为对应的字符编码。字符编码校验单元,用于根据所述正则表达式对所述新增姓名对应的字符编码进行校验以得到所述新增姓名是否通过的校验结果。
判断单元120,用于若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息。
第一替换姓名获取单元130,用于若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储。
在一实施例中,所述第一替换姓名获取单元130包括:目标字符映射信息获取单元、映射字符获取单元和映射字符替换单元。
目标字符映射信息获取单元,用于根据所述待替换字符对应的字符编码对所述替换字符映射集合中所包含的字符映射信息进行检索以得到目标字符映射信息。映射字符获取单元,用于根据所述目标字符映射信息对所述待替换字符进行映射,以得到对应的映射字符。映射字符替换单元,用于将所述新增姓名中与所述映射字符对应的字符替换为映射字符,以得到第一替换姓名。
第二替换姓名获取单元140,用于若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
在一实施例中,所述第二替换姓名获取单元140包括:字符图片获取单元、字符图片解析单元和字符替换单元。
字符图片获取单元,用于根据所述字符图片生成模型生成与所述待替换字符对应的字符图片。字符图片解析单元,用于根据所述字符图片解析模型对所述字符图片进行解析,以获取与所述待替换字符对应的替换字符。字符替换单元,用于若所述替换字符均不为空字符,根据所述替换字符对所述新增姓名进行替换以得到第二替换姓名。
在一实施例中,所述字符图片解析单元包括:特征向量获取单元、匹配度计算单元、标准字符数量判断单元、第一替换字符获取单元和第二替换字符获取单元。
特征向量获取单元,用于所述根据所述特征向量提取公式对所述待替换字符的字符图片进行计算,以得到所述待替换字符的特征向量。匹配度计算单元,用于计算所述待替换字符的特征向量与每一所述标准字符对应的特征向量之间的匹配度。标准字符数量判断单元,用于判断与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量是否大于零。第一替换字符获取单元,用于若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量大于零,获取匹配度最高的标准字符作为与所述待替换字符对应的替换字符。第二替换字符获取单元,用于若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量不大于零,将空字符作为与所述待替换字符对应的替换字符。
在一实施例中,所述基于字符转换的姓名存储装置100还包括:字符映射信息新增单元。
字符映射信息新增单元,用于生成与所述第二替换姓名中所包含的替换字符对应的新增字符映射信息并添加至所述替换字符映射集合中。
在一实施例中,所述字符映射信息新增单元包括第一编码信息获取单元、第二编码信息获取单元和新增字符映射信息生成单元。
第一编码信息获取单元,用于根据所述编码转换规则获取所述新增姓名中待替换字符对应的字符编码作为第一编码信息。第二编码信息获取单元,用于根据所述编码转换规则获取所述替换字符对应的字符编码作为第二编码信息。新增字符映射信息生成单元,用于根据所述第一编码信息与所述第二编码信息的对应关系,生成与所述替换字符对应的新增字符映射信息。
在本申请实施例所提供的基于字符转换的姓名存储装置用于执行上述基于字符转换的姓名存储方法,可获取与生僻字相近的替换字符对姓名中所包含的生僻字进行替换得到第一替换姓名或第二替换姓名,可方便对包含生僻字的姓名进行存储,并确保所存储的姓名具有良好的兼容性。
上述基于字符转换的姓名存储装置可以实现为计算机程序的形式,该计算机程序可以在如图9所示的计算机设备上运行。
请参阅图9,图9是本申请实施例提供的计算机设备的示意性框图。
参阅图9,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行基于字符转换的姓名存储方法。该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行基于字符转换的姓名存储方法。该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实 现本实施例的基于字符转换的姓名存储方法。
本领域技术人员可以理解,图9中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图9所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable GateArray,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介质可以为非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本申请实施例的基于字符转换的姓名存储方法。
所述存储介质可以是前述设备的内部存储单元,例如设备的硬盘或内存。所述存储介质也可以是所述设备的外部存储设备,例如所述设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储介质还可以既包括所述设备的内部存储单元也包括外部存储设备。所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种基于字符转换的姓名存储方法,应用于用户终端,所述方法包括:
    若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果;
    若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息;
    若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储;
    若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
  2. 根据权利要求1所述的基于字符转换的姓名存储方法,其中,所述字符校验模型包括编码转换规则及正则表达式,所述根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果,包括:
    根据所述编码转换规则将所述新增姓名中每一字符转换为对应的字符编码;
    根据所述正则表达式对所述新增姓名对应的字符编码进行校验以得到所述新增姓名是否通过的校验结果。
  3. 根据权利要求1所述的基于字符转换的姓名存储方法,其中,所述根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储,包括:
    根据所述待替换字符对应的字符编码对所述替换字符映射集合中所包含的字符映射信息进行检索以得到目标字符映射信息;
    根据所述目标字符映射信息对所述待替换字符进行映射,以得到对应的映射字符;
    将所述新增姓名中与所述映射字符对应的字符替换为映射字符,以得到第一替换姓名。
  4. 根据权利要求1所述的基于字符转换的姓名存储方法,其中,所述字符 替换模型包括字符图片生成模型、字符图片解析模型,所述根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储,包括:
    根据所述字符图片生成模型生成与所述待替换字符对应的字符图片;
    根据所述字符图片解析模型对所述字符图片进行解析,以获取与所述待替换字符对应的替换字符;
    若所述替换字符均不为空字符,根据所述替换字符对所述新增姓名进行替换以得到第二替换姓名。
  5. 根据权利要求4所述的基于字符转换的姓名存储方法,其中,还包括:
    生成与所述第二替换姓名中所包含的替换字符对应的新增字符映射信息并添加至所述替换字符映射集合中。
  6. 根据权利要求5所述的基于字符转换的姓名存储方法,其中,所述生成与所述第二替换姓名中所包含的替换字符对应的新增字符映射信息并添加至所述替换字符映射集合中,包括:
    根据所述编码转换规则获取所述新增姓名中待替换字符对应的字符编码作为第一编码信息;
    根据所述编码转换规则获取所述替换字符对应的字符编码作为第二编码信息;
    根据所述第一编码信息与所述第二编码信息的对应关系,生成与所述替换字符对应的新增字符映射信息。
  7. 根据权利要求4所述的基于字符转换的姓名存储方法,其中,所述字符图片解析模型包括特征向量提取公式、匹配度阈值及每一标准字符对应的特征向量,所述根据所述字符图片解析模型对所述字符图片进行解析,以获取与所述待替换字符对应的替换字符,包括:
    根据所述特征向量提取公式对所述待替换字符的字符图片进行计算,以得到所述待替换字符的特征向量;
    计算所述待替换字符的特征向量与每一所述标准字符对应的特征向量之间的匹配度;
    判断与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量是否大于零;
    若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量大 于零,获取匹配度最高的标准字符作为与所述待替换字符对应的替换字符;
    若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量不大于零,将空字符作为与所述待替换字符对应的替换字符。
  8. 一种基于字符转换的姓名存储装置,包括:
    姓名校验单元,用于若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果;
    判断单元,用于若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息;
    第一替换姓名获取单元,用于若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储;
    第二替换姓名获取单元,用于若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
  9. 根据权利要求8所述的基于字符转换的姓名存储装置,其中,所述姓名校验单元包括:
    字符编码获取单元,用于根据所述编码转换规则将所述新增姓名中每一字符转换为对应的字符编码;
    字符编码校验单元,用于根据所述正则表达式对所述新增姓名对应的字符编码进行校验以得到所述新增姓名是否通过的校验结果。
  10. 根据权利要求8所述的基于字符转换的姓名存储装置,其中,所述第一替换姓名获取单元包括:
    目标字符映射信息获取单元,用于根据所述待替换字符对应的字符编码对所述替换字符映射集合中所包含的字符映射信息进行检索以得到目标字符映射信息;
    映射字符获取单元,用于根据所述目标字符映射信息对所述待替换字符进行映射,以得到对应的映射字符;
    映射字符替换单元,用于将所述新增姓名中与所述映射字符对应的字符替换为映射字符,以得到第一替换姓名。
  11. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所 述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现以下步骤:
    若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果;
    若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息;
    若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储;
    若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
  12. 根据权利要求11所述的基于字符转换的姓名存储方法,其中,所述字符校验模型包括编码转换规则及正则表达式,所述根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果,包括:
    根据所述编码转换规则将所述新增姓名中每一字符转换为对应的字符编码;
    根据所述正则表达式对所述新增姓名对应的字符编码进行校验以得到所述新增姓名是否通过的校验结果。
  13. 根据权利要求11所述的基于字符转换的姓名存储方法,其中,所述根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储,包括:
    根据所述待替换字符对应的字符编码对所述替换字符映射集合中所包含的字符映射信息进行检索以得到目标字符映射信息;
    根据所述目标字符映射信息对所述待替换字符进行映射,以得到对应的映射字符;
    将所述新增姓名中与所述映射字符对应的字符替换为映射字符,以得到第一替换姓名。
  14. 根据权利要求11所述的基于字符转换的姓名存储方法,其中,所述字符替换模型包括字符图片生成模型、字符图片解析模型,所述根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所 述第二替换姓名进行存储,包括:
    根据所述字符图片生成模型生成与所述待替换字符对应的字符图片;
    根据所述字符图片解析模型对所述字符图片进行解析,以获取与所述待替换字符对应的替换字符;
    若所述替换字符均不为空字符,根据所述替换字符对所述新增姓名进行替换以得到第二替换姓名。
  15. 根据权利要求14所述的基于字符转换的姓名存储方法,其中,还包括:
    生成与所述第二替换姓名中所包含的替换字符对应的新增字符映射信息并添加至所述替换字符映射集合中。
  16. 根据权利要求15所述的基于字符转换的姓名存储方法,其中,所述生成与所述第二替换姓名中所包含的替换字符对应的新增字符映射信息并添加至所述替换字符映射集合中,包括:
    根据所述编码转换规则获取所述新增姓名中待替换字符对应的字符编码作为第一编码信息;
    根据所述编码转换规则获取所述替换字符对应的字符编码作为第二编码信息;
    根据所述第一编码信息与所述第二编码信息的对应关系,生成与所述替换字符对应的新增字符映射信息。
  17. 根据权利要求14所述的基于字符转换的姓名存储方法,其中,所述字符图片解析模型包括特征向量提取公式、匹配度阈值及每一标准字符对应的特征向量,所述根据所述字符图片解析模型对所述字符图片进行解析,以获取与所述待替换字符对应的替换字符,包括:
    根据所述特征向量提取公式对所述待替换字符的字符图片进行计算,以得到所述待替换字符的特征向量;
    计算所述待替换字符的特征向量与每一所述标准字符对应的特征向量之间的匹配度;
    判断与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量是否大于零;
    若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量大于零,获取匹配度最高的标准字符作为与所述待替换字符对应的替换字符;
    若与所述待替换字符之间匹配度大于所述匹配度阈值的标准字符的数量不 大于零,将空字符作为与所述待替换字符对应的替换字符。
  18. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
    若接收到用户所输入的新增姓名,根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果;
    若所述校验结果为不通过,获取所述新增姓名中校验不通过的字符作为待替换字符,判断预置的替换字符映射集合中是否包含与所述待替换字符对应的字符映射信息;
    若所述替换字符映射集合中包含与所述待替换字符对应的字符映射信息,根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储;
    若所述替换字符映射集合中不包含与所述待替换字符对应的字符映射信息,根据预置的字符替换模型获取与所述待替换字符对应的替换字符,以得到第二替换姓名并对所述第二替换姓名进行存储。
  19. 根据权利要求18所述的基于字符转换的姓名存储方法,其中,所述字符校验模型包括编码转换规则及正则表达式,所述根据预置的字符校验模型对所述新增姓名进行校验以得到是否通过的校验结果,包括:
    根据所述编码转换规则将所述新增姓名中每一字符转换为对应的字符编码;
    根据所述正则表达式对所述新增姓名对应的字符编码进行校验以得到所述新增姓名是否通过的校验结果。
  20. 根据权利要求18所述的基于字符转换的姓名存储方法,其中,所述根据所述替换字符映射集合将所述待替换字符转换为替换字符,以得到第一替换姓名并对所述第一替换姓名进行存储,包括:
    根据所述待替换字符对应的字符编码对所述替换字符映射集合中所包含的字符映射信息进行检索以得到目标字符映射信息;
    根据所述目标字符映射信息对所述待替换字符进行映射,以得到对应的映射字符;
    将所述新增姓名中与所述映射字符对应的字符替换为映射字符,以得到第一替换姓名。
PCT/CN2019/118235 2019-10-16 2019-11-14 基于字符转换的姓名存储方法、装置、计算机设备 WO2021072872A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910983727.9 2019-10-16
CN201910983727.9A CN111046631A (zh) 2019-10-16 2019-10-16 基于字符转换的姓名存储方法、装置、计算机设备

Publications (1)

Publication Number Publication Date
WO2021072872A1 true WO2021072872A1 (zh) 2021-04-22

Family

ID=70232308

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118235 WO2021072872A1 (zh) 2019-10-16 2019-11-14 基于字符转换的姓名存储方法、装置、计算机设备

Country Status (2)

Country Link
CN (1) CN111046631A (zh)
WO (1) WO2021072872A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444680B (zh) * 2020-04-25 2023-05-16 中信银行股份有限公司 生僻字的编码扩展方法、装置、存储介质及电子设备
CN113850050B (zh) * 2020-06-28 2022-09-23 荣耀终端有限公司 字符显示方法、字符显示装置及终端设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831232A (zh) * 2012-08-30 2012-12-19 山石网科通信技术(北京)有限公司 字符串的匹配方法及装置
CN108287811A (zh) * 2017-01-10 2018-07-17 阿里巴巴集团控股有限公司 一种字符校验方法及装置
CN108629046A (zh) * 2018-05-14 2018-10-09 平安科技(深圳)有限公司 一种字段匹配方法及终端设备
CN109800339A (zh) * 2018-12-13 2019-05-24 平安普惠企业管理有限公司 正则表达式生成方法、装置、计算机设备及存储介质
CN110222617A (zh) * 2019-05-29 2019-09-10 四川译讯信息科技有限公司 一种pdf文件修复方法和系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608064A (zh) * 2015-11-24 2016-05-25 小米科技有限责任公司 字符替换方法及装置
CN110135530B (zh) * 2019-05-16 2021-08-13 京东方科技集团股份有限公司 转换图像中汉字字体的方法及系统、计算机设备及介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831232A (zh) * 2012-08-30 2012-12-19 山石网科通信技术(北京)有限公司 字符串的匹配方法及装置
CN108287811A (zh) * 2017-01-10 2018-07-17 阿里巴巴集团控股有限公司 一种字符校验方法及装置
CN108629046A (zh) * 2018-05-14 2018-10-09 平安科技(深圳)有限公司 一种字段匹配方法及终端设备
CN109800339A (zh) * 2018-12-13 2019-05-24 平安普惠企业管理有限公司 正则表达式生成方法、装置、计算机设备及存储介质
CN110222617A (zh) * 2019-05-29 2019-09-10 四川译讯信息科技有限公司 一种pdf文件修复方法和系统

Also Published As

Publication number Publication date
CN111046631A (zh) 2020-04-21

Similar Documents

Publication Publication Date Title
US10755093B2 (en) Hierarchical information extraction using document segmentation and optical character recognition correction
CN109697451B (zh) 相似图像聚类方法及装置、存储介质、电子设备
US20220121966A1 (en) Knowledge graph embedding representation method, and related device
WO2019019640A1 (zh) 订单信息的模拟处理方法、装置、存储介质和计算机设备
CN111290806A (zh) 应用程序接口的调用方法、装置、计算机设备及存储介质
WO2023039942A1 (zh) 基于文本识别的要素信息提取方法、装置、设备及介质
CN110990276A (zh) 接口字段自动化测试方法、装置和存储介质
US11734341B2 (en) Information processing method, related device, and computer storage medium
WO2021072863A1 (zh) 文本相似度计算方法、装置、电子设备及计算机可读存储介质
CN113157854B (zh) Api的敏感数据泄露检测方法及系统
WO2021072872A1 (zh) 基于字符转换的姓名存储方法、装置、计算机设备
WO2021196935A1 (zh) 数据校验方法、装置、电子设备和存储介质
CN112035480A (zh) 数据表管理方法、装置、设备及存储介质
CN110888791A (zh) 一种日志处理方法、装置、设备和存储介质
US20210374190A1 (en) System and method for parsing visual information to extract data elements from randomly formatted digital documents
CN111209736A (zh) 文本文件解析方法、装置、计算机设备及存储介质
CN110688111A (zh) 业务流程的配置方法、装置、服务器和存储介质
CN113568965A (zh) 一种结构化信息的提取方法、装置、电子设备及存储介质
CN110175128B (zh) 一种相似代码案例获取方法、装置、设备和存储介质
CN111104400A (zh) 数据归一方法及装置、电子设备、存储介质
CN113391972A (zh) 一种接口测试方法及装置
CN117435189A (zh) 金融系统接口的测试用例分析方法、装置、设备及介质
CN117171030A (zh) 软件运行环境检测方法、装置、设备及存储介质
CN116955406A (zh) Sql语句生成方法、装置、电子设备及存储介质
CN113111200B (zh) 审核图片文件的方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19949035

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19949035

Country of ref document: EP

Kind code of ref document: A1