CN116303889A - Rarely-used word detection method and device, storage medium and terminal - Google Patents

Rarely-used word detection method and device, storage medium and terminal Download PDF

Info

Publication number
CN116303889A
CN116303889A CN202310096133.2A CN202310096133A CN116303889A CN 116303889 A CN116303889 A CN 116303889A CN 202310096133 A CN202310096133 A CN 202310096133A CN 116303889 A CN116303889 A CN 116303889A
Authority
CN
China
Prior art keywords
character
name
object name
determining
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310096133.2A
Other languages
Chinese (zh)
Inventor
谢涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310096133.2A priority Critical patent/CN116303889A/en
Publication of CN116303889A publication Critical patent/CN116303889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion

Abstract

The embodiment of the specification discloses a method, a device, a storage medium and a terminal for detecting rare words, and relates to the technical field of information processing. Firstly, acquiring an object name to be detected, and judging whether a substitute character or a split character exists in the object name; and then if the object exists, determining that the uncommon word exists in the object name. In practical situations, when the uncommon words exist in the object names, the uncommon words in the object names are mostly replaced or split, so that the uncommon words can be determined to exist in the object names as long as the replacement characters or the split characters exist in the object names, the uncommon words can be detected without the participation of a user manually, and the detection efficiency of the uncommon words is greatly improved.

Description

Rarely-used word detection method and device, storage medium and terminal
Technical Field
The present disclosure relates to the field of information processing technologies, and in particular, to a method, an apparatus, a storage medium, and a terminal for detecting rarely used words.
Background
Along with the rapid development of communication technology, the information age of project processing is gradually entered, and the corresponding information such as names, place names, addresses and the like is gradually removed from the original paper recording and storage modes, so that the efficiency is greatly improved. Because of the great popularity of Chinese character history, so far, there are more special rare characters such as rare characters, variant characters, local special characters and the like to continue to be used.
Disclosure of Invention
In a first aspect, embodiments of the present disclosure provide a method for detecting a rare word, the method including:
acquiring an object name to be detected, and judging whether a substitute character or a split character exists in the object name;
if so, determining that the uncommon word exists in the object name.
In one possible implementation manner, before the determining whether the substitute character or the split character exists in the object name, the method further includes: detecting whether each name character of the object name belongs to a common character of a common character code library; if each name character belongs to a common character of the common character coding library, determining that the object name does not have a rare word; and if at least one common character of the name character not belonging to the common character coding library exists, determining that the uncommon word exists in the object name.
In one possible implementation manner, the detecting whether each name character of the object name belongs to a common character of the common character code library includes: acquiring a common character recognition regular expression aiming at the common character coding library; and detecting whether each name character of the object name belongs to a common character code point range corresponding to the common character code library by adopting the common character recognition regular expression.
In one possible implementation manner, the determining whether the substitute character or the split character exists in the object name includes: judging whether each name character of the object name belongs to a replacement character range, wherein the character replacement range comprises, but is not limited to, a digital character coding range and a pinyin character coding range; if each name character does not belong to the range of the alternative character, determining that the object name does not have the alternative character; and if at least one name character belongs to the alternative character range, determining that the object name has an alternative character.
In one possible implementation manner, after the determining that the object name does not have the substitute character, the method further includes: acquiring a card face image of an identity information card corresponding to the object name, and recognizing the number of words of the card face name in the card face image to obtain a first number of words of the card face object name; if the second word number corresponding to the name character of the object name is inconsistent with the second word number, determining that split characters exist in the object name; and if the second word number corresponding to the name character of the object name is consistent with the second word number, determining that no split character exists in the object name.
In one possible implementation manner, after the determining that the uncommon word exists in the object name, the method further includes: and sending out a prompt aiming at the uncommon words in the object names, and displaying the uncommon words in the object names.
In a second aspect, embodiments of the present disclosure provide a rare word detection device, the device comprising:
the character judging module is used for acquiring the object name to be detected and judging whether a substitute character or a split character exists in the object name;
and the rarely used word determining module is used for determining that the rarely used word exists in the object name if the rarely used word exists.
In a third aspect, the present description provides a computer program product comprising instructions which, when run on a computer or a processor, cause the computer or the processor to perform the steps of the method described above.
In a fourth aspect, the present description provides a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method described above.
In a fifth aspect, embodiments of the present description provide a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being adapted to be loaded by the processor and to perform the steps of the method described above.
The technical scheme provided by some embodiments of the present specification has the following beneficial effects:
the embodiment of the specification provides a method for detecting rarely used words, which comprises the steps of firstly obtaining an object name to be detected, and judging whether a substitute character or a split character exists in the object name; and then if the object exists, determining that the uncommon word exists in the object name. In practical situations, when the uncommon words exist in the object names, the uncommon words in the object names are mostly replaced or split, so that the uncommon words can be determined to exist in the object names as long as the replacement characters or the split characters exist in the object names, the uncommon words can be detected without the participation of a user manually, and the detection efficiency of the uncommon words is greatly improved.
Drawings
In order to more clearly illustrate the embodiments of the present description or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system for detecting uncommon words according to one or more embodiments of the present disclosure;
FIG. 2 is a schematic flow diagram of a method for detecting uncommon words according to one or more embodiments of the present disclosure;
FIG. 3 is a flow chart of another method for detecting uncommon words according to one or more embodiments of the present disclosure;
FIG. 4 is a block diagram of a device for detecting uncommon words according to one or more embodiments of the present disclosure;
fig. 5 is a schematic structural diagram of a terminal according to one or more embodiments of the present disclosure.
Detailed Description
The following description of the embodiments of the present invention will be made apparent from, and elucidated with reference to, the drawings of the present specification, in which embodiments described are only some, but not all, embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
In the related technology, when a financial institution such as a bank encounters a user whose name contains a rare word which is not supported by the system, some banking lines replace the rare word with pinyin or numerals, and some banking lines separate the rare word, so that a substitute word name of the user is generated. Because the square character library is used in the related system, the alternative character names cannot pass the verification of the related system, and objective difficulty is caused to the investigation of the due job, so that a method for detecting the uncommon characters in the names of the objects such as the names is required to detect in advance.
The present specification is described in detail below with reference to specific examples.
Referring to fig. 1, a schematic view of a rare word detection system according to one or more embodiments of the present disclosure is provided. As shown in fig. 1, the rare word detection system may at least include a user side cluster and a service platform 100.
The client cluster may include at least one client, as shown in fig. 1, specifically includes a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, …, and a client n corresponding to a user n, where n is an integer greater than 0.
Each user terminal in the user terminal cluster may be a terminal with a communication function, where the terminal includes, but is not limited to: wearable devices, handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Terminals may be called different names in different networks, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, a personal digital assistant (personal digital assistant, PDA), a terminal in a 5G network or a future evolution network, and the like.
The service platform 100 may be a separate server device, such as: rack-mounted, blade, tower-type or cabinet-type server equipment or hardware equipment with stronger computing capacity such as workstations, mainframe computers and the like is adopted; the server cluster can also be formed by adopting a plurality of servers, each server in the service cluster can be formed in a symmetrical mode, wherein each server is functionally equivalent and functionally equivalent in a transaction link, each server can independently provide services to the outside, and the independent provision of the services can be understood as no assistance of other servers.
In one or more embodiments of the present disclosure, the service platform 100 may establish a communication connection with at least one client in the client cluster, and complete data interaction in the rare word detection process based on the communication connection;
illustratively, the service platform 100 may obtain the object name to be detected based on the rare word detection method of the present specification, and determine whether there is a substitute character or a split character in the object name; if the object name exists, the fact that the uncommon word exists in the object name is determined.
Illustratively, the service platform 100 may send relevant data (such as a multi-code rarely used word library) for rarely used word detection to the user side based on the rarely used word detection method of the present specification, and the user side obtains the object name to be detected, and determines whether there is a substitute character or a split character in the object name; if the object name exists, the fact that the uncommon word exists in the object name is determined.
It should be noted that, the service platform 100 establishes a communication connection with at least one user side in the user side cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, and the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., target compression packages). All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.
The embodiment of the rarely-used word detection system provided in the present specification and the rarely-used word detection method in one or more embodiments belong to the same concept, and an execution subject corresponding to the rarely-used word detection method related to one or more embodiments in the present specification may be a terminal, and the terminal may be the service platform 100 described above; the execution subject corresponding to the method for detecting the rare word according to one or more embodiments of the specification may also be a terminal corresponding to the user terminal, which is specifically determined based on the actual application environment. The implementation process of the embodiment of the rare word detection system can be described in detail in the following method embodiments, which are not repeated here.
Based on the schematic view of the scenario shown in fig. 1, the detailed description of the method for detecting the uncommon word provided in one or more embodiments of the present disclosure is provided below.
Referring to fig. 2, a flow diagram of a method for detecting uncommon words according to one or more embodiments of the present disclosure is provided, which may be implemented by a computer program and may be executed on a device for detecting uncommon words based on von neumann system. The computer program may be integrated in the application or may run as a stand-alone tool class application. The rarely used word detection device can be a terminal.
Specifically, the method for detecting the uncommon words comprises the following steps:
s202, acquiring an object name to be detected, and judging whether a substitute character or a split character exists in the object name.
The object name may be a user name (e.g., name), an account name, a geographic name, an address name, and so forth.
The object name consists of a plurality of name characters, and the name characters can be understood as the collective names of various characters and symbols, including characters of various countries, punctuation marks, graphic symbols, numbers and the like;
it will be appreciated that in daily life, some users may have a demand for uncommon words, and that there may be uncommon words in the object names of users who have a demand for uncommon words.
In the related art, when a financial institution such as a bank encounters a user whose name contains a rare word which is not supported by the system, some financial institutions replace the rare word with pinyin or numerals, and some financial institutions separate the rare word, so that a substitute word name of the user is generated. Therefore, in the embodiment of the present specification, whether the substitute character or the split character exists in each name character of the object name may be based on whether the substitute character or the split character exists in the name character, and the method for determining whether the substitute character or the split character exists in the name character may not be limited.
S204, if the object name exists, determining that the uncommon word exists in the object name.
It will be appreciated that, due to the specificity of the rarely used word, the presence of the substituted or split character in the name character, which represents the substitution or split operation of the name character in the object name, can determine that the rarely used word is present in the object name.
S206, if the rarely used word does not exist in the object name, determining that the rarely used word does not exist in the object name.
Correspondingly, if no substitute character or split character exists in the name character, that is, no substitute operation or split operation is performed on the name character in the object name, it can be determined that no uncommon word exists in the object name.
In the embodiment of the specification, firstly, an object name to be detected is obtained, and whether a substitute character or a split character exists in the object name is judged; and then if the object exists, determining that the uncommon word exists in the object name. In practical situations, when the uncommon words exist in the object names, the uncommon words in the object names are mostly replaced or split, so that the uncommon words can be determined to exist in the object names as long as the replacement characters or the split characters exist in the object names, the uncommon words can be detected without the participation of a user manually, and the detection efficiency of the uncommon words is greatly improved.
Referring to fig. 3, a flow chart of another method for detecting rare words according to one or more embodiments of the present disclosure is shown.
As shown in fig. 3, the method includes:
s302, detecting whether each name character of the object name belongs to a common character of a common character coding library.
In one or more embodiments of the present description, character encoding, i.e., unicode, also known as word set code, is the encoding of characters in a character set into an object (e.g., bit pattern, natural number sequence, 8-bit group, or electrical pulse) in a specified set for storage in a computer and delivery over a communication network. Common character encoding: ASCII, GB2312 code, GBK code, GB18030 code, UTF8, UTF16, etc.;
the common character coding library is a set of a plurality of characters, the common character coding library is more in variety, the number of characters contained in each common character coding library is different, and the common character coding library can be: an ASCII character encoding library, a GB2312 character encoding library, a BIG5 character encoding library, a GB18030 character encoding library, a Unicode character encoding library, a GBK character (encoding) encoding library and the like.
Illustratively, the common character code library may be determined based on the actual application environment, and the common character code may be, for example, a GBK character (code) code library;
the common characters can be understood as characters of a common character encoding library;
it can be understood that the object name is composed of a plurality of name characters, and whether the object name has uncommon characters can be determined in advance by detecting whether each name character of the object name belongs to a common character of the common character code library.
In a possible implementation manner, the detection of whether each name character of the object name belongs to a common character of the common character code library may be implemented by constructing or acquiring a common character recognition regular pattern for performing common character detection on the common character code library.
Illustratively, the regular pattern for common character recognition is a regular pattern for detecting common characters, and based on the regular pattern for common character recognition, whether a name character belongs to a common character code point range corresponding to a common character code library or not can be detected, if the common name character belongs to the common character code point range, the name character belongs to the common character, and if the name character does not belong to the common character code point range, the name character can be a rare character;
for example, the following partial pseudocode involved in the common character recognition regular expression may be employed to detect whether each name character in the object name is all a common character within the common character code library;
GBK kanji regular
"[\u4e00-\u9fa5]"
Wherein "\u" is a unicode code point, 4e00 is a unicode regular start code point, and 9fa5 is a unicode regular end code point; the initial code point and the end code point form a character code point range aiming at a common character code library, and whether the code point corresponding to the name character belongs to the common character code point range is detected through common character recognition regular codes, so that whether the name character is a common character of the common character code library is detected.
S304, if each name character belongs to a common character of the common character code library, determining that the object name has no uncommon character.
It can be understood that by performing the foregoing common character detection process on each name character, if each name character belongs to a common character of the common character code library, the electronic device can determine that the object name does not have uncommon characters.
S306, if at least one name character does not belong to the common characters of the common character code library, determining that the uncommon characters exist in the object name.
It can be understood that, by performing the foregoing common character detection process on each name character, if at least one common character exists in which the name character does not belong to the common character code library, the electronic device can determine that the object name has a uncommon character;
in one or more embodiments of the present disclosure, conventional character detection may be quickly implemented by comparing a conventional character recognition rule against a conventional character code library to a character-by-character and name-to-character comparison, thereby improving character processing efficiency.
S308, judging whether each name character of the object name belongs to a replacement character range, wherein the character replacement range comprises, but is not limited to, a numerical character coding range and a Pinyin character coding range.
It can be understood that when the rarely used word exists in the object name, one way to solve the problem that the rarely used word cannot be input is to replace the rarely used word in the object name, that is, replace the rarely used word in the object name with other recognizable characters, so that one way to determine whether the replacement characters exist in the object name is to determine whether each name character of the object name belongs to a replacement character range, wherein the character replacement range includes but is not limited to a digital character encoding range and a pinyin character encoding range.
S310, if each name character does not belong to the range of the alternative character, determining that the alternative character does not exist in the object name.
It will be appreciated that if each name character does not belong to the range of substitute characters, it is determined that there are no substitute characters for the object name, i.e., no substitute operations are performed on the words in the object name.
S312, if at least one name character belongs to the substitute character range, determining that the substitute character exists in the object name.
It will be appreciated that if there is at least one name character belonging to the substitute character range, it is determined that there is a substitute character for the object name, i.e. that a substitute operation is performed on at least one word in the object name.
S314, acquiring a card face image of the identity information card corresponding to the object name, and recognizing the number of words of the card face name in the card face image to obtain a first number of words of the card face object name.
It can be understood that after determining that the object name does not have the substitute character, the card surface image of the identity information card corresponding to the object name can be obtained, wherein the card surface of the identity information card is printed with the identity information of the user, and then the card surface image printed with the information about the object name on the identity information card can be obtained by means of photographing, shooting and the like, and the card surface object name can be obtained in the card surface image.
Further, word number recognition can be performed on the card face names in the card face image, wherein the object content in the image can be recognized through an image character recognition (optical character recognition, OCR) technology, and then the first word number of the card face object names is obtained.
S316, if the second word number corresponding to the name character of the object name is inconsistent with the second word number, determining that the split character exists in the object name.
It can be understood that the number of words corresponding to the name characters of the object name can be counted to obtain the second number of words of the object name, the first number of words and the second number of words are compared, if the second number of words corresponding to the name characters of the object name is inconsistent with the second number of words, the splitting operation is performed on the words in the object name, and it can be determined that the splitting characters exist in the object name.
S318, if the second word number corresponding to the name character of the object name is consistent with the second word number, determining that the split character does not exist in the object name.
Accordingly, if the number of the second words corresponding to the name characters of the object name is consistent with the number of the second words, that is, the splitting operation is not performed on the words in the object name, it may be determined that the splitting characters do not exist in the object name.
S320, if the substitute character or the split character exists in the object name, determining that the uncommon word exists in the object name.
It will be appreciated that, due to the specificity of the rarely used word, the presence of the substituted or split character in the name character, which represents the substitution or split operation of the name character in the object name, can determine that the rarely used word is present in the object name.
S322, sending out a prompt aiming at the uncommon words in the object names and displaying the uncommon words in the object names.
Further, after the fact that the uncommon words exist in the object names is determined, reminding can be sent out for the uncommon words in the object names, and the uncommon words in the object names are displayed, so that a user can know the uncommon words in the object names in time.
S324, if the substitute character or the split character does not exist in the object name, determining that the rarely used word does not exist in the object name.
In the embodiment of the specification, whether the rarely used word exists in the object name or not can be determined by judging whether each name character of the object name belongs to the range of the substituted character or not and judging the number of words in the card face image of the identity information card corresponding to the object name, so that the rarely used word can be detected without the participation of a user manually, and the detection efficiency of the rarely used word is greatly improved.
Referring to fig. 4, fig. 4 is a block diagram of a device for detecting rarely used words according to one or more embodiments of the present disclosure. As shown in fig. 4, the rare word detecting apparatus 400 includes:
the character judging module 420 is configured to obtain an object name to be detected, and judge whether a substitute character or a split character exists in the object name;
the rarely used word determining module 440 is configured to determine that the rarely used word exists in the object name if the rarely used word exists.
Optionally, the rare word detection device 400 further includes:
the rarely used word pre-checking module is used for detecting whether each name character of the object name belongs to a common character of the common character coding library; if each name character belongs to a common character of a common character coding library, determining that the object name does not have a rare word, and executing the step of judging whether the object name has a substitute character or a split character; if at least one name character does not belong to the common characters of the common character code library, determining that the object name has the uncommon word.
The rarely used character pre-checking module is also used for acquiring a common character recognition regular pattern aiming at a common character coding library; and detecting whether each name character of the object name belongs to a common character code point range corresponding to a common character code library by adopting a common character recognition regular formula.
Optionally, the character determining module 420 is further configured to determine whether each name character of the object name belongs to a replacement character range, where the character replacement range includes, but is not limited to, a numeric character encoding range and a pinyin character encoding range; if each name character does not belong to the range of the substitute character, determining that the object name does not have the substitute character; if at least one name character belongs to the range of the alternative characters, determining that the alternative characters exist in the object name.
Optionally, the character judging module 420 is further configured to obtain a card face image of the identity information card corresponding to the object name, and identify the number of words of the card face name in the card face image, so as to obtain a first number of words of the card face object name; if the second word number corresponding to the name character of the object name is inconsistent with the second word number, determining that split characters exist in the object name; and if the second word number corresponding to the name character of the object name is consistent with the second word number, determining that the split character does not exist in the object name.
Optionally, the rare word detection device 400 further includes: the display module is used for sending out a prompt aiming at the uncommon words in the object names and displaying the uncommon words in the object names.
In an embodiment of the present disclosure, a device for detecting rarely used words, the device includes: the character judging module is used for acquiring the object name to be detected and judging whether a substitute character or a split character exists in the object name; the rarely used word determining module is used for determining that the rarely used word exists in the object name if the rarely used word exists. In practical situations, when the uncommon words exist in the object names, the uncommon words in the object names are mostly replaced or split, so that the uncommon words can be determined to exist in the object names as long as the replacement characters or the split characters exist in the object names, the uncommon words can be detected without the participation of a user manually, and the detection efficiency of the uncommon words is greatly improved.
The present description provides a computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the steps of the method of any of the above embodiments.
The present description also provides a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to carry out the steps of the method according to any of the embodiments described above.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a terminal according to one or more embodiments of the present disclosure. As shown in fig. 5, the terminal 500 may include: at least one terminal processor 501, at least one network interface 504, a user interface 503, a memory 505, at least one communication bus 502.
Wherein a communication bus 502 is used to enable connected communications between these components.
The user interface 503 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 503 may further include a standard wired interface and a standard wireless interface.
The network interface 504 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the terminal processor 501 may comprise one or more processing cores. The terminal processor 501 connects various parts within the overall terminal 500 using various interfaces and lines, performs various functions of the terminal 500 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 505, and invoking data stored in the memory 505. Alternatively, the terminal processor 501 may be implemented in at least one hardware form of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The terminal processor 501 may integrate one or a combination of several of a processor (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the terminal processor 501 and may be implemented by a single chip.
The Memory 505 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (ROM). Optionally, the memory 505 comprises a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 505 may be used to store instructions, programs, code sets, or instruction sets. The memory 505 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described various method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 505 may also optionally be at least one storage device located remotely from the aforementioned terminal processor 501. As shown in fig. 5, the memory 505, which is a kind of computer storage medium, may include an operating system, a network communication module, a user interface module, and a rare word detection program.
In the terminal 500 shown in fig. 5, the user interface 503 is mainly used for providing an input interface for a user, and acquiring data input by the user; the terminal processor 501 may be configured to invoke the uncommon word detection program stored in the memory 505, and specifically perform the following operations:
acquiring an object name to be detected, and judging whether a substitute character or a split character exists in the object name;
if the object name exists, the fact that the uncommon word exists in the object name is determined.
In some embodiments, before determining whether there is an alternate character or split character in the object name, further comprising: detecting whether each name character of the object name belongs to a common character of a common character code library; if each name character belongs to a common character of a common character coding library, determining that the object name does not have a rare word, and executing the step of judging whether the object name has a substitute character or a split character; if at least one name character does not belong to the common characters of the common character code library, determining that the object name has the uncommon word.
In some embodiments, detecting whether each name character of the object name belongs to a common character of a common character encoding library comprises: acquiring a common character recognition regular expression aiming at a common character coding library; and detecting whether each name character of the object name belongs to a common character code point range corresponding to a common character code library by adopting a common character recognition regular formula.
In some embodiments, determining whether there is an alternate character or split character in the object name includes: judging whether each name character of the object name belongs to a replacement character range, wherein the character replacement range comprises, but is not limited to, a digital character coding range and a pinyin character coding range; if each name character does not belong to the range of the substitute character, determining that the object name does not have the substitute character; if at least one name character belongs to the range of the alternative characters, determining that the alternative characters exist in the object name.
In some embodiments, after determining that the object name does not have a surrogate character, further comprising: acquiring a card face image of an identity information card corresponding to an object name, and carrying out word number identification on the card face name in the card face image to obtain a first word number of the card face object name; if the second word number corresponding to the name character of the object name is inconsistent with the second word number, determining that split characters exist in the object name; and if the second word number corresponding to the name character of the object name is consistent with the second word number, determining that the split character does not exist in the object name.
In some embodiments, after determining that the uncommon word exists in the object name, further comprising: and sending a prompt aiming at the uncommon words in the object names, and displaying the uncommon words in the object names.
In the several embodiments provided in this specification, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules is merely a logical function division, and there may be additional divisions of actual implementation, e.g., multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules illustrated as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product described above includes one or more computer instructions. When the computer program instructions described above are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present specification are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage media may be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a digital versatile Disk (Digital Versatile Disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.
In addition, it should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, user behavior data information, digital service usage information, and the like referred to in this specification are all acquired with sufficient authorization.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The foregoing is a description of a method, an apparatus, a storage medium, and a terminal for detecting a rare word provided in the embodiments of the present specification, where those skilled in the art may change in terms of specific implementation and application scope according to the concepts of the embodiments of the present specification, and in summary, the disclosure should not be construed as limiting the embodiments of the present specification.

Claims (10)

1. A method for detecting rarely used words, the method comprising:
acquiring an object name to be detected, and judging whether a substitute character or a split character exists in the object name;
if so, determining that the uncommon word exists in the object name.
2. The method of claim 1, the determining whether there is an alternate character or a split character in the object name, further comprising:
detecting whether each name character of the object name belongs to a common character of a common character code library;
if each name character belongs to a common character of the common character code library, determining that the object name does not have a rare word, and executing the step of judging whether the object name has a substitute character or a split character;
and if at least one common character of the name character not belonging to the common character coding library exists, determining that the uncommon word exists in the object name.
3. The method of claim 2, the detecting whether each name character of the object name belongs to a common character of the common character code library, comprising:
acquiring a common character recognition regular expression aiming at the common character coding library;
and detecting whether each name character of the object name belongs to a common character code point range corresponding to the common character code library by adopting the common character recognition regular expression.
4. A method according to any one of claims 1 to 3, said determining whether there is a surrogate character or split character in the object name comprising:
judging whether each name character of the object name belongs to a replacement character range, wherein the character replacement range comprises, but is not limited to, a digital character coding range and a pinyin character coding range;
if each name character does not belong to the range of the alternative character, determining that the object name does not have the alternative character;
and if at least one name character belongs to the alternative character range, determining that the object name has an alternative character.
5. The method of claim 4, after the determining that the object name does not have a surrogate character, further comprising:
acquiring a card face image of an identity information card corresponding to the object name, and recognizing the number of words of the card face name in the card face image to obtain a first number of words of the card face object name;
if the second word number corresponding to the name character of the object name is inconsistent with the second word number, determining that split characters exist in the object name;
and if the second word number corresponding to the name character of the object name is consistent with the second word number, determining that no split character exists in the object name.
6. The method of claim 1, after the determining that the uncommon word exists in the object name, further comprising:
and sending out a prompt aiming at the uncommon words in the object names, and displaying the uncommon words in the object names.
7. A device for detecting rarely used words, the device comprising:
the character judging module is used for acquiring the object name to be detected and judging whether a substitute character or a split character exists in the object name;
and the rarely used word determining module is used for determining that the rarely used word exists in the object name if the rarely used word exists.
8. A computer program product comprising instructions which, when run on a computer or processor, cause the computer or processor to perform the steps of the method of any of claims 1 to 6.
9. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the steps of the method according to any one of claims 1 to 6.
10. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method according to any one of claims 1 to 6 when the program is executed.
CN202310096133.2A 2023-01-31 2023-01-31 Rarely-used word detection method and device, storage medium and terminal Pending CN116303889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310096133.2A CN116303889A (en) 2023-01-31 2023-01-31 Rarely-used word detection method and device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310096133.2A CN116303889A (en) 2023-01-31 2023-01-31 Rarely-used word detection method and device, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN116303889A true CN116303889A (en) 2023-06-23

Family

ID=86826588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310096133.2A Pending CN116303889A (en) 2023-01-31 2023-01-31 Rarely-used word detection method and device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN116303889A (en)

Similar Documents

Publication Publication Date Title
CN107785021B (en) Voice input method, device, computer equipment and medium
CN109345417B (en) Online assessment method and terminal equipment for business personnel based on identity authentication
CN108090351A (en) For handling the method and apparatus of request message
CN107958215A (en) A kind of antifraud recognition methods, device, server and storage medium
CN111783138A (en) Sensitive data detection method and device, computer equipment and storage medium
CN112307512A (en) Log desensitization method, device and storage medium
CN111787154A (en) Information processing method and electronic equipment
CN112784112A (en) Message checking method and device
CN111353891A (en) Auxiliary method and device for identifying suspicious groups in fund transaction data
CN116303888A (en) Rarely used word processing method and device, storage medium and electronic equipment
CN116303889A (en) Rarely-used word detection method and device, storage medium and terminal
CN116319089A (en) Dynamic weak password detection method, device, computer equipment and medium
CN106204129A (en) The terminal unit recognition prize drawing image that authority is different carries out the method and system drawn a lottery
CN110543457A (en) Track type document processing method and device, storage medium and electronic device
CN113220949B (en) Construction method and device of private data identification system
CN110045844B (en) Position coding form data processing system
CN113674083A (en) Internet financial platform credit risk monitoring method, device and computer system
CN112308678A (en) Price information processing method, device, equipment and medium based on image recognition
CN106936840B (en) Information prompting method and device
CN114330263A (en) Message identification method, device, equipment and storage medium
CN116501833A (en) Rarely used word processing method and device, storage medium and electronic equipment
JP2020184209A (en) Information processing method, information processing device, and information processing program
US11620659B2 (en) System and method for applying image recognition and invisible watermarking to mitigate and address fraud
CN116204602A (en) Word mapping method, device, storage medium and terminal
JP6662200B2 (en) Information processing apparatus and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination