CN117764694A - Risk account identification method, apparatus, electronic device and storage medium - Google Patents

Risk account identification method, apparatus, electronic device and storage medium Download PDF

Info

Publication number
CN117764694A
CN117764694A CN202311589649.7A CN202311589649A CN117764694A CN 117764694 A CN117764694 A CN 117764694A CN 202311589649 A CN202311589649 A CN 202311589649A CN 117764694 A CN117764694 A CN 117764694A
Authority
CN
China
Prior art keywords
letter
account
risk
character representation
phoneme
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311589649.7A
Other languages
Chinese (zh)
Inventor
汪盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Financial Technology Co Ltd
Original Assignee
Bank of China Financial Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Financial Technology Co Ltd filed Critical Bank of China Financial Technology Co Ltd
Priority to CN202311589649.7A priority Critical patent/CN117764694A/en
Publication of CN117764694A publication Critical patent/CN117764694A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The application provides a risk account identification method, a risk account identification device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a character representation result of an account name of an account to be identified in a current language; coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name; matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names; determining a risk identification result of the account to be identified based on the voice code matching result; the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variant of each letter in the current language in different character representation results. The method and the device improve the identification accuracy of the risk account.

Description

Risk account identification method, apparatus, electronic device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a risk account identification method, apparatus, electronic device, and storage medium.
Background
Currently, commercial banks identify potentially-present risk accounts by fuzzy matching account names with back-washed money lists. In cross-border transaction, because clients corresponding to each account are spread in different countries, the used languages are different, so that a plurality of aliases or variants can exist in the account names obtained after translation, the accuracy of risk account identification is low, and the requirement of risk prevention and control of commercial banks cannot be met.
Therefore, how to improve the accuracy of risk account identification and meet the requirement of commercial banks on risk prevention and control becomes a technical problem to be solved in the industry.
Disclosure of Invention
The application provides a risk account identification method, a risk account identification device, electronic equipment and a storage medium, which are used for solving the technical problem of how to improve the accuracy of risk account identification and meet the requirement of commercial banks on risk prevention and control.
The application provides a risk account identification method, which comprises the following steps:
acquiring a character representation result of an account name of an account to be identified in a current language;
coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name;
Matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names;
determining a risk identification result of the account to be identified based on the voice code matching result;
the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of all the letters in the current language in different character representation results.
In some embodiments, before the matching the voice code corresponding to the account name with the voice code corresponding to each risk account name, the matching includes:
acquiring a plurality of risk account names, and determining character representation results of the risk account names in the current language;
and coding character representation results of the names of the risk accounts based on the single-phoneme coding rule corresponding to the current language to obtain voice codes corresponding to the names of the risk accounts.
In some embodiments, after the obtaining the character representation result of the account name of the account to be identified in the current language, the method includes:
detecting each character in the character representation result, and determining that a first letter which does not belong to the corresponding first language exists in the character representation result;
Deleting the first letter from the character representation result, and converting each letter in the character representation result into a uppercase format or a lowercase format.
In some embodiments, after deleting the first letter from the character representation result and converting each letter in the character representation result to a uppercase format or a lowercase format, the method includes:
determining letter combinations positioned at the initial of each word in the character representation result;
deleting a second letter in the letter combination under the condition that the letter combination is matched with a preset letter combination;
wherein the second letter is not pronounced in the preset letter combination.
In some embodiments, the method includes, after deleting the second letter in the letter combination if the letter combination matches a preset letter combination:
traversing each letter in each word segmentation, and determining a current letter and a next letter corresponding to the current letter;
and deleting the next letter in each word segmentation under the condition that the current letter is the same as the next letter.
In some embodiments, the method includes, in the case where the current letter is the same as the next letter, after deleting the next letter in each word segment:
Determining the phoneme type and arrangement position corresponding to each letter in each word;
when the phoneme type corresponding to any letter is a vowel letter and the arrangement position is a word head, reserving the any letter in each word segmentation;
and deleting any letter in each word segmentation under the condition that the phoneme type corresponding to the any letter is a vowel letter and the arrangement position is in the word or the word tail.
In some embodiments, the encoding the character representation result based on the single-phoneme encoding rule corresponding to the current language to obtain the voice code corresponding to the account name includes:
detecting letters with phoneme types being consonant letters in each word;
under the condition that any letter is matched with the first consonant letter, the character representation result of the any letter is used as the voice code of the any letter; the character representation result corresponding to the allopatric variant of the first consonant is the same as the first consonant;
under the condition that any letter is matched with a second consonant letter, the character representation result corresponding to the allopatric variant of any letter is used as the voice code of any letter; the character representation result corresponding to the allopatric variant of the second consonant is different from the second consonant;
Under the condition that any letter is matched with a third consonant letter, determining a letter combination of the any letter in each word, determining a phoneme variant corresponding to the any letter in the letter combination, and taking a character representation result corresponding to the phoneme variant as a voice code of the any letter; the allopatric variant of the third consonant letter is determined based on a combination of letters of the third consonant letter in the word segment.
The application provides a risk account identification device, including:
the acquisition unit is used for acquiring a character representation result of the account name of the account to be identified in the current language;
the coding unit is used for coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name;
the matching unit is used for matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names;
the recognition unit is used for determining a risk recognition result of the account to be recognized based on the voice code matching result;
the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of all the letters in the current language in different character representation results.
The application provides an electronic device comprising a memory, in which a computer program is stored, and a processor arranged to execute the risk account identification method by means of the computer program.
The application provides a computer readable storage medium comprising a stored program, wherein the program executes the risk account identification method.
According to the risk account identification method, the risk account identification device, the electronic equipment and the storage medium, character representation results of account names of accounts to be identified in the current language are obtained; coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name; matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names; determining a risk identification result of the account to be identified based on the voice code matching result; because the single-variant phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of each letter in different character representation results in the current language, the account names are converted into voice codes and then compared, the account names can be identified by taking phonemes as units from the angle of language pronunciation, misspellings of the account names caused by pronunciation in the current language can be effectively identified, the missed detection risk is avoided, the identification accuracy of risk accounts is improved, and the requirement of business banks for risk prevention and control is met.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the technical solutions of the present application or the prior art, the following description will briefly introduce the drawings used in the embodiments or the description of the prior art, and it is obvious that, in the following description, the drawings are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is one of flow diagrams of a risk account identification method provided in the present application;
FIG. 2 is a second flow chart of the risk account identification method provided in the present application;
fig. 3 is a schematic structural diagram of a risk account identification apparatus provided in the present application;
fig. 4 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like herein are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or units or modules is not necessarily limited to those steps or units or modules that are expressly listed or inherent to such process, method, article, or apparatus.
In the technical scheme of the application, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the client personal information all accord with the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.
Fig. 1 is a schematic flow chart of a risk account identification method provided in the present application, as shown in fig. 1, the method includes steps 110, 120, 130 and 140.
Step 110, obtaining a character representation result of the account name of the account to be identified in the current language.
Specifically, the execution subject of the risk account identification method provided in the embodiment of the present application is a risk account identification device. The apparatus may be implemented in software, for example a risk account identification program running in a computer; the risk account identification method may also be a device, such as a mobile terminal, a tablet computer, a desktop computer, or a server.
The account to be identified is an account needing risk identification. The risk here is an abnormal transaction risk including money laundering and the like. The user of the account may be a person or an organization or the like. These usage agents are spread across different countries, account names also referring to different languages. The account name refers to the name of an individual or organization when the bank makes an account.
Banks typically identify potentially present risk accounts in a manner that the risk list matches the account name in a fuzzy manner. When account names of different accounts are processed, fuzzy matching is generally required to be performed by unifying all the account names into the same language. The current language is the language used in performing the risk list matching. For example, the current language may be chinese, english, french, russian, etc. In the embodiments of the present application, the present language will be used as English for illustration.
The character representation result is a result of representing the account name by adopting characters such as letters or numbers in the current language. For example, in the english environment, the character representation results are in english form of account name. If the account name is of other languages, the account name should be translated into the corresponding English word.
And 120, encoding the character representation result based on the single-phoneme encoding rule corresponding to the current language, and obtaining the voice code corresponding to the account name. Wherein, the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of each letter in the current language in different character representation results.
Specifically, taking the current language as english as an example, misspellings of english words can be classified into two general categories: keyboard entry errors and voice errors. The former is related to the keyboard layout of the computer and the key stroke errors of the entering person, while the latter is caused by the lack of understanding or misknowledge of the pronunciation rules.
Therefore, in order to improve the accuracy of risk list recognition, the English spelling error related to pronunciation is optimized, and a voice code matching method can be adopted. The voice code matching is to code the pronunciation of the account name in the current language and match the pronunciation by means of code comparison. The phonetic code is a code for representing the pronunciation of a word, which is determined from the viewpoint of language pronunciation, and may be represented by letters or numbers.
The smallest unit of language in the current language is the letter, and different letters can compose different words. Phonemes are the smallest phonetic units in a language that can distinguish meaning. A phoneme variant is a specific representation or representation of a phoneme in a particular speech environment, with a class-to-member relationship between the phoneme, all members being variants (no orthosteric). For the corresponding phoneme variant of each letter in the current language in different character representation results, the single-phoneme coding rule corresponding to the current language can be determined. For example, for letter P, when combined with letter H, the pronunciation is similar to letter F, and the phonetic codes of letter P and letter H may be determined to be letter F. Single phonemes refer to the phonemic variants of a single letter.
And according to the single-phoneme coding rule corresponding to the current language, sequentially performing coding conversion on each letter in the character representation result to obtain a voice code corresponding to the account name.
And 130, matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names.
Specifically, after the voice codes corresponding to the account names are obtained, the voice codes corresponding to the account names and the voice codes corresponding to the risk account names can be matched, and the matching method can adopt a comparison method of the similarity of the voice codes.
The matching method can adopt a fuzzy matching method or an accurate matching method.
Fuzzy matching may employ the lycenstein algorithm (i.e., edit distance). As a similarity calculation algorithm, the edit distance is the number of edits by converting two voice codes (character strings) calculated by processing a voice code corresponding to each account name and a voice code corresponding to a risk account name. The smaller the edit distance, the more similar the two speech codes. A distance threshold may be set, and if the edit distance is less than the distance threshold, it indicates that the account name matches the corresponding risk account name.
The accurate matching refers to performing one-to-one accurate matching on characters of two voice codes at different positions, and judging whether the two voice codes are identical or not according to matching results of all the characters, namely, matching.
And 140, determining a risk identification result of the account to be identified based on the voice code matching result.
Specifically, if the voice code matching result is that the voice code corresponding to the account name is matched with the voice code corresponding to any risk account name, the account name is the same as the risk account name, and the risk identification result of the account to be identified is a risk account; if the voice code matching result is that the voice code corresponding to the account name is matched with the voice code corresponding to each risk account name, the account name is different from each risk account name, and the risk recognition result of the account to be recognized is a risk-free account (or a normal account).
According to the risk account identification method provided by the embodiment of the application, the character representation result of the account name of the account to be identified in the current language is obtained; coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name; matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names; determining a risk identification result of the account to be identified based on the voice code matching result; because the single-variant phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of each letter in different character representation results in the current language, the account names are converted into voice codes and then compared, the account names can be identified by taking phonemes as units from the angle of language pronunciation, misspellings of the account names caused by pronunciation in the current language can be effectively identified, the missed detection risk is avoided, the identification accuracy of risk accounts is improved, and the requirement of business banks for risk prevention and control is met.
It should be noted that each embodiment of the present application may be freely combined, permuted, or executed separately, and does not need to rely on or rely on a fixed execution sequence.
In some embodiments, step 130 is preceded by:
acquiring a plurality of risk account names, and determining character representation results of the risk account names in the current language;
and coding character representation results of the names of the risk accounts based on the single-phoneme coding rule corresponding to the current language to obtain voice codes corresponding to the names of the risk accounts.
Specifically, a plurality of risk account names may be obtained in advance, where the risk account names may be represented by text forms in different languages, and may be uniformly converted into a character representation result in the current language, for example, all of the risk account names are converted into the form of english word representation.
And coding character representation results of the names of the risk accounts according to the single-phoneme coding rule corresponding to English (current language), so as to obtain voice codes corresponding to the names of the risk accounts.
And establishing a voice list library according to the voice codes corresponding to the names of the risk accounts. For example, the risk account name may be used as a key, a voice code corresponding to the risk account name may be used as a value, and the voice code may be stored as a list or a list in the form of key value pairs to form a voice list library. After the voice code corresponding to the account name of the account to be identified is obtained, the voice code can be searched in a voice list library, if the voice code is hit, the account to be identified is a risk account, and if the voice code is not hit, the account to be identified is a risk-free account.
According to the risk account identification method, the risk account names are encoded according to the single-mutation phoneme encoding rule corresponding to the current language, so that the voice codes corresponding to the risk account names are obtained, the risk account identification is conveniently carried out on the account to be identified in a voice code matching mode, and the identification accuracy of the risk account is improved.
In some embodiments, step 110 is followed by:
detecting each character in the character representation result, and determining that a first letter which does not belong to the corresponding first language exists in the character representation result;
and deleting the first letter from the character representation result, and converting each letter in the character representation result into a uppercase format or a lowercase format.
Specifically, after the account name of the account to be identified is represented by the character in the current language, the character representation result may appear a first letter which does not belong to the corresponding first letter of the current language. For example, when an account name of a partial language is converted into an english word, a numeric character, punctuation character, or a first letter other than english may appear in the character representation result due to an error of translation software or an input error of a translation person.
The computer code corresponding to each character in the character representation result, such as ASCII code (American Standard Code for Information Interchange ) and the like, may be obtained, and the first letter not belonging to the current language is identified in the character representation result by matching the computer code of the characters with the computer code of the letter corresponding to the current language.
After the first letter is identified, the first letter can be deleted from the character representation result, so that interference of the first letter on the voice code matching is avoided. Thereafter, each letter in the character representation result may be converted into either a uppercase format or a lowercase format. After unifying the case and case formats, the letters can be accurately compared conveniently, and the error rate is reduced.
According to the risk account identification method, the first letters which do not belong to the corresponding first language are deleted from the character representation result, interference of the first letters to voice code matching is avoided, and identification accuracy of the risk account is improved.
In some embodiments, after deleting the first letter in the character representation and converting each letter in the character representation to either uppercase or lowercase format, the method comprises:
Determining letter combinations positioned at the initial of each word in the character representation result;
deleting a second letter in the letter combination under the condition that the letter combination is matched with the preset letter combination;
wherein the second letter is not pronounced in the preset letter combination.
Specifically, the letters or letter combinations of the character representation result, in which each word is located at the beginning of the word, may be detected.
For example, the letters in each word segment can be traversed, the current letter and the letter next to the current letter are obtained, and the letter combination consisting of the two letters is matched with the preset letter combination. The second letter which does not sound exists in the preset letter combination. To improve the accuracy and efficiency of the phonetic code matching, the second letter in the combination of letters may be identified and deleted.
Taking the current language as English as an example, there are preset letter combinations composed of two or more letters, and in the preset letter combinations, there are second letters which do not sound. For example, there are letter combinations "AE", "GN", "KN", "PN" and "WR" in which the initial is not pronounced and may be deleted before conversion to speech code. Also for example, there is a combination of letters "WH" in which the second letter "H" is silent and may be deleted.
According to the risk account identification method, the letter combination at the beginning of the word in the word segmentation of the character representation result is detected, if the character combination is matched with the preset letter combination, the second letter which does not pronounce in the letter combination is deleted, interference caused by the second letter to voice code matching is avoided, and identification accuracy of the risk account is improved.
In some embodiments, after deleting the second letter in the letter combination in the case that the letter combination matches the preset letter combination, the method includes:
traversing each letter in each word segmentation, and determining the current letter and the next letter corresponding to the current letter;
in the case where the current letter is the same as the next letter, the next letter is deleted in each word segment.
Specifically, each letter may be traversed in each word segment to obtain the current letter and the next letter. If the current letter and the next letter are the same, the repeated letters exist in the word segmentation, and the next letter can be deleted from the word segmentation.
According to the risk account identification method, repeated letters are deleted from each word segmentation, interference caused by the repeated letters to voice code matching is avoided, and identification accuracy of the risk account is improved.
In some embodiments, where the current letter is the same as the next letter, after the next letter is deleted from each word segment, the method includes:
determining the phoneme type and arrangement position corresponding to each letter in each word;
when the phoneme type corresponding to any letter is a vowel letter and the arrangement position is a word head, reserving any letter in each word segmentation;
in the case where the phoneme type corresponding to any one letter is a vowel letter and the arrangement position is in a word or a word tail, any one letter is deleted in each word segment.
Specifically, the respective letters may be classified into vowels and consonants according to phoneme types. Vowels are sounds in which the flow of air through the mouth during sound production is not impeded; consonants are sounds in which air flow through the mouth during sound production is impeded.
In the speech code, the vowels at the beginning of the word have the function of identifying the word, and can be reserved, the pronunciation of the vowels in the word or at the end of the word can be changed in various ways, and the vowels can be deleted or ignored for the convenience of matching.
The phoneme type corresponding to each letter and the arrangement position of each letter in the word can be determined in each word. The arrangement position refers to the position of the letters in the word segmentation and can include a word head, a word middle and a word tail. If the arrangement position of the letter in the word segmentation is the first, the letter can be considered to be positioned at the beginning of the word; if the arrangement position of the letter in the word segmentation is the last, the letter can be considered to be positioned at the tail of the word; a letter may be considered to be in a word if the letter is arranged in the word in neither the first nor the last position.
If the phoneme type corresponding to any letter is a vowel letter and the arrangement position is a word head, the letter can be reserved in word segmentation; if the phoneme type corresponding to any letter is a vowel letter and the arrangement position is in a word or a word tail, the letter can be deleted in the word segmentation.
According to the risk account identification method, the vowels are reserved or deleted according to different arrangement positions of the vowels in the word segmentation, so that interference of the vowels to voice code matching is avoided, and the identification accuracy of the risk account is improved.
In some embodiments, step 120 comprises:
detecting letters with phoneme types being consonant letters in each word;
in the case that any one of the letters is matched with the first consonant letter, the character representation result of any one of the letters is used as the voice code of any one of the letters; the character representation result corresponding to the allopatric variant of the first consonant is the same as the first consonant;
under the condition that any letter is matched with the second consonant letter, the character representation result corresponding to the allopatric variant of any letter is used as the voice code of any letter; the character representation result corresponding to the allopatric variant of the second consonant is different from the second consonant;
Under the condition that any letter is matched with the third consonant letter, determining the letter combination of any letter in each word, determining the phoneme variant corresponding to any letter in the letter combination, and taking the character representation result corresponding to the phoneme variant as the voice code of any letter; the allopatric variant of the third consonant letter is determined based on the combination of the letters of the third consonant letter in the word segment.
Specifically, for consonant letters present in each word segment, the following can be handled.
In the case where any one letter is matched with the first consonant letter, the character representation result of any one letter may be directly used as the voice code of any one letter. The character representation result corresponding to the allopatric variant of the first consonant is the same as the first consonant. Taking the current language as an example of English, the first consonant letters include F, J, L, M, N, and R. That is, at the time of code conversion, the first consonant letter is not required to be converted, and remains as it is.
And under the condition that any letter is matched with the second consonant letter, the character representation result corresponding to the allopatric variant of any letter is used as the voice code of any letter. The character representation result corresponding to the allopatric variant of the second consonant is different from the second consonant. Taking the current language as an example of English, the second consonant letters include Q, V, X, and Z. Wherein the corresponding allopatric variant of Q is K, the corresponding allopatric variant of V is F, the corresponding allopatric variant of X is KS, and the corresponding allopatric variant of Z is S. In performing the transcoding, the phonetic code of the second consonant letter may be determined as the letter to which its corresponding allopatric variant corresponds.
Under the condition that any letter is matched with the third consonant letter, the letter combination formed by the letter in each word can be detected, the letter combination specifically corresponding to the letter is determined, the corresponding phoneme variant of the letter is determined according to the letter combination, and finally the voice code of the letter is determined according to the phoneme variant. The allopatric variant of the third consonant letter is determined based on the combination of the letters of the third consonant letter in the word segment. Taking the current language as an example of English, the third consonant letters include B, C, D, G, H, K, P, S, T, W and Y. For C, in the letter combination CI, its speech code is S; in the letter-combined SCI, its speech code is K.
According to the risk account identification method provided by the embodiment of the application, consonant letters are distinguished, the voice codes are respectively determined, the accuracy of the voice codes corresponding to the consonant letters is improved, and the identification accuracy of the risk account is improved.
Fig. 2 is a second flowchart of a risk account identification method provided in the present application, as shown in fig. 2, the method includes:
in step 210, when the list (risk account name) is processed in batches daily, all english words (the current language is english) are converted according to the coding rule of the voice, and the deformed voice list library (including the voice codes corresponding to the risk account names) is output.
The coding rules are single-phoneme speech transcoding rules that convert consonant letters or combinations of letters into consonant class codes, wherein the vowels (a, E, I, O, U) are deleted or ignored.
Prior to transcoding, pre-processing is required, including:
deleting or ignoring non-english alphabetic characters in the encoded word and converting all letters to capitalized form; pre-processing the letters or letter combinations of the initials before encoding. When the letter combination AE, GN, KN, PN or WR is positioned at the beginning of the word, deleting the first letter; deleting H in the initial letter combination WH; replacing the letter X positioned at the beginning of the word with S; and carrying out duplicate elimination treatment on the adjacent repeated letters.
The conversion rule is specifically as follows:
first, if consonant letters are F, J, L, M, N and R, the consonant letters are left unchanged;
if the consonant letters are Q, V, X and Z, the consonant letters are respectively replaced by K, F, KS and S according to the corresponding relation;
the third class, the remaining consonant letters are classified according to the conditions described in the conversion rule, includes:
letter B: the letter combination MB is deleted when it is located at the end of the word, otherwise B is reserved.
Letter C: c in the alphabetic combinations CIA and CH is converted to X; c in the letter combinations CI, CE, and CY is converted to S; deleting C in the letter combinations SCI, SCE and SCY; in other cases (including the alphabetic combination SCH), C is converted to K.
Letter D: d in the letter combinations DGE, DGY and DGI is converted into J; otherwise D is converted to T.
Letter G: the alphabetic combination GH deletes G either not at the end of the word or not before the vowels; deleting G in the letter combination GN or GNED; deleting G in the alphabetical combination GDE; g letters before I, E or Y, G was converted to J when not GG combined; otherwise G is converted to K.
Letter H: h after deleting the letter combinations CH, SH, PH, TH and GH; h is reserved in other cases.
Letter K: deleting the letter K when the letter K is positioned behind the letter C; otherwise K is reserved.
Letter P: letter-combination PH transition F; otherwise P is reserved.
Letter S: the letter combination SH is converted into X; deleting S in SIO or SIA; otherwise S is reserved.
Letter T: the letter combination TH is converted to the number 0; t in the letter combination TCH is deleted.
Letter W: deleting W when no vowel letters follow the letter W; when there is a vowel letter following the letter W, W is retained.
Letter Y: deleting Y when no vowel letters follow the letter Y; when there is a vowel letter following the letter Y, Y is preserved.
Step 220, converting the input censoring character string (account name of the account to be identified) according to the same voice coding rule (obtaining a corresponding voice code).
The system outputs the voice code of each English word after converting the English words in the censored character string according to the conversion rule so as to carry out comparison subsequently.
Step 230, comparing the converted censored character string with the voice list library.
And comparing the voice code of the censorship information in the last step with the voice code generated after the list in the database is processed. If the two speech codes are identical, the original words are not identical in view of the possible speech ambiguities contained in the speech codes, and the hit words can be assigned a score (e.g., 0.7 score), and particularly, after a large number of experiments, a balance point can be found in the error rate and the accuracy.
Step 240, listing possible matching results according to parameters such as a matching threshold value, and further determining a risk identification result according to the matching results.
Returning the hit result according to the requirement of the upper layer, calculating the score according to the original scoring rule for judging whether the hit occurs or not, and continuously executing the subsequent flow.
The apparatus provided in the embodiments of the present application will be described below, and the apparatus described below and the method described above may be referred to correspondingly.
Fig. 3 is a schematic structural diagram of a risk account identification apparatus provided in the present application, as shown in fig. 3, the apparatus includes:
An obtaining unit 310, configured to obtain a character representation result of an account name of an account to be identified in a current language;
the encoding unit 320 is configured to encode the character representation result based on the single-variant phoneme encoding rule corresponding to the current language, so as to obtain a speech code corresponding to the account name;
a matching unit 330, configured to match the voice code corresponding to the account name with the voice code corresponding to each risk account name;
the identifying unit 340 is configured to determine a risk identification result of the account to be identified based on the voice code matching result;
wherein, the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of each letter in the current language in different character representation results.
The risk account identification device provided by the embodiment of the application acquires a character representation result of an account name of an account to be identified in a current language; coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name; matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names; determining a risk identification result of the account to be identified based on the voice code matching result; because the single-variant phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of each letter in different character representation results in the current language, the account names are converted into voice codes and then compared, the account names can be identified by taking phonemes as units from the angle of language pronunciation, misspellings of the account names caused by pronunciation in the current language can be effectively identified, the missed detection risk is avoided, the identification accuracy of risk accounts is improved, and the requirement of business banks for risk prevention and control is met.
In some embodiments, the encoding unit is to:
acquiring a plurality of risk account names, and determining character representation results of the risk account names in the current language;
and coding character representation results of the names of the risk accounts based on the single-phoneme coding rule corresponding to the current language to obtain voice codes corresponding to the names of the risk accounts.
In some embodiments, the acquisition unit is to:
detecting each character in the character representation result, and determining that a first letter which does not belong to the corresponding first language exists in the character representation result;
and deleting the first letter from the character representation result, and converting each letter in the character representation result into a uppercase format or a lowercase format.
In some embodiments, the acquisition unit is to:
determining letter combinations positioned at the initial of each word in the character representation result;
deleting a second letter in the letter combination under the condition that the letter combination is matched with the preset letter combination;
wherein the second letter is not pronounced in the preset letter combination.
In some embodiments, the acquisition unit is to:
traversing each letter in each word segmentation, and determining the current letter and the next letter corresponding to the current letter;
In the case where the current letter is the same as the next letter, the next letter is deleted in each word segment.
In some embodiments, the acquisition unit is to:
determining the phoneme type and arrangement position corresponding to each letter in each word;
when the phoneme type corresponding to any letter is a vowel letter and the arrangement position is a word head, reserving any letter in each word segmentation;
in the case where the phoneme type corresponding to any one letter is a vowel letter and the arrangement position is in a word or a word tail, any one letter is deleted in each word segment.
In some embodiments, the acquisition unit is to:
detecting letters with phoneme types being consonant letters in each word;
in the case that any one of the letters is matched with the first consonant letter, the character representation result of any one of the letters is used as the voice code of any one of the letters; the character representation result corresponding to the allopatric variant of the first consonant is the same as the first consonant;
under the condition that any letter is matched with the second consonant letter, the character representation result corresponding to the allopatric variant of any letter is used as the voice code of any letter; the character representation result corresponding to the allopatric variant of the second consonant is different from the second consonant;
Under the condition that any letter is matched with the third consonant letter, determining the letter combination of any letter in each word, determining the phoneme variant corresponding to any letter in the letter combination, and taking the character representation result corresponding to the phoneme variant as the voice code of any letter; the allopatric variant of the third consonant letter is determined based on the combination of the letters of the third consonant letter in the word segment.
Fig. 4 is a schematic structural diagram of an electronic device provided in the present application, and as shown in fig. 4, the electronic device may include: processor (Processor) 410, communication interface (Communications Interface) 420, memory (Memory) 430, and communication bus (Communications Bus) 440, wherein Processor 410, communication interface 420, and Memory 430 complete communication with each other via communication bus 440. The processor 410 may invoke logic commands in the memory 430 to perform the following method:
acquiring a character representation result of an account name of an account to be identified in a current language; coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name; matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names; determining a risk identification result of the account to be identified based on the voice code matching result; wherein, the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of each letter in the current language in different character representation results.
In addition, the logic commands in the memory described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The processor in the electronic device provided by the embodiment of the present application may call the logic instruction in the memory to implement the above method, and the specific implementation manner of the processor is consistent with the implementation manner of the foregoing method, and may achieve the same beneficial effects, which are not described herein again.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the methods provided by the above embodiments.
The specific embodiment is consistent with the foregoing method embodiment, and the same beneficial effects can be achieved, and will not be described herein.
Embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A method for identifying a risk account, comprising:
acquiring a character representation result of an account name of an account to be identified in a current language;
coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name;
matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names;
determining a risk identification result of the account to be identified based on the voice code matching result;
the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of all the letters in the current language in different character representation results.
2. The method for identifying a risk account according to claim 1, wherein before matching the voice code corresponding to the account name with the voice code corresponding to each risk account name, the method comprises:
acquiring a plurality of risk account names, and determining character representation results of the risk account names in the current language;
and coding character representation results of the names of the risk accounts based on the single-phoneme coding rule corresponding to the current language to obtain voice codes corresponding to the names of the risk accounts.
3. The risk account identification method according to claim 1, wherein after the obtaining of the character representation result of the account name of the account to be identified in the current language, the method comprises:
detecting each character in the character representation result, and determining that a first letter which does not belong to the corresponding first language exists in the character representation result;
deleting the first letter from the character representation result, and converting each letter in the character representation result into a uppercase format or a lowercase format.
4. A method of risk account identification according to claim 3, wherein after deleting the first letter from the character representation and converting each letter in the character representation into either uppercase or lowercase format, the method comprises:
Determining letter combinations positioned at the initial of each word in the character representation result;
deleting a second letter in the letter combination under the condition that the letter combination is matched with a preset letter combination;
wherein the second letter is not pronounced in the preset letter combination.
5. The method of claim 4, wherein, in the event that the combination of letters matches a preset combination of letters, after deleting a second letter in the combination of letters, the method comprises:
traversing each letter in each word segmentation, and determining a current letter and a next letter corresponding to the current letter;
and deleting the next letter in each word segmentation under the condition that the current letter is the same as the next letter.
6. The risk account identification method of claim 5 wherein, in the event that the current letter is the same as the next letter, after the next letter is deleted in each word segment, the method comprises:
determining the phoneme type and arrangement position corresponding to each letter in each word;
when the phoneme type corresponding to any letter is a vowel letter and the arrangement position is a word head, reserving the any letter in each word segmentation;
And deleting any letter in each word segmentation under the condition that the phoneme type corresponding to the any letter is a vowel letter and the arrangement position is in the word or the word tail.
7. The risk account identification method of claim 6, wherein the encoding the character representation result based on the single-phoneme encoding rule corresponding to the current language to obtain the voice code corresponding to the account name includes:
detecting letters with phoneme types being consonant letters in each word;
under the condition that any letter is matched with the first consonant letter, the character representation result of the any letter is used as the voice code of the any letter; the character representation result corresponding to the allopatric variant of the first consonant is the same as the first consonant;
under the condition that any letter is matched with a second consonant letter, the character representation result corresponding to the allopatric variant of any letter is used as the voice code of any letter; the character representation result corresponding to the allopatric variant of the second consonant is different from the second consonant;
under the condition that any letter is matched with a third consonant letter, determining a letter combination of the any letter in each word, determining a phoneme variant corresponding to the any letter in the letter combination, and taking a character representation result corresponding to the phoneme variant as a voice code of the any letter; the allopatric variant of the third consonant letter is determined based on a combination of letters of the third consonant letter in the word segment.
8. A risk account identification device, comprising:
the acquisition unit is used for acquiring a character representation result of the account name of the account to be identified in the current language;
the coding unit is used for coding the character representation result based on the single-phoneme coding rule corresponding to the current language to obtain a voice code corresponding to the account name;
the matching unit is used for matching the voice codes corresponding to the account names with the voice codes corresponding to the risk account names;
the recognition unit is used for determining a risk recognition result of the account to be recognized based on the voice code matching result;
the single-phoneme coding rule corresponding to the current language is determined based on the corresponding phoneme variants of all the letters in the current language in different character representation results.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to perform the risk account identification method of any of claims 1 to 7 by means of the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program when run performs the risk account identification method of any one of claims 1 to 7.
CN202311589649.7A 2023-11-24 2023-11-24 Risk account identification method, apparatus, electronic device and storage medium Pending CN117764694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311589649.7A CN117764694A (en) 2023-11-24 2023-11-24 Risk account identification method, apparatus, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311589649.7A CN117764694A (en) 2023-11-24 2023-11-24 Risk account identification method, apparatus, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN117764694A true CN117764694A (en) 2024-03-26

Family

ID=90319084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311589649.7A Pending CN117764694A (en) 2023-11-24 2023-11-24 Risk account identification method, apparatus, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN117764694A (en)

Similar Documents

Publication Publication Date Title
CN108847241B (en) Method for recognizing conference voice as text, electronic device and storage medium
US7421387B2 (en) Dynamic N-best algorithm to reduce recognition errors
US20070219777A1 (en) Identifying language origin of words
CN101133411A (en) Fault-tolerant romanized input method for non-roman characters
CN111445898B (en) Language identification method and device, electronic equipment and storage medium
CN112287684A (en) Short text auditing method and device integrating variant word recognition
US6763331B2 (en) Sentence recognition apparatus, sentence recognition method, program, and medium
CN113268576B (en) Deep learning-based department semantic information extraction method and device
CN111651978A (en) Entity-based lexical examination method and device, computer equipment and storage medium
CN112257437A (en) Voice recognition error correction method and device, electronic equipment and storage medium
CN112287680A (en) Entity extraction method, device, equipment and storage medium of inquiry information
CN113449514A (en) Text error correction method and device suitable for specific vertical field
CN111079433A (en) Event extraction method and device and electronic equipment
CN110852075A (en) Voice transcription method and device for automatically adding punctuation marks and readable storage medium
CN108304389B (en) Interactive voice translation method and device
CN112069816A (en) Chinese punctuation adding method, system and equipment
CN111898342A (en) Chinese pronunciation verification method based on edit distance
CN111737424A (en) Question matching method, device, equipment and storage medium
JP2000089786A (en) Method for correcting speech recognition result and apparatus therefor
CN113051923B (en) Data verification method and device, computer equipment and storage medium
CN116450896A (en) Text fuzzy matching method, device, electronic equipment and readable storage medium
US20100145677A1 (en) System and Method for Making a User Dependent Language Model
CN117764694A (en) Risk account identification method, apparatus, electronic device and storage medium
CN111310457B (en) Word mismatching recognition method and device, electronic equipment and storage medium
CN114548075A (en) Text processing method, text processing device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination