CN110516125A - Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string - Google Patents

Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string Download PDF

Info

Publication number
CN110516125A
CN110516125A CN201910802851.0A CN201910802851A CN110516125A CN 110516125 A CN110516125 A CN 110516125A CN 201910802851 A CN201910802851 A CN 201910802851A CN 110516125 A CN110516125 A CN 110516125A
Authority
CN
China
Prior art keywords
character string
deep learning
feature vector
standardized
learning feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910802851.0A
Other languages
Chinese (zh)
Other versions
CN110516125B (en
Inventor
陆青
姜敏华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rajax Network Technology Co Ltd
Original Assignee
Rajax Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rajax Network Technology Co Ltd filed Critical Rajax Network Technology Co Ltd
Priority to CN201910802851.0A priority Critical patent/CN110516125B/en
Publication of CN110516125A publication Critical patent/CN110516125A/en
Application granted granted Critical
Publication of CN110516125B publication Critical patent/CN110516125B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification

Abstract

Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string, which comprises obtain original character string and be respectively converted into corresponding picture and phonetic symbol string;The original character string, picture and phonetic symbol string are inputted respectively in the first deep learning model, the second deep learning model and third deep learning model, corresponding first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector are obtained;Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, the corresponding standardized character string of the original character string is determined;The standardized character string is matched with the character string in preset exception database, identifies the unusual character string in the standardized character string, and export recognition result.Above scheme realizes automatic identification unusual character string, promotes the efficiency of identification, improves accuracy and accuracy.

Description

Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string
Technical field
The present embodiments relate to technical field of data processing, more particularly to identify the method, apparatus of unusual character string, set Standby and readable storage medium storing program for executing.
Background technique
Now, the daily too busy to get away internet of people, user can generate text in scenes such as shopping, chat, study and works This content, often user can be subjective in writing process or unintentionally inputs anomalous content.In order to reduce these anomalous contents Propagation, need to user input content identify, at present generally use two methods: 1, manual identified;2, canonical table Up to formula match cognization.
However, user is steeply risen using the frequency of internet with the fast development of science and technology, need to expend more people The content that power and time go identification abnormal, it is at high cost, speed is slow if relying solely on manual identified, the sea of internet can not be adapted to Measure business data processing demand.And the matching of regular expression is by the content of text that will acquire and to be set as abnormal character Similarity matching is carried out, identifies wherein abnormal text or symbol etc..But this method is low to deformed characters resolution, it is right User is deliberately difficult to by the character string that deformed characters input.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method, apparatus, equipment and readable storage for identifying unusual character string Medium, may be implemented automatic identification unusual character string, promote the efficiency of unusual character string identification, improve identification accuracy and accurate Degree.
The embodiment of the invention provides a kind of methods for identifying unusual character string, which comprises
Obtain original character string;The original character string is respectively converted into corresponding picture and phonetic symbol string;By the original Beginning character string inputs in preset first deep learning model, obtains the first deep learning feature vector, the picture is inputted In preset second deep learning model, the second deep learning feature vector is obtained, the phonetic symbol string is inputted into preset third In deep learning model, third deep learning feature vector is obtained;Based on the first deep learning feature vector, the second depth Learning characteristic vector sum third deep learning feature vector, determines the corresponding standardized character string of the original character string;By institute It states standardized character string to be matched with the character string in preset exception database, identify in the standardized character string Unusual character string;Export recognition result.
Further, described to be based on the first deep learning feature vector, the second deep learning feature vector and third Deep learning feature vector determines the corresponding standardized character string of the original character string, comprising: fusion first depth Feature vector, the second deep learning feature vector and third deep learning feature vector are practised, fusion feature vector is obtained;It will be described Fusion feature vector inputs in preset 4th deep learning model, obtains the corresponding standardized character of the original character string String.
Further, described to be based on the first deep learning feature vector, the second deep learning feature vector and third Deep learning feature vector determines the corresponding standardized character string of the original character string, comprising: it is deep to be based respectively on described first Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector are spent, the original character string is obtained Corresponding first standardized character string, the second standardized character string and third standardized character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described Unusual character string in standardized character string, comprising: by the first standardized character string, the second standardized character string and third Standardized character string is matched with the character string in preset exception database respectively, identifies first standardized character Unusual character string in string, the second standardized character string and third standardized character string.
Further, described to be based on the first deep learning feature vector, the second deep learning feature vector and third Deep learning feature vector determines the corresponding standardized character string of the original character string, further includes: fusion first depth Learning characteristic vector, the second deep learning feature vector and the third deep learning feature vector obtain fusion feature Vector;The fusion feature vector is inputted in preset 4th deep learning model, it is corresponding to obtain the original character string 4th standardized character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described Unusual character string in standardized character string, further includes: will be in the 4th standardized character string and preset exception database Character string matched, identify the unusual character string in the 4th standardized character string.
Further, the fusion the first deep learning feature vector, the second deep learning feature vector and The third deep learning feature vector, comprising: by the first deep learning feature vector, the second deep learning feature vector It is connected with third deep learning feature vector head and the tail.
Further, the first deep learning model includes first circulation neural network model, second depth Practising model includes convolutional neural networks model, and the third deep learning model includes second circulation neural network model.
Further, described that the original character string is converted into phonetic symbol string, comprising: the master based on the original character string The original character string is converted to the corresponding phonetic symbol string of the principal language type by body language form.
The embodiment of the invention also provides a kind of devices for identifying unusual character string, and described device includes: original character string Acquiring unit is suitable for obtaining original character string;First original character string converting unit, suitable for being converted to the original character string Corresponding picture;Second original character string converting unit, suitable for the original character string is converted to corresponding phonetic symbol string;First Deep learning unit is suitable for inputting the original character string in preset first deep learning model, obtains the first depth Practise feature vector;Second deep learning unit is suitable for inputting the picture in preset second deep learning model, obtains the Two deep learning feature vectors;Third deep learning unit is suitable for the phonetic symbol string inputting preset third deep learning mould In type, third deep learning feature vector is obtained;Standardized character string generation unit is suitable for special according to first deep learning Vector, the second deep learning feature vector and third deep learning feature vector are levied, determines the corresponding mark of the original character string Standardization character string;Unusual character string recognition unit, suitable for by the word in the standardized character string and preset exception database Symbol string is matched, and identifies the unusual character string in the standardized character string;As a result output unit is suitable for output identification knot Fruit.
The embodiment of the invention also provides a kind of data processing equipments, including memory and processor;Wherein, the storage Device is suitable for one or more computer instruction of storage, and the processor executes any of the above-described implementation when running the computer instruction The step of example the method.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described The step of computer instruction executes any of the above-described embodiment the method when running.
Using the scheme of the identification unusual character string of the embodiment of the present invention, the original character string that first will acquire is respectively converted into Then the original character string, picture and phonetic symbol string are inputted the first deep learning mould by corresponding picture and phonetic symbol string respectively In type, the second deep learning model and third deep learning model, corresponding first deep learning feature vector, second are obtained deeply Learning characteristic vector sum third deep learning feature vector is spent, later, deeply based on the first deep learning feature vector, second Learning characteristic vector sum third deep learning feature vector is spent, determines the corresponding standardized character string of the original character string, and The standardized character string is matched with the character string in preset exception database, that is, may recognize that the standardization word Unusual character string in symbol string.Above-mentioned character string identification process, by the way that original character string is converted to picture and phonetic symbol string, then Deep learning is carried out respectively, obtains corresponding feature vector, and the original character string is restored by the feature vector of multiple dimensions Corresponding standardized character string, then unusual character string identification is carried out, the discrimination of deformed characters can be greatlyd improve, so as to To improve the accuracy and accuracy of the identification of unusual character string.Also, entire identification process does not need manually to participate in and adjust, and It is automatic identification, therefore the efficiency of unusual character string identification can be promoted, human cost is greatly reduced.
Further, by the first deep learning feature vector, the second deep learning feature vector and third depth It practises feature vector to be merged, obtains fusion feature vector, then the fusion feature vector is inputted into the 4th deep learning model In, the corresponding standardized character string of the original character string can be obtained, is then identified and is exported.Using the above scheme, By the way that the original character string, picture and the corresponding feature vector of phonetic symbol string are carried out fusion and secondary deep study, Ke Yijin One step deepens the connection between feature vector, obtains more accurate standardized character string, and the identification for improving unusual character string is wide Degree and accuracy, enhancing identify the ability of unusual character string.
It is possible to further determine the first deep learning feature vector, the second deep learning feature vector, respectively Three deep learning feature vectors and the corresponding standardization character string of the 4th deep learning feature vector, and first mark is identified simultaneously Unusual character string in standardization character string, the second standardized character string, third standardized character string and the 4th standardized character string, When, there are when unusual character string, with regard to output, there are the recognition results of unusual character string, real in wherein at least one quasi-ization character string It identifies to existing various dimensions, the omission factor of unusual character string identification can be reduced.
Further, due in the original character string of input may include various spoken and written languages, number, even symbol, So in the phonetic symbol string for being converted into original character string, based on the principal language type of the original character string, by original word Symbol string is identified after being converted to corresponding phonetic symbol string, can expand the application range of unusual character string identification.
Detailed description of the invention
It, below will be to this specification embodiment or existing in order to illustrate more clearly of the technical solution of this specification embodiment Attached drawing needed in technical description is briefly described, it should be apparent that, drawings described below is only this theory Some embodiments of bright book for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow chart for the method for identifying unusual character string in the embodiment of the present invention.
Fig. 2 is a kind of process of the method for the corresponding standardized character string of determining original character string in the embodiment of the present invention Figure.
Fig. 3 is the flow chart of the method for another identification unusual character string in the embodiment of the present invention.
Fig. 4 is a kind of structural schematic diagram for the device for identifying unusual character string in the embodiment of the present invention.
Fig. 5 is a kind of structural schematic diagram of standardized character string generation unit in the embodiment of the present invention.
Fig. 6 is a kind of structural schematic diagram of unusual character string recognition unit in the embodiment of the present invention.
Fig. 7 is the structural schematic diagram of another standardized character string generation unit in the embodiment of the present invention.
Fig. 8 is the schematic diagram that original character string is converted to picture in the embodiment of the present invention.
Specific embodiment
As previously mentioned, the business datum of internet is huge at present, it is not only at high cost if relying solely on manual identified, and Processing speed is slow.And the matched method of unusual character is carried out by regular expression, and it is low to deformed characters resolution, it can not be accurate Identify all unusual characters.It is enjoyed for example, certain user registers new user by other cell-phone numbers on an application service platform By preferential, the clothes on service platform then are informed with combine texts forms such as wrong word, letter and unordered symbols in remarks Business side's real handset number;For another example, it advertises in comment on commodity for oneself shop, with wrong word, letter and unordered symbol Etc. combine texts leave personal contact method.Thus, either manual identified or regular expression matching identifies, can not all expire The data processing needs of the magnanimity business of the existing internet of foot.
In view of the above-mentioned problems, the embodiment of the invention provides a kind of method for identifying unusual character string, the original that first will acquire Beginning character string is respectively converted into corresponding picture and phonetic symbol string, then, the original character string, picture and phonetic symbol string is distinguished defeated Enter in the first deep learning model, the second deep learning model and third deep learning model, obtains corresponding first depth Feature vector, the second deep learning feature vector and third deep learning feature vector are practised, later, is based on first depth Feature vector, the second deep learning feature vector and third deep learning feature vector are practised, determines that the original character string is corresponding Standardized character string, and the standardized character string is matched with the character string in preset exception database Identify the unusual character string in the standardized character string.
To make those skilled in the art more fully understand design, implementation and the advantage of the embodiment of the present invention, with Lower reference attached drawing, is described in detail by concrete application scene.
A kind of flow chart for the method identifying unusual character string in the embodiment of the present invention shown in referring to Fig.1, in the present invention In embodiment, identification unusual character string can use following steps:
S11 obtains original character string.
In specific implementation, the original character string can derive from any desired identification unusual character string on internet The data format of platform, the original character string is determined by the system coding of platform, wherein the system coding can use Existing any character set encoding, such as ASCII coding, GB2312 coding, BIG5 coding, GB18030 coding etc.;It can also adopt With customized character set encoding.By taking e-commerce platform as an example, user can be defeated in the remarks column or comment interface to place an order Enter content of text, the content of text of the available user's input of e-commerce platform, as original character string.
The original character string is respectively converted into corresponding picture and phonetic symbol string by S12.
In specific implementation, the original character string can be converted into corresponding picture and phonetic symbol using various ways String.
In an embodiment of the present invention, original character string can be converted to by the conversion regime of encoding and decoding corresponding black White picture or color image.For example, carrying out the encoding and decoding of Base64 format to original character string, it is converted into corresponding picture.
For phonetic symbol string, in an embodiment of the present invention, the phonetic symbol table of comparisons can be preset in the database and is then passed through The phonetic symbol table of comparisons is compared, original character string is converted into phonetic symbol string.Wherein, the phonetic symbol table of comparisons may include any principal language class The contrast relationship of type and principal language phonetic symbol, for example, the contrast relationship of English alphabet and English phonetic symbol, number and English phonetic symbol Contrast relationship, the contrast relationship of Chinese text and phonetic, symbol and contrast relationship of phonetic symbol etc., specifically can be according to the actual situation Setting.
In addition, the conversion of special picture and phonetic symbol string can be set in order to simplify conversion process and shorten conversion time Module or tool can also carry out the conversion of picture and phonetic symbol string using existing conversion tool.
S13 inputs the original character string in preset first deep learning model, obtains the first deep learning feature Vector inputs the picture in preset second deep learning model, the second deep learning feature vector is obtained, by the sound Mark string inputs in preset third deep learning model, obtains third deep learning feature vector.
In specific implementation, preset first deep learning model, the second deep learning model and third deep learning mould Type may each comprise one or more neural network models for completing training, and the type of the model specifically used can be according to turning The characteristics of changing data is selected and is arranged.
For example, the first deep learning model may include Recognition with Recurrent Neural Network (Recurrent Neural Network, RNN) various models under system, for handling the lexical information and language of original character string in a period of time sequence Adopted information, it is hereby achieved that the first deep learning feature vector comprising the information such as vocabulary and semanteme.
For another example, the second deep learning model may include convolutional neural networks (Convolutional Neural Networks, CNN) various models under system, for handling the characteristic information of each part in picture, it is hereby achieved that packet Second deep learning feature vector of the information of correlated characteristic containing character.Wherein, characteristic information may include text, symbol, letter With the shape informations such as number.
For another example, the third deep learning model may include the various models under Recognition with Recurrent Neural Network system, be used for The pronunciation information and semantic information of processing phonetic symbol string in a period of time sequence, it is hereby achieved that including the information such as vocabulary and semanteme Third deep learning feature vector.
S14, it is special based on the first deep learning feature vector, the second deep learning feature vector and third deep learning Vector is levied, determines the corresponding standardized character string of the original character string.
In specific implementation, the first deep learning feature vector and the second deep learning feature vector can be carried out The reverse resolution of character shape, and the reverse resolution of character pronunciation is carried out to the third deep learning feature vector, Jin Erke With the corresponding standardized character string of the determination original character string.For example, relevant standard character can be preset in the database The shape table of comparisons and the standard character pronunciation table of comparisons, the first deep learning feature vector and the second deep learning feature vector It is matched with the standard character shape table of comparisons, the third deep learning feature vector and the standard character pronunciation pair It is matched according to table.Wherein, the standard character shape table of comparisons and the standard character pronunciation table of comparisons may be set according to actual conditions.
S15 matches the standardized character string with the character string in preset exception database, identifies described Unusual character string in standardized character string.
In specific implementation, preset exception database may be set according to actual conditions.Due to obtaining original character Go here and there corresponding standardized character string, it is more convenient to be matched with the character string in preset exception database.
S16 exports recognition result.
In specific implementation, have unusual character string if recognition result is, can according to presetting, can to user into Row is reminded, and to avoid the generation of unusual character string, recognition result can also be exported to rear end monitoring personnel, is issued to it abnormal Prompt, notes abnormalities, and execute corresponding processing operation in time convenient for monitoring personnel.
The method of identification unusual character string through the foregoing embodiment, by the way that original character string is converted to picture and phonetic symbol String, then carries out deep learning respectively, obtains corresponding feature vector, multiple by original character string, picture and phonetic symbol string etc. The feature vector of dimension restores the corresponding standardized character string of the original character string, then carries out unusual character string identification, can be with The discrimination of deformed characters is greatlyd improve, so as to improve the accuracy and accuracy of the identification of unusual character string.Also, it is whole A identification process does not need manually to participate in and adjust, but automatic identification, therefore can promote the efficiency of unusual character string identification, Human cost is greatly reduced.
To more fully understand those skilled in the art and realizing the embodiment of the present invention, below by way of specific application scenarios It is described in detail and how to identify unusual character string.
In an embodiment of the present invention, an application service platform needs to identify the exception that user leaves in original character string Content is the data of email address.Assuming that the user inputs the content by Character deformation in comment or remarks are as follows: " 1. two Three Ai Te qq.c0m".The system coding of the application service platform is encoded using ASCII, therefore can obtain " 1. 23 Ai Te qq.The data of the corresponding ASCII hexadecimal code format of c0m " are as follows: " 2460 8d30 4e09 827e 7,279 0071 0071 3,002 0,063 0030 006d ", using space as separator, above-mentioned ASCII hexadecimal code data are as original Character string.
Then, can be converted to the original character string by decoded conversion regime includes " 1. 23 Ai Te qq. The picture of c0m " content uses Base64 encoding and decoding in the present embodiment, and the original character string is converted to corresponding picture, such as schemes In 8 shown in picture 80.
Also, original character string can be converted to according to the preset phonetic symbol table of comparisons by corresponding phonetic symbol string, i.e. " yi er san ai te kju:kju:ju hao si:ling em”。
As previously mentioned, deep learning model used by step s 13 can be used according to the data characteristics inputted Corresponding neural network model.In the present embodiment, the first deep learning model may include first circulation neural network Model, the second deep learning model may include convolutional neural networks model, and the third deep learning model can wrap Include second circulation neural network model.
After above-mentioned data processing, the original character string is inputted into the first deep learning model, by circulation After Processing with Neural Network export N1 tie up the first deep learning feature vector [Xi], wherein i=1,2,3 ... N1, N1 be not less than 1 natural number;Xi indicates the maximum probability that i-th of output is predicted according to the original character string, the numerical value of Xi [0,1] it Between.
It is understood that being trained according to actual use scene using different training datas, available difference First deep learning model of function.For example, the first deep learning model can be used in original character string described in screening The interference data of syntax rule are not met, then the training data of available standard syntax, to the first deep learning model It is trained.After completing training, the first deep learning model can carry out grammer screening processing to the data of input, then The maximum probability array for the grammaticality predicted according to the original character string is exported, it is thus special as the first deep learning Levy vector.
As previously mentioned, the picture is inputted into the second deep learning model, it is defeated after convolutional neural networks are handled N2 ties up the second deep learning feature vector [Yi] out, wherein i=1,2,3 ... N2, N2 are the natural number not less than 1;Yi is indicated The maximum probability exported according to i-th of the picture prediction, the numerical value of Yi is between [0,1].
It is understood that being trained according to actual use scene using different training datas, available difference Second deep learning model of function, for example, the second deep learning model is used to extract the character string in the picture, then The training data of available reference character string label is trained.After completing training, the second deep learning model can be with Text string extracting processing is carried out to the picture of input, then exports the character string maximum probability array according to the picture prediction, Thus it is used as the second deep learning feature vector.
As previously mentioned, the phonetic symbol string is inputted the third deep learning model, after convolutional neural networks are handled Export N3 dimension third deep learning feature vector [Zi], wherein i=1,2,3 ... N3, N3 are the natural number not less than 1;Zi table Show the maximum probability that i-th of output is predicted according to the phonetic symbol string, the numerical value of Zi is between [0,1].
It is understood that being trained according to actual use scene using different training datas, available difference The third deep learning model of function, for example, the third deep learning model is not for meeting in phonetic symbol string described in screening The interference data of phonetic symbol rule, then the training data of available mark phonetic symbol label is trained.After completing training, described the Three deep learning models can carry out phonetic symbol rule screening processing to the phonetic symbol string of input, then export pre- according to the phonetic symbol string Thus the phonetic symbol string maximum probability array of survey is used as third deep learning feature vector.
It later, can be according to preset relevant standard character shape in the database of application service platform place system The shape table of comparisons and the standard character pronunciation table of comparisons, respectively to the first deep learning feature vector [Xi], the second deep learning Feature vector [Yi] and third deep learning feature vector [Zi] carry out analytical reverse phase, so that it is corresponding to obtain the original character string The first standardized character string, the second standardized character string and third standardized character string.
Wherein, the standard character shape table of comparisons may include text, symbol, letter, number standard shape etc. wherein extremely Contrast relationship between a kind of few character and nonnegative number no more than 1, the standard character pronunciation table of comparisons may include text, symbol Number, the contrast relationship between the wherein at least one such as standard pronunciation of letter, number and nonnegative number no more than 1.In addition, standard The character shape table of comparisons can also include the contrast relationship between radical standard shape and nonnegative number no more than 1, standard The character pronunciation table of comparisons can also be comprising between text, symbol, letter, the fuzzy readings of number and nonnegative number no more than 1 Contrast relationship.
Detailed process is as follows for analytical reverse phase:
1) the first deep learning feature vector [Xi] is matched with the standard character shape table of comparisons, it can be with Identify warped digital " 1. " similar with digital " 1 " shape, and, punctuation mark similar with punctuation mark " " shape ".", therefore, the first standardized character string of acquisition are as follows: " 1 two three Ai Te qq.c0m ".
2) the second deep learning feature vector [Yi] is matched with the standard character shape table of comparisons, it can be with Identify warped digital " 1. " similar with digital " 1 " shape, digital " 0 " similar with alphabetical " o " shape, with punctuation mark The similar punctuation mark of " " shape ".", it might even be possible to identify the similar deformed letters " three " of digital " 3 " shape, the of acquisition Two standardized character strings are as follows: " 1 two 3 Ai Te qq.com ".
3) the third deep learning feature vector [Zi] is matched with the standard character pronunciation table of comparisons, it can be with Identify it is identical with punctuation mark " " (ju hao) pronunciation "." (ju hao), text identical with digital " 2 " (er) pronunciation " two " (er), text " three " (san) identical with digital " 3 " (san) pronunciation, and it is identical with symbol "@" (ai te) pronunciation Text " Ai Te " (ai te).The third standardized character string of acquisition are as follows: " 123@qq.c0m ".
Again by the first standardized character string, the second standardized character string and third standardized character string respectively with it is default Exception database in character string matched, identify unusual character string therein, when the first standardized character string, It is defeated when at least one standardized character string identifies unusual character string in second standardized character string and third standardized character string There is the recognition result of unusual character string out.
For example, the first standardized character string and third standardized character string are respectively and in preset exception database After character string is matched, the unidentified relevant unusual character string " qq.com " of mailbox out, still, second standardized character String " 1 two 3 Ai Te qq.com " is matched with the character string in preset exception database, identifies the relevant abnormal word of mailbox Symbol string " qq.com ".
Using the above scheme, the corresponding standardized character string of the original character string is restored by multiple feature vectors, then The identification of unusual character string is carried out, the unusual character string that identification deforms in terms of character, picture and phonetic symbol three respectively.
In specific implementation, the unusual character string of identification deformation still may be used in terms of character, picture and phonetic symbol three respectively Can have can not identify unusual character string, identify the problems such as unusual character string of mistake, for example, the mailbox of setting is relevant different Normal character string is "@qq.com ", then the first to three standardized character string can not identify the relevant exception of mailbox respectively Character string.For this purpose, can make further to extend and optimize to step S14, so that it is determined that standardized character string.Below by way of tool Body embodiment is described in detail.
In embodiments of the present invention, referring to a kind of corresponding standardized character of the determination original character string shown in Fig. 2 The flow chart of the method for string, can specifically include following steps:
It is special to merge the first deep learning feature vector, the second deep learning feature vector and third deep learning by S21 Vector is levied, fusion feature vector is obtained.
In specific implementation, the first deep learning feature vector, the second deep learning feature vector and third are merged The method of deep learning feature vector can be using following at least one mode:
1, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to Amount head and the tail connect, and obtain N1+N2+N3 dimension fusion feature vector [Xi, Yi, Zi].
2, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to Random combine is measured, N1+N2+N3 dimension fusion feature vector [Ri] is obtained, wherein Ri ∈ gathers { Xi, Yi, Zi }.
3, respectively that the first deep learning feature vector, the second deep learning feature vector and third deep learning is special Sign vector carries out transposition and combines, and obtains N1+N2+N3 and ties up fusion feature vector [XiT,YiT,ZiT] or [Hi], wherein Hi ∈ collection Close { XiT,YiT,ZiT}。
It is understood that the method for actual fused be not limited to it is above-mentioned several, can also according to other different dimensions, will The first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector carry out at fusion Reason.
The fusion feature vector is inputted in preset 4th deep learning model, obtains the original character string by S22 Corresponding 4th standardized character string.
Wherein, the preset 4th deep learning model can be using one or more neural network moulds for completing training Type is for example, various models and multi-layer perception (MLP) (Multi Layer Perceptron, MLP) under RNN system, RNN model energy The acquisition speed of lifting feature vector, the output accuracy rate of MLP energy lifting feature vector.
In specific implementation, the training set of the 4th deep learning model may include various modifications character shape and corresponding The training data and various modifications character pronunciation of standard character shape and the training data of corresponding standard character pronunciation, the After the completion of four deep learning models are by training set training, the fusion feature vector is inputted to the 4th depth for completing training It practises in model, corresponding 4th standardized character string " 123@qq.com " is obtained by shape matching and pronunciation matching, it then, can To match the 4th standardized character string with the character string in preset exception database, exception therein is identified Character string "@qq.com " simultaneously exports recognition result.
In conjunction with above-described embodiment, as shown in figure 3, for the method for identifying unusual character string another in the embodiment of the present invention Flow chart, method and step are as follows:
S31 obtains original character string.
Original character string is converted to picture by S32-1.
Original character string is converted to phonetic symbol string by S32-2.
Original character string is inputted the first deep learning model by S33-1.
Picture is inputted the second deep learning model by S33-2.
Phonetic symbol string is inputted third deep learning model by S33-3.
S34-1 obtains the first deep learning feature vector.
S34-2 obtains the second deep learning feature vector.
S34-3 obtains third deep learning feature vector.
S35, fusion first to third deep learning feature vector.
S36 is inputted the first of fusion to third deep learning feature vector in the 4th deep learning model.
S37 can obtain the 4th standardized character string after the 4th deep learning model treatment.
S38 identifies the unusual character string in the 4th standardized character string.
S39 exports recognition result.
Using the above scheme, by merging the original character string, picture and the corresponding feature vector of phonetic symbol string Learn with secondary deep, can further deepen the connection between feature vector, obtains more accurate standardized character string, mention The apprehension span of high unusual character string and accuracy, enhancing identify the ability of unusual character string.
In specific implementation, step S15 can also be made further to extend and optimize, so that it is determined that standardized character String.It is described in detail below by way of specific embodiment.
It in embodiments of the present invention, can be by the first standardized character string, the second standardized character string, third standard Change character string and the 4th standardized character string is matched with the character string in preset exception database respectively, as long as identifying In the first standardized character string, the second standardized character string, third standardized character string and the 4th standardized character string extremely A kind of less there are unusual character strings, and just there are the recognition results of unusual character string for output, identify with realizing various dimensions, can reduce The omission factor of unusual character string identification.
In specific implementation, due in the original character string of input may comprising various spoken and written languages, number, even accord with Number, so, it, will be former based on the principal language type of the original character string in the phonetic symbol string for being converted into original character string Beginning character string is identified after being converted to corresponding phonetic symbol string, can expand the application range of unusual character string identification.
The embodiment of the invention also provides identify unusual character strings corresponding with the method for above-mentioned identification unusual character string Device referring to the drawings, passes through specific implementation to more fully understand those skilled in the art and realizing the embodiment of the present invention Example describes in detail.
Referring to the structural schematic diagram of the device for identifying unusual character string a kind of in the embodiment of the present invention shown in Fig. 4, at this In inventive embodiments, the device 400 of the identification unusual character string may include:
Original character string acquiring unit 401 is suitable for obtaining original character string;
First original character string converting unit 402, suitable for the original character string is converted to corresponding picture;
Second original character string converting unit 403, suitable for the original character string is converted to corresponding phonetic symbol string;
First deep learning unit 404 is suitable for inputting the original character string in preset first deep learning model, Obtain the first deep learning feature vector;
Second deep learning unit 405 is suitable for inputting the picture in preset second deep learning model, obtains the Two deep learning feature vectors;
Third deep learning unit 406 is suitable for inputting the phonetic symbol string in preset third deep learning model, obtain Third deep learning feature vector;
Standardized character string generation unit 407 is suitable for according to the first deep learning feature vector, the second deep learning Feature vector and third deep learning feature vector determine the corresponding standardized character string of the original character string;
Unusual character string recognition unit 408, suitable for by the word in the standardized character string and preset exception database Symbol string is matched, and identifies the unusual character string in the standardized character string;
As a result output unit 409 are suitable for output recognition result.
Using the above scheme, by the way that original character string is converted to picture and phonetic symbol string, deep learning is then carried out respectively, Corresponding feature vector is obtained, the corresponding standardized character of the original character string is restored by the feature vector of multiple dimensions String, then unusual character string identification is carried out, the discrimination of deformed characters can be greatlyd improve, so as to improve unusual character string The accuracy and accuracy of identification.Also, entire identification process does not need manually to participate in and adjust, but automatic identification, therefore The efficiency that the identification of unusual character string can be promoted, is greatly reduced human cost.
In an embodiment of the present invention, as shown in figure 5, the standardized character string generation unit 407 may include:
First standardized character concatenates into subelement 501, is suitable for obtaining institute according to the first deep learning feature vector State the corresponding first standardized character string of original character string;
Second standardized character concatenates into subelement 502, is suitable for obtaining institute according to the second deep learning feature vector State the corresponding second standardized character string of original character string;
Third standardized character concatenates into subelement 503, is suitable for obtaining institute according to the third deep learning feature vector State the corresponding third standardized character string of original character string.
As shown in fig. 6, the unusual character string recognition unit 408 may include:
First unusual character string identifies subelement 601, is suitable for the first standardized character string and preset abnormal number It is matched according to the character string in library, identifies the unusual character string in the first standardized character string;
Second unusual character string identifies subelement 602, is suitable for the second standardized character string and preset abnormal number It is matched according to the character string in library, identifies the unusual character string in the second standardized character string;
Third unusual character string identifies subelement 603, is suitable for the third standardized character string and preset abnormal number It is matched according to the character string in library, identifies the unusual character string in the third standardized character string.
In specific implementation, device 400 can also be made further to extend and optimize, so that it is determined that standardized character String.It is described in detail below by way of specific embodiment.
In an embodiment of the present invention, can by the original character string, picture and the corresponding feature vector of phonetic symbol string into Row fusion and secondary deep study, further deepen the connection between feature vector, are described further in conjunction with Fig. 4 and Fig. 7, As shown in fig. 7, the standardized character string generation unit 407 may include:
Feature vector merges subelement 701, is suitable for merging the first deep learning feature vector, the second deep learning special Vector sum third deep learning feature vector is levied, fusion feature vector is obtained.
Deep learning subelement 702 is suitable for inputting the fusion feature vector in preset 4th deep learning model, Determine the corresponding 4th standardized character string of the original character string.
Then, the unusual character string recognition unit 408 can be by the 4th standardized character string and preset exception Character string in database is matched, and identifies the unusual character string in the standardized character string, finally, the result is defeated Unit 409 exports recognition result out.
Using the above scheme, by merging the original character string, picture and the corresponding feature vector of phonetic symbol string Learn with secondary deep, can further deepen the connection between feature vector, obtains more accurate standardized character string, mention The apprehension span of high unusual character string and accuracy, enhancing identify the ability of unusual character string.
In still another embodiment of the process, the first standardized character string, the second standardized character can be identified respectively String, third standardized character string and the 4th standardized character string, are described further in conjunction with Fig. 4, Fig. 5 and Fig. 6.
As shown in figure 5, the standardized character string generation unit 407 except the first standardized character concatenate into subelement 501, Second standardized character concatenates into subelement 502 and third standardized character is concatenated into outside subelement 503, can also include:
Feature vector merges subelement 701, is suitable for merging the first deep learning feature vector, the second deep learning special Vector sum third deep learning feature vector is levied, fusion feature vector is obtained.
Deep learning subelement 702 is suitable for inputting the fusion feature vector in preset 4th deep learning model, Determine the corresponding 4th standardized character string of the original character string.
As shown in fig. 6, the unusual character string recognition unit 408 identifies subelement 601, second except the first unusual character string Unusual character string identifies outside subelement 602 and third unusual character string identification subelement 603, can also include:
4th unusual character string identifies subelement 604, is suitable for the 4th standardized character string and preset abnormal number It is matched according to the character string in library, identifies the unusual character string in the 4th standardized character string.
In specific implementation, by the first standardized character string, the second standardized character string, third standardized character string It is matched respectively with the character string in preset exception database with the 4th standardized character string, as long as identifying described first At least one of standardized character string, the second standardized character string, third standardized character string and the 4th standardized character string are deposited In unusual character string, with regard to output, there are the recognition results of unusual character string, identify with realizing various dimensions, can reduce unusual character The omission factor of string identification.
In specific implementation, the first deep learning feature vector, the second deep learning feature vector and third are merged The method of deep learning feature vector may include following at least one:
1, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to Amount head and the tail connect.
2, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to Measure random combine.
3, respectively that the first deep learning feature vector, the second deep learning feature vector and third deep learning is special Sign vector carries out transposition and combines.
It is understood that the method for actual fused be not limited to it is above-mentioned several, can also according to other different dimensions, will The first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector are handled.
In specific implementation, first to fourth described preset deep learning model can use one or more neural networks Model training forms.Wherein, the first deep learning model may include first circulation neural network model, and described second is deep Spending learning model may include convolutional neural networks model, and the third deep learning model may include second circulation nerve net Network model, the preset 4th deep learning model may include Recognition with Recurrent Neural Network model and convolutional neural networks model.
In specific implementation, due in the original character string of input may comprising various spoken and written languages, number, even accord with Number, so, in the phonetic symbol string for being converted into original character string, the second original character string converting unit is based on described original The principal language type of character string is known after original character string is converted to the corresponding phonetic symbol string of the principal language type Not, the application range of unusual character string identification can be expanded.
The embodiment of the invention also provides a kind of data processing equipment, including memory and processor, on the memory It is stored with the computer instruction that can be run on the processor, the processor can execute when running the computer instruction Described in any of the above-described embodiment of the present invention the step of method of identification unusual character string.The computer instruction executes when running Identification unusual character string method specific implementation be referred in above-described embodiment identification unusual character string method step Suddenly, it repeats no more.
The data processing equipment can be handheld terminals, tablet computer, the personal desktop computers such as mobile phone etc..
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described Computer instruction can execute the step of any of the above-described embodiment method of the invention when running.
Wherein, the computer readable storage medium can be CD, mechanical hard disk, solid state hard disk etc. it is various it is appropriate can Read storage medium.The method of the identification unusual character string of the instruction execution stored on the computer readable storage medium, specifically The embodiment that can refer to the method for above-mentioned each identification unusual character string, repeats no more.
To sum up, the embodiment of the invention discloses A1 embodiment, a method of identification unusual character string, comprising:
Obtain original character string;
The original character string is respectively converted into corresponding picture and phonetic symbol string;
The original character string is inputted in preset first deep learning model, obtain the first deep learning feature to Amount inputs the picture in preset second deep learning model, the second deep learning feature vector is obtained, by the phonetic symbol String inputs in preset third deep learning model, obtains third deep learning feature vector;
Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to Amount, determines the corresponding standardized character string of the original character string;
The standardized character string is matched with the character string in preset exception database, identifies the standard Change the unusual character string in character string;
Export recognition result.
The embodiment of the invention discloses A2 embodiments, the method for the identification unusual character string as described in A1 embodiment, described Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, determine The corresponding standardized character string of the original character string, comprising:
Merge the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to Amount obtains fusion feature vector;
The fusion feature vector is inputted in preset 4th deep learning model, it is corresponding to obtain the original character string Standardized character string.
The embodiment of the invention discloses A3 embodiments, the method for the identification unusual character string as described in A1 embodiment, described Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, determine The corresponding standardized character string of the original character string, comprising:
It is special to be based respectively on the first deep learning feature vector, the second deep learning feature vector and third deep learning Vector is levied, the corresponding first standardized character string of the original character string, the second standardized character string and third standardization are obtained Character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described Unusual character string in standardized character string, comprising:
By the first standardized character string, the second standardized character string and third standardized character string respectively with it is preset Character string in exception database is matched, and identifies the first standardized character string, the second standardized character string and Unusual character string in three standardized character strings.
The embodiment of the invention discloses A4 embodiments, the method for the identification unusual character string as described in A3 embodiment, described Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, determine The corresponding standardized character string of the original character string, further includes:
Merge the first deep learning feature vector, the second deep learning feature vector and the third depth Feature vector is practised, fusion feature vector is obtained;
The fusion feature vector is inputted in preset 4th deep learning model, it is corresponding to obtain the original character string The 4th standardized character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described Unusual character string in standardized character string, further includes:
The 4th standardized character string is matched with the character string in preset exception database, is identified described Unusual character string in 4th standardized character string.
The embodiment of the invention discloses A5 embodiments, the identification unusual character string as described in A2 embodiment or A4 embodiment Method, it is described to merge the first deep learning feature vector, the second deep learning feature vector and the third depth Learning characteristic vector, comprising:
By the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector Head and the tail connect.
The embodiment of the invention discloses A6 embodiments, the method for the identification unusual character string as described in A1 embodiment, described First deep learning model includes first circulation neural network model, and the second deep learning model includes convolutional neural networks Model, the third deep learning model include second circulation neural network model.
The embodiment of the invention discloses A7 embodiment, the identification as described in any one of A1 to A4 embodiment or A6 embodiment is different The method of normal character string, it is described that the original character string is converted into phonetic symbol string, comprising:
Based on the principal language type of the original character string, the original character string is converted into the principal language class The corresponding phonetic symbol string of type.
The embodiment of the invention discloses B1 embodiment, a kind of device identifying unusual character string, comprising:
Original character string acquiring unit is suitable for obtaining original character string;
First original character string converting unit, suitable for the original character string is converted to corresponding picture;
Second original character string converting unit, suitable for the original character string is converted to corresponding phonetic symbol string;
First deep learning unit is suitable for inputting the original character string in preset first deep learning model, obtain Obtain the first deep learning feature vector;
Second deep learning unit is suitable for inputting the picture in preset second deep learning model, obtains second Deep learning feature vector;
Third deep learning unit is suitable for inputting the phonetic symbol string in preset third deep learning model, obtains the Three deep learning feature vectors;
Standardized character string generation unit is suitable for special according to the first deep learning feature vector, the second deep learning Vector sum third deep learning feature vector is levied, determines the corresponding standardized character string of the original character string;
Unusual character string recognition unit, suitable for by the character string in the standardized character string and preset exception database It is matched, identifies the unusual character string in the standardized character string;
As a result output unit is suitable for output recognition result.
The embodiment of the invention discloses B2 embodiments, the device of the identification unusual character string as described in B1 embodiment, described Standardized character string generation unit includes:
Feature vector merges subelement, is suitable for merging the first deep learning feature vector, the second deep learning feature Vector sum third deep learning feature vector obtains fusion feature vector;
Deep learning subelement is suitable for inputting the fusion feature vector in preset 4th deep learning model, really Determine the corresponding standardized character string of the original character string.
The embodiment of the invention discloses B3 embodiments, the device of the identification unusual character string as described in B1 embodiment, described Standardized character string generation unit includes:
First standardized character concatenates into subelement, is suitable for being obtained described according to the first deep learning feature vector The corresponding first standardized character string of original character string;
Second standardized character concatenates into subelement, is suitable for being obtained described according to the second deep learning feature vector The corresponding second standardized character string of original character string;
Third standardized character concatenates into subelement, is suitable for being obtained described according to the third deep learning feature vector The corresponding third standardized character string of original character string;
The unusual character string recognition unit includes:
First unusual character string identifies subelement, is suitable for the first standardized character string and preset exception database In character string matched, identify the unusual character string in the first standardized character string;
Second unusual character string identifies subelement, is suitable for the second standardized character string and preset exception database In character string matched, identify the unusual character string in the second standardized character string;
Third unusual character string identifies subelement, is suitable for the third standardized character string and preset exception database In character string matched, identify the unusual character string in the third standardized character string.
The embodiment of the invention discloses B4 embodiments, the device of the identification unusual character string as described in B3 embodiment, described Standardized character string generation unit further include:
Feature vector merges subelement, is suitable for merging the first deep learning feature vector, the second deep learning feature Vector sum third deep learning feature vector obtains fusion feature vector;
Deep learning subelement is suitable for inputting the fusion feature vector in preset 4th deep learning model, really Determine the corresponding 4th standardized character string of the original character string;
The unusual character string recognition unit further include:
4th unusual character string identifies subelement, is suitable for the 4th standardized character string and preset exception database In character string matched, identify the unusual character string in the 4th standardized character string.
The embodiment of the invention discloses B5 embodiments, the dress of the identification unusual character string as described in B2 or B4 any embodiment It sets, described eigenvector merges subelement, is suitable for the first deep learning feature vector, the second deep learning feature vector It is connected with third deep learning feature vector head and the tail.
The embodiment of the invention discloses B6 embodiments, the device of the identification unusual character string as described in B1 embodiment, described First deep learning model includes first circulation neural network model, and the second deep learning model includes convolutional neural networks Model, the third deep learning model include second circulation neural network model.
The embodiment of the invention discloses B7 embodiments, and the identification as described in B1 to B4 any embodiment or B6 embodiment is abnormal The device of character string, the second original character string converting unit, suitable for the principal language type according to the original character string, The original character string is converted into the corresponding phonetic symbol string of the principal language type.
The embodiment of the invention discloses C1 embodiment, a kind of data processing equipment, including memory and processor;Wherein, The memory is suitable for one or more computer instruction of storage, and the processor executes A1 extremely when running the computer instruction The step of A7 any embodiment the method.
The embodiment of the invention discloses D1 embodiment, a kind of computer readable storage medium is stored thereon with computer and refers to It enables, the step of A1 is to A7 any embodiment the method is executed when the computer instruction is run.
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute Subject to the range of restriction.

Claims (10)

1. a kind of method for identifying unusual character string characterized by comprising
Obtain original character string;
The original character string is respectively converted into corresponding picture and phonetic symbol string;
The original character string is inputted in preset first deep learning model, the first deep learning feature vector is obtained, it will The picture inputs in preset second deep learning model, obtains the second deep learning feature vector, and the phonetic symbol string is defeated Enter in preset third deep learning model, obtains third deep learning feature vector;
Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, Determine the corresponding standardized character string of the original character string;
The standardized character string is matched with the character string in preset exception database, identifies the standardization word Unusual character string in symbol string;
Export recognition result.
2. the method for identification unusual character string according to claim 1, which is characterized in that described to be based on first depth Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector, determine the original character string pair The standardized character string answered, comprising:
The first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector are merged, Obtain fusion feature vector;
The fusion feature vector is inputted in preset 4th deep learning model, the corresponding mark of the original character string is obtained Standardization character string.
3. the method for identification unusual character string according to claim 1, which is characterized in that described to be based on first depth Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector, determine the original character string pair The standardized character string answered, comprising:
Be based respectively on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to Amount, obtains the corresponding first standardized character string of the original character string, the second standardized character string and third standardized character String;
It is described to match the standardized character string with the character string in preset exception database, identify the standard Change the unusual character string in character string, comprising:
By the first standardized character string, the second standardized character string and third standardized character string respectively with preset exception Character string in database is matched, and identifies the first standardized character string, the second standardized character string and third mark Unusual character string in standardization character string.
4. the method for identification unusual character string according to claim 3, which is characterized in that described to be based on first depth Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector, determine the original character string pair The standardized character string answered, further includes:
It is special to merge the first deep learning feature vector, the second deep learning feature vector and the third deep learning Vector is levied, fusion feature vector is obtained;
The fusion feature vector is inputted in preset 4th deep learning model, the original character string corresponding the is obtained Four standardized character strings;
It is described to match the standardized character string with the character string in preset exception database, identify the standard Change the unusual character string in character string, further includes:
The 4th standardized character string is matched with the character string in preset exception database, identifies the described 4th Unusual character string in standardized character string.
5. the method for identification unusual character string according to claim 2 or 4, which is characterized in that the fusion described first Deep learning feature vector, the second deep learning feature vector and the third deep learning feature vector, comprising:
By the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector head and the tail Connection.
6. the method for identification unusual character string according to claim 1, which is characterized in that the first deep learning model Including first circulation neural network model, the second deep learning model includes convolutional neural networks model, and the third is deep Spending learning model includes second circulation neural network model.
7. according to claim 1 to the method for any one of 4 or as claimed in claim 6 identification unusual character strings, which is characterized in that It is described that the original character string is converted into phonetic symbol string, comprising:
Based on the principal language type of the original character string, the original character string is converted into the principal language type pair The phonetic symbol string answered.
8. a kind of device for identifying unusual character string characterized by comprising
Original character string acquiring unit is suitable for obtaining original character string;
First original character string converting unit, suitable for the original character string is converted to corresponding picture;
Second original character string converting unit, suitable for the original character string is converted to corresponding phonetic symbol string;
First deep learning unit is suitable for inputting the original character string in preset first deep learning model, obtains the One deep learning feature vector;
Second deep learning unit is suitable for inputting the picture in preset second deep learning model, obtains the second depth Learning characteristic vector;
Third deep learning unit is suitable for inputting the phonetic symbol string in preset third deep learning model, it is deep to obtain third Spend learning characteristic vector;
Standardized character string generation unit, be suitable for according to the first deep learning feature vector, the second deep learning feature to Amount and third deep learning feature vector, determine the corresponding standardized character string of the original character string;
Unusual character string recognition unit, suitable for carrying out the character string in the standardized character string and preset exception database Matching, identifies the unusual character string in the standardized character string;
As a result output unit is suitable for output recognition result.
9. a kind of data processing equipment, including memory and processor;Wherein, the memory is suitable for one or more meter of storage Calculation machine instruction, which is characterized in that perform claim requires described in 1 to 7 any one when the processor runs the computer instruction The step of method.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction Perform claim requires the step of any one of 1 to 7 the method when operation.
CN201910802851.0A 2019-08-28 2019-08-28 Method, device and equipment for identifying abnormal character string and readable storage medium Active CN110516125B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910802851.0A CN110516125B (en) 2019-08-28 2019-08-28 Method, device and equipment for identifying abnormal character string and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910802851.0A CN110516125B (en) 2019-08-28 2019-08-28 Method, device and equipment for identifying abnormal character string and readable storage medium

Publications (2)

Publication Number Publication Date
CN110516125A true CN110516125A (en) 2019-11-29
CN110516125B CN110516125B (en) 2020-05-08

Family

ID=68628417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910802851.0A Active CN110516125B (en) 2019-08-28 2019-08-28 Method, device and equipment for identifying abnormal character string and readable storage medium

Country Status (1)

Country Link
CN (1) CN110516125B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113382000A (en) * 2021-06-09 2021-09-10 北京天融信网络安全技术有限公司 UA character string anomaly detection method, device, equipment and medium
CN113792820A (en) * 2021-11-15 2021-12-14 航天宏康智能科技(北京)有限公司 Countermeasure training method and device for user behavior log anomaly detection model

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03141484A (en) * 1989-10-26 1991-06-17 Nec Corp Method and device for segmenting character
CN103000176A (en) * 2012-12-28 2013-03-27 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN107633343A (en) * 2017-08-09 2018-01-26 杭州洋驼网络科技有限公司 Transaction data changes risk recognition system and method
CN108108732A (en) * 2016-11-25 2018-06-01 财团法人工业技术研究院 Character recognition system and character recognition method thereof
CN109117848A (en) * 2018-09-07 2019-01-01 泰康保险集团股份有限公司 A kind of line of text character identifying method, device, medium and electronic equipment
CN109460461A (en) * 2018-11-13 2019-03-12 苏州思必驰信息科技有限公司 Text matching technique and system based on text similarity model
CN109522558A (en) * 2018-11-21 2019-03-26 金现代信息产业股份有限公司 A kind of Chinese wrongly written character bearing calibration based on deep learning
CN109547455A (en) * 2018-12-06 2019-03-29 南京邮电大学 Industrial Internet of Things anomaly detection method, readable storage medium storing program for executing and terminal
CN109739370A (en) * 2019-01-10 2019-05-10 北京帝派智能科技有限公司 A kind of language model training method, method for inputting pinyin and device
CN109753987A (en) * 2018-04-18 2019-05-14 新华三信息安全技术有限公司 File identification method and feature extracting method
CN109816118A (en) * 2019-01-25 2019-05-28 上海深杳智能科技有限公司 A kind of method and terminal of the creation structured document based on deep learning model
CN110083819A (en) * 2018-01-26 2019-08-02 北京京东尚科信息技术有限公司 Spell error correction method, device, medium and electronic equipment
CN110110577A (en) * 2019-01-22 2019-08-09 口碑(上海)信息技术有限公司 Identify method and device, the storage medium, electronic device of name of the dish
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal
CN110135261A (en) * 2019-04-15 2019-08-16 北京易华录信息技术股份有限公司 A kind of method and system of trained road anomalous identification model, road anomalous identification

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03141484A (en) * 1989-10-26 1991-06-17 Nec Corp Method and device for segmenting character
CN103000176A (en) * 2012-12-28 2013-03-27 安徽科大讯飞信息科技股份有限公司 Speech recognition method and system
CN108108732A (en) * 2016-11-25 2018-06-01 财团法人工业技术研究院 Character recognition system and character recognition method thereof
CN107633343A (en) * 2017-08-09 2018-01-26 杭州洋驼网络科技有限公司 Transaction data changes risk recognition system and method
CN110083819A (en) * 2018-01-26 2019-08-02 北京京东尚科信息技术有限公司 Spell error correction method, device, medium and electronic equipment
CN109753987A (en) * 2018-04-18 2019-05-14 新华三信息安全技术有限公司 File identification method and feature extracting method
CN109117848A (en) * 2018-09-07 2019-01-01 泰康保险集团股份有限公司 A kind of line of text character identifying method, device, medium and electronic equipment
CN109460461A (en) * 2018-11-13 2019-03-12 苏州思必驰信息科技有限公司 Text matching technique and system based on text similarity model
CN109522558A (en) * 2018-11-21 2019-03-26 金现代信息产业股份有限公司 A kind of Chinese wrongly written character bearing calibration based on deep learning
CN109547455A (en) * 2018-12-06 2019-03-29 南京邮电大学 Industrial Internet of Things anomaly detection method, readable storage medium storing program for executing and terminal
CN109739370A (en) * 2019-01-10 2019-05-10 北京帝派智能科技有限公司 A kind of language model training method, method for inputting pinyin and device
CN110110577A (en) * 2019-01-22 2019-08-09 口碑(上海)信息技术有限公司 Identify method and device, the storage medium, electronic device of name of the dish
CN109816118A (en) * 2019-01-25 2019-05-28 上海深杳智能科技有限公司 A kind of method and terminal of the creation structured document based on deep learning model
CN110135261A (en) * 2019-04-15 2019-08-16 北京易华录信息技术股份有限公司 A kind of method and system of trained road anomalous identification model, road anomalous identification
CN110135414A (en) * 2019-05-16 2019-08-16 京北方信息技术股份有限公司 Corpus update method, device, storage medium and terminal

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113382000A (en) * 2021-06-09 2021-09-10 北京天融信网络安全技术有限公司 UA character string anomaly detection method, device, equipment and medium
CN113792820A (en) * 2021-11-15 2021-12-14 航天宏康智能科技(北京)有限公司 Countermeasure training method and device for user behavior log anomaly detection model
CN113792820B (en) * 2021-11-15 2022-02-08 航天宏康智能科技(北京)有限公司 Countermeasure training method and device for user behavior log anomaly detection model

Also Published As

Publication number Publication date
CN110516125B (en) 2020-05-08

Similar Documents

Publication Publication Date Title
US11386271B2 (en) Mathematical processing method, apparatus and device for text problem, and storage medium
CN108959246B (en) Answer selection method and device based on improved attention mechanism and electronic equipment
CN107220235B (en) Speech recognition error correction method and device based on artificial intelligence and storage medium
WO2020186778A1 (en) Error word correction method and device, computer device, and storage medium
CN110288980A (en) Audio recognition method, the training method of model, device, equipment and storage medium
CN113313022B (en) Training method of character recognition model and method for recognizing characters in image
CN107679032A (en) Voice changes error correction method and device
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN111160041B (en) Semantic understanding method and device, electronic equipment and storage medium
CN113205817A (en) Speech semantic recognition method, system, device and medium
CN113158687A (en) Semantic disambiguation method and device, storage medium and electronic device
CN110516125A (en) Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string
CN113255331B (en) Text error correction method, device and storage medium
CN112966476B (en) Text processing method and device, electronic equipment and storage medium
CN113918031A (en) System and method for Chinese punctuation recovery using sub-character information
CN113268989A (en) Polyphone processing method and device
CN113642569A (en) Unstructured data document processing method and related equipment
CN112765330A (en) Text data processing method and device, electronic equipment and storage medium
US20230153550A1 (en) Machine Translation Method and Apparatus, Device and Storage Medium
CN113345409B (en) Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal
CN110245331A (en) A kind of sentence conversion method, device, server and computer storage medium
CN112818688B (en) Text processing method, device, equipment and storage medium
WO2022141855A1 (en) Text regularization method and apparatus, and electronic device and storage medium
CN111428005A (en) Standard question and answer pair determining method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant