CN110516125A - Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string - Google Patents
Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string Download PDFInfo
- Publication number
- CN110516125A CN110516125A CN201910802851.0A CN201910802851A CN110516125A CN 110516125 A CN110516125 A CN 110516125A CN 201910802851 A CN201910802851 A CN 201910802851A CN 110516125 A CN110516125 A CN 110516125A
- Authority
- CN
- China
- Prior art keywords
- character string
- deep learning
- feature vector
- standardized
- learning feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
Abstract
Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string, which comprises obtain original character string and be respectively converted into corresponding picture and phonetic symbol string;The original character string, picture and phonetic symbol string are inputted respectively in the first deep learning model, the second deep learning model and third deep learning model, corresponding first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector are obtained;Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, the corresponding standardized character string of the original character string is determined;The standardized character string is matched with the character string in preset exception database, identifies the unusual character string in the standardized character string, and export recognition result.Above scheme realizes automatic identification unusual character string, promotes the efficiency of identification, improves accuracy and accuracy.
Description
Technical field
The present embodiments relate to technical field of data processing, more particularly to identify the method, apparatus of unusual character string, set
Standby and readable storage medium storing program for executing.
Background technique
Now, the daily too busy to get away internet of people, user can generate text in scenes such as shopping, chat, study and works
This content, often user can be subjective in writing process or unintentionally inputs anomalous content.In order to reduce these anomalous contents
Propagation, need to user input content identify, at present generally use two methods: 1, manual identified;2, canonical table
Up to formula match cognization.
However, user is steeply risen using the frequency of internet with the fast development of science and technology, need to expend more people
The content that power and time go identification abnormal, it is at high cost, speed is slow if relying solely on manual identified, the sea of internet can not be adapted to
Measure business data processing demand.And the matching of regular expression is by the content of text that will acquire and to be set as abnormal character
Similarity matching is carried out, identifies wherein abnormal text or symbol etc..But this method is low to deformed characters resolution, it is right
User is deliberately difficult to by the character string that deformed characters input.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of method, apparatus, equipment and readable storage for identifying unusual character string
Medium, may be implemented automatic identification unusual character string, promote the efficiency of unusual character string identification, improve identification accuracy and accurate
Degree.
The embodiment of the invention provides a kind of methods for identifying unusual character string, which comprises
Obtain original character string;The original character string is respectively converted into corresponding picture and phonetic symbol string;By the original
Beginning character string inputs in preset first deep learning model, obtains the first deep learning feature vector, the picture is inputted
In preset second deep learning model, the second deep learning feature vector is obtained, the phonetic symbol string is inputted into preset third
In deep learning model, third deep learning feature vector is obtained;Based on the first deep learning feature vector, the second depth
Learning characteristic vector sum third deep learning feature vector, determines the corresponding standardized character string of the original character string;By institute
It states standardized character string to be matched with the character string in preset exception database, identify in the standardized character string
Unusual character string;Export recognition result.
Further, described to be based on the first deep learning feature vector, the second deep learning feature vector and third
Deep learning feature vector determines the corresponding standardized character string of the original character string, comprising: fusion first depth
Feature vector, the second deep learning feature vector and third deep learning feature vector are practised, fusion feature vector is obtained;It will be described
Fusion feature vector inputs in preset 4th deep learning model, obtains the corresponding standardized character of the original character string
String.
Further, described to be based on the first deep learning feature vector, the second deep learning feature vector and third
Deep learning feature vector determines the corresponding standardized character string of the original character string, comprising: it is deep to be based respectively on described first
Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector are spent, the original character string is obtained
Corresponding first standardized character string, the second standardized character string and third standardized character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described
Unusual character string in standardized character string, comprising: by the first standardized character string, the second standardized character string and third
Standardized character string is matched with the character string in preset exception database respectively, identifies first standardized character
Unusual character string in string, the second standardized character string and third standardized character string.
Further, described to be based on the first deep learning feature vector, the second deep learning feature vector and third
Deep learning feature vector determines the corresponding standardized character string of the original character string, further includes: fusion first depth
Learning characteristic vector, the second deep learning feature vector and the third deep learning feature vector obtain fusion feature
Vector;The fusion feature vector is inputted in preset 4th deep learning model, it is corresponding to obtain the original character string
4th standardized character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described
Unusual character string in standardized character string, further includes: will be in the 4th standardized character string and preset exception database
Character string matched, identify the unusual character string in the 4th standardized character string.
Further, the fusion the first deep learning feature vector, the second deep learning feature vector and
The third deep learning feature vector, comprising: by the first deep learning feature vector, the second deep learning feature vector
It is connected with third deep learning feature vector head and the tail.
Further, the first deep learning model includes first circulation neural network model, second depth
Practising model includes convolutional neural networks model, and the third deep learning model includes second circulation neural network model.
Further, described that the original character string is converted into phonetic symbol string, comprising: the master based on the original character string
The original character string is converted to the corresponding phonetic symbol string of the principal language type by body language form.
The embodiment of the invention also provides a kind of devices for identifying unusual character string, and described device includes: original character string
Acquiring unit is suitable for obtaining original character string;First original character string converting unit, suitable for being converted to the original character string
Corresponding picture;Second original character string converting unit, suitable for the original character string is converted to corresponding phonetic symbol string;First
Deep learning unit is suitable for inputting the original character string in preset first deep learning model, obtains the first depth
Practise feature vector;Second deep learning unit is suitable for inputting the picture in preset second deep learning model, obtains the
Two deep learning feature vectors;Third deep learning unit is suitable for the phonetic symbol string inputting preset third deep learning mould
In type, third deep learning feature vector is obtained;Standardized character string generation unit is suitable for special according to first deep learning
Vector, the second deep learning feature vector and third deep learning feature vector are levied, determines the corresponding mark of the original character string
Standardization character string;Unusual character string recognition unit, suitable for by the word in the standardized character string and preset exception database
Symbol string is matched, and identifies the unusual character string in the standardized character string;As a result output unit is suitable for output identification knot
Fruit.
The embodiment of the invention also provides a kind of data processing equipments, including memory and processor;Wherein, the storage
Device is suitable for one or more computer instruction of storage, and the processor executes any of the above-described implementation when running the computer instruction
The step of example the method.
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described
The step of computer instruction executes any of the above-described embodiment the method when running.
Using the scheme of the identification unusual character string of the embodiment of the present invention, the original character string that first will acquire is respectively converted into
Then the original character string, picture and phonetic symbol string are inputted the first deep learning mould by corresponding picture and phonetic symbol string respectively
In type, the second deep learning model and third deep learning model, corresponding first deep learning feature vector, second are obtained deeply
Learning characteristic vector sum third deep learning feature vector is spent, later, deeply based on the first deep learning feature vector, second
Learning characteristic vector sum third deep learning feature vector is spent, determines the corresponding standardized character string of the original character string, and
The standardized character string is matched with the character string in preset exception database, that is, may recognize that the standardization word
Unusual character string in symbol string.Above-mentioned character string identification process, by the way that original character string is converted to picture and phonetic symbol string, then
Deep learning is carried out respectively, obtains corresponding feature vector, and the original character string is restored by the feature vector of multiple dimensions
Corresponding standardized character string, then unusual character string identification is carried out, the discrimination of deformed characters can be greatlyd improve, so as to
To improve the accuracy and accuracy of the identification of unusual character string.Also, entire identification process does not need manually to participate in and adjust, and
It is automatic identification, therefore the efficiency of unusual character string identification can be promoted, human cost is greatly reduced.
Further, by the first deep learning feature vector, the second deep learning feature vector and third depth
It practises feature vector to be merged, obtains fusion feature vector, then the fusion feature vector is inputted into the 4th deep learning model
In, the corresponding standardized character string of the original character string can be obtained, is then identified and is exported.Using the above scheme,
By the way that the original character string, picture and the corresponding feature vector of phonetic symbol string are carried out fusion and secondary deep study, Ke Yijin
One step deepens the connection between feature vector, obtains more accurate standardized character string, and the identification for improving unusual character string is wide
Degree and accuracy, enhancing identify the ability of unusual character string.
It is possible to further determine the first deep learning feature vector, the second deep learning feature vector, respectively
Three deep learning feature vectors and the corresponding standardization character string of the 4th deep learning feature vector, and first mark is identified simultaneously
Unusual character string in standardization character string, the second standardized character string, third standardized character string and the 4th standardized character string,
When, there are when unusual character string, with regard to output, there are the recognition results of unusual character string, real in wherein at least one quasi-ization character string
It identifies to existing various dimensions, the omission factor of unusual character string identification can be reduced.
Further, due in the original character string of input may include various spoken and written languages, number, even symbol,
So in the phonetic symbol string for being converted into original character string, based on the principal language type of the original character string, by original word
Symbol string is identified after being converted to corresponding phonetic symbol string, can expand the application range of unusual character string identification.
Detailed description of the invention
It, below will be to this specification embodiment or existing in order to illustrate more clearly of the technical solution of this specification embodiment
Attached drawing needed in technical description is briefly described, it should be apparent that, drawings described below is only this theory
Some embodiments of bright book for those of ordinary skill in the art without creative efforts, can be with
It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow chart for the method for identifying unusual character string in the embodiment of the present invention.
Fig. 2 is a kind of process of the method for the corresponding standardized character string of determining original character string in the embodiment of the present invention
Figure.
Fig. 3 is the flow chart of the method for another identification unusual character string in the embodiment of the present invention.
Fig. 4 is a kind of structural schematic diagram for the device for identifying unusual character string in the embodiment of the present invention.
Fig. 5 is a kind of structural schematic diagram of standardized character string generation unit in the embodiment of the present invention.
Fig. 6 is a kind of structural schematic diagram of unusual character string recognition unit in the embodiment of the present invention.
Fig. 7 is the structural schematic diagram of another standardized character string generation unit in the embodiment of the present invention.
Fig. 8 is the schematic diagram that original character string is converted to picture in the embodiment of the present invention.
Specific embodiment
As previously mentioned, the business datum of internet is huge at present, it is not only at high cost if relying solely on manual identified, and
Processing speed is slow.And the matched method of unusual character is carried out by regular expression, and it is low to deformed characters resolution, it can not be accurate
Identify all unusual characters.It is enjoyed for example, certain user registers new user by other cell-phone numbers on an application service platform
By preferential, the clothes on service platform then are informed with combine texts forms such as wrong word, letter and unordered symbols in remarks
Business side's real handset number;For another example, it advertises in comment on commodity for oneself shop, with wrong word, letter and unordered symbol
Etc. combine texts leave personal contact method.Thus, either manual identified or regular expression matching identifies, can not all expire
The data processing needs of the magnanimity business of the existing internet of foot.
In view of the above-mentioned problems, the embodiment of the invention provides a kind of method for identifying unusual character string, the original that first will acquire
Beginning character string is respectively converted into corresponding picture and phonetic symbol string, then, the original character string, picture and phonetic symbol string is distinguished defeated
Enter in the first deep learning model, the second deep learning model and third deep learning model, obtains corresponding first depth
Feature vector, the second deep learning feature vector and third deep learning feature vector are practised, later, is based on first depth
Feature vector, the second deep learning feature vector and third deep learning feature vector are practised, determines that the original character string is corresponding
Standardized character string, and the standardized character string is matched with the character string in preset exception database
Identify the unusual character string in the standardized character string.
To make those skilled in the art more fully understand design, implementation and the advantage of the embodiment of the present invention, with
Lower reference attached drawing, is described in detail by concrete application scene.
A kind of flow chart for the method identifying unusual character string in the embodiment of the present invention shown in referring to Fig.1, in the present invention
In embodiment, identification unusual character string can use following steps:
S11 obtains original character string.
In specific implementation, the original character string can derive from any desired identification unusual character string on internet
The data format of platform, the original character string is determined by the system coding of platform, wherein the system coding can use
Existing any character set encoding, such as ASCII coding, GB2312 coding, BIG5 coding, GB18030 coding etc.;It can also adopt
With customized character set encoding.By taking e-commerce platform as an example, user can be defeated in the remarks column or comment interface to place an order
Enter content of text, the content of text of the available user's input of e-commerce platform, as original character string.
The original character string is respectively converted into corresponding picture and phonetic symbol string by S12.
In specific implementation, the original character string can be converted into corresponding picture and phonetic symbol using various ways
String.
In an embodiment of the present invention, original character string can be converted to by the conversion regime of encoding and decoding corresponding black
White picture or color image.For example, carrying out the encoding and decoding of Base64 format to original character string, it is converted into corresponding picture.
For phonetic symbol string, in an embodiment of the present invention, the phonetic symbol table of comparisons can be preset in the database and is then passed through
The phonetic symbol table of comparisons is compared, original character string is converted into phonetic symbol string.Wherein, the phonetic symbol table of comparisons may include any principal language class
The contrast relationship of type and principal language phonetic symbol, for example, the contrast relationship of English alphabet and English phonetic symbol, number and English phonetic symbol
Contrast relationship, the contrast relationship of Chinese text and phonetic, symbol and contrast relationship of phonetic symbol etc., specifically can be according to the actual situation
Setting.
In addition, the conversion of special picture and phonetic symbol string can be set in order to simplify conversion process and shorten conversion time
Module or tool can also carry out the conversion of picture and phonetic symbol string using existing conversion tool.
S13 inputs the original character string in preset first deep learning model, obtains the first deep learning feature
Vector inputs the picture in preset second deep learning model, the second deep learning feature vector is obtained, by the sound
Mark string inputs in preset third deep learning model, obtains third deep learning feature vector.
In specific implementation, preset first deep learning model, the second deep learning model and third deep learning mould
Type may each comprise one or more neural network models for completing training, and the type of the model specifically used can be according to turning
The characteristics of changing data is selected and is arranged.
For example, the first deep learning model may include Recognition with Recurrent Neural Network (Recurrent Neural
Network, RNN) various models under system, for handling the lexical information and language of original character string in a period of time sequence
Adopted information, it is hereby achieved that the first deep learning feature vector comprising the information such as vocabulary and semanteme.
For another example, the second deep learning model may include convolutional neural networks (Convolutional Neural
Networks, CNN) various models under system, for handling the characteristic information of each part in picture, it is hereby achieved that packet
Second deep learning feature vector of the information of correlated characteristic containing character.Wherein, characteristic information may include text, symbol, letter
With the shape informations such as number.
For another example, the third deep learning model may include the various models under Recognition with Recurrent Neural Network system, be used for
The pronunciation information and semantic information of processing phonetic symbol string in a period of time sequence, it is hereby achieved that including the information such as vocabulary and semanteme
Third deep learning feature vector.
S14, it is special based on the first deep learning feature vector, the second deep learning feature vector and third deep learning
Vector is levied, determines the corresponding standardized character string of the original character string.
In specific implementation, the first deep learning feature vector and the second deep learning feature vector can be carried out
The reverse resolution of character shape, and the reverse resolution of character pronunciation is carried out to the third deep learning feature vector, Jin Erke
With the corresponding standardized character string of the determination original character string.For example, relevant standard character can be preset in the database
The shape table of comparisons and the standard character pronunciation table of comparisons, the first deep learning feature vector and the second deep learning feature vector
It is matched with the standard character shape table of comparisons, the third deep learning feature vector and the standard character pronunciation pair
It is matched according to table.Wherein, the standard character shape table of comparisons and the standard character pronunciation table of comparisons may be set according to actual conditions.
S15 matches the standardized character string with the character string in preset exception database, identifies described
Unusual character string in standardized character string.
In specific implementation, preset exception database may be set according to actual conditions.Due to obtaining original character
Go here and there corresponding standardized character string, it is more convenient to be matched with the character string in preset exception database.
S16 exports recognition result.
In specific implementation, have unusual character string if recognition result is, can according to presetting, can to user into
Row is reminded, and to avoid the generation of unusual character string, recognition result can also be exported to rear end monitoring personnel, is issued to it abnormal
Prompt, notes abnormalities, and execute corresponding processing operation in time convenient for monitoring personnel.
The method of identification unusual character string through the foregoing embodiment, by the way that original character string is converted to picture and phonetic symbol
String, then carries out deep learning respectively, obtains corresponding feature vector, multiple by original character string, picture and phonetic symbol string etc.
The feature vector of dimension restores the corresponding standardized character string of the original character string, then carries out unusual character string identification, can be with
The discrimination of deformed characters is greatlyd improve, so as to improve the accuracy and accuracy of the identification of unusual character string.Also, it is whole
A identification process does not need manually to participate in and adjust, but automatic identification, therefore can promote the efficiency of unusual character string identification,
Human cost is greatly reduced.
To more fully understand those skilled in the art and realizing the embodiment of the present invention, below by way of specific application scenarios
It is described in detail and how to identify unusual character string.
In an embodiment of the present invention, an application service platform needs to identify the exception that user leaves in original character string
Content is the data of email address.Assuming that the user inputs the content by Character deformation in comment or remarks are as follows: " 1. two
Three Ai Te qq.c0m".The system coding of the application service platform is encoded using ASCII, therefore can obtain " 1. 23 Ai Te
qq.The data of the corresponding ASCII hexadecimal code format of c0m " are as follows: " 2460 8d30 4e09 827e 7,279 0071
0071 3,002 0,063 0030 006d ", using space as separator, above-mentioned ASCII hexadecimal code data are as original
Character string.
Then, can be converted to the original character string by decoded conversion regime includes " 1. 23 Ai Te qq.
The picture of c0m " content uses Base64 encoding and decoding in the present embodiment, and the original character string is converted to corresponding picture, such as schemes
In 8 shown in picture 80.
Also, original character string can be converted to according to the preset phonetic symbol table of comparisons by corresponding phonetic symbol string, i.e. " yi er
san ai te kju:kju:ju hao si:ling em”。
As previously mentioned, deep learning model used by step s 13 can be used according to the data characteristics inputted
Corresponding neural network model.In the present embodiment, the first deep learning model may include first circulation neural network
Model, the second deep learning model may include convolutional neural networks model, and the third deep learning model can wrap
Include second circulation neural network model.
After above-mentioned data processing, the original character string is inputted into the first deep learning model, by circulation
After Processing with Neural Network export N1 tie up the first deep learning feature vector [Xi], wherein i=1,2,3 ... N1, N1 be not less than
1 natural number;Xi indicates the maximum probability that i-th of output is predicted according to the original character string, the numerical value of Xi [0,1] it
Between.
It is understood that being trained according to actual use scene using different training datas, available difference
First deep learning model of function.For example, the first deep learning model can be used in original character string described in screening
The interference data of syntax rule are not met, then the training data of available standard syntax, to the first deep learning model
It is trained.After completing training, the first deep learning model can carry out grammer screening processing to the data of input, then
The maximum probability array for the grammaticality predicted according to the original character string is exported, it is thus special as the first deep learning
Levy vector.
As previously mentioned, the picture is inputted into the second deep learning model, it is defeated after convolutional neural networks are handled
N2 ties up the second deep learning feature vector [Yi] out, wherein i=1,2,3 ... N2, N2 are the natural number not less than 1;Yi is indicated
The maximum probability exported according to i-th of the picture prediction, the numerical value of Yi is between [0,1].
It is understood that being trained according to actual use scene using different training datas, available difference
Second deep learning model of function, for example, the second deep learning model is used to extract the character string in the picture, then
The training data of available reference character string label is trained.After completing training, the second deep learning model can be with
Text string extracting processing is carried out to the picture of input, then exports the character string maximum probability array according to the picture prediction,
Thus it is used as the second deep learning feature vector.
As previously mentioned, the phonetic symbol string is inputted the third deep learning model, after convolutional neural networks are handled
Export N3 dimension third deep learning feature vector [Zi], wherein i=1,2,3 ... N3, N3 are the natural number not less than 1;Zi table
Show the maximum probability that i-th of output is predicted according to the phonetic symbol string, the numerical value of Zi is between [0,1].
It is understood that being trained according to actual use scene using different training datas, available difference
The third deep learning model of function, for example, the third deep learning model is not for meeting in phonetic symbol string described in screening
The interference data of phonetic symbol rule, then the training data of available mark phonetic symbol label is trained.After completing training, described the
Three deep learning models can carry out phonetic symbol rule screening processing to the phonetic symbol string of input, then export pre- according to the phonetic symbol string
Thus the phonetic symbol string maximum probability array of survey is used as third deep learning feature vector.
It later, can be according to preset relevant standard character shape in the database of application service platform place system
The shape table of comparisons and the standard character pronunciation table of comparisons, respectively to the first deep learning feature vector [Xi], the second deep learning
Feature vector [Yi] and third deep learning feature vector [Zi] carry out analytical reverse phase, so that it is corresponding to obtain the original character string
The first standardized character string, the second standardized character string and third standardized character string.
Wherein, the standard character shape table of comparisons may include text, symbol, letter, number standard shape etc. wherein extremely
Contrast relationship between a kind of few character and nonnegative number no more than 1, the standard character pronunciation table of comparisons may include text, symbol
Number, the contrast relationship between the wherein at least one such as standard pronunciation of letter, number and nonnegative number no more than 1.In addition, standard
The character shape table of comparisons can also include the contrast relationship between radical standard shape and nonnegative number no more than 1, standard
The character pronunciation table of comparisons can also be comprising between text, symbol, letter, the fuzzy readings of number and nonnegative number no more than 1
Contrast relationship.
Detailed process is as follows for analytical reverse phase:
1) the first deep learning feature vector [Xi] is matched with the standard character shape table of comparisons, it can be with
Identify warped digital " 1. " similar with digital " 1 " shape, and, punctuation mark similar with punctuation mark " " shape
".", therefore, the first standardized character string of acquisition are as follows: " 1 two three Ai Te qq.c0m ".
2) the second deep learning feature vector [Yi] is matched with the standard character shape table of comparisons, it can be with
Identify warped digital " 1. " similar with digital " 1 " shape, digital " 0 " similar with alphabetical " o " shape, with punctuation mark
The similar punctuation mark of " " shape ".", it might even be possible to identify the similar deformed letters " three " of digital " 3 " shape, the of acquisition
Two standardized character strings are as follows: " 1 two 3 Ai Te qq.com ".
3) the third deep learning feature vector [Zi] is matched with the standard character pronunciation table of comparisons, it can be with
Identify it is identical with punctuation mark " " (ju hao) pronunciation "." (ju hao), text identical with digital " 2 " (er) pronunciation
" two " (er), text " three " (san) identical with digital " 3 " (san) pronunciation, and it is identical with symbol "@" (ai te) pronunciation
Text " Ai Te " (ai te).The third standardized character string of acquisition are as follows: " 123@qq.c0m ".
Again by the first standardized character string, the second standardized character string and third standardized character string respectively with it is default
Exception database in character string matched, identify unusual character string therein, when the first standardized character string,
It is defeated when at least one standardized character string identifies unusual character string in second standardized character string and third standardized character string
There is the recognition result of unusual character string out.
For example, the first standardized character string and third standardized character string are respectively and in preset exception database
After character string is matched, the unidentified relevant unusual character string " qq.com " of mailbox out, still, second standardized character
String " 1 two 3 Ai Te qq.com " is matched with the character string in preset exception database, identifies the relevant abnormal word of mailbox
Symbol string " qq.com ".
Using the above scheme, the corresponding standardized character string of the original character string is restored by multiple feature vectors, then
The identification of unusual character string is carried out, the unusual character string that identification deforms in terms of character, picture and phonetic symbol three respectively.
In specific implementation, the unusual character string of identification deformation still may be used in terms of character, picture and phonetic symbol three respectively
Can have can not identify unusual character string, identify the problems such as unusual character string of mistake, for example, the mailbox of setting is relevant different
Normal character string is "@qq.com ", then the first to three standardized character string can not identify the relevant exception of mailbox respectively
Character string.For this purpose, can make further to extend and optimize to step S14, so that it is determined that standardized character string.Below by way of tool
Body embodiment is described in detail.
In embodiments of the present invention, referring to a kind of corresponding standardized character of the determination original character string shown in Fig. 2
The flow chart of the method for string, can specifically include following steps:
It is special to merge the first deep learning feature vector, the second deep learning feature vector and third deep learning by S21
Vector is levied, fusion feature vector is obtained.
In specific implementation, the first deep learning feature vector, the second deep learning feature vector and third are merged
The method of deep learning feature vector can be using following at least one mode:
1, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to
Amount head and the tail connect, and obtain N1+N2+N3 dimension fusion feature vector [Xi, Yi, Zi].
2, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to
Random combine is measured, N1+N2+N3 dimension fusion feature vector [Ri] is obtained, wherein Ri ∈ gathers { Xi, Yi, Zi }.
3, respectively that the first deep learning feature vector, the second deep learning feature vector and third deep learning is special
Sign vector carries out transposition and combines, and obtains N1+N2+N3 and ties up fusion feature vector [XiT,YiT,ZiT] or [Hi], wherein Hi ∈ collection
Close { XiT,YiT,ZiT}。
It is understood that the method for actual fused be not limited to it is above-mentioned several, can also according to other different dimensions, will
The first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector carry out at fusion
Reason.
The fusion feature vector is inputted in preset 4th deep learning model, obtains the original character string by S22
Corresponding 4th standardized character string.
Wherein, the preset 4th deep learning model can be using one or more neural network moulds for completing training
Type is for example, various models and multi-layer perception (MLP) (Multi Layer Perceptron, MLP) under RNN system, RNN model energy
The acquisition speed of lifting feature vector, the output accuracy rate of MLP energy lifting feature vector.
In specific implementation, the training set of the 4th deep learning model may include various modifications character shape and corresponding
The training data and various modifications character pronunciation of standard character shape and the training data of corresponding standard character pronunciation, the
After the completion of four deep learning models are by training set training, the fusion feature vector is inputted to the 4th depth for completing training
It practises in model, corresponding 4th standardized character string " 123@qq.com " is obtained by shape matching and pronunciation matching, it then, can
To match the 4th standardized character string with the character string in preset exception database, exception therein is identified
Character string "@qq.com " simultaneously exports recognition result.
In conjunction with above-described embodiment, as shown in figure 3, for the method for identifying unusual character string another in the embodiment of the present invention
Flow chart, method and step are as follows:
S31 obtains original character string.
Original character string is converted to picture by S32-1.
Original character string is converted to phonetic symbol string by S32-2.
Original character string is inputted the first deep learning model by S33-1.
Picture is inputted the second deep learning model by S33-2.
Phonetic symbol string is inputted third deep learning model by S33-3.
S34-1 obtains the first deep learning feature vector.
S34-2 obtains the second deep learning feature vector.
S34-3 obtains third deep learning feature vector.
S35, fusion first to third deep learning feature vector.
S36 is inputted the first of fusion to third deep learning feature vector in the 4th deep learning model.
S37 can obtain the 4th standardized character string after the 4th deep learning model treatment.
S38 identifies the unusual character string in the 4th standardized character string.
S39 exports recognition result.
Using the above scheme, by merging the original character string, picture and the corresponding feature vector of phonetic symbol string
Learn with secondary deep, can further deepen the connection between feature vector, obtains more accurate standardized character string, mention
The apprehension span of high unusual character string and accuracy, enhancing identify the ability of unusual character string.
In specific implementation, step S15 can also be made further to extend and optimize, so that it is determined that standardized character
String.It is described in detail below by way of specific embodiment.
It in embodiments of the present invention, can be by the first standardized character string, the second standardized character string, third standard
Change character string and the 4th standardized character string is matched with the character string in preset exception database respectively, as long as identifying
In the first standardized character string, the second standardized character string, third standardized character string and the 4th standardized character string extremely
A kind of less there are unusual character strings, and just there are the recognition results of unusual character string for output, identify with realizing various dimensions, can reduce
The omission factor of unusual character string identification.
In specific implementation, due in the original character string of input may comprising various spoken and written languages, number, even accord with
Number, so, it, will be former based on the principal language type of the original character string in the phonetic symbol string for being converted into original character string
Beginning character string is identified after being converted to corresponding phonetic symbol string, can expand the application range of unusual character string identification.
The embodiment of the invention also provides identify unusual character strings corresponding with the method for above-mentioned identification unusual character string
Device referring to the drawings, passes through specific implementation to more fully understand those skilled in the art and realizing the embodiment of the present invention
Example describes in detail.
Referring to the structural schematic diagram of the device for identifying unusual character string a kind of in the embodiment of the present invention shown in Fig. 4, at this
In inventive embodiments, the device 400 of the identification unusual character string may include:
Original character string acquiring unit 401 is suitable for obtaining original character string;
First original character string converting unit 402, suitable for the original character string is converted to corresponding picture;
Second original character string converting unit 403, suitable for the original character string is converted to corresponding phonetic symbol string;
First deep learning unit 404 is suitable for inputting the original character string in preset first deep learning model,
Obtain the first deep learning feature vector;
Second deep learning unit 405 is suitable for inputting the picture in preset second deep learning model, obtains the
Two deep learning feature vectors;
Third deep learning unit 406 is suitable for inputting the phonetic symbol string in preset third deep learning model, obtain
Third deep learning feature vector;
Standardized character string generation unit 407 is suitable for according to the first deep learning feature vector, the second deep learning
Feature vector and third deep learning feature vector determine the corresponding standardized character string of the original character string;
Unusual character string recognition unit 408, suitable for by the word in the standardized character string and preset exception database
Symbol string is matched, and identifies the unusual character string in the standardized character string;
As a result output unit 409 are suitable for output recognition result.
Using the above scheme, by the way that original character string is converted to picture and phonetic symbol string, deep learning is then carried out respectively,
Corresponding feature vector is obtained, the corresponding standardized character of the original character string is restored by the feature vector of multiple dimensions
String, then unusual character string identification is carried out, the discrimination of deformed characters can be greatlyd improve, so as to improve unusual character string
The accuracy and accuracy of identification.Also, entire identification process does not need manually to participate in and adjust, but automatic identification, therefore
The efficiency that the identification of unusual character string can be promoted, is greatly reduced human cost.
In an embodiment of the present invention, as shown in figure 5, the standardized character string generation unit 407 may include:
First standardized character concatenates into subelement 501, is suitable for obtaining institute according to the first deep learning feature vector
State the corresponding first standardized character string of original character string;
Second standardized character concatenates into subelement 502, is suitable for obtaining institute according to the second deep learning feature vector
State the corresponding second standardized character string of original character string;
Third standardized character concatenates into subelement 503, is suitable for obtaining institute according to the third deep learning feature vector
State the corresponding third standardized character string of original character string.
As shown in fig. 6, the unusual character string recognition unit 408 may include:
First unusual character string identifies subelement 601, is suitable for the first standardized character string and preset abnormal number
It is matched according to the character string in library, identifies the unusual character string in the first standardized character string;
Second unusual character string identifies subelement 602, is suitable for the second standardized character string and preset abnormal number
It is matched according to the character string in library, identifies the unusual character string in the second standardized character string;
Third unusual character string identifies subelement 603, is suitable for the third standardized character string and preset abnormal number
It is matched according to the character string in library, identifies the unusual character string in the third standardized character string.
In specific implementation, device 400 can also be made further to extend and optimize, so that it is determined that standardized character
String.It is described in detail below by way of specific embodiment.
In an embodiment of the present invention, can by the original character string, picture and the corresponding feature vector of phonetic symbol string into
Row fusion and secondary deep study, further deepen the connection between feature vector, are described further in conjunction with Fig. 4 and Fig. 7,
As shown in fig. 7, the standardized character string generation unit 407 may include:
Feature vector merges subelement 701, is suitable for merging the first deep learning feature vector, the second deep learning special
Vector sum third deep learning feature vector is levied, fusion feature vector is obtained.
Deep learning subelement 702 is suitable for inputting the fusion feature vector in preset 4th deep learning model,
Determine the corresponding 4th standardized character string of the original character string.
Then, the unusual character string recognition unit 408 can be by the 4th standardized character string and preset exception
Character string in database is matched, and identifies the unusual character string in the standardized character string, finally, the result is defeated
Unit 409 exports recognition result out.
Using the above scheme, by merging the original character string, picture and the corresponding feature vector of phonetic symbol string
Learn with secondary deep, can further deepen the connection between feature vector, obtains more accurate standardized character string, mention
The apprehension span of high unusual character string and accuracy, enhancing identify the ability of unusual character string.
In still another embodiment of the process, the first standardized character string, the second standardized character can be identified respectively
String, third standardized character string and the 4th standardized character string, are described further in conjunction with Fig. 4, Fig. 5 and Fig. 6.
As shown in figure 5, the standardized character string generation unit 407 except the first standardized character concatenate into subelement 501,
Second standardized character concatenates into subelement 502 and third standardized character is concatenated into outside subelement 503, can also include:
Feature vector merges subelement 701, is suitable for merging the first deep learning feature vector, the second deep learning special
Vector sum third deep learning feature vector is levied, fusion feature vector is obtained.
Deep learning subelement 702 is suitable for inputting the fusion feature vector in preset 4th deep learning model,
Determine the corresponding 4th standardized character string of the original character string.
As shown in fig. 6, the unusual character string recognition unit 408 identifies subelement 601, second except the first unusual character string
Unusual character string identifies outside subelement 602 and third unusual character string identification subelement 603, can also include:
4th unusual character string identifies subelement 604, is suitable for the 4th standardized character string and preset abnormal number
It is matched according to the character string in library, identifies the unusual character string in the 4th standardized character string.
In specific implementation, by the first standardized character string, the second standardized character string, third standardized character string
It is matched respectively with the character string in preset exception database with the 4th standardized character string, as long as identifying described first
At least one of standardized character string, the second standardized character string, third standardized character string and the 4th standardized character string are deposited
In unusual character string, with regard to output, there are the recognition results of unusual character string, identify with realizing various dimensions, can reduce unusual character
The omission factor of string identification.
In specific implementation, the first deep learning feature vector, the second deep learning feature vector and third are merged
The method of deep learning feature vector may include following at least one:
1, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to
Amount head and the tail connect.
2, by the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to
Measure random combine.
3, respectively that the first deep learning feature vector, the second deep learning feature vector and third deep learning is special
Sign vector carries out transposition and combines.
It is understood that the method for actual fused be not limited to it is above-mentioned several, can also according to other different dimensions, will
The first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector are handled.
In specific implementation, first to fourth described preset deep learning model can use one or more neural networks
Model training forms.Wherein, the first deep learning model may include first circulation neural network model, and described second is deep
Spending learning model may include convolutional neural networks model, and the third deep learning model may include second circulation nerve net
Network model, the preset 4th deep learning model may include Recognition with Recurrent Neural Network model and convolutional neural networks model.
In specific implementation, due in the original character string of input may comprising various spoken and written languages, number, even accord with
Number, so, in the phonetic symbol string for being converted into original character string, the second original character string converting unit is based on described original
The principal language type of character string is known after original character string is converted to the corresponding phonetic symbol string of the principal language type
Not, the application range of unusual character string identification can be expanded.
The embodiment of the invention also provides a kind of data processing equipment, including memory and processor, on the memory
It is stored with the computer instruction that can be run on the processor, the processor can execute when running the computer instruction
Described in any of the above-described embodiment of the present invention the step of method of identification unusual character string.The computer instruction executes when running
Identification unusual character string method specific implementation be referred in above-described embodiment identification unusual character string method step
Suddenly, it repeats no more.
The data processing equipment can be handheld terminals, tablet computer, the personal desktop computers such as mobile phone etc..
The embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer instruction, described
Computer instruction can execute the step of any of the above-described embodiment method of the invention when running.
Wherein, the computer readable storage medium can be CD, mechanical hard disk, solid state hard disk etc. it is various it is appropriate can
Read storage medium.The method of the identification unusual character string of the instruction execution stored on the computer readable storage medium, specifically
The embodiment that can refer to the method for above-mentioned each identification unusual character string, repeats no more.
To sum up, the embodiment of the invention discloses A1 embodiment, a method of identification unusual character string, comprising:
Obtain original character string;
The original character string is respectively converted into corresponding picture and phonetic symbol string;
The original character string is inputted in preset first deep learning model, obtain the first deep learning feature to
Amount inputs the picture in preset second deep learning model, the second deep learning feature vector is obtained, by the phonetic symbol
String inputs in preset third deep learning model, obtains third deep learning feature vector;
Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to
Amount, determines the corresponding standardized character string of the original character string;
The standardized character string is matched with the character string in preset exception database, identifies the standard
Change the unusual character string in character string;
Export recognition result.
The embodiment of the invention discloses A2 embodiments, the method for the identification unusual character string as described in A1 embodiment, described
Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, determine
The corresponding standardized character string of the original character string, comprising:
Merge the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to
Amount obtains fusion feature vector;
The fusion feature vector is inputted in preset 4th deep learning model, it is corresponding to obtain the original character string
Standardized character string.
The embodiment of the invention discloses A3 embodiments, the method for the identification unusual character string as described in A1 embodiment, described
Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, determine
The corresponding standardized character string of the original character string, comprising:
It is special to be based respectively on the first deep learning feature vector, the second deep learning feature vector and third deep learning
Vector is levied, the corresponding first standardized character string of the original character string, the second standardized character string and third standardization are obtained
Character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described
Unusual character string in standardized character string, comprising:
By the first standardized character string, the second standardized character string and third standardized character string respectively with it is preset
Character string in exception database is matched, and identifies the first standardized character string, the second standardized character string and
Unusual character string in three standardized character strings.
The embodiment of the invention discloses A4 embodiments, the method for the identification unusual character string as described in A3 embodiment, described
Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector, determine
The corresponding standardized character string of the original character string, further includes:
Merge the first deep learning feature vector, the second deep learning feature vector and the third depth
Feature vector is practised, fusion feature vector is obtained;
The fusion feature vector is inputted in preset 4th deep learning model, it is corresponding to obtain the original character string
The 4th standardized character string;
It is described to match the standardized character string with the character string in preset exception database, it identifies described
Unusual character string in standardized character string, further includes:
The 4th standardized character string is matched with the character string in preset exception database, is identified described
Unusual character string in 4th standardized character string.
The embodiment of the invention discloses A5 embodiments, the identification unusual character string as described in A2 embodiment or A4 embodiment
Method, it is described to merge the first deep learning feature vector, the second deep learning feature vector and the third depth
Learning characteristic vector, comprising:
By the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector
Head and the tail connect.
The embodiment of the invention discloses A6 embodiments, the method for the identification unusual character string as described in A1 embodiment, described
First deep learning model includes first circulation neural network model, and the second deep learning model includes convolutional neural networks
Model, the third deep learning model include second circulation neural network model.
The embodiment of the invention discloses A7 embodiment, the identification as described in any one of A1 to A4 embodiment or A6 embodiment is different
The method of normal character string, it is described that the original character string is converted into phonetic symbol string, comprising:
Based on the principal language type of the original character string, the original character string is converted into the principal language class
The corresponding phonetic symbol string of type.
The embodiment of the invention discloses B1 embodiment, a kind of device identifying unusual character string, comprising:
Original character string acquiring unit is suitable for obtaining original character string;
First original character string converting unit, suitable for the original character string is converted to corresponding picture;
Second original character string converting unit, suitable for the original character string is converted to corresponding phonetic symbol string;
First deep learning unit is suitable for inputting the original character string in preset first deep learning model, obtain
Obtain the first deep learning feature vector;
Second deep learning unit is suitable for inputting the picture in preset second deep learning model, obtains second
Deep learning feature vector;
Third deep learning unit is suitable for inputting the phonetic symbol string in preset third deep learning model, obtains the
Three deep learning feature vectors;
Standardized character string generation unit is suitable for special according to the first deep learning feature vector, the second deep learning
Vector sum third deep learning feature vector is levied, determines the corresponding standardized character string of the original character string;
Unusual character string recognition unit, suitable for by the character string in the standardized character string and preset exception database
It is matched, identifies the unusual character string in the standardized character string;
As a result output unit is suitable for output recognition result.
The embodiment of the invention discloses B2 embodiments, the device of the identification unusual character string as described in B1 embodiment, described
Standardized character string generation unit includes:
Feature vector merges subelement, is suitable for merging the first deep learning feature vector, the second deep learning feature
Vector sum third deep learning feature vector obtains fusion feature vector;
Deep learning subelement is suitable for inputting the fusion feature vector in preset 4th deep learning model, really
Determine the corresponding standardized character string of the original character string.
The embodiment of the invention discloses B3 embodiments, the device of the identification unusual character string as described in B1 embodiment, described
Standardized character string generation unit includes:
First standardized character concatenates into subelement, is suitable for being obtained described according to the first deep learning feature vector
The corresponding first standardized character string of original character string;
Second standardized character concatenates into subelement, is suitable for being obtained described according to the second deep learning feature vector
The corresponding second standardized character string of original character string;
Third standardized character concatenates into subelement, is suitable for being obtained described according to the third deep learning feature vector
The corresponding third standardized character string of original character string;
The unusual character string recognition unit includes:
First unusual character string identifies subelement, is suitable for the first standardized character string and preset exception database
In character string matched, identify the unusual character string in the first standardized character string;
Second unusual character string identifies subelement, is suitable for the second standardized character string and preset exception database
In character string matched, identify the unusual character string in the second standardized character string;
Third unusual character string identifies subelement, is suitable for the third standardized character string and preset exception database
In character string matched, identify the unusual character string in the third standardized character string.
The embodiment of the invention discloses B4 embodiments, the device of the identification unusual character string as described in B3 embodiment, described
Standardized character string generation unit further include:
Feature vector merges subelement, is suitable for merging the first deep learning feature vector, the second deep learning feature
Vector sum third deep learning feature vector obtains fusion feature vector;
Deep learning subelement is suitable for inputting the fusion feature vector in preset 4th deep learning model, really
Determine the corresponding 4th standardized character string of the original character string;
The unusual character string recognition unit further include:
4th unusual character string identifies subelement, is suitable for the 4th standardized character string and preset exception database
In character string matched, identify the unusual character string in the 4th standardized character string.
The embodiment of the invention discloses B5 embodiments, the dress of the identification unusual character string as described in B2 or B4 any embodiment
It sets, described eigenvector merges subelement, is suitable for the first deep learning feature vector, the second deep learning feature vector
It is connected with third deep learning feature vector head and the tail.
The embodiment of the invention discloses B6 embodiments, the device of the identification unusual character string as described in B1 embodiment, described
First deep learning model includes first circulation neural network model, and the second deep learning model includes convolutional neural networks
Model, the third deep learning model include second circulation neural network model.
The embodiment of the invention discloses B7 embodiments, and the identification as described in B1 to B4 any embodiment or B6 embodiment is abnormal
The device of character string, the second original character string converting unit, suitable for the principal language type according to the original character string,
The original character string is converted into the corresponding phonetic symbol string of the principal language type.
The embodiment of the invention discloses C1 embodiment, a kind of data processing equipment, including memory and processor;Wherein,
The memory is suitable for one or more computer instruction of storage, and the processor executes A1 extremely when running the computer instruction
The step of A7 any embodiment the method.
The embodiment of the invention discloses D1 embodiment, a kind of computer readable storage medium is stored thereon with computer and refers to
It enables, the step of A1 is to A7 any embodiment the method is executed when the computer instruction is run.
Although present disclosure is as above, present invention is not limited to this.Anyone skilled in the art are not departing from this
It in the spirit and scope of invention, can make various changes or modifications, therefore protection scope of the present invention should be with claim institute
Subject to the range of restriction.
Claims (10)
1. a kind of method for identifying unusual character string characterized by comprising
Obtain original character string;
The original character string is respectively converted into corresponding picture and phonetic symbol string;
The original character string is inputted in preset first deep learning model, the first deep learning feature vector is obtained, it will
The picture inputs in preset second deep learning model, obtains the second deep learning feature vector, and the phonetic symbol string is defeated
Enter in preset third deep learning model, obtains third deep learning feature vector;
Based on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector,
Determine the corresponding standardized character string of the original character string;
The standardized character string is matched with the character string in preset exception database, identifies the standardization word
Unusual character string in symbol string;
Export recognition result.
2. the method for identification unusual character string according to claim 1, which is characterized in that described to be based on first depth
Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector, determine the original character string pair
The standardized character string answered, comprising:
The first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector are merged,
Obtain fusion feature vector;
The fusion feature vector is inputted in preset 4th deep learning model, the corresponding mark of the original character string is obtained
Standardization character string.
3. the method for identification unusual character string according to claim 1, which is characterized in that described to be based on first depth
Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector, determine the original character string pair
The standardized character string answered, comprising:
Be based respectively on the first deep learning feature vector, the second deep learning feature vector and third deep learning feature to
Amount, obtains the corresponding first standardized character string of the original character string, the second standardized character string and third standardized character
String;
It is described to match the standardized character string with the character string in preset exception database, identify the standard
Change the unusual character string in character string, comprising:
By the first standardized character string, the second standardized character string and third standardized character string respectively with preset exception
Character string in database is matched, and identifies the first standardized character string, the second standardized character string and third mark
Unusual character string in standardization character string.
4. the method for identification unusual character string according to claim 3, which is characterized in that described to be based on first depth
Learning characteristic vector, the second deep learning feature vector and third deep learning feature vector, determine the original character string pair
The standardized character string answered, further includes:
It is special to merge the first deep learning feature vector, the second deep learning feature vector and the third deep learning
Vector is levied, fusion feature vector is obtained;
The fusion feature vector is inputted in preset 4th deep learning model, the original character string corresponding the is obtained
Four standardized character strings;
It is described to match the standardized character string with the character string in preset exception database, identify the standard
Change the unusual character string in character string, further includes:
The 4th standardized character string is matched with the character string in preset exception database, identifies the described 4th
Unusual character string in standardized character string.
5. the method for identification unusual character string according to claim 2 or 4, which is characterized in that the fusion described first
Deep learning feature vector, the second deep learning feature vector and the third deep learning feature vector, comprising:
By the first deep learning feature vector, the second deep learning feature vector and third deep learning feature vector head and the tail
Connection.
6. the method for identification unusual character string according to claim 1, which is characterized in that the first deep learning model
Including first circulation neural network model, the second deep learning model includes convolutional neural networks model, and the third is deep
Spending learning model includes second circulation neural network model.
7. according to claim 1 to the method for any one of 4 or as claimed in claim 6 identification unusual character strings, which is characterized in that
It is described that the original character string is converted into phonetic symbol string, comprising:
Based on the principal language type of the original character string, the original character string is converted into the principal language type pair
The phonetic symbol string answered.
8. a kind of device for identifying unusual character string characterized by comprising
Original character string acquiring unit is suitable for obtaining original character string;
First original character string converting unit, suitable for the original character string is converted to corresponding picture;
Second original character string converting unit, suitable for the original character string is converted to corresponding phonetic symbol string;
First deep learning unit is suitable for inputting the original character string in preset first deep learning model, obtains the
One deep learning feature vector;
Second deep learning unit is suitable for inputting the picture in preset second deep learning model, obtains the second depth
Learning characteristic vector;
Third deep learning unit is suitable for inputting the phonetic symbol string in preset third deep learning model, it is deep to obtain third
Spend learning characteristic vector;
Standardized character string generation unit, be suitable for according to the first deep learning feature vector, the second deep learning feature to
Amount and third deep learning feature vector, determine the corresponding standardized character string of the original character string;
Unusual character string recognition unit, suitable for carrying out the character string in the standardized character string and preset exception database
Matching, identifies the unusual character string in the standardized character string;
As a result output unit is suitable for output recognition result.
9. a kind of data processing equipment, including memory and processor;Wherein, the memory is suitable for one or more meter of storage
Calculation machine instruction, which is characterized in that perform claim requires described in 1 to 7 any one when the processor runs the computer instruction
The step of method.
10. a kind of computer readable storage medium, is stored thereon with computer instruction, which is characterized in that the computer instruction
Perform claim requires the step of any one of 1 to 7 the method when operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910802851.0A CN110516125B (en) | 2019-08-28 | 2019-08-28 | Method, device and equipment for identifying abnormal character string and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910802851.0A CN110516125B (en) | 2019-08-28 | 2019-08-28 | Method, device and equipment for identifying abnormal character string and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110516125A true CN110516125A (en) | 2019-11-29 |
CN110516125B CN110516125B (en) | 2020-05-08 |
Family
ID=68628417
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910802851.0A Active CN110516125B (en) | 2019-08-28 | 2019-08-28 | Method, device and equipment for identifying abnormal character string and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110516125B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113382000A (en) * | 2021-06-09 | 2021-09-10 | 北京天融信网络安全技术有限公司 | UA character string anomaly detection method, device, equipment and medium |
CN113792820A (en) * | 2021-11-15 | 2021-12-14 | 航天宏康智能科技(北京)有限公司 | Countermeasure training method and device for user behavior log anomaly detection model |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03141484A (en) * | 1989-10-26 | 1991-06-17 | Nec Corp | Method and device for segmenting character |
CN103000176A (en) * | 2012-12-28 | 2013-03-27 | 安徽科大讯飞信息科技股份有限公司 | Speech recognition method and system |
CN107633343A (en) * | 2017-08-09 | 2018-01-26 | 杭州洋驼网络科技有限公司 | Transaction data changes risk recognition system and method |
CN108108732A (en) * | 2016-11-25 | 2018-06-01 | 财团法人工业技术研究院 | Character recognition system and character recognition method thereof |
CN109117848A (en) * | 2018-09-07 | 2019-01-01 | 泰康保险集团股份有限公司 | A kind of line of text character identifying method, device, medium and electronic equipment |
CN109460461A (en) * | 2018-11-13 | 2019-03-12 | 苏州思必驰信息科技有限公司 | Text matching technique and system based on text similarity model |
CN109522558A (en) * | 2018-11-21 | 2019-03-26 | 金现代信息产业股份有限公司 | A kind of Chinese wrongly written character bearing calibration based on deep learning |
CN109547455A (en) * | 2018-12-06 | 2019-03-29 | 南京邮电大学 | Industrial Internet of Things anomaly detection method, readable storage medium storing program for executing and terminal |
CN109739370A (en) * | 2019-01-10 | 2019-05-10 | 北京帝派智能科技有限公司 | A kind of language model training method, method for inputting pinyin and device |
CN109753987A (en) * | 2018-04-18 | 2019-05-14 | 新华三信息安全技术有限公司 | File identification method and feature extracting method |
CN109816118A (en) * | 2019-01-25 | 2019-05-28 | 上海深杳智能科技有限公司 | A kind of method and terminal of the creation structured document based on deep learning model |
CN110083819A (en) * | 2018-01-26 | 2019-08-02 | 北京京东尚科信息技术有限公司 | Spell error correction method, device, medium and electronic equipment |
CN110110577A (en) * | 2019-01-22 | 2019-08-09 | 口碑(上海)信息技术有限公司 | Identify method and device, the storage medium, electronic device of name of the dish |
CN110135414A (en) * | 2019-05-16 | 2019-08-16 | 京北方信息技术股份有限公司 | Corpus update method, device, storage medium and terminal |
CN110135261A (en) * | 2019-04-15 | 2019-08-16 | 北京易华录信息技术股份有限公司 | A kind of method and system of trained road anomalous identification model, road anomalous identification |
-
2019
- 2019-08-28 CN CN201910802851.0A patent/CN110516125B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH03141484A (en) * | 1989-10-26 | 1991-06-17 | Nec Corp | Method and device for segmenting character |
CN103000176A (en) * | 2012-12-28 | 2013-03-27 | 安徽科大讯飞信息科技股份有限公司 | Speech recognition method and system |
CN108108732A (en) * | 2016-11-25 | 2018-06-01 | 财团法人工业技术研究院 | Character recognition system and character recognition method thereof |
CN107633343A (en) * | 2017-08-09 | 2018-01-26 | 杭州洋驼网络科技有限公司 | Transaction data changes risk recognition system and method |
CN110083819A (en) * | 2018-01-26 | 2019-08-02 | 北京京东尚科信息技术有限公司 | Spell error correction method, device, medium and electronic equipment |
CN109753987A (en) * | 2018-04-18 | 2019-05-14 | 新华三信息安全技术有限公司 | File identification method and feature extracting method |
CN109117848A (en) * | 2018-09-07 | 2019-01-01 | 泰康保险集团股份有限公司 | A kind of line of text character identifying method, device, medium and electronic equipment |
CN109460461A (en) * | 2018-11-13 | 2019-03-12 | 苏州思必驰信息科技有限公司 | Text matching technique and system based on text similarity model |
CN109522558A (en) * | 2018-11-21 | 2019-03-26 | 金现代信息产业股份有限公司 | A kind of Chinese wrongly written character bearing calibration based on deep learning |
CN109547455A (en) * | 2018-12-06 | 2019-03-29 | 南京邮电大学 | Industrial Internet of Things anomaly detection method, readable storage medium storing program for executing and terminal |
CN109739370A (en) * | 2019-01-10 | 2019-05-10 | 北京帝派智能科技有限公司 | A kind of language model training method, method for inputting pinyin and device |
CN110110577A (en) * | 2019-01-22 | 2019-08-09 | 口碑(上海)信息技术有限公司 | Identify method and device, the storage medium, electronic device of name of the dish |
CN109816118A (en) * | 2019-01-25 | 2019-05-28 | 上海深杳智能科技有限公司 | A kind of method and terminal of the creation structured document based on deep learning model |
CN110135261A (en) * | 2019-04-15 | 2019-08-16 | 北京易华录信息技术股份有限公司 | A kind of method and system of trained road anomalous identification model, road anomalous identification |
CN110135414A (en) * | 2019-05-16 | 2019-08-16 | 京北方信息技术股份有限公司 | Corpus update method, device, storage medium and terminal |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113382000A (en) * | 2021-06-09 | 2021-09-10 | 北京天融信网络安全技术有限公司 | UA character string anomaly detection method, device, equipment and medium |
CN113792820A (en) * | 2021-11-15 | 2021-12-14 | 航天宏康智能科技(北京)有限公司 | Countermeasure training method and device for user behavior log anomaly detection model |
CN113792820B (en) * | 2021-11-15 | 2022-02-08 | 航天宏康智能科技(北京)有限公司 | Countermeasure training method and device for user behavior log anomaly detection model |
Also Published As
Publication number | Publication date |
---|---|
CN110516125B (en) | 2020-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11386271B2 (en) | Mathematical processing method, apparatus and device for text problem, and storage medium | |
CN108959246B (en) | Answer selection method and device based on improved attention mechanism and electronic equipment | |
CN107220235B (en) | Speech recognition error correction method and device based on artificial intelligence and storage medium | |
WO2020186778A1 (en) | Error word correction method and device, computer device, and storage medium | |
CN110288980A (en) | Audio recognition method, the training method of model, device, equipment and storage medium | |
CN113313022B (en) | Training method of character recognition model and method for recognizing characters in image | |
CN107679032A (en) | Voice changes error correction method and device | |
CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
CN111160041B (en) | Semantic understanding method and device, electronic equipment and storage medium | |
CN113205817A (en) | Speech semantic recognition method, system, device and medium | |
CN113158687A (en) | Semantic disambiguation method and device, storage medium and electronic device | |
CN110516125A (en) | Identify method, apparatus, equipment and the readable storage medium storing program for executing of unusual character string | |
CN113255331B (en) | Text error correction method, device and storage medium | |
CN112966476B (en) | Text processing method and device, electronic equipment and storage medium | |
CN113918031A (en) | System and method for Chinese punctuation recovery using sub-character information | |
CN113268989A (en) | Polyphone processing method and device | |
CN113642569A (en) | Unstructured data document processing method and related equipment | |
CN112765330A (en) | Text data processing method and device, electronic equipment and storage medium | |
US20230153550A1 (en) | Machine Translation Method and Apparatus, Device and Storage Medium | |
CN113345409B (en) | Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium | |
CN111401069A (en) | Intention recognition method and intention recognition device for conversation text and terminal | |
CN110245331A (en) | A kind of sentence conversion method, device, server and computer storage medium | |
CN112818688B (en) | Text processing method, device, equipment and storage medium | |
WO2022141855A1 (en) | Text regularization method and apparatus, and electronic device and storage medium | |
CN111428005A (en) | Standard question and answer pair determining method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |