CN109241523A - Recognition methods, device and the equipment of variant cheating field - Google Patents
Recognition methods, device and the equipment of variant cheating field Download PDFInfo
- Publication number
- CN109241523A CN109241523A CN201810907161.7A CN201810907161A CN109241523A CN 109241523 A CN109241523 A CN 109241523A CN 201810907161 A CN201810907161 A CN 201810907161A CN 109241523 A CN109241523 A CN 109241523A
- Authority
- CN
- China
- Prior art keywords
- variant
- character
- digital number
- text
- paragraph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses recognition methods, device and the equipment of a kind of variant cheating field, wherein method includes: to obtain text to be identified;Digital number paragraph is extracted from text to be identified;The conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching;If being matched to introducer, judge digital number paragraph for variant cheating field;If not being matched to introducer, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature;If score value is greater than preset threshold, judge digital number paragraph for variant cheating field.Solve the problems, such as cannot to identify as a result, discontinuous digital segment, without introducer can not match cognization variant cheating field, improve the accuracy rate of variant cheating field identification.
Description
Technical field
The present invention relates to the recognition methods of Internet technical field more particularly to a kind of variant cheating field, device and set
It is standby.
Background technique
With the fast development of Internet technology, network has become the major way of people's communication exchange, release information.
However, will appear the variant cheating field with contact method on internet often, such as " ← → common vetch ← → 199 ← → 2638
← → 723 ← → " " jia dimension → I0230 fires Hou 66183 " etc..
In the related technology, scheme one is identified according to matching result and is become by introducers such as matchings " wechat, phone, mail "
Body cheating field.Program accuracy rate is lower, for example " I recognizes him in wechat " " he has high prestige " can be identified as variant
Cheating field, and when matching is less than introducer, it can not identify variant cheating field.Scheme two passes through WeChat ID code, phone
The regular expression matching text fragment that number, url are linked, if corresponding digital section can be matched to, which is known
It Wei not variant cheating field.Program accuracy rate is lower, and can not match discontinuous digital segment.
Summary of the invention
The embodiment of the present invention is intended to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the embodiment of the present invention is to propose a kind of recognition methods of variant cheating field, with solution
Cannot certainly identify in the related technology discontinuous digital segment, without introducer can not match cognization variant cheating field the problem of,
Improve the accuracy rate of variant cheating field identification.
Second purpose of the embodiment of the present invention is to propose a kind of identification device of variant cheating field.
The third purpose of the embodiment of the present invention is to propose a kind of computer equipment.
4th purpose of the embodiment of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
In order to achieve the above object, first aspect present invention embodiment proposes a kind of recognition methods of variant cheating field, packet
It includes:
Obtain text to be identified;
Digital number paragraph is extracted from the text to be identified;
The conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching;
If being matched to introducer, judge the digital number paragraph for variant cheating field;
If not being matched to introducer, variant feature is extracted from the digital number paragraph, and according to the variant
Feature is given a mark to generate score value;
If the score value is greater than preset threshold, judge the digital number paragraph for variant cheating field.
The recognition methods of the variant cheating field of the embodiment of the present invention, obtains text to be identified first, and from text to be identified
Digital number paragraph is extracted in this.In turn, the conversion of variant word is carried out to the text in digital number paragraph and carries out introducer
Match, and when being matched to introducer, judges digital number paragraph for variant cheating field.Further, it be not matched to guidance
When word, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature, further existed
When score value is greater than preset threshold, judge digital number paragraph for variant cheating field.It, can be from text to be identified in the present embodiment
The digital number paragraph including discontinuous digital segment is extracted in this, discontinuous number cannot be identified in the related technology by solving
The problem of segment.When being not matched to introducer, by extracting variant feature from digital number paragraph, and according to variant spy
Sign is given a mark to generate score value, to can also identify variant cheating according to score value when being not matched to introducer
Field.Also, by way of being given a mark in conjunction with variant feature, the variant cheating field recognition strategy of regularization is realized,
Improve the accuracy rate of algorithm and the accuracy rate of variant cheating field identification.
In addition, the recognition methods of variant cheating field according to the above embodiment of the present invention can also have additional skill as follows
Art feature:
Optionally, described before extracting digital number paragraph in the text to be identified, further includes: to described wait know
Other text carries out digital variations normalization.
Optionally, described that the text to be identified is carried out digital variations to normalize including: according to variant contact method number
Digital variations normalization is carried out to the text to be identified according to library.
Optionally, the recognition methods of the variant cheating field further include:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding number
The corresponding number of word picture.
It is optionally, described that digital number paragraph is extracted from the text to be identified, comprising:
Obtain the position of the spcial character in the text to be identified;
Extract the character string for meeting preset rules before and after the spcial character, and by the spcial character and the special word
The digital number paragraph is added in the character string that symbol front and back meets preset rules.
Optionally, the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the number is added in the character between the spcial character and the character at the preset characters interval
Font size code paragraph.
Optionally, the recognition methods of the variant cheating field further include:
Remove the interference symbol in the digital number paragraph.
Optionally, described that variant feature is extracted from the digital number paragraph, and beaten according to the variant feature
Divide to generate score value and includes:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
In order to achieve the above object, second aspect of the present invention embodiment proposes a kind of identification device of variant cheating field, packet
It includes:
Module is obtained, for obtaining text to be identified;
Extraction module, for extracting digital number paragraph from the text to be identified;
Matching module, for carrying out the conversion of variant word to the text in digital number paragraph and carrying out introducer matching;
First judgment module, if judging the digital number paragraph for variant cheating word for being matched to introducer
Section;
Grading module, if variant feature is extracted from the digital number paragraph for not being matched to introducer, and
It is given a mark according to the variant feature to generate score value;
Second judgment module judges that the digital number paragraph is if being greater than preset threshold for the score value
Variant cheating field.
The identification device of the variant cheating field of the embodiment of the present invention, by obtaining text to be identified, and then to be identified
Digital number paragraph is extracted in text, and the conversion of variant word further is carried out to the text in digital number paragraph and carries out introducer
Matching judges that digital number paragraph is variant cheating field and when being matched to introducer, when not being matched to introducer, from
Variant feature is extracted in digital number paragraph, and is given a mark according to variant feature to generate score value, further in score value
When greater than preset threshold, judge digital number paragraph for variant cheating field.Solve as a result, cannot identify in the related technology it is non-
Continuous number segment, without introducer can not match cognization variant cheating field the problem of, improve variant cheating field identification
Accuracy rate.
In addition, the identification device of variant cheating field according to the above embodiment of the present invention can also have additional skill as follows
Art feature:
Optionally, the identification device of variant cheating field further include: conversion module, for the text to be identified
This progress digital variations normalization.
Optionally, the conversion module is specifically used for: according to variant contact method database to the text to be identified into
The normalization of row digital variations.
Optionally, the conversion module is specifically used for:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding number
The corresponding number of word picture.
Optionally, the extraction module is specifically used for:
Obtain the position of the spcial character in the text to be identified;
Extract the character string for meeting preset rules before and after the spcial character, and by the spcial character and the special word
The digital number paragraph is added in the character string that symbol front and back meets preset rules.
Optionally, the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the number is added in the character between the spcial character and the character at the preset characters interval
Font size code paragraph.
Optionally, the identification device of variant cheating field further include: processing module, for removing the digital number
Interference symbol in code paragraph.
Optionally, institute's scoring module is specifically used for:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
In order to achieve the above object, third aspect present invention embodiment proposes a kind of computer equipment, including processor and deposit
Reservoir;Wherein, the processor is held to run with described by reading the executable program code stored in the memory
The corresponding program of line program code, with the recognition methods for realizing the variant cheating field as described in first aspect embodiment.
In order to achieve the above object, fourth aspect present invention embodiment proposes a kind of non-transitory computer-readable storage medium
Matter is stored thereon with computer program, which is characterized in that realizes when the program is executed by processor such as first aspect embodiment institute
The recognition methods for the variant cheating field stated.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the recognition methods of variant cheating field provided by the embodiment of the present invention;
Fig. 2 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention;
Fig. 3 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the identification device of variant cheating field provided by the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of the identification device of the cheating field of another kind variant provided by the embodiment of the present invention;
Fig. 6 shows the block diagram for being suitable for the exemplary computer device for being used to realize the embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings recognition methods, device and the equipment of the variant cheating field of the embodiment of the present invention are described.
Fig. 1 is a kind of flow diagram of the recognition methods of variant cheating field, such as Fig. 1 provided by the embodiment of the present invention
Shown, the recognition methods of variant cheating field includes:
Step 101, text to be identified is obtained.
In the present embodiment, in order to identify variant cheating field, need first to obtain text to be identified.
In one embodiment of the invention, can be obtained from internet article, comment or pushed information etc. as to
Identify text.For example, " XX favour finishing identifies wechat small routine without installing APP to available circle of friends pushed information
Application finishing is by stages." it is used as text to be identified.For another example, " meeting is one grand joyful, at heart for available comment
Meet, even more the most beautiful landscape of life.← → common vetch ← → 199 ← → 2638 ← → 723 ← → " it is used as text to be identified.
Step 102, digital number paragraph is extracted from text to be identified.
In practical applications, due to generally including contact method in variant cheating field, and contact method (such as WeChat ID
Code, telephone number etc.) usually it is made of number, therefore, in order to identify variant cheating field, need to extract from text to be identified
Digital number paragraph.Wherein, digital number paragraph refers to the paragraph with digital number, may include number, Chinese, letter,
Additional character etc..
From text to be identified extract digital number paragraph implementation can there are many, be illustrated below:
As a kind of possible implementation, the position of the spcial character in available text to be identified, and then with spy
Whether the character for judging preset characters interval centered on different character forward or backward is number, if it is, by spcial character with
Digital number paragraph is added in character between the character at preset characters interval.Wherein, spcial character can be individual digit word
Symbol, can be continuous number character field, or the word of hit character string regular position (such as character string beginning, end etc.)
Symbol etc..Preset characters interval can be obtained according to lot of experimental data, can also be by those skilled in the art's self-setting.
In the present embodiment, the digital number paragraph including continuous number segment can be extracted from text to be identified, than
Such as " add wechat 12345678 ", the digital number paragraph including discontinuous digital segment, such as " 1.3- can also be extracted
The micro- 8-9 letter -0 of 4.- adds this " etc..
Step 103, the conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching.
In one embodiment of the invention, variant word conversion vocabulary can be preset, and by variant word transformational relation
It is stored in variant word conversion vocabulary, and then vocabulary is converted by variant word, the conversion of variant word is carried out to digital number paragraph.Example
Such as, digital number paragraph is " jia dimension 136919 meets 71634 by chance ", becomes " adding micro- 136919 to meet by chance by the conversion of variant word
71634”。
Wherein, variant word conversion vocabulary can be configured according to the variant printed words notebook data collected on line, can also be by
Those skilled in the art's self-setting as needed.For example, variant word conversion vocabulary may include: " sign " micro-, "+" " adds ", " v "
Micro-etc..
In the present embodiment, after carrying out the conversion of variant word to digital number paragraph, need to guide word matching, with basis
With result identification variant cheating field.As a kind of possible implementation, guidance vocabulary can be preset, and then according to drawing
Introductory word table guides word matching to the digital number paragraph after the conversion of variant word.For example, introducer can be " public for " wechat "
Many numbers " etc..
Step 104, if being matched to introducer, judge digital number paragraph for variant cheating field.
In the present embodiment, if being matched to introducer, judge digital number paragraph for variant cheating field.Such as number
Number paragraph " adding wechat 12345678 " has been matched to introducer " wechat ", then judges the digital number paragraph for variant cheating word
Section.
Step 105, if not being matched to introducer, variant feature is extracted from digital number paragraph, and according to variant
Feature is given a mark to generate score value.
It in one embodiment of the invention, can be by correlated characteristic extraction algorithm, directly from digital number paragraph
Extract variant feature.
In one embodiment of the invention, Chinese, the number etc. in digital number paragraph can also be normalized to spell
Sound, and then variant feature is extracted from the digital number paragraph of alphabetizing.Such as " adding wechat 12345678 " carries out phonetic normalizing
After change, become " jiaweixinyiersansiwuliuqiba ".
Wherein, variant feature can be obtained according to lot of experimental data, can also be as needed by those skilled in the art
Self-setting.For example, variant feature may include special abnormality symbol accounting, number series number etc..
It, can be in several ways according to variant spy after extracting variant feature in digital number paragraph in the present embodiment
Sign is given a mark to generate score value.
As a kind of possible implementation, marking formula can be preset, and then is mentioned from digital number paragraph
After taking variant feature, variant feature is substituted into marking formula and is given a mark to generate score value.Wherein, marking formula can basis
Lot of experimental data obtains, can also be by those skilled in the art's self-setting as needed.
As alternatively possible implementation, digital number paragraph can be chosen using on line as sample data, according to sample
Notebook data is trained the parameter information of network neural model, to generate scoring model, and then from digital number paragraph
After extracting variant feature, variant feature is input in scoring model and is given a mark to generate score value.
It should be noted that above-mentioned given a mark according to variant feature to generate the implementation of score value is only example
Property, score value only can be generated by one way in which, score value can also be generated in conjunction with various ways, do not limited herein
System.
Step 106, if score value is greater than preset threshold, judge digital number paragraph for variant cheating field.
In the present embodiment, score value is higher, illustrates that a possibility that digital number paragraph is variant cheating field is higher, instead
It, score value is lower, illustrates that a possibility that digital number paragraph is variant cheating field is lower.
It is alternatively possible in turn, judge that score value is greater than preset threshold by the score value of acquisition compared with preset threshold
Digital number paragraph is variant cheating field, and the digital number paragraph that score value is less than or equal to preset threshold is not variant cheating word
Section.
In the present embodiment, by the fining of digital number paragraph extraction process, solves discontinuous number in the related technology
The unrecognized problem of word slice section.When being not matched to introducer, by extracting variant feature from digital number paragraph, and
It is given a mark according to variant feature to generate score value, so that can also be carried out according to score value when being not matched to introducer
Identify variant cheating field.Also, by way of being given a mark in conjunction with variant feature, the variant cheating field of implementation rule
Recognition strategy improves the accuracy rate of algorithm and the accuracy of variant cheating field identification.
In conclusion the recognition methods of the variant cheating field of the embodiment of the present invention, by obtaining text to be identified, in turn
Digital number paragraph is extracted from text to be identified, and the conversion of variant word further is carried out to the text in digital number paragraph and is gone forward side by side
The matching of row introducer, and when being matched to introducer, judge that digital number paragraph for variant cheating field, be not matched to guidance
When word, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature, further existed
When score value is greater than preset threshold, judge digital number paragraph for variant cheating field.It solves and in the related technology cannot as a result,
Identify discontinuous digital segment, without introducer can not match cognization variant cheating field the problem of, improve variant cheating word
The accuracy rate of section identification.
In order to more clearly explain the present invention, below for extracting from text to be identified, the progress of digital number paragraph is detailed
It describes in detail bright.Fig. 2 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention, is such as schemed
Shown in 2, after obtaining text to be identified, this method comprises:
Step 201, digital variations normalization is carried out to text to be identified.
In the present embodiment, from text to be identified extract digital number paragraph before, can also to text to be identified into
Digital variations in text to be identified are converted to normal digital by the normalization of row digital variations, digital in the related technology to solve
Variant (such as " 1- > mono- ", " 2- > bis- ", " 8- > 〥 ") can not identify, the problem that coverage rate is low.
In one embodiment of the invention, number can be carried out to text to be identified according to variant contact method database
Variant normalization.
For example, variant contact method on line can periodically be collected as negative sample, by variant alphanumeric characters and just
The corresponding relationship of constant word alphabetic character is stored in variant contact method database, and then text to be identified and variant are joined
It is that mode database is matched, further according to matching result and corresponding relationship, by the variant numeric word in text to be identified
Alphabetic character is converted to normal digital alphabetic character.
Wherein, variant alphanumeric characters include but is not limited to Chinese figure (one -> 1), RMB number (one -> 1), band
Circle numberVariant letterDeng.The corresponding relationship of variant alphanumeric characters and normal digital alphabetic character can
To be stored by trie-master structure, dict structure or other similar structure, herein with no restriction.
In one embodiment of the invention, the character in text to be identified can also be converted to character picture, in turn
Character picture and digital picture are carried out similarity to compare to generate similarity value, it is further that similarity value is similar greater than presetting
Character corresponding to the character picture of threshold value is spent, the corresponding number of corresponding digital picture is converted to.
For example, can be unicode (Unicode) coding section by text conversion to be identified, and will be in unicode coding section
Character be converted to character picture, and then character picture and preset digital picture are subjected to similarity and compared, passed through perception and breathe out
Uncommon algorithm (Perceptual hash algorithm, abbreviation phash) calculates the similarity value of picture, further by similarity
Value is matched with default similarity threshold, is obtained similarity value and is greater than word corresponding to the character picture of default similarity threshold
Symbol is converted to the corresponding number of corresponding digital picture.
Step 202, the position of the spcial character in text to be identified is obtained.
Wherein, spcial character can be individual digit character, can be continuous number character field, or hit character
The character etc. of string regular position.
As a kind of possible implementation, text to be identified can be matched by the matched mode of canonical, be obtained
The spcial character in text to be identified is taken, and obtains the position of spcial character.
Step 203, the character string that spcial character front and back meets preset rules is extracted, and will be before spcial character and spcial character
Digital number paragraph is added in the character string for meeting preset rules afterwards.
In one embodiment of the invention, preset rules can be with are as follows: is judged forward or backward centered on spcial character
Whether the character at preset characters interval is number, if it is, by the word between spcial character and the character at preset characters interval
Digital number paragraph is added in Fu Jun.
For example, being divided into 5 between preset characters, text to be identified is " adding -1-2-3 ", and spcial character is " adding ", with spcial character
The character for being spaced 5 characters centered on " adding " backward is " 3 ", since " 3 " are numbers, then " will add -1-2-3 " and digital number section is added
It falls.
For another example being divided into 10 between preset characters, know that the character of 10 character pitches is not backward centered on spcial character
Number, so judge to know the character of 9 character pitches backward as number, then it is spcial character and 9 character pitches backward is direct
Digital number paragraph is added in character.
Wherein, preset characters interval can be obtained according to lot of experimental data, can also be by those skilled in the art voluntarily
Setting.
Thus, it is possible to extract the digital number paragraph including continuous number segment from text to be identified, can also mention
Take out the digital number paragraph including discontinuous digital segment.
In one embodiment of the invention, it after obtaining digital number paragraph, can also remove in digital number paragraph
Interference symbol.Wherein, interference symbol includes but is not limited to that exception interferes symbol and additional character, such as ︻,☆,Deng.
It is alternatively possible to extract the interference symbol in digital number paragraph based on unicode coding section.That is, such as
The character, then regarded as interference symbol by the unicode coding section that the certain additional characters of fruit one word hits are concentrated, and removal should
Character and the calculating that additional character ratio is added.Wherein, the unicode coding section that additional character is concentrated can be (u2600-
U26FF, u2700- u27BF).
The recognition methods of the variant cheating field of the embodiment of the present invention, by carrying out digital variations normalizing to text to be identified
Change, realizes the identification to variant number.In turn, by obtaining the position of the spcial character in text to be identified, it is special to extract
Meet the character string of preset rules before and after character, and the character for meeting preset rules before and after spcial character and spcial character is serially added
Enter digital number paragraph, thus, it is possible to the digital number paragraph including continuous number segment is extracted from text to be identified,
The digital number paragraph including discontinuous digital segment can be extracted, to solve discontinuous digital segment in the related technology
Unrecognized problem realizes the fining of digital number paragraph extraction process.Also, it can also remove in digital number
Symbol is interfered, to further increase the accuracy rate of variant cheating field identification.
Based on the above embodiment, further, variant spy can also be extracted from the normalized digital number paragraph of phonetic
It seeks peace abstract characteristics, and then score value is obtained according to variant feature and abstract characteristics, identify that variant is practised fraud field according to score value.
Fig. 3 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention, such as
Shown in Fig. 3, from text to be identified extract digital number paragraph after, the variant practise fraud field recognition methods further include:
Step 301, the conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching.
In one embodiment of the invention, guidance vocabulary can be preset, and establishes introducer matching characteristic Mguide,
And then word matching is guided to the digital number paragraph after the conversion of variant word according to guidance vocabulary, when being matched to introducer,
Mguide=1;When not being matched to introducer, Mguide=0.
It should be noted that the explanation for carrying out the conversion of variant word to digital number paragraph in previous embodiment is equally fitted
For step 301, details are not described herein again.
Step 302, digital number paragraph is subjected to phonetic normalization.
In one embodiment of the invention, database can also be preset, and Chinese, number is corresponding with phonetic
Relationship stores in the database, and then according to Chinese, number and the corresponding relationship of phonetic, by digital number paragraph Chinese,
Number is converted to phonetic.Such as after " adding wechat 12345678 " carries out phonetic normalization, become
“jiaweixinyiersansiwuliuqiba”。
Step 303, variant feature is extracted from the normalized digital number paragraph of phonetic.
Wherein, variant feature is illustrated below:
E-distanceguide: variant introducer editing distance.Wherein, editing distance (Edit Distance, also known as
Levenshtein distance) refer between two word strings, the minimum edit operation times needed for another are changed into as one.For example,
It is " Weihe letter " that introducer is " wechat " before phonetic normalization, then editing distance is 1.
Spec_ratio: special abnormality symbol accounting.Calculation method: Spec_ratio=len (Spec)/len
(comment), wherein len (Spec) is additional character number of characters, and len (comment) is overall character number.
Guide_pinyin: phonetic and introducer phonetically similar word number.Such as (character used in proper names and in rendering some foreign names -> jia, common vetch -> wei, the heart -> xin).
Digit_pinyin: digital phonetically similar word number.Such as (two -> er, take a walk its -> liuliuqi).
Distancegd: the distance between introducer and number series.For example, phonetic introducer " weixin " and number series it
Between distance.Calculation method: Posguide-Posdigit-seq。
Matchg: guide word contiguity after phonetic normalization.For example, if phonetic normalization after hit " weixin,
The words such as jiawei, gongzhonghao, jiaq ", then contiguity is 1;If hitting a word, such as " wei, dian, jia, q ",
Then contiguity is 0.
E-distancedigit: number series editing distance.For example, if a number series is " one 37 eight eight 77 ", this number
Sequence editing distance is 3.
Seq_number: number series number.Number series Seq is defined as continuous (less than 2 Chinese characters of midfeather)
Numeric word parent segment, string of the alphanumeric characters number between 5-11.
Step 304, abstract characteristics are extracted from the normalized digital number paragraph of phonetic.
In the present embodiment, digital number paragraph after phonetic can also being normalized, be further abstracted be normalized into it is abstract
String, and abstract characteristics are extracted from abstract string.
As an example, normalization rule are as follows: the character of hit introducer phonetic is 2, hits digital phonetic letter
Character is 1, and the character for hitting additional character is 3, remaining character is 0.It is illustrated below: if digital number paragraph is " Jia Wei ィ
What is thumbed up is all very lucky Ji to Yan [4 0 woods of Ling two Ba, three wine paint] ", then being abstracted string is " 2223111111111300000000 ".
Wherein, abstract string is denoted as Seq-ab, and abstract characteristics are described in detail below:
Var-w: contact method feature variance, circular: Var (Seq-ab).
Digit-var: number dispersion, circular: Var (pos1) is abstracted 1 (representing digital alphabet) in string
The variance of the position of appearance.
Guide-var: introducer dispersion, circular: Var (pos2) is abstracted 2 (representing guide word) in string
The variance of the position of appearance.
Spec-var: additional character dispersion, circular: Var (pos3), i.e., 3 (represent special symbol in abstract string
Number) occur position variance.
Step 305, it is given a mark according to variant feature and abstract characteristics to generate score value.
Step 306, if score value is greater than preset threshold, judge digital number paragraph for variant cheating field.
In the present embodiment, it can give a mark jointly to digital number paragraph in conjunction with marking formula and xgboost model.
In one embodiment of the invention, it can be given a mark according to variant feature and introducer matching characteristic in conjunction with formula,
A possibility that in a manner of judging to be related in digital number paragraph, marking formula are exemplified below:
If score is more than or equal to 2, mode of being related in the digital number paragraph is judged.
In one embodiment of the invention, the variant cheating field conduct for having normal or variant contact method can be chosen
Negative sample is chosen the normal field without contact method and is instructed as positive sample, and then by sample data to xgboost model
Practice, further according to variant feature and abstract characteristics, is given a mark by xgboost model to generate score value.
Wherein, variant feature and abstract characteristics be step 303, the feature extracted in step 304, be exemplified below:
Var-w: contact method feature variance.
Digit-var: number dispersion.
Guide-var: introducer dispersion.
Spec-var: additional character dispersion.
E-distanceguide: variant introducer editing distance.
Spec_ratio: special abnormality symbol accounting.
Guide_pinyin: phonetic and introducer phonetically similar word number
Digit_pinyin: digital phonetically similar word number.
Distancegd: the distance between introducer and number series.
Matchg: guide word contiguity after phonetic normalization.
E-distancedigit: number series editing distance.
Seq_number: number series number.
In the present embodiment, number can be judged when the score value of give a mark formula and xgboost model is both greater than preset threshold
Font size code paragraph is variant cheating field.
It should be noted that above-mentioned for being given a mark according to variant feature and abstract characteristics to generate the explanation of score value
Explanation is only exemplary, and in another embodiment of the present invention, can also judge digital number when being matched to introducer
Code paragraph is that variant field of practising fraud carries out phonetic normalization to digital number paragraph, and from normalizing when not being matched to introducer
Variant feature and abstract characteristics are extracted in the digital number paragraph of change, are further passed through according to variant feature to abstract characteristics related
Formula and model are given a mark, and identify variant cheating field according to score value.
The recognition methods of the variant cheating field of the embodiment of the present invention, by the way that digital number paragraph is carried out phonetic normalizing
Change, and extracts variant feature and abstract characteristics from the normalized digital number paragraph of phonetic.In turn, it is matched in conjunction with introducer special
Sign, variant feature and abstract characteristics, two kinds of marking modes of aggregative formula and model generate score value, identify variant according to score value
Cheating field further improves the accuracy rate of variant cheating field identification.In the present embodiment, pass through two different marking sides
Formula implementation rule variant recognition strategy, improve algorithm accuracy rate, also, from additional character ratio calculate, variant normalization, number
Code is abstracted with guide word position distribution, and model automatically start with by the various aspects such as training, greatly improves algorithm generalization ability.
In order to realize above-described embodiment, the present invention also proposes that a kind of identification device of variant cheating field, Fig. 4 are the present invention
A kind of structural schematic diagram of the identification device of variant cheating field, the field as shown in figure 4, variant is practised fraud provided by embodiment
Identification device include: obtain module 100, extraction module 200, matching module 300, first judgment module 400, grading module
500, the second judgment module 600.
Wherein, module 100 is obtained, for obtaining text to be identified.
Extraction module 200, for extracting digital number paragraph from text to be identified.
Matching module 300, for carrying out the conversion of variant word to the text in digital number paragraph and carrying out introducer matching.
First judgment module 400, if judging digital number paragraph for variant cheating word for being matched to introducer
Section.
Grading module 500, if variant feature is extracted from digital number paragraph for not being matched to introducer, and
It is given a mark according to variant feature to generate score value.
Second judgment module 600 judges digital number paragraph for variant work if being greater than preset threshold for score value
Disadvantage field.
On the basis of fig. 4, the identification device of variant cheating field shown in fig. 5 further include: conversion module 700, processing
Module 800.
Wherein, conversion module 700, for carrying out digital variations normalization to text to be identified.
Further, conversion module 700 is specifically used for: being counted according to variant contact method database to text to be identified
The normalization of word variant.
Further, conversion module 700 is specifically used for:
Character in text to be identified is converted into character picture;
Character picture and digital picture are carried out similarity to compare to generate similarity value;
Similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding digitized map
The corresponding number of piece.
Further, extraction module 200 is specifically used for:
Obtain the position of the spcial character in text to be identified;
The character string that spcial character front and back meets preset rules is extracted, and pre- by meeting before and after spcial character and spcial character
If digital number paragraph is added in the character string of rule.
Further, preset rules are as follows: judge the character at preset characters interval forward or backward centered on spcial character
It whether is number;If it is, digital number section is added in the character between spcial character and the character at preset characters interval
It falls.
Processing module 800, for removing the interference symbol in digital number paragraph.
Further, grading module 500 is specifically used for:
Digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to variant feature and abstract characteristics to generate score value.
It should be noted that previous embodiment is equally applicable to this to the explanation of the recognition methods of variant cheating field
The identification device of the variant cheating field of embodiment, details are not described herein again.
In conclusion the identification device of the variant cheating field of the embodiment of the present invention, by obtaining text to be identified, in turn
Digital number paragraph is extracted from text to be identified, and the conversion of variant word further is carried out to the text in digital number paragraph and is gone forward side by side
The matching of row introducer, and when being matched to introducer, judge that digital number paragraph for variant cheating field, be not matched to guidance
When word, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature, further existed
When score value is greater than preset threshold, judge digital number paragraph for variant cheating field.It solves and in the related technology cannot as a result,
Identify discontinuous digital segment, without introducer can not match cognization variant cheating field the problem of, improve variant cheating word
The accuracy rate of section identification.
In order to realize above-described embodiment, the present invention also proposes a kind of computer equipment, including processor and memory;Its
In, processor runs journey corresponding with executable program code by reading the executable program code stored in memory
Sequence, with the recognition methods for realizing the variant cheating field as described in aforementioned any embodiment.
In order to realize above-described embodiment, the present invention also proposes a kind of computer program product, when in computer program product
Instruction realize that the variant as described in aforementioned any embodiment is practised fraud the recognition methods of field when being executed by processor.
In order to realize above-described embodiment, the present invention also proposes a kind of non-transitorycomputer readable storage medium, deposits thereon
Computer program is contained, the knowledge of the variant cheating field as described in aforementioned any embodiment is realized when which is executed by processor
Other method.
Fig. 6 shows the block diagram for being suitable for the exemplary computer device for being used to realize the embodiment of the present invention.The meter that Fig. 6 is shown
Calculating machine equipment 12 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in fig. 6, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with
Including but not limited to: one or more processor or processing unit 16, system storage 28 connect different system components
The bus 18 of (including system storage 28 and processing unit 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (Industry Standard
Architecture;Hereinafter referred to as: ISA) bus, microchannel architecture (Micro Channel Architecture;Below
Referred to as: MAC) bus, enhanced isa bus, Video Electronics Standards Association (Video Electronics Standards
Association;Hereinafter referred to as: VESA) local bus and peripheral component interconnection (Peripheral Component
Interconnection;Hereinafter referred to as: PCI) bus.
Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by
The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory
Device (Random Access Memory;Hereinafter referred to as: RAM) 30 and/or cache memory 32.Computer equipment 12 can be with
It further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example,
Storage system 34 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 6 do not show, commonly referred to as " hard drive
Device ").Although being not shown in Fig. 6, the disk for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided and driven
Dynamic device, and to removable anonvolatile optical disk (such as: compact disc read-only memory (Compact Disc Read Only
Memory;Hereinafter referred to as: CD-ROM), digital multi CD-ROM (Digital Video Disc Read Only
Memory;Hereinafter referred to as: DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving
Device can be connected by one or more data media interfaces with bus 18.Memory 28 may include that at least one program produces
Product, the program product have one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application
The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28
In, such program module 42 include but is not limited to operating system, one or more application program, other program modules and
It may include the realization of network environment in program data, each of these examples or certain combination.Program module 42 is usual
Execute the function and/or method in embodiments described herein.
Computer equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24
Deng) communication, the equipment interacted with the computer system/server 12 can be also enabled a user to one or more to be communicated, and/
Or with enable the computer system/server 12 and one or more of the other any equipment (example for being communicated of calculating equipment
Such as network interface card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 22.Also, it calculates
Machine equipment 12 can also pass through network adapter 20 and one or more network (such as local area network (Local Area
Network;Hereinafter referred to as: LAN), wide area network (Wide Area Network;Hereinafter referred to as: WAN) and/or public network, example
Such as internet) communication.As shown, network adapter 20 is communicated by bus 18 with other modules of computer equipment 12.It answers
When understanding, although not shown in the drawings, other hardware and/or software module can be used in conjunction with computer equipment 12, including but not
Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and
Data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and
Data processing, such as realize the method referred in previous embodiment.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot
It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the
One ", the feature of " second " can explicitly or implicitly include at least one of the features.In the description of the present invention, " multiple "
It is meant that at least two, such as two, three etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned
Embodiment is changed, modifies, replacement and variant.
Claims (18)
1. a kind of recognition methods of variant cheating field characterized by comprising
Obtain text to be identified;
Digital number paragraph is extracted from the text to be identified;
The conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching;
If being matched to introducer, judge the digital number paragraph for variant cheating field;
If not being matched to introducer, variant feature is extracted from the digital number paragraph, and according to the variant feature
It gives a mark to generate score value;
If the score value is greater than preset threshold, judge the digital number paragraph for variant cheating field.
2. the recognition methods of variant cheating field as described in claim 1, which is characterized in that described from the text to be identified
In this before extraction digital number paragraph, further includes:
Digital variations normalization is carried out to the text to be identified.
3. the recognition methods of variant cheating field as claimed in claim 2, which is characterized in that described to the text to be identified
Carrying out digital variations normalization includes:
Digital variations normalization is carried out to the text to be identified according to variant contact method database.
4. the recognition methods of variant cheating field as claimed in claim 3, which is characterized in that further include:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding digitized map
The corresponding number of piece.
5. the recognition methods of variant according to any one of claims 1-4 cheating field, which is characterized in that it is described from it is described to
It identifies and extracts digital number paragraph in text, comprising:
Obtain the position of the spcial character in the text to be identified;
The character string that the spcial character front and back meets preset rules is extracted, and will be before the spcial character and the spcial character
The digital number paragraph is added in the character string for meeting preset rules afterwards.
6. the recognition methods of variant cheating field as claimed in claim 5, which is characterized in that the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the digital number is added in the character between the spcial character and the character at the preset characters interval
Code paragraph.
7. the recognition methods of variant cheating field as claimed in claim 5, which is characterized in that further include:
Remove the interference symbol in the digital number paragraph.
8. the recognition methods of variant cheating field as described in claim 1, which is characterized in that described from the digital number section
Middle extraction variant feature is fallen, and is given a mark according to the variant feature to generate score value and include:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
9. a kind of identification device of variant cheating field characterized by comprising
Module is obtained, for obtaining text to be identified;
Extraction module, for extracting digital number paragraph from the text to be identified;
Matching module, for carrying out the conversion of variant word to the text in digital number paragraph and carrying out introducer matching;
First judgment module, if judging the digital number paragraph for variant cheating field for being matched to introducer;
Grading module, if for not being matched to introducer, the extraction variant feature from the digital number paragraph, and according to
The variant feature is given a mark to generate score value;
Second judgment module judges the digital number paragraph for variant if being greater than preset threshold for the score value
Cheating field.
10. the identification device of variant cheating field as claimed in claim 9, which is characterized in that further include:
Conversion module, for carrying out digital variations normalization to the text to be identified.
11. the identification device of variant cheating field as claimed in claim 10, which is characterized in that the conversion module is specifically used
In:
Digital variations normalization is carried out to the text to be identified according to variant contact method database.
12. the identification device of variant cheating field as claimed in claim 10, which is characterized in that the conversion module is specifically used
In:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding digitized map
The corresponding number of piece.
13. such as the identification device of the described in any item variant cheating fields of claim 9-12, which is characterized in that the extraction mould
Block is specifically used for:
Obtain the position of the spcial character in the text to be identified;
The character string that the spcial character front and back meets preset rules is extracted, and will be before the spcial character and the spcial character
The digital number paragraph is added in the character string for meeting preset rules afterwards.
14. the identification device of variant cheating field as claimed in claim 13, which is characterized in that the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the digital number is added in the character between the spcial character and the character at the preset characters interval
Code paragraph.
15. the identification device of variant cheating field as claimed in claim 13, which is characterized in that further include:
Processing module, for removing the interference symbol in the digital number paragraph.
16. the identification device of variant cheating field as claimed in claim 9, which is characterized in that institute's scoring module is specifically used
In:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
17. a kind of computer equipment, which is characterized in that including processor and memory;
Wherein, the processor is run by reading the executable program code stored in the memory can be performed with described
The corresponding program of program code, with the identification side for realizing variant cheating field such as of any of claims 1-8
Method.
18. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program
The recognition methods such as variant of any of claims 1-8 cheating field is realized when being executed by processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810907161.7A CN109241523B (en) | 2018-08-10 | 2018-08-10 | Method, device and equipment for identifying variant cheating fields |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810907161.7A CN109241523B (en) | 2018-08-10 | 2018-08-10 | Method, device and equipment for identifying variant cheating fields |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109241523A true CN109241523A (en) | 2019-01-18 |
CN109241523B CN109241523B (en) | 2020-12-11 |
Family
ID=65070547
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810907161.7A Active CN109241523B (en) | 2018-08-10 | 2018-08-10 | Method, device and equipment for identifying variant cheating fields |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109241523B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110085224A (en) * | 2019-04-10 | 2019-08-02 | 深圳康佳电子科技有限公司 | Intelligent terminal whole process speech control processing method, intelligent terminal and storage medium |
CN110298020A (en) * | 2019-05-30 | 2019-10-01 | 北京百度网讯科技有限公司 | Anti- variant restoring method and the anti-cheat method of equipment, text and the equipment of practising fraud of text |
CN110717328A (en) * | 2019-07-04 | 2020-01-21 | 北京达佳互联信息技术有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN112201225A (en) * | 2020-09-30 | 2021-01-08 | 北京大米科技有限公司 | Corpus obtaining method and device, readable storage medium and electronic equipment |
CN112784592A (en) * | 2019-11-11 | 2021-05-11 | 四川睿象科技有限公司 | Method for extracting effective alarm data based on natural language features |
CN113282746A (en) * | 2020-08-08 | 2021-08-20 | 西北工业大学 | Novel network media platform variant comment confrontation text generation method |
CN113408270A (en) * | 2021-06-10 | 2021-09-17 | 广州三七极创网络科技有限公司 | Variant text recognition method and device and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729520A (en) * | 2008-10-28 | 2010-06-09 | 北京大学 | Method and device for detecting sensitive information |
CN102184188A (en) * | 2011-04-15 | 2011-09-14 | 百度在线网络技术(北京)有限公司 | Method and equipment for determining sensitivity of target text |
CN102591854A (en) * | 2012-01-10 | 2012-07-18 | 凤凰在线(北京)信息技术有限公司 | Advertisement filtering system and advertisement filtering method specific to text characteristics |
CN103064850A (en) * | 2011-10-20 | 2013-04-24 | 腾讯科技(深圳)有限公司 | Method and system of digging cheating data |
CN103514174A (en) * | 2012-06-18 | 2014-01-15 | 北京百度网讯科技有限公司 | Text categorization method and device |
CN104050556A (en) * | 2014-05-27 | 2014-09-17 | 哈尔滨理工大学 | Feature selection method and detection method of junk mails |
CN106407324A (en) * | 2016-08-31 | 2017-02-15 | 北京城市网邻信息技术有限公司 | Method and device for recognizing contact information |
-
2018
- 2018-08-10 CN CN201810907161.7A patent/CN109241523B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101729520A (en) * | 2008-10-28 | 2010-06-09 | 北京大学 | Method and device for detecting sensitive information |
CN102184188A (en) * | 2011-04-15 | 2011-09-14 | 百度在线网络技术(北京)有限公司 | Method and equipment for determining sensitivity of target text |
CN103064850A (en) * | 2011-10-20 | 2013-04-24 | 腾讯科技(深圳)有限公司 | Method and system of digging cheating data |
CN102591854A (en) * | 2012-01-10 | 2012-07-18 | 凤凰在线(北京)信息技术有限公司 | Advertisement filtering system and advertisement filtering method specific to text characteristics |
CN103514174A (en) * | 2012-06-18 | 2014-01-15 | 北京百度网讯科技有限公司 | Text categorization method and device |
CN104050556A (en) * | 2014-05-27 | 2014-09-17 | 哈尔滨理工大学 | Feature selection method and detection method of junk mails |
CN106407324A (en) * | 2016-08-31 | 2017-02-15 | 北京城市网邻信息技术有限公司 | Method and device for recognizing contact information |
Non-Patent Citations (1)
Title |
---|
汪霞等: "基于中文变形词匹配的贝叶斯邮件过滤模型", 《计算机应用与软件》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110085224A (en) * | 2019-04-10 | 2019-08-02 | 深圳康佳电子科技有限公司 | Intelligent terminal whole process speech control processing method, intelligent terminal and storage medium |
CN110085224B (en) * | 2019-04-10 | 2021-06-01 | 深圳康佳电子科技有限公司 | Intelligent terminal whole-course voice control processing method, intelligent terminal and storage medium |
CN110298020A (en) * | 2019-05-30 | 2019-10-01 | 北京百度网讯科技有限公司 | Anti- variant restoring method and the anti-cheat method of equipment, text and the equipment of practising fraud of text |
CN110298020B (en) * | 2019-05-30 | 2023-05-16 | 北京百度网讯科技有限公司 | Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment |
CN110717328A (en) * | 2019-07-04 | 2020-01-21 | 北京达佳互联信息技术有限公司 | Text recognition method and device, electronic equipment and storage medium |
CN112784592A (en) * | 2019-11-11 | 2021-05-11 | 四川睿象科技有限公司 | Method for extracting effective alarm data based on natural language features |
CN113282746A (en) * | 2020-08-08 | 2021-08-20 | 西北工业大学 | Novel network media platform variant comment confrontation text generation method |
CN113282746B (en) * | 2020-08-08 | 2023-05-23 | 西北工业大学 | Method for generating variant comment countermeasure text of network media platform |
CN112201225A (en) * | 2020-09-30 | 2021-01-08 | 北京大米科技有限公司 | Corpus obtaining method and device, readable storage medium and electronic equipment |
CN112201225B (en) * | 2020-09-30 | 2024-02-02 | 北京大米科技有限公司 | Corpus acquisition method and device, readable storage medium and electronic equipment |
CN113408270A (en) * | 2021-06-10 | 2021-09-17 | 广州三七极创网络科技有限公司 | Variant text recognition method and device and electronic equipment |
CN113408270B (en) * | 2021-06-10 | 2023-02-10 | 广州三七极创网络科技有限公司 | Variant text recognition method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109241523B (en) | 2020-12-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109241523A (en) | Recognition methods, device and the equipment of variant cheating field | |
CN108984530B (en) | Detection method and detection system for network sensitive content | |
CN110909548B (en) | Chinese named entity recognition method, device and computer readable storage medium | |
US8380488B1 (en) | Identifying a property of a document | |
CN104008091B (en) | A kind of network text sentiment analysis method based on emotion value | |
Ionescu et al. | Can characters reveal your native language? A language-independent approach to native language identification | |
CN106815197A (en) | The determination method and apparatus of text similarity | |
CN110516247A (en) | Name entity recognition method neural network based and computer storage medium | |
CN104239490B (en) | Multi-account detection method and device for UGC (user generated content) website platform | |
CN112100384B (en) | Data viewpoint extraction method, device, equipment and storage medium | |
Das et al. | An algorithm for Japanese character recognition | |
CN112686026B (en) | Keyword extraction method, device, equipment and medium based on information entropy | |
CN113901170A (en) | Event extraction method and system combining Bert model and template matching and electronic equipment | |
CN110020005A (en) | Symptom matching process in main suit and present illness history in a kind of case history | |
CN108170806A (en) | Sensitive word detection filter method, device and computer equipment | |
Bedrick et al. | Robust kaomoji detection in Twitter | |
CN110222331A (en) | Lie recognition methods and device, storage medium, computer equipment | |
Mathew et al. | Asking questions on handwritten document collections | |
CN110119702B (en) | Facial expression recognition method based on deep learning prior | |
Aragón et al. | A straightforward multimodal approach for author profiling | |
CN113850643A (en) | Product recommendation method and device, electronic equipment and readable storage medium | |
CN102929863A (en) | Method for intelligently analyzing Chinese character emotional tendency through computer | |
CN107861941B (en) | User nickname authenticity evaluation method, storage medium, electronic device and system | |
CN113887202A (en) | Text error correction method and device, computer equipment and storage medium | |
Ji | Cross-lingual predicate cluster acquisition to improve bilingual event extraction by inductive learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |