CN109241523A - Recognition methods, device and the equipment of variant cheating field - Google Patents

Recognition methods, device and the equipment of variant cheating field Download PDF

Info

Publication number
CN109241523A
CN109241523A CN201810907161.7A CN201810907161A CN109241523A CN 109241523 A CN109241523 A CN 109241523A CN 201810907161 A CN201810907161 A CN 201810907161A CN 109241523 A CN109241523 A CN 109241523A
Authority
CN
China
Prior art keywords
variant
character
digital number
text
paragraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810907161.7A
Other languages
Chinese (zh)
Other versions
CN109241523B (en
Inventor
陈玉焓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810907161.7A priority Critical patent/CN109241523B/en
Publication of CN109241523A publication Critical patent/CN109241523A/en
Application granted granted Critical
Publication of CN109241523B publication Critical patent/CN109241523B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses recognition methods, device and the equipment of a kind of variant cheating field, wherein method includes: to obtain text to be identified;Digital number paragraph is extracted from text to be identified;The conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching;If being matched to introducer, judge digital number paragraph for variant cheating field;If not being matched to introducer, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature;If score value is greater than preset threshold, judge digital number paragraph for variant cheating field.Solve the problems, such as cannot to identify as a result, discontinuous digital segment, without introducer can not match cognization variant cheating field, improve the accuracy rate of variant cheating field identification.

Description

Recognition methods, device and the equipment of variant cheating field
Technical field
The present invention relates to the recognition methods of Internet technical field more particularly to a kind of variant cheating field, device and set It is standby.
Background technique
With the fast development of Internet technology, network has become the major way of people's communication exchange, release information. However, will appear the variant cheating field with contact method on internet often, such as " ← → common vetch ← → 199 ← → 2638 ← → 723 ← → " " jia dimension → I0230 fires Hou 66183 " etc..
In the related technology, scheme one is identified according to matching result and is become by introducers such as matchings " wechat, phone, mail " Body cheating field.Program accuracy rate is lower, for example " I recognizes him in wechat " " he has high prestige " can be identified as variant Cheating field, and when matching is less than introducer, it can not identify variant cheating field.Scheme two passes through WeChat ID code, phone The regular expression matching text fragment that number, url are linked, if corresponding digital section can be matched to, which is known It Wei not variant cheating field.Program accuracy rate is lower, and can not match discontinuous digital segment.
Summary of the invention
The embodiment of the present invention is intended to solve at least some of the technical problems in related technologies.
For this purpose, first purpose of the embodiment of the present invention is to propose a kind of recognition methods of variant cheating field, with solution Cannot certainly identify in the related technology discontinuous digital segment, without introducer can not match cognization variant cheating field the problem of, Improve the accuracy rate of variant cheating field identification.
Second purpose of the embodiment of the present invention is to propose a kind of identification device of variant cheating field.
The third purpose of the embodiment of the present invention is to propose a kind of computer equipment.
4th purpose of the embodiment of the present invention is to propose a kind of non-transitorycomputer readable storage medium.
In order to achieve the above object, first aspect present invention embodiment proposes a kind of recognition methods of variant cheating field, packet It includes:
Obtain text to be identified;
Digital number paragraph is extracted from the text to be identified;
The conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching;
If being matched to introducer, judge the digital number paragraph for variant cheating field;
If not being matched to introducer, variant feature is extracted from the digital number paragraph, and according to the variant Feature is given a mark to generate score value;
If the score value is greater than preset threshold, judge the digital number paragraph for variant cheating field.
The recognition methods of the variant cheating field of the embodiment of the present invention, obtains text to be identified first, and from text to be identified Digital number paragraph is extracted in this.In turn, the conversion of variant word is carried out to the text in digital number paragraph and carries out introducer Match, and when being matched to introducer, judges digital number paragraph for variant cheating field.Further, it be not matched to guidance When word, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature, further existed When score value is greater than preset threshold, judge digital number paragraph for variant cheating field.It, can be from text to be identified in the present embodiment The digital number paragraph including discontinuous digital segment is extracted in this, discontinuous number cannot be identified in the related technology by solving The problem of segment.When being not matched to introducer, by extracting variant feature from digital number paragraph, and according to variant spy Sign is given a mark to generate score value, to can also identify variant cheating according to score value when being not matched to introducer Field.Also, by way of being given a mark in conjunction with variant feature, the variant cheating field recognition strategy of regularization is realized, Improve the accuracy rate of algorithm and the accuracy rate of variant cheating field identification.
In addition, the recognition methods of variant cheating field according to the above embodiment of the present invention can also have additional skill as follows Art feature:
Optionally, described before extracting digital number paragraph in the text to be identified, further includes: to described wait know Other text carries out digital variations normalization.
Optionally, described that the text to be identified is carried out digital variations to normalize including: according to variant contact method number Digital variations normalization is carried out to the text to be identified according to library.
Optionally, the recognition methods of the variant cheating field further include:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding number The corresponding number of word picture.
It is optionally, described that digital number paragraph is extracted from the text to be identified, comprising:
Obtain the position of the spcial character in the text to be identified;
Extract the character string for meeting preset rules before and after the spcial character, and by the spcial character and the special word The digital number paragraph is added in the character string that symbol front and back meets preset rules.
Optionally, the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the number is added in the character between the spcial character and the character at the preset characters interval Font size code paragraph.
Optionally, the recognition methods of the variant cheating field further include:
Remove the interference symbol in the digital number paragraph.
Optionally, described that variant feature is extracted from the digital number paragraph, and beaten according to the variant feature Divide to generate score value and includes:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
In order to achieve the above object, second aspect of the present invention embodiment proposes a kind of identification device of variant cheating field, packet It includes:
Module is obtained, for obtaining text to be identified;
Extraction module, for extracting digital number paragraph from the text to be identified;
Matching module, for carrying out the conversion of variant word to the text in digital number paragraph and carrying out introducer matching;
First judgment module, if judging the digital number paragraph for variant cheating word for being matched to introducer Section;
Grading module, if variant feature is extracted from the digital number paragraph for not being matched to introducer, and It is given a mark according to the variant feature to generate score value;
Second judgment module judges that the digital number paragraph is if being greater than preset threshold for the score value Variant cheating field.
The identification device of the variant cheating field of the embodiment of the present invention, by obtaining text to be identified, and then to be identified Digital number paragraph is extracted in text, and the conversion of variant word further is carried out to the text in digital number paragraph and carries out introducer Matching judges that digital number paragraph is variant cheating field and when being matched to introducer, when not being matched to introducer, from Variant feature is extracted in digital number paragraph, and is given a mark according to variant feature to generate score value, further in score value When greater than preset threshold, judge digital number paragraph for variant cheating field.Solve as a result, cannot identify in the related technology it is non- Continuous number segment, without introducer can not match cognization variant cheating field the problem of, improve variant cheating field identification Accuracy rate.
In addition, the identification device of variant cheating field according to the above embodiment of the present invention can also have additional skill as follows Art feature:
Optionally, the identification device of variant cheating field further include: conversion module, for the text to be identified This progress digital variations normalization.
Optionally, the conversion module is specifically used for: according to variant contact method database to the text to be identified into The normalization of row digital variations.
Optionally, the conversion module is specifically used for:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding number The corresponding number of word picture.
Optionally, the extraction module is specifically used for:
Obtain the position of the spcial character in the text to be identified;
Extract the character string for meeting preset rules before and after the spcial character, and by the spcial character and the special word The digital number paragraph is added in the character string that symbol front and back meets preset rules.
Optionally, the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the number is added in the character between the spcial character and the character at the preset characters interval Font size code paragraph.
Optionally, the identification device of variant cheating field further include: processing module, for removing the digital number Interference symbol in code paragraph.
Optionally, institute's scoring module is specifically used for:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
In order to achieve the above object, third aspect present invention embodiment proposes a kind of computer equipment, including processor and deposit Reservoir;Wherein, the processor is held to run with described by reading the executable program code stored in the memory The corresponding program of line program code, with the recognition methods for realizing the variant cheating field as described in first aspect embodiment.
In order to achieve the above object, fourth aspect present invention embodiment proposes a kind of non-transitory computer-readable storage medium Matter is stored thereon with computer program, which is characterized in that realizes when the program is executed by processor such as first aspect embodiment institute The recognition methods for the variant cheating field stated.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of the recognition methods of variant cheating field provided by the embodiment of the present invention;
Fig. 2 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention;
Fig. 3 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of the identification device of variant cheating field provided by the embodiment of the present invention;
Fig. 5 is the structural schematic diagram of the identification device of the cheating field of another kind variant provided by the embodiment of the present invention;
Fig. 6 shows the block diagram for being suitable for the exemplary computer device for being used to realize the embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings recognition methods, device and the equipment of the variant cheating field of the embodiment of the present invention are described.
Fig. 1 is a kind of flow diagram of the recognition methods of variant cheating field, such as Fig. 1 provided by the embodiment of the present invention Shown, the recognition methods of variant cheating field includes:
Step 101, text to be identified is obtained.
In the present embodiment, in order to identify variant cheating field, need first to obtain text to be identified.
In one embodiment of the invention, can be obtained from internet article, comment or pushed information etc. as to Identify text.For example, " XX favour finishing identifies wechat small routine without installing APP to available circle of friends pushed information Application finishing is by stages." it is used as text to be identified.For another example, " meeting is one grand joyful, at heart for available comment Meet, even more the most beautiful landscape of life.← → common vetch ← → 199 ← → 2638 ← → 723 ← → " it is used as text to be identified.
Step 102, digital number paragraph is extracted from text to be identified.
In practical applications, due to generally including contact method in variant cheating field, and contact method (such as WeChat ID Code, telephone number etc.) usually it is made of number, therefore, in order to identify variant cheating field, need to extract from text to be identified Digital number paragraph.Wherein, digital number paragraph refers to the paragraph with digital number, may include number, Chinese, letter, Additional character etc..
From text to be identified extract digital number paragraph implementation can there are many, be illustrated below:
As a kind of possible implementation, the position of the spcial character in available text to be identified, and then with spy Whether the character for judging preset characters interval centered on different character forward or backward is number, if it is, by spcial character with Digital number paragraph is added in character between the character at preset characters interval.Wherein, spcial character can be individual digit word Symbol, can be continuous number character field, or the word of hit character string regular position (such as character string beginning, end etc.) Symbol etc..Preset characters interval can be obtained according to lot of experimental data, can also be by those skilled in the art's self-setting.
In the present embodiment, the digital number paragraph including continuous number segment can be extracted from text to be identified, than Such as " add wechat 12345678 ", the digital number paragraph including discontinuous digital segment, such as " 1.3- can also be extracted The micro- 8-9 letter -0 of 4.- adds this " etc..
Step 103, the conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching.
In one embodiment of the invention, variant word conversion vocabulary can be preset, and by variant word transformational relation It is stored in variant word conversion vocabulary, and then vocabulary is converted by variant word, the conversion of variant word is carried out to digital number paragraph.Example Such as, digital number paragraph is " jia dimension 136919 meets 71634 by chance ", becomes " adding micro- 136919 to meet by chance by the conversion of variant word 71634”。
Wherein, variant word conversion vocabulary can be configured according to the variant printed words notebook data collected on line, can also be by Those skilled in the art's self-setting as needed.For example, variant word conversion vocabulary may include: " sign " micro-, "+" " adds ", " v " Micro-etc..
In the present embodiment, after carrying out the conversion of variant word to digital number paragraph, need to guide word matching, with basis With result identification variant cheating field.As a kind of possible implementation, guidance vocabulary can be preset, and then according to drawing Introductory word table guides word matching to the digital number paragraph after the conversion of variant word.For example, introducer can be " public for " wechat " Many numbers " etc..
Step 104, if being matched to introducer, judge digital number paragraph for variant cheating field.
In the present embodiment, if being matched to introducer, judge digital number paragraph for variant cheating field.Such as number Number paragraph " adding wechat 12345678 " has been matched to introducer " wechat ", then judges the digital number paragraph for variant cheating word Section.
Step 105, if not being matched to introducer, variant feature is extracted from digital number paragraph, and according to variant Feature is given a mark to generate score value.
It in one embodiment of the invention, can be by correlated characteristic extraction algorithm, directly from digital number paragraph Extract variant feature.
In one embodiment of the invention, Chinese, the number etc. in digital number paragraph can also be normalized to spell Sound, and then variant feature is extracted from the digital number paragraph of alphabetizing.Such as " adding wechat 12345678 " carries out phonetic normalizing After change, become " jiaweixinyiersansiwuliuqiba ".
Wherein, variant feature can be obtained according to lot of experimental data, can also be as needed by those skilled in the art Self-setting.For example, variant feature may include special abnormality symbol accounting, number series number etc..
It, can be in several ways according to variant spy after extracting variant feature in digital number paragraph in the present embodiment Sign is given a mark to generate score value.
As a kind of possible implementation, marking formula can be preset, and then is mentioned from digital number paragraph After taking variant feature, variant feature is substituted into marking formula and is given a mark to generate score value.Wherein, marking formula can basis Lot of experimental data obtains, can also be by those skilled in the art's self-setting as needed.
As alternatively possible implementation, digital number paragraph can be chosen using on line as sample data, according to sample Notebook data is trained the parameter information of network neural model, to generate scoring model, and then from digital number paragraph After extracting variant feature, variant feature is input in scoring model and is given a mark to generate score value.
It should be noted that above-mentioned given a mark according to variant feature to generate the implementation of score value is only example Property, score value only can be generated by one way in which, score value can also be generated in conjunction with various ways, do not limited herein System.
Step 106, if score value is greater than preset threshold, judge digital number paragraph for variant cheating field.
In the present embodiment, score value is higher, illustrates that a possibility that digital number paragraph is variant cheating field is higher, instead It, score value is lower, illustrates that a possibility that digital number paragraph is variant cheating field is lower.
It is alternatively possible in turn, judge that score value is greater than preset threshold by the score value of acquisition compared with preset threshold Digital number paragraph is variant cheating field, and the digital number paragraph that score value is less than or equal to preset threshold is not variant cheating word Section.
In the present embodiment, by the fining of digital number paragraph extraction process, solves discontinuous number in the related technology The unrecognized problem of word slice section.When being not matched to introducer, by extracting variant feature from digital number paragraph, and It is given a mark according to variant feature to generate score value, so that can also be carried out according to score value when being not matched to introducer Identify variant cheating field.Also, by way of being given a mark in conjunction with variant feature, the variant cheating field of implementation rule Recognition strategy improves the accuracy rate of algorithm and the accuracy of variant cheating field identification.
In conclusion the recognition methods of the variant cheating field of the embodiment of the present invention, by obtaining text to be identified, in turn Digital number paragraph is extracted from text to be identified, and the conversion of variant word further is carried out to the text in digital number paragraph and is gone forward side by side The matching of row introducer, and when being matched to introducer, judge that digital number paragraph for variant cheating field, be not matched to guidance When word, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature, further existed When score value is greater than preset threshold, judge digital number paragraph for variant cheating field.It solves and in the related technology cannot as a result, Identify discontinuous digital segment, without introducer can not match cognization variant cheating field the problem of, improve variant cheating word The accuracy rate of section identification.
In order to more clearly explain the present invention, below for extracting from text to be identified, the progress of digital number paragraph is detailed It describes in detail bright.Fig. 2 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention, is such as schemed Shown in 2, after obtaining text to be identified, this method comprises:
Step 201, digital variations normalization is carried out to text to be identified.
In the present embodiment, from text to be identified extract digital number paragraph before, can also to text to be identified into Digital variations in text to be identified are converted to normal digital by the normalization of row digital variations, digital in the related technology to solve Variant (such as " 1- > mono- ", " 2- > bis- ", " 8- > 〥 ") can not identify, the problem that coverage rate is low.
In one embodiment of the invention, number can be carried out to text to be identified according to variant contact method database Variant normalization.
For example, variant contact method on line can periodically be collected as negative sample, by variant alphanumeric characters and just The corresponding relationship of constant word alphabetic character is stored in variant contact method database, and then text to be identified and variant are joined It is that mode database is matched, further according to matching result and corresponding relationship, by the variant numeric word in text to be identified Alphabetic character is converted to normal digital alphabetic character.
Wherein, variant alphanumeric characters include but is not limited to Chinese figure (one -> 1), RMB number (one -> 1), band Circle numberVariant letterDeng.The corresponding relationship of variant alphanumeric characters and normal digital alphabetic character can To be stored by trie-master structure, dict structure or other similar structure, herein with no restriction.
In one embodiment of the invention, the character in text to be identified can also be converted to character picture, in turn Character picture and digital picture are carried out similarity to compare to generate similarity value, it is further that similarity value is similar greater than presetting Character corresponding to the character picture of threshold value is spent, the corresponding number of corresponding digital picture is converted to.
For example, can be unicode (Unicode) coding section by text conversion to be identified, and will be in unicode coding section Character be converted to character picture, and then character picture and preset digital picture are subjected to similarity and compared, passed through perception and breathe out Uncommon algorithm (Perceptual hash algorithm, abbreviation phash) calculates the similarity value of picture, further by similarity Value is matched with default similarity threshold, is obtained similarity value and is greater than word corresponding to the character picture of default similarity threshold Symbol is converted to the corresponding number of corresponding digital picture.
Step 202, the position of the spcial character in text to be identified is obtained.
Wherein, spcial character can be individual digit character, can be continuous number character field, or hit character The character etc. of string regular position.
As a kind of possible implementation, text to be identified can be matched by the matched mode of canonical, be obtained The spcial character in text to be identified is taken, and obtains the position of spcial character.
Step 203, the character string that spcial character front and back meets preset rules is extracted, and will be before spcial character and spcial character Digital number paragraph is added in the character string for meeting preset rules afterwards.
In one embodiment of the invention, preset rules can be with are as follows: is judged forward or backward centered on spcial character Whether the character at preset characters interval is number, if it is, by the word between spcial character and the character at preset characters interval Digital number paragraph is added in Fu Jun.
For example, being divided into 5 between preset characters, text to be identified is " adding -1-2-3 ", and spcial character is " adding ", with spcial character The character for being spaced 5 characters centered on " adding " backward is " 3 ", since " 3 " are numbers, then " will add -1-2-3 " and digital number section is added It falls.
For another example being divided into 10 between preset characters, know that the character of 10 character pitches is not backward centered on spcial character Number, so judge to know the character of 9 character pitches backward as number, then it is spcial character and 9 character pitches backward is direct Digital number paragraph is added in character.
Wherein, preset characters interval can be obtained according to lot of experimental data, can also be by those skilled in the art voluntarily Setting.
Thus, it is possible to extract the digital number paragraph including continuous number segment from text to be identified, can also mention Take out the digital number paragraph including discontinuous digital segment.
In one embodiment of the invention, it after obtaining digital number paragraph, can also remove in digital number paragraph Interference symbol.Wherein, interference symbol includes but is not limited to that exception interferes symbol and additional character, such as ︻,☆,Deng.
It is alternatively possible to extract the interference symbol in digital number paragraph based on unicode coding section.That is, such as The character, then regarded as interference symbol by the unicode coding section that the certain additional characters of fruit one word hits are concentrated, and removal should Character and the calculating that additional character ratio is added.Wherein, the unicode coding section that additional character is concentrated can be (u2600- U26FF, u2700- u27BF).
The recognition methods of the variant cheating field of the embodiment of the present invention, by carrying out digital variations normalizing to text to be identified Change, realizes the identification to variant number.In turn, by obtaining the position of the spcial character in text to be identified, it is special to extract Meet the character string of preset rules before and after character, and the character for meeting preset rules before and after spcial character and spcial character is serially added Enter digital number paragraph, thus, it is possible to the digital number paragraph including continuous number segment is extracted from text to be identified, The digital number paragraph including discontinuous digital segment can be extracted, to solve discontinuous digital segment in the related technology Unrecognized problem realizes the fining of digital number paragraph extraction process.Also, it can also remove in digital number Symbol is interfered, to further increase the accuracy rate of variant cheating field identification.
Based on the above embodiment, further, variant spy can also be extracted from the normalized digital number paragraph of phonetic It seeks peace abstract characteristics, and then score value is obtained according to variant feature and abstract characteristics, identify that variant is practised fraud field according to score value.
Fig. 3 is the flow diagram of the recognition methods of the cheating field of another kind variant provided by the embodiment of the present invention, such as Shown in Fig. 3, from text to be identified extract digital number paragraph after, the variant practise fraud field recognition methods further include:
Step 301, the conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching.
In one embodiment of the invention, guidance vocabulary can be preset, and establishes introducer matching characteristic Mguide, And then word matching is guided to the digital number paragraph after the conversion of variant word according to guidance vocabulary, when being matched to introducer, Mguide=1;When not being matched to introducer, Mguide=0.
It should be noted that the explanation for carrying out the conversion of variant word to digital number paragraph in previous embodiment is equally fitted For step 301, details are not described herein again.
Step 302, digital number paragraph is subjected to phonetic normalization.
In one embodiment of the invention, database can also be preset, and Chinese, number is corresponding with phonetic Relationship stores in the database, and then according to Chinese, number and the corresponding relationship of phonetic, by digital number paragraph Chinese, Number is converted to phonetic.Such as after " adding wechat 12345678 " carries out phonetic normalization, become “jiaweixinyiersansiwuliuqiba”。
Step 303, variant feature is extracted from the normalized digital number paragraph of phonetic.
Wherein, variant feature is illustrated below:
E-distanceguide: variant introducer editing distance.Wherein, editing distance (Edit Distance, also known as Levenshtein distance) refer between two word strings, the minimum edit operation times needed for another are changed into as one.For example, It is " Weihe letter " that introducer is " wechat " before phonetic normalization, then editing distance is 1.
Spec_ratio: special abnormality symbol accounting.Calculation method: Spec_ratio=len (Spec)/len (comment), wherein len (Spec) is additional character number of characters, and len (comment) is overall character number.
Guide_pinyin: phonetic and introducer phonetically similar word number.Such as (character used in proper names and in rendering some foreign names -> jia, common vetch -> wei, the heart -> xin).
Digit_pinyin: digital phonetically similar word number.Such as (two -> er, take a walk its -> liuliuqi).
Distancegd: the distance between introducer and number series.For example, phonetic introducer " weixin " and number series it Between distance.Calculation method: Posguide-Posdigit-seq
Matchg: guide word contiguity after phonetic normalization.For example, if phonetic normalization after hit " weixin, The words such as jiawei, gongzhonghao, jiaq ", then contiguity is 1;If hitting a word, such as " wei, dian, jia, q ", Then contiguity is 0.
E-distancedigit: number series editing distance.For example, if a number series is " one 37 eight eight 77 ", this number Sequence editing distance is 3.
Seq_number: number series number.Number series Seq is defined as continuous (less than 2 Chinese characters of midfeather) Numeric word parent segment, string of the alphanumeric characters number between 5-11.
Step 304, abstract characteristics are extracted from the normalized digital number paragraph of phonetic.
In the present embodiment, digital number paragraph after phonetic can also being normalized, be further abstracted be normalized into it is abstract String, and abstract characteristics are extracted from abstract string.
As an example, normalization rule are as follows: the character of hit introducer phonetic is 2, hits digital phonetic letter Character is 1, and the character for hitting additional character is 3, remaining character is 0.It is illustrated below: if digital number paragraph is " Jia Wei ィ What is thumbed up is all very lucky Ji to Yan [4 0 woods of Ling two Ba, three wine paint] ", then being abstracted string is " 2223111111111300000000 ".
Wherein, abstract string is denoted as Seq-ab, and abstract characteristics are described in detail below:
Var-w: contact method feature variance, circular: Var (Seq-ab).
Digit-var: number dispersion, circular: Var (pos1) is abstracted 1 (representing digital alphabet) in string The variance of the position of appearance.
Guide-var: introducer dispersion, circular: Var (pos2) is abstracted 2 (representing guide word) in string The variance of the position of appearance.
Spec-var: additional character dispersion, circular: Var (pos3), i.e., 3 (represent special symbol in abstract string Number) occur position variance.
Step 305, it is given a mark according to variant feature and abstract characteristics to generate score value.
Step 306, if score value is greater than preset threshold, judge digital number paragraph for variant cheating field.
In the present embodiment, it can give a mark jointly to digital number paragraph in conjunction with marking formula and xgboost model.
In one embodiment of the invention, it can be given a mark according to variant feature and introducer matching characteristic in conjunction with formula, A possibility that in a manner of judging to be related in digital number paragraph, marking formula are exemplified below:
If score is more than or equal to 2, mode of being related in the digital number paragraph is judged.
In one embodiment of the invention, the variant cheating field conduct for having normal or variant contact method can be chosen Negative sample is chosen the normal field without contact method and is instructed as positive sample, and then by sample data to xgboost model Practice, further according to variant feature and abstract characteristics, is given a mark by xgboost model to generate score value.
Wherein, variant feature and abstract characteristics be step 303, the feature extracted in step 304, be exemplified below:
Var-w: contact method feature variance.
Digit-var: number dispersion.
Guide-var: introducer dispersion.
Spec-var: additional character dispersion.
E-distanceguide: variant introducer editing distance.
Spec_ratio: special abnormality symbol accounting.
Guide_pinyin: phonetic and introducer phonetically similar word number
Digit_pinyin: digital phonetically similar word number.
Distancegd: the distance between introducer and number series.
Matchg: guide word contiguity after phonetic normalization.
E-distancedigit: number series editing distance.
Seq_number: number series number.
In the present embodiment, number can be judged when the score value of give a mark formula and xgboost model is both greater than preset threshold Font size code paragraph is variant cheating field.
It should be noted that above-mentioned for being given a mark according to variant feature and abstract characteristics to generate the explanation of score value Explanation is only exemplary, and in another embodiment of the present invention, can also judge digital number when being matched to introducer Code paragraph is that variant field of practising fraud carries out phonetic normalization to digital number paragraph, and from normalizing when not being matched to introducer Variant feature and abstract characteristics are extracted in the digital number paragraph of change, are further passed through according to variant feature to abstract characteristics related Formula and model are given a mark, and identify variant cheating field according to score value.
The recognition methods of the variant cheating field of the embodiment of the present invention, by the way that digital number paragraph is carried out phonetic normalizing Change, and extracts variant feature and abstract characteristics from the normalized digital number paragraph of phonetic.In turn, it is matched in conjunction with introducer special Sign, variant feature and abstract characteristics, two kinds of marking modes of aggregative formula and model generate score value, identify variant according to score value Cheating field further improves the accuracy rate of variant cheating field identification.In the present embodiment, pass through two different marking sides Formula implementation rule variant recognition strategy, improve algorithm accuracy rate, also, from additional character ratio calculate, variant normalization, number Code is abstracted with guide word position distribution, and model automatically start with by the various aspects such as training, greatly improves algorithm generalization ability.
In order to realize above-described embodiment, the present invention also proposes that a kind of identification device of variant cheating field, Fig. 4 are the present invention A kind of structural schematic diagram of the identification device of variant cheating field, the field as shown in figure 4, variant is practised fraud provided by embodiment Identification device include: obtain module 100, extraction module 200, matching module 300, first judgment module 400, grading module 500, the second judgment module 600.
Wherein, module 100 is obtained, for obtaining text to be identified.
Extraction module 200, for extracting digital number paragraph from text to be identified.
Matching module 300, for carrying out the conversion of variant word to the text in digital number paragraph and carrying out introducer matching.
First judgment module 400, if judging digital number paragraph for variant cheating word for being matched to introducer Section.
Grading module 500, if variant feature is extracted from digital number paragraph for not being matched to introducer, and It is given a mark according to variant feature to generate score value.
Second judgment module 600 judges digital number paragraph for variant work if being greater than preset threshold for score value Disadvantage field.
On the basis of fig. 4, the identification device of variant cheating field shown in fig. 5 further include: conversion module 700, processing Module 800.
Wherein, conversion module 700, for carrying out digital variations normalization to text to be identified.
Further, conversion module 700 is specifically used for: being counted according to variant contact method database to text to be identified The normalization of word variant.
Further, conversion module 700 is specifically used for:
Character in text to be identified is converted into character picture;
Character picture and digital picture are carried out similarity to compare to generate similarity value;
Similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding digitized map The corresponding number of piece.
Further, extraction module 200 is specifically used for:
Obtain the position of the spcial character in text to be identified;
The character string that spcial character front and back meets preset rules is extracted, and pre- by meeting before and after spcial character and spcial character If digital number paragraph is added in the character string of rule.
Further, preset rules are as follows: judge the character at preset characters interval forward or backward centered on spcial character It whether is number;If it is, digital number section is added in the character between spcial character and the character at preset characters interval It falls.
Processing module 800, for removing the interference symbol in digital number paragraph.
Further, grading module 500 is specifically used for:
Digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to variant feature and abstract characteristics to generate score value.
It should be noted that previous embodiment is equally applicable to this to the explanation of the recognition methods of variant cheating field The identification device of the variant cheating field of embodiment, details are not described herein again.
In conclusion the identification device of the variant cheating field of the embodiment of the present invention, by obtaining text to be identified, in turn Digital number paragraph is extracted from text to be identified, and the conversion of variant word further is carried out to the text in digital number paragraph and is gone forward side by side The matching of row introducer, and when being matched to introducer, judge that digital number paragraph for variant cheating field, be not matched to guidance When word, variant feature is extracted from digital number paragraph, and give a mark to generate score value according to variant feature, further existed When score value is greater than preset threshold, judge digital number paragraph for variant cheating field.It solves and in the related technology cannot as a result, Identify discontinuous digital segment, without introducer can not match cognization variant cheating field the problem of, improve variant cheating word The accuracy rate of section identification.
In order to realize above-described embodiment, the present invention also proposes a kind of computer equipment, including processor and memory;Its In, processor runs journey corresponding with executable program code by reading the executable program code stored in memory Sequence, with the recognition methods for realizing the variant cheating field as described in aforementioned any embodiment.
In order to realize above-described embodiment, the present invention also proposes a kind of computer program product, when in computer program product Instruction realize that the variant as described in aforementioned any embodiment is practised fraud the recognition methods of field when being executed by processor.
In order to realize above-described embodiment, the present invention also proposes a kind of non-transitorycomputer readable storage medium, deposits thereon Computer program is contained, the knowledge of the variant cheating field as described in aforementioned any embodiment is realized when which is executed by processor Other method.
Fig. 6 shows the block diagram for being suitable for the exemplary computer device for being used to realize the embodiment of the present invention.The meter that Fig. 6 is shown Calculating machine equipment 12 is only an example, should not function to the embodiment of the present invention and use scope bring any restrictions.
As shown in fig. 6, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with Including but not limited to: one or more processor or processing unit 16, system storage 28 connect different system components The bus 18 of (including system storage 28 and processing unit 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture;Hereinafter referred to as: ISA) bus, microchannel architecture (Micro Channel Architecture;Below Referred to as: MAC) bus, enhanced isa bus, Video Electronics Standards Association (Video Electronics Standards Association;Hereinafter referred to as: VESA) local bus and peripheral component interconnection (Peripheral Component Interconnection;Hereinafter referred to as: PCI) bus.
Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
Memory 28 may include the computer system readable media of form of volatile memory, such as random access memory Device (Random Access Memory;Hereinafter referred to as: RAM) 30 and/or cache memory 32.Computer equipment 12 can be with It further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, Storage system 34 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 6 do not show, commonly referred to as " hard drive Device ").Although being not shown in Fig. 6, the disk for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided and driven Dynamic device, and to removable anonvolatile optical disk (such as: compact disc read-only memory (Compact Disc Read Only Memory;Hereinafter referred to as: CD-ROM), digital multi CD-ROM (Digital Video Disc Read Only Memory;Hereinafter referred to as: DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 include but is not limited to operating system, one or more application program, other program modules and It may include the realization of network environment in program data, each of these examples or certain combination.Program module 42 is usual Execute the function and/or method in embodiments described herein.
Computer equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 Deng) communication, the equipment interacted with the computer system/server 12 can be also enabled a user to one or more to be communicated, and/ Or with enable the computer system/server 12 and one or more of the other any equipment (example for being communicated of calculating equipment Such as network interface card, modem etc.) communication.This communication can be carried out by input/output (I/O) interface 22.Also, it calculates Machine equipment 12 can also pass through network adapter 20 and one or more network (such as local area network (Local Area Network;Hereinafter referred to as: LAN), wide area network (Wide Area Network;Hereinafter referred to as: WAN) and/or public network, example Such as internet) communication.As shown, network adapter 20 is communicated by bus 18 with other modules of computer equipment 12.It answers When understanding, although not shown in the drawings, other hardware and/or software module can be used in conjunction with computer equipment 12, including but not Be limited to: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and Data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and Data processing, such as realize the method referred in previous embodiment.
In the description of the present invention, it is to be understood that, term " first ", " second " are used for description purposes only, and cannot It is interpreted as indication or suggestion relative importance or implicitly indicates the quantity of indicated technical characteristic.Define as a result, " the One ", the feature of " second " can explicitly or implicitly include at least one of the features.In the description of the present invention, " multiple " It is meant that at least two, such as two, three etc., unless otherwise specifically defined.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
Although the embodiments of the present invention has been shown and described above, it is to be understood that above-described embodiment is example Property, it is not considered as limiting the invention, those skilled in the art within the scope of the invention can be to above-mentioned Embodiment is changed, modifies, replacement and variant.

Claims (18)

1. a kind of recognition methods of variant cheating field characterized by comprising
Obtain text to be identified;
Digital number paragraph is extracted from the text to be identified;
The conversion of variant word is carried out to the text in digital number paragraph and carries out introducer matching;
If being matched to introducer, judge the digital number paragraph for variant cheating field;
If not being matched to introducer, variant feature is extracted from the digital number paragraph, and according to the variant feature It gives a mark to generate score value;
If the score value is greater than preset threshold, judge the digital number paragraph for variant cheating field.
2. the recognition methods of variant cheating field as described in claim 1, which is characterized in that described from the text to be identified In this before extraction digital number paragraph, further includes:
Digital variations normalization is carried out to the text to be identified.
3. the recognition methods of variant cheating field as claimed in claim 2, which is characterized in that described to the text to be identified Carrying out digital variations normalization includes:
Digital variations normalization is carried out to the text to be identified according to variant contact method database.
4. the recognition methods of variant cheating field as claimed in claim 3, which is characterized in that further include:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding digitized map The corresponding number of piece.
5. the recognition methods of variant according to any one of claims 1-4 cheating field, which is characterized in that it is described from it is described to It identifies and extracts digital number paragraph in text, comprising:
Obtain the position of the spcial character in the text to be identified;
The character string that the spcial character front and back meets preset rules is extracted, and will be before the spcial character and the spcial character The digital number paragraph is added in the character string for meeting preset rules afterwards.
6. the recognition methods of variant cheating field as claimed in claim 5, which is characterized in that the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the digital number is added in the character between the spcial character and the character at the preset characters interval Code paragraph.
7. the recognition methods of variant cheating field as claimed in claim 5, which is characterized in that further include:
Remove the interference symbol in the digital number paragraph.
8. the recognition methods of variant cheating field as described in claim 1, which is characterized in that described from the digital number section Middle extraction variant feature is fallen, and is given a mark according to the variant feature to generate score value and include:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
9. a kind of identification device of variant cheating field characterized by comprising
Module is obtained, for obtaining text to be identified;
Extraction module, for extracting digital number paragraph from the text to be identified;
Matching module, for carrying out the conversion of variant word to the text in digital number paragraph and carrying out introducer matching;
First judgment module, if judging the digital number paragraph for variant cheating field for being matched to introducer;
Grading module, if for not being matched to introducer, the extraction variant feature from the digital number paragraph, and according to The variant feature is given a mark to generate score value;
Second judgment module judges the digital number paragraph for variant if being greater than preset threshold for the score value Cheating field.
10. the identification device of variant cheating field as claimed in claim 9, which is characterized in that further include:
Conversion module, for carrying out digital variations normalization to the text to be identified.
11. the identification device of variant cheating field as claimed in claim 10, which is characterized in that the conversion module is specifically used In:
Digital variations normalization is carried out to the text to be identified according to variant contact method database.
12. the identification device of variant cheating field as claimed in claim 10, which is characterized in that the conversion module is specifically used In:
Character in the text to be identified is converted into character picture;
The character picture and digital picture are carried out similarity to compare to generate similarity value;
The similarity value is greater than character corresponding to the character picture of default similarity threshold and is converted to corresponding digitized map The corresponding number of piece.
13. such as the identification device of the described in any item variant cheating fields of claim 9-12, which is characterized in that the extraction mould Block is specifically used for:
Obtain the position of the spcial character in the text to be identified;
The character string that the spcial character front and back meets preset rules is extracted, and will be before the spcial character and the spcial character The digital number paragraph is added in the character string for meeting preset rules afterwards.
14. the identification device of variant cheating field as claimed in claim 13, which is characterized in that the preset rules are as follows:
Whether the character for judging preset characters interval forward or backward centered on the spcial character is number;
If it is, the digital number is added in the character between the spcial character and the character at the preset characters interval Code paragraph.
15. the identification device of variant cheating field as claimed in claim 13, which is characterized in that further include:
Processing module, for removing the interference symbol in the digital number paragraph.
16. the identification device of variant cheating field as claimed in claim 9, which is characterized in that institute's scoring module is specifically used In:
The digital number paragraph is subjected to phonetic normalization;
Variant feature is extracted from the normalized digital number paragraph of phonetic;
Abstract characteristics are extracted from the normalized digital number paragraph of phonetic;
It is given a mark according to the variant feature and the abstract characteristics to generate the score value.
17. a kind of computer equipment, which is characterized in that including processor and memory;
Wherein, the processor is run by reading the executable program code stored in the memory can be performed with described The corresponding program of program code, with the identification side for realizing variant cheating field such as of any of claims 1-8 Method.
18. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the program The recognition methods such as variant of any of claims 1-8 cheating field is realized when being executed by processor.
CN201810907161.7A 2018-08-10 2018-08-10 Method, device and equipment for identifying variant cheating fields Active CN109241523B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810907161.7A CN109241523B (en) 2018-08-10 2018-08-10 Method, device and equipment for identifying variant cheating fields

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810907161.7A CN109241523B (en) 2018-08-10 2018-08-10 Method, device and equipment for identifying variant cheating fields

Publications (2)

Publication Number Publication Date
CN109241523A true CN109241523A (en) 2019-01-18
CN109241523B CN109241523B (en) 2020-12-11

Family

ID=65070547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810907161.7A Active CN109241523B (en) 2018-08-10 2018-08-10 Method, device and equipment for identifying variant cheating fields

Country Status (1)

Country Link
CN (1) CN109241523B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110085224A (en) * 2019-04-10 2019-08-02 深圳康佳电子科技有限公司 Intelligent terminal whole process speech control processing method, intelligent terminal and storage medium
CN110298020A (en) * 2019-05-30 2019-10-01 北京百度网讯科技有限公司 Anti- variant restoring method and the anti-cheat method of equipment, text and the equipment of practising fraud of text
CN110717328A (en) * 2019-07-04 2020-01-21 北京达佳互联信息技术有限公司 Text recognition method and device, electronic equipment and storage medium
CN112201225A (en) * 2020-09-30 2021-01-08 北京大米科技有限公司 Corpus obtaining method and device, readable storage medium and electronic equipment
CN112784592A (en) * 2019-11-11 2021-05-11 四川睿象科技有限公司 Method for extracting effective alarm data based on natural language features
CN113282746A (en) * 2020-08-08 2021-08-20 西北工业大学 Novel network media platform variant comment confrontation text generation method
CN113408270A (en) * 2021-06-10 2021-09-17 广州三七极创网络科技有限公司 Variant text recognition method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729520A (en) * 2008-10-28 2010-06-09 北京大学 Method and device for detecting sensitive information
CN102184188A (en) * 2011-04-15 2011-09-14 百度在线网络技术(北京)有限公司 Method and equipment for determining sensitivity of target text
CN102591854A (en) * 2012-01-10 2012-07-18 凤凰在线(北京)信息技术有限公司 Advertisement filtering system and advertisement filtering method specific to text characteristics
CN103064850A (en) * 2011-10-20 2013-04-24 腾讯科技(深圳)有限公司 Method and system of digging cheating data
CN103514174A (en) * 2012-06-18 2014-01-15 北京百度网讯科技有限公司 Text categorization method and device
CN104050556A (en) * 2014-05-27 2014-09-17 哈尔滨理工大学 Feature selection method and detection method of junk mails
CN106407324A (en) * 2016-08-31 2017-02-15 北京城市网邻信息技术有限公司 Method and device for recognizing contact information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729520A (en) * 2008-10-28 2010-06-09 北京大学 Method and device for detecting sensitive information
CN102184188A (en) * 2011-04-15 2011-09-14 百度在线网络技术(北京)有限公司 Method and equipment for determining sensitivity of target text
CN103064850A (en) * 2011-10-20 2013-04-24 腾讯科技(深圳)有限公司 Method and system of digging cheating data
CN102591854A (en) * 2012-01-10 2012-07-18 凤凰在线(北京)信息技术有限公司 Advertisement filtering system and advertisement filtering method specific to text characteristics
CN103514174A (en) * 2012-06-18 2014-01-15 北京百度网讯科技有限公司 Text categorization method and device
CN104050556A (en) * 2014-05-27 2014-09-17 哈尔滨理工大学 Feature selection method and detection method of junk mails
CN106407324A (en) * 2016-08-31 2017-02-15 北京城市网邻信息技术有限公司 Method and device for recognizing contact information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汪霞等: "基于中文变形词匹配的贝叶斯邮件过滤模型", 《计算机应用与软件》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110085224A (en) * 2019-04-10 2019-08-02 深圳康佳电子科技有限公司 Intelligent terminal whole process speech control processing method, intelligent terminal and storage medium
CN110085224B (en) * 2019-04-10 2021-06-01 深圳康佳电子科技有限公司 Intelligent terminal whole-course voice control processing method, intelligent terminal and storage medium
CN110298020A (en) * 2019-05-30 2019-10-01 北京百度网讯科技有限公司 Anti- variant restoring method and the anti-cheat method of equipment, text and the equipment of practising fraud of text
CN110298020B (en) * 2019-05-30 2023-05-16 北京百度网讯科技有限公司 Text anti-cheating variant reduction method and equipment, and text anti-cheating method and equipment
CN110717328A (en) * 2019-07-04 2020-01-21 北京达佳互联信息技术有限公司 Text recognition method and device, electronic equipment and storage medium
CN112784592A (en) * 2019-11-11 2021-05-11 四川睿象科技有限公司 Method for extracting effective alarm data based on natural language features
CN113282746A (en) * 2020-08-08 2021-08-20 西北工业大学 Novel network media platform variant comment confrontation text generation method
CN113282746B (en) * 2020-08-08 2023-05-23 西北工业大学 Method for generating variant comment countermeasure text of network media platform
CN112201225A (en) * 2020-09-30 2021-01-08 北京大米科技有限公司 Corpus obtaining method and device, readable storage medium and electronic equipment
CN112201225B (en) * 2020-09-30 2024-02-02 北京大米科技有限公司 Corpus acquisition method and device, readable storage medium and electronic equipment
CN113408270A (en) * 2021-06-10 2021-09-17 广州三七极创网络科技有限公司 Variant text recognition method and device and electronic equipment
CN113408270B (en) * 2021-06-10 2023-02-10 广州三七极创网络科技有限公司 Variant text recognition method and device and electronic equipment

Also Published As

Publication number Publication date
CN109241523B (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN109241523A (en) Recognition methods, device and the equipment of variant cheating field
CN108984530B (en) Detection method and detection system for network sensitive content
CN110909548B (en) Chinese named entity recognition method, device and computer readable storage medium
US8380488B1 (en) Identifying a property of a document
CN104008091B (en) A kind of network text sentiment analysis method based on emotion value
Ionescu et al. Can characters reveal your native language? A language-independent approach to native language identification
CN106815197A (en) The determination method and apparatus of text similarity
CN110516247A (en) Name entity recognition method neural network based and computer storage medium
CN104239490B (en) Multi-account detection method and device for UGC (user generated content) website platform
CN112100384B (en) Data viewpoint extraction method, device, equipment and storage medium
Das et al. An algorithm for Japanese character recognition
CN112686026B (en) Keyword extraction method, device, equipment and medium based on information entropy
CN113901170A (en) Event extraction method and system combining Bert model and template matching and electronic equipment
CN110020005A (en) Symptom matching process in main suit and present illness history in a kind of case history
CN108170806A (en) Sensitive word detection filter method, device and computer equipment
Bedrick et al. Robust kaomoji detection in Twitter
CN110222331A (en) Lie recognition methods and device, storage medium, computer equipment
Mathew et al. Asking questions on handwritten document collections
CN110119702B (en) Facial expression recognition method based on deep learning prior
Aragón et al. A straightforward multimodal approach for author profiling
CN113850643A (en) Product recommendation method and device, electronic equipment and readable storage medium
CN102929863A (en) Method for intelligently analyzing Chinese character emotional tendency through computer
CN107861941B (en) User nickname authenticity evaluation method, storage medium, electronic device and system
CN113887202A (en) Text error correction method and device, computer equipment and storage medium
Ji Cross-lingual predicate cluster acquisition to improve bilingual event extraction by inductive learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant