CN104391860B - content type detection method and device - Google Patents
content type detection method and device Download PDFInfo
- Publication number
- CN104391860B CN104391860B CN201410569492.6A CN201410569492A CN104391860B CN 104391860 B CN104391860 B CN 104391860B CN 201410569492 A CN201410569492 A CN 201410569492A CN 104391860 B CN104391860 B CN 104391860B
- Authority
- CN
- China
- Prior art keywords
- grader
- content
- classification
- testing result
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 152
- 238000000605 extraction Methods 0.000 claims abstract description 62
- 238000000034 method Methods 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims description 125
- 238000012549 training Methods 0.000 claims description 16
- 230000000052 comparative effect Effects 0.000 claims description 7
- 238000003066 decision tree Methods 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 238000011002 quantification Methods 0.000 claims 2
- 239000000463 material Substances 0.000 description 5
- 238000007689 inspection Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000036630 mental development Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a content type detection method and a content type detection device. The method comprises the following steps: extracting the characteristics of the content to be detected; performing category detection on the content to be detected by adopting at least two classifiers matched with the content to be detected according to a feature extraction result; and determining a final class detection result corresponding to the content to be detected according to the class detection results obtained by the at least two classifiers. The technical scheme provided by the embodiment of the invention can automatically detect the type of the acquired content, shorten the detection time and reduce the detection cost.
Description
Technical field
The present embodiments relate to Classification and Identification technical field, more particularly to a kind of content type detection method and device.
Background technology
With the development of Internet technology, the information on internet is all the time all with the swift and violent increasing of exponential speed
Add, people obtain and use information mode is also more and more various and facilitation.But internet is brought in the life to people
While convenient, the also life to people brings many negative effects.For example the number of site on internet is in profit
, can be by some unsound content displayings to user, so as to have a strong impact on that user's browses body with the purpose for improving clicking rate
Test, for teenager, these contents can produce material impact to its physical and mental development.
At present, the discriminating majority to web site contents (such as Pornograph) is based on artificial judgement, although this method
Accurately, but efficiency is low, and needs to expend substantial amounts of man power and material, can not tackle what is increasingly spread unchecked in current site at all
Harmful content.
The content of the invention
The embodiment of the present invention provides a kind of content type detection method and device, can enter to the classification of acquired content
Row automatic detection, shorten detection time, reduce testing cost.
In a first aspect, the embodiments of the invention provide a kind of content type detection method, this method includes:
Treat detection content and carry out feature extraction;
According to feature extraction result, using at least two graders being adapted with the content to be detected, treated to described
Detection content carries out classification detection;
The classification testing result obtained according at least two grader, it is determined that corresponding to the content to be detected most
Whole classification testing result.
Second aspect, the embodiment of the present invention additionally provide a kind of content type detection means, and the device includes:
Content Feature Extraction unit, feature extraction is carried out for treating detection content;
Content type detection unit, for according to feature extraction result, using be adapted with the content to be detected to
Few two kinds of graders, classification detection is carried out to the content to be detected;
Content detection result determining unit, for the classification testing result obtained according at least two grader, really
Surely the final classification testing result of the content to be detected is corresponded to.
Technical scheme provided in an embodiment of the present invention, the feature that detection content is treated using grader are detected, and are realized
The automatic identification of detection content generic is treated, spent manpower and thing can be substantially reduced compared to artificial detection
Power, shorten detection time, reduce testing cost;Also, the classification testing result based on Various Classifiers on Regional is treated to determine to correspond to
It the final classification testing result of detection content, can effectively ensure the correctness of classification testing result, improve accuracy of detection.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention one provides;
Fig. 2 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention two provides;
Fig. 3 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural representation for content type detection means that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural representation for content type detection means that the embodiment of the present invention five provides;
Fig. 6 is a kind of schematic flow sheet for preferable content type detection method that the embodiment of the present invention six provides.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just
Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one:
Fig. 1 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention one provides, and the present embodiment can
The situation of classification detection is carried out suitable for treating detection content, this method can be performed by classification detection means, described device
Realized by software and/or hardware.Referring to Fig. 1, the content type detection method that the present embodiment provides specifically includes following operation:
Operation 110, treat detection content progress feature extraction.
In the present embodiment, content to be detected can be stored in advance in local, or be obtained in real time from other equipment
Obtained text and/or the content of picture format.For example, content to be detected is the HTML to being obtained from the server in internet
(HyperText Mark-up Language, HTML) page, parsed to obtain include text and/or
The web page contents of picture format.
For the content of text formatting, the texts such as card side, document frequency, information gain, mutual information, cross entropy can be based on
Eigen extraction algorithm, feature extraction is carried out to it;For the content of picture format, mesh can be carried out to image content first
The identification of thing is marked, establishes the characteristic vector of the image content according to recognition result afterwards.Wherein, the characteristic vector may include mesh
Mark the area of thing, number, position and account for the elements such as whole picture region ratio.
Operate 120, according to feature extraction result, using at least two graders being adapted with content to be detected, treat
Detection content carries out classification detection.
In the present embodiment, at least two graders being adapted with content to be detected, every kind of grader are pre-created
The detection for treating detection content generic can independently be realized.Specifically, every kind of grader can realize that treating detection content enters
The detection of at least one classification of row, such as it is to belong to target classification to detect the classification of the content to be detected, is still not belonging to target
Classification, or which kind of target classification that the classification of the content to be detected belongs in plurality of target classification detected.
The establishment process of various graders can be specially:Great amount of samples in sample library storage is trained;According to instruction
Practice result and obtain the disaggregated model for belonging to this grader.A part of the disaggregated model as grader, it is inputted and output is
The input and output of its corresponding grader.Wherein, the great amount of samples stored in Sample Storehouse need to belong to target classification including its classification
One group of sample and its classification be not belonging to the other another group of sample of target class;Training to sample includes putting forward sample progress feature
Take, this feature extraction algorithm should be consistent with the above-mentioned feature extraction algorithm for treating detection content.
, can be using its feature extraction result as described at least two after treating detection content and carrying out feature extraction
The input of the disaggregated model of grader, corresponded to being handled respectively feature extraction result using each disaggregated model to generate
The classification testing result of content to be detected, and classification testing result is exported.
In embodiments of the present invention, at least two graders being adapted with content to be detected may include in following grader
At least two:SVMs (SVM, Support Vector Machine) grader, naive Bayesian (Bayes) classification
Device, k nearest neighbor distance (KNN, k-NearestNeighbor) grader, decision tree (ID3, Iterative Dichotomiser
3) grader and logistic regression (Logistic) grader.
Operation 130, the classification testing result obtained according at least two graders, it is determined that corresponding to content to be detected most
Whole classification testing result.
After the classification for treating detection content respectively using grader not of the same race is detected, setting rule can be based on, it is right
Resulting classification testing result is handled, to determine the final classification testing result corresponding to content to be detected.Specifically
Processing procedure can be:Each number with identical category testing result in all categories testing result obtained by statistics;
Corresponding identical category testing result detects as the final classification corresponding to content to be detected in the case of using number maximum
As a result.Detected for example, the classification that 5 kinds of graders treat detection content has been respectively adopted, its testing result is followed successively by:It is to be checked
Survey content to belong to target classification, be not belonging to target classification, belong to target classification, be not belonging to target classification, belong to target classification, then
Resulting statistical result is in processing procedure:Testing result is that to belong to the other number of target class be 3 to content to be detected, detection knot
Fruit is that to be not belonging to the other number of target class be 2 to content to be detected, therefore corresponding to the final classification testing result of content to be detected
For:Target detection content belongs to target classification.
Certainly, its processing procedure can also be other modes, and the present embodiment is not construed as limiting to this.For example, it can be directed in advance
Different classification testing results assigns different values, such as it is that content to be detected belongs to the 1st target to assign classification testing result
The value of classification is 1, and classification testing result is that content to be detected belongs to the 2nd other value of target class for 2, and classification testing result was both
The 1st target classification is not belonging to, is also not belonging to the 2nd other value of target class as 0;Then, by corresponding to all categories testing result
Value is weighted to obtain a new value, and then determines to examine corresponding to the final classification of content to be detected according to new value
Survey result.Wherein, the weight corresponding to the value of any classification testing result, it is to obtain category testing result in advance that can be
The weighted value that corresponding grader assigns.
The technical scheme that the present embodiment provides, the feature that detection content is treated using grader detected, and is realized pair
The automatic identification of content generic to be detected, man power and material can be substantially reduced compared to artificial detection, when shortening detection
Between, reduce testing cost;Also, the classification testing result based on Various Classifiers on Regional is final corresponding to content to be detected to determine
It classification testing result, can effectively ensure the correctness of classification testing result, improve accuracy of detection.
Embodiment two:
Fig. 2 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention two provides, and the present embodiment exists
On the basis of above-described embodiment one, add the operation for obtaining content to be detected, and based on the operation aforesaid operations 110 are made into
One-step optimization.Referring to Fig. 2, the content type detection method that the present embodiment provides specifically includes following operation:
Operate 210, web page contents are obtained according to URL, as content to be detected;
If including content of text in operation 220, web page contents, content of text is entered based on Text character extraction algorithm
Row feature extraction, and the characteristic set by feature extraction result added to web page contents;
If including image content in operation 230, web page contents, target signature identification is carried out to image content, according to
Target signature recognition result establishes the characteristic vector of image content, added to the characteristic set of web page contents;
Operation 240, the characteristic set according to web page contents, using at least two graders being adapted with web page contents,
Classification detection is carried out to web page contents;
Operation 250, the classification testing result obtained according at least two graders, it is determined that corresponding to the final of web page contents
Classification testing result.
In the present embodiment, it can send resource based on the URL prestored to corresponding server and obtain
Request, the html page that the reception server returns according to the request are taken, and html page is parsed, is wherein wrapped with extraction
The content of text and image content contained, as accessed web page contents, namely content to be detected.
Text character extraction algorithm can be card side, document frequency, information gain, mutual information or cross entropy etc.;Target
The grader that feature is adapted to the content to be detected classification to be detected is associated, such as whether grader will detect web page contents
In the case of belonging to yellow harmful content classification, target signature can be features of skin colors.
If web page contents include content of text and image content simultaneously, can by the feature extraction result of content of text and
The characteristic vector of image content is in the lump as the feature extraction result for treating detection content, to carry out the detection of following categories.When
So, it is determined to save the cost of classification detection and spent time, the main contents that also can first treat detection content, with
Judge that it is content of text, or image content, the feature extraction result only using identified main contents is as treating afterwards
The feature extraction result of detection content, to carry out the detection of following categories.
It is for detecting whether web page contents belong to the specific of yellow harmful content this classification for category detection method
Application scenarios, in a kind of embodiment of the present embodiment, Text character extraction algorithm is preferably the side's of card algorithm;To picture
Content carries out target signature identification, and the characteristic vector of image content is established according to target signature recognition result, including:
Face Detection is carried out to image content using statistic histogram model;
Establish the characteristic vector of image content according to Face Detection result, wherein characteristic vector be by following element extremely
A few vector formed:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for colour of skin boundary rectangle
Ratio, maximum colour of skin connected region accounts for the ratio of whole picture region, maximum colour of skin connected region accounts for colour of skin boundary rectangle
Ratio and center picture region colour of skin ratio.
In this embodiment, Face Detection is carried out to image content, can identify the skin included in picture
Color area information, the information may include the number, size, location and shape of area of skin color, can determine accordingly in above-mentioned vector
Either element.Wherein, center picture region colour of skin ratio refers to:The colour of skin area included in the setting central area of picture
Domain accounts for the ratio of the central area.
The technical scheme of the present embodiment, the feature of web page contents is detected using grader not of the same race, realized pair
The automatic identification of web page contents generic, particularly, it automatically can detect to belong to yellow from substantial amounts of web page contents
The content of bad classification.Compared to artificial detection, the present embodiment can substantially reduce the man power and material spent by it, shorten inspection
The time is surveyed, reduces testing cost.
Embodiment three:
Fig. 3 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention three provides, and the present embodiment exists
On the basis of the various embodiments described above, to " the classification testing result obtained according at least two graders, it is determined that corresponding in webpage
Further optimization is made in the operation of the final classification testing result held ", and accordingly adds Optimum Classification device and its ballot weight
Operation.Referring to Fig. 3, the content type detection method that the present embodiment provides specifically includes following operation:
Operation 310, treat detection content progress feature extraction;
Operate 320, according to feature extraction result, using at least two graders being adapted with content to be detected, treat
Detection content carries out classification detection;
Operation 330, the result of calculation according to equation below, it is determined that detecting knot corresponding to the final classification of content to be detected
Fruit:
Wherein, i is integer;N is the total number of at least two graders;miFor i-th of classification at least two graders
The classification testing result of device, value are that 1 or 0,0 classification for representing content to be detected represents content to be detected as non-targeted classification, 1
Classification be target classification;wiFor the ballot weight of i-th kind of grader;σ is given threshold;R=1 represents content to be detected most
Whole classification testing result is target classification, and r=0 represents that the final classification testing result of content to be detected is not target classification.
In the present embodiment, the initial value of the ballot weight of various graders can be pre-arranged, all graders
Ballot weight sum is 1.For example, the initial value of each ballot weight can be set equal, also can be according to grader not of the same race
The size of accuracy of detection, to set ballot weight, specifically, for the bigger grader of accuracy of detection, for the ballot of its imparting
Weight is then bigger.
The technical scheme that the present embodiment provides, ballot is carried out by the classification testing result for obtaining Various Classifiers on Regional and added
Power, to determine the final classification testing result corresponding to content to be detected, it is possible to increase treat the inspection of detection content generic
Survey precision so that classification testing result is more nearly the classification belonging to content reality to be detected.
In view of the grader being pre-designed, due to the finiteness of sample stored in its Sample Storehouse, can not ensure
Resulting classification testing result is necessarily correct.Therefore, on the basis of above-mentioned technical proposal, can be to grader and its ballot side
Formula makees further optimization, to improve the accuracy of final classification testing result.
Specifically, in the classification testing result obtained according at least two graders, it is determined that corresponding to content to be detected
After final classification testing result, in addition to:
The final classification testing result corresponding to content to be detected that will be obtained, the classification obtained with least two graders
Testing result is compared, and correct classification testing result whether is generated with the grader judged at least two graders,
And compared result is stored;Specifically, can be by final classification testing result, the classification obtained respectively with various graders is examined
Survey result to be compared, to judge whether various graders generate correct classification testing result;
Every the period 1 of setting, the classification at least two graders is calculated according to the comparative result stored
The recall rate of i-th kind of grader is in the recall rate of device, wherein at least two kinds of graders:The i-th kind point within the current period 1
The number of correct classification testing result caused by class device is examined with all categories caused by i-th kind of grader in the current period 1
Survey the ratio of the number of result.
For example, in nearest seven days, certain grader has carried out the classification detection operation of 50 times altogether, and according to being stored
Comparative result, it is known that the grader has in the classification detection operation of described 50 times and generates correct classification detection for 30 times
As a result, 20 times in addition then generate mistake classification testing result, therefore the grader in nearest seven days obtained by detection
Rate is:30/50=0.6.
In embodiments of the present invention, on the one hand, can the recall rate based on each grader, to update the franchise of each grader
Weight.Specifically, after the recall rate for the grader being calculated at least two graders, may also include:According to such as
Lower formula updates the ballot weight of the grader at least two graders:
Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' classify for i-th kind after this renewal
The ballot weight of device.
In embodiments of the present invention, on the other hand, can also the recall rate based on each grader, it is pre-designed to eliminate
Corresponding grader at least two graders.Specifically, category detection method provided in an embodiment of the present invention can also be further
Including:
The grader that recall rate at least two graders is respectively less than into superseded threshold value within continuous N number of period 1 enters
Row removes, and to redefine the grader being adapted with content to be detected, wherein N is the integer more than 1.
Certainly or by other means, the corresponding classification at least two pre-designed graders is eliminated
Device.For example, calculate the average recall rate of various graders in continuous N number of period 1;If minimum average detected rate is less than
Threshold value is eliminated, then is eliminated its corresponding grader.
It should be noted that after some grader is eliminated, the various graders remained need to be redefined
Ballot weight, to ensure its ballot weight sum as 1.Specifically, the ballot weight of the various graders newly determined can be
The equal value automatically generated or the recall rate based on each grader remained, each classification redefined
The ballot weight of device, the determination process can be found in the process of the ballot weight of each grader of above-mentioned renewal, will not be repeated here.
On the basis of above-mentioned technical proposal, the grader at least two graders includes the sample for being stored with initial sample
This storehouse, and the disaggregated model for being used to treat detection content and carrying out classification detection for being trained to obtain to Sample Storehouse;
In the final classification testing result corresponding to content to be detected that will be obtained, the class obtained with least two graders
After other testing result is compared, in addition to:If the grader at least two graders generates the classification inspection of mistake
Result is surveyed, then generates the sample of the grader of the classification testing result of mistake using content to be detected as feedback samples, addition
In storehouse;
Every the second round of setting, training once generates the classification testing result of mistake within current second round
Grader Sample Storehouse, the disaggregated model of the grader of the classification testing result of mistake is generated according to this training result amendment,
It is updated with the grader of the classification testing result to generating mistake.
Preferably, second round can be seven days.
The embodiment of the present invention can overcome grader few in sample size by the self-correction to Sample Storehouse and ballot weight
In the case of treat detection content classification detection it is inaccurate the problem of, so as to improve accuracy of detection.
Example IV:
Fig. 4 show a kind of structural representation of content type detection means of the offer of the embodiment of the present invention four, this implementation
Example is applicable to treat the situation that detection content carries out classification detection.Referring to Fig. 4, the concrete structure of the content type detection means
It is as follows:
Content Feature Extraction unit 410, feature extraction is carried out for treating detection content;
Content type detection unit 420, for according to feature extraction result, using what is be adapted with the content to be detected
At least two graders, classification detection is carried out to the content to be detected;
Content detection result determining unit 430, for the classification testing result obtained according at least two grader,
It is determined that the final classification testing result corresponding to the content to be detected.
In a kind of preferred embodiment of the present embodiment, described device also includes:
Contents acquiring unit 400, the Content Feature Extraction unit 410 treat detection content carry out feature extraction it
Before, for obtaining web page contents according to URL, as content to be detected;
The Content Feature Extraction unit 410, including:
Text character extraction subelement 4101, if for including content of text in the web page contents, based on text
Feature extraction algorithm carries out feature extraction to the content of text, and feature extraction result is added to the feature set of web page contents
Close;
Picture feature extracts subelement 4102, if for including image content in the web page contents, to the figure
Piece content carries out target signature identification, and the characteristic vector of the image content is established according to target signature recognition result, is added to
The characteristic set of the web page contents.
Further, the Text character extraction algorithm is card side's algorithm;
The picture feature extracts subelement 4102, is specifically used for:
Face Detection is carried out to the image content using statistic histogram model;
The characteristic vector of the image content is established according to Face Detection result, wherein the characteristic vector is by following member
At least one formed vector in element:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for colour of skin boundary rectangle
Ratio, maximum colour of skin connected region accounts for the ratio of whole picture region, maximum colour of skin connected region accounts for colour of skin boundary rectangle
Ratio and center picture region colour of skin ratio.
In embodiments of the present invention, at least two grader includes at least two in following grader:
Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and patrol
Collect recurrence grader.
The said goods can perform the method that the embodiment of the present invention one and embodiment two are provided, and it is corresponding to possess execution method
Functional module and beneficial effect.Not ins and outs of detailed description in the present embodiment, refer to embodiment one and embodiment two.
Embodiment five:
Fig. 5 is a kind of structural representation for content type detection means that the embodiment of the present invention five provides, and the present embodiment exists
On the basis of above-described embodiment four, further optimization is made to the structure of content testing result determining unit, and accordingly add
The corresponding units of Optimum Classification device and its ballot weight.Referring to Fig. 5, the present embodiment provide content type detection means it is specific
Structure is as follows:
Content Feature Extraction unit 510, feature extraction is carried out for treating detection content;
Content type detection unit 520, for according to feature extraction result, using what is be adapted with the content to be detected
At least two graders, classification detection is carried out to the content to be detected;
Content detection result determining unit 530, for the classification testing result obtained according at least two grader,
It is determined that the final classification testing result corresponding to the content to be detected.
Further, the content detection result determining unit 530, is specifically used for:
According to the result of calculation of equation below, it is determined that the final classification testing result corresponding to the content to be detected:
Wherein, i is integer;N is the total number of at least two grader;miFor at least two grader
The classification testing result of i grader, value are 1 or 0,0 to represent the classification of the content to be detected as non-targeted classification, 1 generation
The classification of content to be detected described in table is target classification;wiFor the ballot weight of i-th kind of grader;σ is given threshold;r
The final classification testing result of=1 expression content to be detected is the target classification, and r=0 represents the content to be detected
Final classification testing result be not the target classification.
Further, on the basis of above-mentioned technical proposal, described device also includes:
Grader detection result judging unit 540, for the content detection result determining unit 530 according to it is described extremely
The classification testing result that few two kinds of graders obtain, it is determined that corresponding to the content to be detected final classification testing result it
Afterwards, the final classification testing result corresponding to the content to be detected that will be obtained, is obtained with least two grader
Classification testing result is compared, and is examined with judging whether the grader at least two grader generates correct classification
Result is surveyed, and compared result is stored;
Grader recall rate computing unit 550, for the period 1 every setting, result is detected according to the grader
The comparative result that judging unit 540 stores calculates the recall rate of the grader in once at least two grader, wherein institute
The recall rate for stating i-th kind of grader at least two graders is:It is correct caused by i-th kind of grader within the current period 1
Classification testing result number and current period 1 in the number of all categories testing result caused by i-th kind of grader
Ratio.
Further, described device also includes:
Grader ballot weight updating block 560, for being calculated one in the grader recall rate computing unit 550
After the recall rate of grader in secondary at least two grader, according to once described at least two points of equation below renewal
The ballot weight of grader in class device:
Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' classify for i-th kind after this renewal
The ballot weight of device.
Further, described device also includes:
Grader eliminates unit 570, for by the recall rate at least two grader in continuous N number of period 1
The interior grader for being respectively less than superseded threshold value is removed, to redefine the grader being adapted with the content to be detected, its
Described in N be integer more than 1.
On the basis of above-mentioned technical proposal, the grader at least two grader includes being stored with initial sample
Sample Storehouse, and the classification for being used to carry out the content to be detected classification detection for being trained to obtain to the Sample Storehouse
Model;
Described device also includes:
Feedback samples adding device 580, for corresponding in grader detection result judging unit 540 by what is obtained
The final classification testing result of the content to be detected, the classification testing result obtained with least two grader are compared
, will be described to be checked if the grader at least two grader generates the classification testing result of mistake after relatively
Content is surveyed as feedback samples, in the Sample Storehouse for the grader for adding the classification testing result for generating mistake;
Grader amending unit 590, for the second round every setting, training once produces within current second round
The grader Sample Storehouse of the classification testing result of mistake, generate according to this training result amendment the classification inspection of mistake
The disaggregated model of the grader of result is surveyed, is updated with the grader to the classification testing result for generating mistake.
The said goods can perform the method that the embodiment of the present invention one, embodiment two and embodiment three are provided, and possess execution
The corresponding functional module of method and beneficial effect.Not ins and outs of detailed description in the present embodiment, refer to embodiment one,
Embodiment two and embodiment three.
Embodiment six:
Fig. 6 is a kind of schematic flow sheet for preferable content type detection method that the embodiment of the present invention six provides.This reality
Applying example can be based on above-mentioned all embodiment, there is provided a kind of preferred embodiment.The content provided referring to Fig. 6, the present embodiment
Category detection method specifically includes following operation:
The to the effect that content of text or image content that operation 610, detection web page contents are included.
Operation 620, the main contents included to web page contents carry out feature extraction.
Operate 630, according to feature extraction result, SVM, Bayes, KNN, the ID3 being adapted with web page contents is respectively adopted
And Logistic graders, classification detection is carried out to web page contents.
Wherein, various graders include the Sample Storehouse for being stored with initial sample, and the Sample Storehouse is trained
What is obtained is used to carry out web page contents the disaggregated model of classification detection.
For SVM classifier, it the advantage is that:Inputted by the way that classification input is changed into numerical value, support can be made
Vector supports grouped data and numeric data simultaneously, is adapted to large-scale data.
For Bayes graders, it the advantage is that:Possessed when the acceptable training of great amount of samples data and inquiry
At high speed, incremental training is supported;Explanation to grader actual learning is relatively easy.
For KNN graders, it the advantage is that:Numerical prediction can be carried out using complicated function, while kept again
The characteristics of being easily understood;Rational data zooming amount;New samples data can be added at any time, without re-starting instruction
Practice.
For ID3 graders, it the advantage is that:It is easily explained a model of undergoing training, and the grader is set
Mostly important factor of judgment has all been arranged in the root position close to tree by calculating method well;Can simultaneously treatment classification number
According to and numeric data;It is easily handled influencing each other between variable;It is adapted to small-scale data.
For Logistic graders, it the advantage is that:Analysis linear relationship is good at, to integrally-built point of data
Analysis is better than decision tree.
Operate 640, the classification testing result that each grader obtains is weighted ballot.
Operation 650, the final classification testing result for corresponding to web page contents is obtained according to Nearest Neighbor with Weighted Voting result.
Operation 660, the recall rate to various graders are monitored.
Specifically, the final classification testing result that will be obtained respectively, the classification testing result obtained with various graders are entered
Row compares, and whether generates correct classification testing result with the grader judged in various graders, and compared result is entered
Row storage;
Every seven days, according to the comparative result stored, the recall rate of various graders is calculated once.
Operation 670, the recall rate according to the various graders monitored, update the ballot weight of various graders.
Operation 680, the Sample Storehouse to grader are updated.
If specifically, any of above-mentioned five kinds of graders grader generate mistake classification testing result,
The Sample Storehouse of the grader of the classification testing result of mistake is generated using its corresponding web page contents as feedback samples, addition
In.Certainly, only the URL of web page contents can also be added in the Sample Storehouse of grader for the classification testing result for generating mistake.
Before being subsequently trained to the Sample Storehouse, html page is just obtained from server based on the URL added in real time, and parse life
Into web page contents, as the feedback samples in Sample Storehouse.
Operation 690, superseded and training management is carried out to grader.
Specifically, grader is eliminated, including:
The grader that recall rate in above-mentioned five kinds of graders was respectively less than into superseded threshold value in continuous one month is moved
Remove, to redefine the grader being adapted with web page contents.
Being trained management to grader includes:
Every seven days, training once when generated in the first seven day mistake classification testing result grader Sample Storehouse,
The disaggregated model of the grader of the classification testing result of mistake is generated according to this training result amendment, with to generating mistake
The grader of classification testing result be updated, so can regularly correct the classification deviation of grader, and then improve point
Class device carries out the degree of accuracy of classification detection.
The technical scheme beneficial effect specific as follows that the present embodiment provides:
First, the detection to web site contents classification can be realized automatically, is used manpower and material resources sparingly, and detection efficiency is high;
Second, the Algorithm for Training of a variety of machine learning is produced into the class of a prediction into Various Classifiers on Regional, every kind of grader
Other testing result, multiple prediction result Nearest Neighbor with Weighted Voting are produced into a final result, so that the classification to web site contents
Detect that its correctness is relatively reliable, has foundation;
3rd, by the way that to the self-correction of sample and ballot weight in each grader, general machine learning model can be overcome
In the case where sample size is few, classification detects the problem of inaccurate;
4th, the correctness for the classification testing result that grader is predicted every time is counted, and periodically examines accordingly
The accuracy rate of grader is examined, is eliminated for accuracy rate grader on the lower side always, can to examine web site contents classification
The degree of accuracy of survey is higher.
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes,
Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.
Claims (14)
- A kind of 1. content type detection method, it is characterised in that including:Treat detection content and carry out feature extraction;According to feature extraction result, using at least two graders being adapted with the content to be detected, to described to be detected Content carries out classification detection;The classification testing result that each grader is obtained carries out quantification treatment, and determines the ballot weight of each grader;Its In, it is bigger for the bigger grader of accuracy of detection, corresponding ballot weight;Calculate the quantized value of all graders and the weighted sum of ballot weight;If the weighted sum is less than or equal to given threshold, then it represents that the final classification testing result of the content to be detected is mesh Classification is marked, the weighted sum is more than given threshold, and the final classification testing result for representing the content to be detected is not the mesh Mark classification;The final classification testing result corresponding to the content to be detected that will be obtained, is obtained with least two grader Classification testing result is compared, and is examined with judging whether the grader at least two grader generates correct classification Result is surveyed, and compared result is stored;Every the period 1 of setting, the classification in once at least two grader is calculated according to the comparative result stored The recall rate of device, wherein the recall rate of i-th kind of grader is at least two grader:I-th within the current period 1 The number of correct classification testing result caused by kind grader and all classes caused by i-th kind of grader in the current period 1 The ratio of the number of other testing result;According to the recall rate of each grader, update the ballot weight of each grader or eliminate at least two grader Corresponding grader.
- 2. category detection method according to claim 1, it is characterised in that treat detection content carry out feature extraction it Before, in addition to:Web page contents are obtained according to URL, as content to be detected;Treat detection content and carry out feature extraction, including:If including content of text in the web page contents, the content of text is carried out based on Text character extraction algorithm special Sign extraction, and the characteristic set by feature extraction result added to web page contents;If including image content in the web page contents, target signature identification is carried out to the image content, according to target Feature recognition result establishes the characteristic vector of the image content, added to the characteristic set of the web page contents.
- 3. category detection method according to claim 2, it is characterised in that the Text character extraction algorithm is calculated for card side Method;Target signature identification is carried out to the image content, the feature of the image content is established according to target signature recognition result Vector, including:Face Detection is carried out to the image content using statistic histogram model;The characteristic vector of the image content is established according to Face Detection result, wherein the characteristic vector is by following element At least one formed vector:Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for the ratio of colour of skin boundary rectangle Example, maximum colour of skin connected region account for the ratio of whole picture region, maximum colour of skin connected region accounts for the ratio of colour of skin boundary rectangle With center picture region colour of skin ratio.
- 4. according to the category detection method any one of claim 1-3, it is characterised in that at least two grader Including at least two in following grader:Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and logic are returned Return grader.
- 5. category detection method according to claim 1, it is characterised in that be calculated once described at least two points After the recall rate of grader in class device, in addition to:According in equation below renewal once at least two grader The ballot weight of grader:<mrow> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&prime;</mo> </msup> <mo>=</mo> <mfrac> <msub> <mi>a</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>=</mo> <mi>n</mi> </mrow> </msubsup> <msub> <mi>a</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' it is this i-th kind of grader after updating Ballot weight.
- 6. category detection method according to claim 1, it is characterised in that also include:The grader that recall rate at least two grader is respectively less than into superseded threshold value within continuous N number of period 1 enters Row removes, to redefine the grader being adapted with the content to be detected, wherein the N is the integer more than 1.
- 7. category detection method according to claim 1, it is characterised in that the grader at least two grader Sample Storehouse including being stored with initial sample, and be used for what the Sample Storehouse was trained to obtain to the content to be detected Carry out the disaggregated model of classification detection;In the final classification testing result corresponding to the content to be detected that will be obtained, obtained with least two grader Classification testing result be compared after, in addition to:If the grader at least two grader generates mistake Classification testing result, then using the content to be detected as feedback samples, add the classification testing result that generates mistake In the Sample Storehouse of grader;Every the second round of setting, training once generates the classification of the classification testing result of mistake within current second round Device Sample Storehouse, the disaggregated model of the grader of the classification testing result of mistake is generated according to this training result amendment, It is updated with the grader to the classification testing result for generating mistake.
- A kind of 8. content type detection means, it is characterised in that including:Content Feature Extraction unit, feature extraction is carried out for treating detection content;Content type detection unit, for according to feature extraction result, using at least two be adapted with the content to be detected Kind grader, classification detection is carried out to the content to be detected;Content detection result determining unit, is used for:The classification testing result that each grader is obtained carries out quantification treatment, and determines the ballot weight of each grader;Its In, it is bigger for the bigger grader of accuracy of detection, corresponding ballot weight;Calculate the quantized value of all graders and the weighted sum of ballot weight;If the weighted sum is less than or equal to given threshold, then it represents that the final classification testing result of the content to be detected is mesh Classification is marked, the weighted sum is more than given threshold, and the final classification testing result for representing the content to be detected is not the mesh Mark classification;Grader detects result judging unit, for classifying in the content detection result determining unit according to described at least two The classification testing result that device obtains, it is determined that after corresponding to the final classification testing result of the content to be detected, by what is obtained Corresponding to the final classification testing result of the content to be detected, the classification testing result obtained with least two grader It is compared, to judge whether the grader at least two grader generates correct classification testing result, and it is right Comparative result is stored;Grader recall rate computing unit, for the period 1 every setting, result is detected according to the grader and judges list The comparative result of member storage calculates the recall rate of the grader in once at least two grader, wherein described at least two The recall rate of i-th kind of grader is in grader:The correct classification detection caused by i-th kind of grader within the current period 1 As a result the ratio of number and the number of all categories testing result caused by i-th kind of grader in the current period 1;Foundation The recall rate of each grader, update each grader ballot weight or eliminate at least two grader in corresponding classification Device.
- 9. classification detection means according to claim 8, it is characterised in that also include:Contents acquiring unit, before the Content Feature Extraction unit treats detection content progress feature extraction, for basis URL obtains web page contents, as content to be detected;The Content Feature Extraction unit, including:Text character extraction subelement, if for including content of text in the web page contents, based on Text character extraction Algorithm carries out feature extraction to the content of text, and feature extraction result is added to the characteristic set of web page contents;Picture feature extracts subelement, if for including image content in the web page contents, the image content is entered Row target signature identifies, the characteristic vector of the image content is established according to target signature recognition result, added to the webpage The characteristic set of content.
- 10. classification detection means according to claim 9, it is characterised in that the Text character extraction algorithm is card side Algorithm;The picture feature extracts subelement, is specifically used for:Face Detection is carried out to the image content using statistic histogram model;The characteristic vector of the image content is established according to Face Detection result, wherein the characteristic vector is by following element At least one formed vector:Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for the ratio of colour of skin boundary rectangle Example, maximum colour of skin connected region account for the ratio of whole picture region, maximum colour of skin connected region accounts for the ratio of colour of skin boundary rectangle With center picture region colour of skin ratio.
- 11. the classification detection means according to any one of claim 8-10, it is characterised in that at least two classification Device includes at least two in following grader:Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and logic are returned Return grader.
- 12. classification detection means according to claim 8, it is characterised in that also include:Grader ballot weight updating block, for the grader recall rate computing unit be calculated once it is described at least After the recall rate of grader in two kinds of graders, according to point in equation below renewal once at least two grader The ballot weight of class device:<mrow> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&prime;</mo> </msup> <mo>=</mo> <mfrac> <msub> <mi>a</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>=</mo> <mi>n</mi> </mrow> </msubsup> <msub> <mi>a</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' it is this i-th kind of grader after updating Ballot weight.
- 13. classification detection means according to claim 8, it is characterised in that also include:Grader eliminates unit, for by the recall rate at least two grader within continuous N number of period 1 it is small Removed in the grader for eliminating threshold value, to redefine the grader being adapted with the content to be detected, wherein the N For the integer more than 1.
- 14. classification detection means according to claim 8, it is characterised in that the classification at least two grader Sample Storehouse of the device including being stored with initial sample, and be trained to obtain to the Sample Storehouse are used for described to be detected interior Hold the disaggregated model for carrying out classification detection;Described device also includes:Feedback samples adding device, for will be obtained in grader detection result judging unit corresponding to described to be detected The final classification testing result of content, compared with the classification testing result that at least two grader obtains after, such as Grader at least two graders described in fruit generate mistake classification testing result, then using the content to be detected as Feedback samples, add in the Sample Storehouse for the grader for generating wrong classification testing result;Grader amending unit, for the second round every setting, training once generates mistake within current second round Classification testing result grader Sample Storehouse, generated according to this training result amendment mistake classification testing result Grader disaggregated model, to be updated to the grader of classification testing result for generating mistake.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410569492.6A CN104391860B (en) | 2014-10-22 | 2014-10-22 | content type detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410569492.6A CN104391860B (en) | 2014-10-22 | 2014-10-22 | content type detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104391860A CN104391860A (en) | 2015-03-04 |
CN104391860B true CN104391860B (en) | 2018-03-02 |
Family
ID=52609764
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410569492.6A Active CN104391860B (en) | 2014-10-22 | 2014-10-22 | content type detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104391860B (en) |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104951802A (en) * | 2015-06-17 | 2015-09-30 | 中国科学院自动化研究所 | Classifier updating method |
CN104965905B (en) | 2015-06-30 | 2018-05-04 | 北京奇虎科技有限公司 | A kind of method and apparatus of Web page classifying |
JP5901828B1 (en) * | 2015-08-20 | 2016-04-13 | 株式会社Cygames | Information processing system, program, and server |
CN105426354B (en) * | 2015-10-29 | 2019-03-22 | 杭州九言科技股份有限公司 | The fusion method and device of a kind of vector |
CN105426356B (en) * | 2015-10-29 | 2019-05-21 | 杭州九言科技股份有限公司 | A kind of target information recognition methods and device |
CN106649384B (en) * | 2015-11-03 | 2019-07-09 | 中国电信股份有限公司 | The method and apparatus classified to URL |
CN106980623B (en) * | 2016-01-18 | 2020-02-21 | 华为技术有限公司 | Data model determination method and device |
CN107193836B (en) * | 2016-03-15 | 2021-08-10 | 腾讯科技(深圳)有限公司 | Identification method and device |
CN107730286A (en) * | 2016-08-10 | 2018-02-23 | 中国移动通信集团黑龙江有限公司 | A kind of target customer's screening technique and device |
CN106383766B (en) | 2016-09-09 | 2018-09-11 | 北京百度网讯科技有限公司 | System monitoring method and apparatus |
CN107995152B (en) * | 2016-10-27 | 2020-07-03 | 腾讯科技(深圳)有限公司 | Malicious access detection method and device and detection server |
CN108804472A (en) * | 2017-05-04 | 2018-11-13 | 腾讯科技(深圳)有限公司 | A kind of webpage content extraction method, device and server |
CN107766234A (en) * | 2017-08-31 | 2018-03-06 | 广州数沃信息科技有限公司 | A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device |
CN107801090A (en) * | 2017-11-03 | 2018-03-13 | 北京奇虎科技有限公司 | Utilize the method, apparatus and computing device of audio-frequency information detection anomalous video file |
CN107895119A (en) * | 2017-12-28 | 2018-04-10 | 北京奇虎科技有限公司 | Program installation packet inspection method, device and electronic equipment |
CN108304483B (en) * | 2017-12-29 | 2021-01-19 | 东软集团股份有限公司 | Webpage classification method, device and equipment |
CN108509794A (en) * | 2018-03-09 | 2018-09-07 | 中山大学 | A kind of malicious web pages defence detection method based on classification learning algorithm |
CN108932502A (en) * | 2018-07-13 | 2018-12-04 | 希蓝科技(北京)有限公司 | A kind of electrocardiogram template classification model modification system and method for self study |
CN110875874B (en) * | 2018-09-03 | 2022-06-07 | Oppo广东移动通信有限公司 | Electronic red packet detection method and device and mobile terminal |
CN109344884B (en) * | 2018-09-14 | 2023-09-12 | 深圳市雅阅科技有限公司 | Media information classification method, method and device for training picture classification model |
CN111612492A (en) * | 2019-02-26 | 2020-09-01 | 北京奇虎科技有限公司 | User online accurate marketing method and device based on multi-feature fusion |
CN111695083B (en) * | 2019-03-15 | 2024-09-20 | 北京京东尚科信息技术有限公司 | Detection method and detection equipment |
CN110363223A (en) * | 2019-06-20 | 2019-10-22 | 华南理工大学 | Industrial flow data processing method, detection method, system, device and medium |
CN112347244B (en) * | 2019-08-08 | 2023-07-25 | 四川大学 | Yellow-based and gambling-based website detection method based on mixed feature analysis |
CN110502552B (en) * | 2019-08-20 | 2022-10-28 | 重庆大学 | Classification data conversion method based on fine-tuning conditional probability |
CN110852285B (en) * | 2019-11-14 | 2023-04-18 | 腾讯科技(深圳)有限公司 | Object detection method and device, computer equipment and storage medium |
CN111310096A (en) * | 2020-02-25 | 2020-06-19 | 维沃移动通信有限公司 | Content saving method, electronic device, and computer-readable storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005279217A (en) * | 2004-03-29 | 2005-10-13 | Kohei Kadowaki | Picture expressing device with body sensitive function |
CN101055621A (en) * | 2006-04-10 | 2007-10-17 | 中国科学院自动化研究所 | Content based sensitive web page identification method |
CN101145171A (en) * | 2007-09-15 | 2008-03-19 | 中国科学院合肥物质科学研究院 | Gene microarray data predication method based on independent component integrated study |
CN101251851A (en) * | 2008-02-29 | 2008-08-27 | 吉林大学 | Multi-classifier integrating method based on increment native Bayes network |
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
-
2014
- 2014-10-22 CN CN201410569492.6A patent/CN104391860B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005279217A (en) * | 2004-03-29 | 2005-10-13 | Kohei Kadowaki | Picture expressing device with body sensitive function |
CN101055621A (en) * | 2006-04-10 | 2007-10-17 | 中国科学院自动化研究所 | Content based sensitive web page identification method |
CN101281521A (en) * | 2007-04-05 | 2008-10-08 | 中国科学院自动化研究所 | Method and system for filtering sensitive web page based on multiple classifier amalgamation |
CN101145171A (en) * | 2007-09-15 | 2008-03-19 | 中国科学院合肥物质科学研究院 | Gene microarray data predication method based on independent component integrated study |
CN101251851A (en) * | 2008-02-29 | 2008-08-27 | 吉林大学 | Multi-classifier integrating method based on increment native Bayes network |
Non-Patent Citations (1)
Title |
---|
一种自适应权值的多特征融合分类方法;张文博等;《系统工程与电子技术》;20130630;正文第1135页,第1137页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104391860A (en) | 2015-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104391860B (en) | content type detection method and device | |
CN106599155B (en) | Webpage classification method and system | |
CN110245229A (en) | A kind of deep learning theme sensibility classification method based on data enhancing | |
CN104239485A (en) | Statistical machine learning-based internet hidden link detection method | |
CN105069141A (en) | Construction method and construction system for stock standard news library | |
CN103577755A (en) | Malicious script static detection method based on SVM (support vector machine) | |
CN110287316A (en) | A kind of Alarm Classification method, apparatus, electronic equipment and storage medium | |
CN106446931A (en) | Feature extraction and classification method and system based on support vector data description | |
CN110310012B (en) | Data analysis method, device, equipment and computer readable storage medium | |
CN107545038A (en) | A kind of file classification method and equipment | |
CN113590764A (en) | Training sample construction method and device, electronic equipment and storage medium | |
CN111048214A (en) | Early warning method and device for spreading situation of foreign livestock and poultry epidemic diseases | |
CN110647995A (en) | Rule training method, device, equipment and storage medium | |
CN104850617A (en) | Short text processing method and apparatus | |
CN107368526A (en) | A kind of data processing method and device | |
CN108959293A (en) | A kind of text data classification method and server | |
CN104915680B (en) | Multi-tag transformation Relationship Prediction method based on Ameliorative RBF Neural Networks | |
CN113723747A (en) | Analysis report generation method, electronic device and readable storage medium | |
CN112888008B (en) | Base station abnormality detection method, device, equipment and storage medium | |
CN110808947B (en) | Automatic vulnerability quantitative evaluation method and system | |
CN112307170A (en) | Relation extraction model training method, relation extraction method, device and medium | |
CN103294828A (en) | Verification method and verification device of data mining model dimension | |
CN116823164A (en) | Business approval method, device, equipment and storage medium | |
CN104462215B (en) | A kind of scientific and technical literature based on time series is cited number Forecasting Methodology | |
Soni et al. | A novel optimized classifier for the loan repayment capability prediction system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20190809 Address after: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer 2 Patentee after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd. Address before: 100091 Beijing, Haidian District, northeast Wang West Road, No. 4, Zhongguancun Software Park, building C, block, 1-03 Patentee before: Pacify a Heng Tong (Beijing) Science and Technology Ltd. |
|
TR01 | Transfer of patent right |