CN104391860A - Content type detection method and device - Google Patents

Content type detection method and device Download PDF

Info

Publication number
CN104391860A
CN104391860A CN201410569492.6A CN201410569492A CN104391860A CN 104391860 A CN104391860 A CN 104391860A CN 201410569492 A CN201410569492 A CN 201410569492A CN 104391860 A CN104391860 A CN 104391860A
Authority
CN
China
Prior art keywords
sorter
content
classification
testing result
sorters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410569492.6A
Other languages
Chinese (zh)
Other versions
CN104391860B (en
Inventor
唐呈光
张兵
杨念
耿志峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Anyi Hengtong Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anyi Hengtong Beijing Technology Co Ltd filed Critical Anyi Hengtong Beijing Technology Co Ltd
Priority to CN201410569492.6A priority Critical patent/CN104391860B/en
Publication of CN104391860A publication Critical patent/CN104391860A/en
Application granted granted Critical
Publication of CN104391860B publication Critical patent/CN104391860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The embodiment of the invention discloses a content type detection method and device. The method comprises the following steps: extracting the features of content to be detected; adopting at least two kinds of classifiers matched with the content to be detected according to a feature extraction result, and performing type detection on the content to be detected; determining a final type detection result corresponding to the content to be detected according to type detection results obtained by the at least two kinds of classifiers. According to the technical scheme provided by the embodiment of the invention, the type of acquired content can be detected automatically, the detection time is shortened, and the detection cost is reduced.

Description

Content type detection method and device
Technical field
The embodiment of the present invention relates to Classification and Identification technical field, particularly relates to a kind of content type detection method and device.
Background technology
Along with the development of Internet technology, the information on internet is all the time all with the swift and violent increase of exponential speed, and people obtain and the mode also more and more various and facilitation of the information of use.But internet, while offering convenience to the life of people, brings a lot of negative effects also to the life of people.Such as, number of site on internet is in profit and improves the object of clicking rate, by some unsound content displayings to user, thus can have a strong impact on the viewing experience of user, particularly for teenager, these contents can produce material impact to its physical and mental development.
At present, based on artificial judgement to the discriminating majority of web site contents (such as Pornograph), although this method is accurate, inefficiency, and need the man power and material of at substantial, the harmful content that current site is spread unchecked day by day cannot be tackled at all.
Summary of the invention
The embodiment of the present invention provides a kind of content type detection method and device, with can to the classification of acquisition content automatically detect, shorten detection time, reduce testing cost.
First aspect, embodiments provide a kind of content type detection method, the method comprises:
Treat Detection of content and carry out feature extraction;
According to feature extraction result, adopt at least two kind sorters suitable with described content to be detected, classification detection is carried out to described content to be detected;
According to the classification testing result that described at least two kinds of sorters obtain, determine the final classification testing result corresponding to described content to be detected.
Second aspect, the embodiment of the present invention additionally provides a kind of content type pick-up unit, and this device comprises:
Content Feature Extraction unit, carries out feature extraction for treating Detection of content;
Content type detecting unit, for according to feature extraction result, adopts at least two kind sorters suitable with described content to be detected, carries out classification detection to described content to be detected;
Content detection result determining unit, for the classification testing result obtained according to described at least two kinds of sorters, determines the final classification testing result corresponding to described content to be detected.
The technical scheme that the embodiment of the present invention provides, the feature utilizing sorter to treat Detection of content detects, and achieves the automatic identification treating Detection of content generic, can greatly reduce spent man power and material compared to manual detection, shorten detection time, reduce testing cost; Further, the classification testing result based on Various Classifiers on Regional determines the final classification testing result corresponding to content to be detected, effectively can ensure the correctness of classification testing result, improves accuracy of detection.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of a kind of content type detection method that the embodiment of the present invention one provides;
Fig. 2 is the schematic flow sheet of a kind of content type detection method that the embodiment of the present invention two provides;
Fig. 3 is the schematic flow sheet of a kind of content type detection method that the embodiment of the present invention three provides;
Fig. 4 is the structural representation of a kind of content type pick-up unit that the embodiment of the present invention four provides;
Fig. 5 is the structural representation of a kind of content type pick-up unit that the embodiment of the present invention five provides;
Fig. 6 is the schematic flow sheet of a kind of preferred content type detection method that the embodiment of the present invention six provides.
Embodiment
Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not entire infrastructure.
Embodiment one:
Fig. 1 is the schematic flow sheet of a kind of content type detection method that the embodiment of the present invention one provides, the present embodiment is applicable to treats the situation that Detection of content carries out classification detection, the method can be performed by classification pick-up unit, and described device is by software and/or hardware implementing.See Fig. 1, the content type detection method that the present embodiment provides specifically comprises following operation:
Operation 110, treat Detection of content and carry out feature extraction.
In the present embodiment, content to be detected can be stored in advance in this locality, or the content of the text acquired from other equipment in real time and/or picture format.Such as, content to be detected is HTML (HyperText Mark-up Language, the HTML (Hypertext Markup Language)) page to obtaining from the server in internet, carries out resolving the web page contents including text and/or picture format obtained.
For the content of text formatting, based on Text character extraction algorithms such as card side, document frequency, information gain, mutual information, cross entropies, feature extraction can be carried out to it; For the content of picture format, first can carry out the identification of object to image content, set up the proper vector of this image content afterwards according to recognition result.Wherein, described proper vector can comprise area, number, the position of object and account for the elements such as whole picture region ratio.
Operate 120, according to feature extraction result, adopt at least two kind sorters suitable with content to be detected, treat Detection of content and carry out classification detection.
In the present embodiment, be pre-created at least two kind sorters suitable with content to be detected, often kind of sorter independently can realize the detection treating Detection of content generic.Concrete, often kind of sorter can realize treating the detection that Detection of content carries out at least one classification, the classification such as detecting this content to be detected belongs to target classification, still do not belong to target classification, or the classification detecting this content to be detected belongs to which kind of the target classification in plurality of target classification.
The constructive process of various sorter can be specially: train the great amount of samples during Sample Storehouse stores; The disaggregated model belonging to this sorter is obtained according to training result.Disaggregated model is as a part for sorter, and its input and output are the input and output of its corresponding sorter.Wherein, the great amount of samples stored in Sample Storehouse need comprise its classification and belong to other group of sample of target class and its classification and do not belong to target class other another group sample; Comprise the training of sample and carry out feature extraction to sample, this feature extraction algorithm should be consistent with the above-mentioned feature extraction algorithm treating Detection of content.
After treating Detection of content and carrying out feature extraction, can using its feature extraction result as the input of the disaggregated model of described at least two kinds of sorters, to adopt each disaggregated model respectively feature extraction result to be processed to the classification testing result generated corresponding to content to be detected, and classification testing result is exported.
In embodiments of the present invention, the at least two kind sorters suitable with content to be detected can comprise at least two kinds in following sorter: support vector machine (SVM, Support Vector Machine) sorter, naive Bayesian (Bayes) sorter, k nearest neighbor distance (KNN, k-NearestNeighbor) sorter, decision tree (ID3, Iterative Dichotomiser 3) sorter and logistic regression (Logistic) sorter.
Operation 130, the classification testing result obtained according at least two kinds of sorters, determine the final classification testing result corresponding to content to be detected.
After the classification adopting sorter not of the same race to treat Detection of content respectively detects, based on setting rule, obtained classification testing result can be processed, to determine the final classification testing result corresponding to content to be detected.Concrete processing procedure can be: in all categories testing result that statistics obtains, each has the number of identical category testing result; Identical category testing result corresponding when number is maximum is as the final classification testing result corresponding to content to be detected.Such as, have employed the classification that 5 kinds of sorters treat Detection of content respectively to detect, its testing result is followed successively by: content to be detected belongs to target classification, does not belong to target classification, belongs to target classification, does not belong to target classification, belongs to target classification, the statistics then obtained in processing procedure is: testing result is that to belong to other number of target class be 3 to content to be detected, testing result is that not belong to other number of target class be 2 to content to be detected, and the final classification testing result therefore corresponding to content to be detected is: target detection content belongs to target classification.
Certainly, its processing procedure can also be other modes, and the present embodiment is not construed as limiting this.Such as, different values can be given in advance for different classification testing results, such as giving classification testing result is that to belong to the 1st other value of target class be 1 to content to be detected, classification testing result is that to belong to the 2nd other value of target class be 2 to content to be detected, classification testing result neither belongs to the 1st target classification, and also not belonging to the 2nd other value of target class is 0; Then, value corresponding for all categories testing result is weighted and obtains a new value, and then determine according to new value the final classification testing result corresponding to content to be detected.Wherein, the weight corresponding to value of arbitrary classification testing result can be in advance for obtaining the weighted value of the sorter imparting corresponding to this classification testing result.
The technical scheme that the present embodiment provides, the feature utilizing sorter to treat Detection of content detects, and achieves the automatic identification treating Detection of content generic, greatly can reduce man power and material compared to manual detection, shorten detection time, reduce testing cost; Further, the classification testing result based on Various Classifiers on Regional determines the final classification testing result corresponding to content to be detected, effectively can ensure the correctness of classification testing result, improves accuracy of detection.
Embodiment two:
Fig. 2 is the schematic flow sheet of a kind of content type detection method that the embodiment of the present invention two provides, and the present embodiment, on the basis of above-described embodiment one, adds the operation obtaining content to be detected, and does further optimization based on this operation to aforesaid operations 110.See Fig. 2, the content type detection method that the present embodiment provides specifically comprises following operation:
Operate 210, obtain web page contents, as content to be detected according to URL(uniform resource locator);
If comprise content of text in operation 220 web page contents, then based on Text character extraction algorithm, feature extraction is carried out to content of text, and feature extraction result is added into the characteristic set of web page contents;
If comprise image content in operation 230 web page contents, then target signature identification is carried out to image content, set up the proper vector of image content according to target signature recognition result, be added into the characteristic set of web page contents;
Operation 240, characteristic set according to web page contents, adopt at least two kind sorters suitable with web page contents, carry out classification detection to web page contents;
Operation 250, the classification testing result obtained according at least two kinds of sorters, determine the final classification testing result corresponding to web page contents.
In the present embodiment, can based on the URL(uniform resource locator) prestored, server to correspondence sends acquisition request conforms, the html page that reception server returns according to this request, and html page is resolved, to extract wherein comprised content of text and image content, as accessed web page contents, it is also content to be detected.
Text character extraction algorithm can be card side, document frequency, information gain, mutual information or cross entropy etc.; The classification that the target signature sorter adaptive with content to be detected will detect is associated, and such as sorter will detect web page contents when whether belonging to yellow harmful content classification, and target signature can be features of skin colors.
If web page contents comprises content of text and image content simultaneously, using the proper vector of the feature extraction result of content of text and image content in the lump as the feature extraction result treating Detection of content, the detection of following categories can be carried out.Certainly, for saving the cost of classification detection and spent time, also the main contents first can treating Detection of content are determined, to judge that it is content of text, or image content, afterwards only using the feature extraction result of determined main contents as the feature extraction result treating Detection of content, carry out the detection of following categories.
Be the application-specific scene whether belonging to yellow this classification of harmful content for detecting web page contents for category detection method, in a kind of embodiment of the present embodiment, Text character extraction algorithm is the side's of card algorithm preferably; Target signature identification is carried out to image content, sets up the proper vector of image content according to target signature recognition result, comprising:
Statistic histogram model is adopted to carry out Face Detection to image content;
Set up the proper vector of image content according to Face Detection result, wherein proper vector is by least one vector formed in following element:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, the ratio that area of skin color accounts for the ratio of colour of skin boundary rectangle, maximum colour of skin connected region accounts for whole picture region, maximum colour of skin connected region account for colour of skin boundary rectangle ratio and center picture region colour of skin ratio.
In this embodiment, carrying out Face Detection to image content, can be identify in picture the area of skin color information comprised, and this information can comprise the number of area of skin color, size, position and shape, can determine the arbitrary element in above-mentioned vector accordingly.Wherein, center picture region colour of skin ratio refers to: the area of skin color comprised in the setting central area of picture accounts for the ratio of this central area.
The technical scheme of the present embodiment, the feature of sorter not of the same race to web page contents is utilized to detect, achieve the automatic identification to web page contents generic, it is possible to especially from a large amount of web page contents, detect the content belonging to yellow bad classification automatically.Compared to manual detection, the present embodiment can reduce the man power and material spent by it greatly, shortens detection time, reduces testing cost.
Embodiment three:
Fig. 3 is the schematic flow sheet of a kind of content type detection method that the embodiment of the present invention three provides, the present embodiment is on the basis of the various embodiments described above, do further to optimize to the operation of " according to the classification testing result that at least two kinds of sorters obtain; determine the final classification testing result corresponding to web page contents ", and the corresponding operation adding Optimum Classification device and ballot weight thereof.See Fig. 3, the content type detection method that the present embodiment provides specifically comprises following operation:
Operation 310, treat Detection of content and carry out feature extraction;
Operate 320, according to feature extraction result, adopt at least two kind sorters suitable with content to be detected, treat Detection of content and carry out classification detection;
Operation 330, result of calculation according to following formula, determine the final classification testing result corresponding to content to be detected:
r = 0 Σ i = 1 i = n ( m i × w i ) ≤ σ 1 Σ i = 1 i = n ( m i × w i ) > σ
Wherein, i is integer; N is total number of at least two kinds of sorters; m ifor the classification testing result of i-th sorter at least two kinds of sorters, value is 1 or 0, and 0 classification representing content to be detected is non-targeted classification, and 1 classification representing content to be detected is target classification; w iit is the ballot weight of i-th kind of sorter; σ is setting threshold value; R=1 represents that the final classification testing result of content to be detected is target classification, and r=0 represents that the final classification testing result of content to be detected is not target classification.
In the present embodiment, the initial value of the ballot weight of various sorter can be pre-arranged, and the ballot weight sum of all sorters is 1.Such as, the initial value that can arrange each ballot weight is all equal, also according to the size of the accuracy of detection of sorter not of the same race, can arrange ballot weight, concrete, and for the sorter that accuracy of detection is larger, the ballot weight of giving for it is then larger.
The technical scheme that the present embodiment provides, voting weighted is carried out by the classification testing result obtained by Various Classifiers on Regional, determine the final classification testing result corresponding to content to be detected, the accuracy of detection treating Detection of content generic can be improved, make classification testing result more close to the classification belonging to content reality to be detected.
Consider the sorter designed in advance, due to the finiteness of sample stored in its Sample Storehouse, can not ensure that obtained classification testing result is necessarily correct.For this reason, on the basis of technique scheme, further optimization can be done to sorter and ballot mode thereof, to improve the accuracy of final classification testing result.
Concrete, in the classification testing result obtained according at least two kinds of sorters, after determining the final classification testing result corresponding to content to be detected, also comprise:
By the final classification testing result corresponding to content to be detected obtained, the classification testing result obtained with at least two kinds of sorters compares, to judge whether the sorter at least two kinds of sorters creates correct classification testing result, and compared result stores; Concrete, can by final classification testing result, the classification testing result obtained with various sorter respectively compares, to judge whether various sorter creates correct classification testing result;
Every the period 1 of setting, calculate the recall rate of the sorter once at least two kinds of sorters according to stored comparative result, wherein at least two kinds of sorters, the recall rate of i-th kind of sorter is: the number of the correct classification testing result that i-th kind of sorter produces and the ratio of the number of all categories testing result that i-th kind of sorter produces in the current period 1 within the current period 1.
Such as, in nearest seven days, certain sorter has carried out the classification detection operation of 50 times altogether, and according to stored comparative result, this sorter known detects in operation the classification of described 50 times, have and create correct classification testing result for 30 times, then create the classification testing result of mistake in addition 20 times, therefore the recall rate that this sorter obtained in nearest seven days is: 30/50=0.6.
In embodiments of the present invention, on the one hand, based on the recall rate of each sorter, the ballot weight of each sorter can be upgraded.Concrete, after the recall rate calculating once the sorter at least two kinds of sorters, also can comprise: the ballot weight upgrading once the sorter at least two kinds of sorters according to following formula:
w i ′ = a i Σ i = 1 i = n a i
Wherein, a ifor the recall rate of i-th kind of sorter that this calculates; w i' for this upgrade after the ballot weight of i-th kind of sorter.
In embodiments of the present invention, on the other hand, also based on the recall rate of each sorter, the corresponding sorter in superseded at least two kinds of pre-designed sorters can be carried out.Concrete, the category detection method that the embodiment of the present invention provides also can comprise further:
Sorter recall rate at least two kinds of sorters being all less than within a N continuous period 1 superseded threshold value removes, to redefine the sorter suitable with content to be detected, wherein N be greater than 1 integer.
Certainly, may also be by other means, carry out the corresponding sorter in superseded at least two kinds of pre-designed sorters.Such as, the average recall rate of various sorter in the N continuous period 1 is calculated; If minimum average detected rate is lower than superseded threshold value, then the sorter of its correspondence is eliminated.
It should be noted that, after being eliminated by certain sorter, the ballot weight of remained various sorters need be redefined, to ensure that its ballot weight sum is for 1.Concrete, the ballot weight of the various sorters newly determined can be the equal value automatically generated, also can be the recall rate based on remained each sorter, the ballot weight of each sorter redefined, this deterministic process see the process of the ballot weight of each sorter of above-mentioned renewal, can not repeat them here.
On the basis of technique scheme, the sorter at least two kinds of sorters comprises the Sample Storehouse storing initial sample, and Sample Storehouse is trained to the disaggregated model carrying out classification detection for treating Detection of content obtained;
In the final classification testing result corresponding to content to be detected that will obtain, after the classification testing result obtained with at least two kinds of sorters compares, also comprise: if the sorter at least two kinds of sorters creates the classification testing result of mistake, then using content to be detected as feedback samples, add create mistake classification testing result sorter Sample Storehouse in;
Every the second round of setting, training once creates the sorter Sample Storehouse of the classification testing result of mistake within current second round, the disaggregated model of the sorter of the classification testing result of mistake is created, to upgrade the sorter of the classification testing result creating mistake according to this training result correction.
Preferably, second round can be seven days.
The embodiment of the present invention, by the self-correction to Sample Storehouse and ballot weight, can overcome sorter treats Detection of content classification when sample size is few and detect inaccurate problem, thus improve accuracy of detection.
Embodiment four:
Figure 4 shows that the structural representation of a kind of content type pick-up unit that the embodiment of the present invention four provides, the present embodiment is applicable to treats the situation that Detection of content carries out classification detection.See Fig. 4, the concrete structure of this content type pick-up unit is as follows:
Content Feature Extraction unit 410, carries out feature extraction for treating Detection of content;
Content type detecting unit 420, for according to feature extraction result, adopts at least two kind sorters suitable with described content to be detected, carries out classification detection to described content to be detected;
Content detection result determining unit 430, for the classification testing result obtained according to described at least two kinds of sorters, determines the final classification testing result corresponding to described content to be detected.
The present embodiment one preferred embodiment in, described device also comprises:
Contents acquiring unit 400, treats before Detection of content carries out feature extraction, for obtaining web page contents, as content to be detected according to URL(uniform resource locator) at described Content Feature Extraction unit 410;
Described Content Feature Extraction unit 410, comprising:
Text character extraction subelement 4101, if for comprising content of text in described web page contents, then carry out feature extraction based on Text character extraction algorithm to described content of text, and is added into the characteristic set of web page contents by feature extraction result;
Picture feature extracts subelement 4102, if for comprising image content in described web page contents, then target signature identification is carried out to described image content, set up the proper vector of described image content according to target signature recognition result, be added into the characteristic set of described web page contents.
Further, described Text character extraction algorithm is card side's algorithm;
Described picture feature extracts subelement 4102, specifically for:
Statistic histogram model is adopted to carry out Face Detection to described image content;
Set up the proper vector of described image content according to Face Detection result, wherein said proper vector is by least one vector formed in following element:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, the ratio that area of skin color accounts for the ratio of colour of skin boundary rectangle, maximum colour of skin connected region accounts for whole picture region, maximum colour of skin connected region account for colour of skin boundary rectangle ratio and center picture region colour of skin ratio.
In embodiments of the present invention, described at least two kinds of sorters comprise at least two kinds in following sorter:
Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and logistic regression sorter.
The said goods can perform the method that the embodiment of the present invention one and embodiment two provide, and possesses the corresponding functional module of manner of execution and beneficial effect.The not ins and outs of detailed description in the present embodiment, can reference example one and embodiment two.
Embodiment five:
The structural representation of a kind of content type pick-up unit that Fig. 5 provides for the embodiment of the present invention five, the present embodiment is on the basis of above-described embodiment four, do further to optimize to the structure of content testing result determining unit, and the corresponding corresponding units adding Optimum Classification device and ballot weight thereof.See Fig. 5, the concrete structure of the content type pick-up unit that the present embodiment provides is as follows:
Content Feature Extraction unit 510, carries out feature extraction for treating Detection of content;
Content type detecting unit 520, for according to feature extraction result, adopts at least two kind sorters suitable with described content to be detected, carries out classification detection to described content to be detected;
Content detection result determining unit 530, for the classification testing result obtained according to described at least two kinds of sorters, determines the final classification testing result corresponding to described content to be detected.
Further, described content detection result determining unit 530, specifically for:
According to the result of calculation of following formula, determine the final classification testing result corresponding to described content to be detected:
r = 0 Σ i = 1 i = n ( m i × w i ) ≤ σ 1 Σ i = 1 i = n ( m i × w i ) > σ
Wherein, i is integer; N for described in total number of at least two kinds of sorters; m ifor the classification testing result of i-th sorter in described at least two kinds of sorters, value is 1 or 0, and 0 classification representing described content to be detected is non-targeted classification, and 1 classification representing described content to be detected is target classification; w ifor the ballot weight of described i-th kind of sorter; σ is setting threshold value; R=1 represents that the final classification testing result of described content to be detected is described target classification, and r=0 represents that the final classification testing result of described content to be detected is not described target classification.
Further, on the basis of technique scheme, described device also comprises:
Sorter detects result judging unit 540, for the classification testing result obtained according to described at least two kinds of sorters in described content detection result determining unit 530, after determining the final classification testing result corresponding to described content to be detected, by the final classification testing result corresponding to described content to be detected obtained, the classification testing result obtained with described at least two kinds of sorters compares, whether create correct classification testing result with the sorter at least two kinds of sorters described in judging, and compared result stores;
Sorter recall rate computing unit 550, for the period 1 every setting, according to described sorter detect comparative result that result judging unit 540 stores calculate once described in the recall rate of sorter at least two kinds of sorters, in wherein said at least two kinds of sorters, the recall rate of i-th kind of sorter is: the number of the correct classification testing result that i-th kind of sorter produces and the ratio of the number of all categories testing result that i-th kind of sorter produces in the current period 1 within the current period 1.
Further, described device also comprises:
Sorter ballot weight updating block 560, after recall rate for the sorter at least two kinds of sorters described in calculating once at described sorter recall rate computing unit 550, the ballot weight of the sorter at least two kinds of sorters described in upgrading once according to following formula:
w i ′ = a i Σ i = 1 i = n a i
Wherein, a ifor the recall rate of i-th kind of sorter that this calculates; w i' for this upgrade after the ballot weight of i-th kind of sorter.
Further, described device also comprises:
Unit 570 eliminated by sorter, sorter for the recall rate in described at least two kinds of sorters being all less than within a N continuous period 1 superseded threshold value removes, to redefine the sorter suitable with described content to be detected, wherein said N be greater than 1 integer.
On the basis of technique scheme, the sorter in described at least two kinds of sorters comprises the Sample Storehouse storing initial sample, and described Sample Storehouse is trained to the disaggregated model for carrying out classification detection to described content to be detected obtained;
Described device also comprises:
Feedback samples adding device 580, for detecting the final classification testing result corresponding to described content to be detected that result judging unit 540 will obtain at described sorter, after the classification testing result obtained with described at least two kinds of sorters compares, if the sorter in described at least two kinds of sorters creates the classification testing result of mistake, then using described content to be detected as feedback samples, add create mistake classification testing result sorter Sample Storehouse in;
Sorter amending unit 590, for the second round every setting, training once creates the sorter Sample Storehouse of the classification testing result of mistake within current second round, the disaggregated model of the sorter of the classification testing result of mistake is created, to upgrade the described sorter creating the classification testing result of mistake according to this training result correction.
The said goods can perform the method that the embodiment of the present invention one, embodiment two and embodiment three provide, and possesses the corresponding functional module of manner of execution and beneficial effect.The not ins and outs of detailed description in the present embodiment, can reference example one, embodiment two and embodiment three.
Embodiment six:
Fig. 6 is the schematic flow sheet of a kind of preferred content type detection method that the embodiment of the present invention six provides.The present embodiment based on above-mentioned all embodiments, can provide a kind of preferred embodiment.See Fig. 6, the content type detection method that the present embodiment provides specifically comprises following operation:
The to the effect that content of text that operation 610, detection web page contents comprise or image content.
Operate 620, feature extraction is carried out to the main contents that web page contents comprises.
Operate 630, according to feature extraction result, adopt SVM, Bayes, KNN, ID3 and Logistic sorter suitable with web page contents respectively, classification detection is carried out to web page contents.
Wherein, various sorter includes the Sample Storehouse storing initial sample, and described Sample Storehouse is trained to the disaggregated model for carrying out classification detection to web page contents obtained.
For SVM classifier, its advantage is: by classification input being changed into numerical value input, support vector can be made to support grouped data and numeric data simultaneously, being applicable to large-scale data.
For Bayes sorter, its advantage is: the high speed possessed when can accept the training of great amount of samples data and inquiry, supports incremental training; Relatively simple to the explanation of sorter actual learning.
For KNN sorter, its advantage is: complicated function can be utilized to carry out numerical prediction, keeps again the feature be easily understood simultaneously; Rational data zooming amount; New samples data can be added at any time, and need not re-start training.
For ID3 sorter, its advantage is: be easy to explanation model of undergoing training, and of paramount importance factor of judgment has all well been arranged in the root position of close tree by the algorithm for design of this sorter; Can simultaneously treatment classification data and numeric data; Be easy to influencing each other between treatment variable; Be applicable to data on a small scale.
For Logistic sorter, its advantage is: be good at analytical line sexual intercourse, is better than decision tree to the integrally-built analysis of data.
Operation 640, the classification testing result obtained by each sorter are weighted ballot.
Operate 650, obtain corresponding to the final classification testing result of web page contents according to Nearest Neighbor with Weighted Voting result.
Operate 660, the recall rate of various sorter is monitored.
Concrete, the final classification testing result will obtained respectively, the classification testing result obtained with various sorter compares, and to judge whether the sorter in various sorter creates correct classification testing result, and compared result stores;
Every seven days, according to stored comparative result, calculate once the recall rate of various sorter.
Operation 670, recall rate according to monitored various sorters, upgrade the ballot weight of various sorter.
Operate 680, the Sample Storehouse of sorter is upgraded.
Concrete, if any one sorter in above-mentioned five kinds of sorters creates the classification testing result of mistake, then using the web page contents of its correspondence as feedback samples, add in the Sample Storehouse of the sorter of the classification testing result creating mistake.Certainly, also can only the URL of web page contents be added in the Sample Storehouse of sorter of the classification testing result creating mistake.Follow-up this Sample Storehouse is trained before, just obtain html page from server in real time based on added URL, and resolve generating web page content, as the feedback samples in Sample Storehouse.
Operate 690, carry out eliminating and training management to sorter.
Concrete, sorter is eliminated, comprising:
The sorter that recall rate in above-mentioned five kinds of sorters was all less than superseded threshold value in continuous one month is removed, to redefine the sorter suitable with web page contents.
Carry out training management to sorter to comprise:
Every seven days, training is once at the sorter Sample Storehouse of working as the classification testing result creating mistake in the first seven day, the disaggregated model of the sorter of the classification testing result of mistake is created according to this training result correction, to upgrade the sorter of the classification testing result creating mistake, the classification deviation of correction sorter that like this can be regular, and then the accuracy improving that sorter carries out classification detection.
The technical scheme that the present embodiment provides beneficial effect specific as follows:
The first, can automatically realize the detection of web site contents classification, use manpower and material resources sparingly, detection efficiency is high;
Second, the Algorithm for Training of multiple machine learning is become Various Classifiers on Regional, often kind of sorter produces the classification testing result of a prediction, multiple Nearest Neighbor with Weighted Voting that predicts the outcome is produced a final result, thus make to the classification of web site contents detect its correctness more reliable, have foundation;
3rd, by the self-correction to sample in each sorter and ballot weight, general machine learning model classification when sample size is few can be overcome and detect inaccurate problem;
4th, the correctness of the classification testing result at every turn predicted by sorter is added up, and periodically investigate the accuracy rate of sorter accordingly, the sorter always on the lower side for accuracy rate is eliminated, and can make the accuracy of web site contents classification detection higher.
Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims (18)

1. a content type detection method, is characterized in that, comprising:
Treat Detection of content and carry out feature extraction;
According to feature extraction result, adopt at least two kind sorters suitable with described content to be detected, classification detection is carried out to described content to be detected;
According to the classification testing result that described at least two kinds of sorters obtain, determine the final classification testing result corresponding to described content to be detected.
2. category detection method according to claim 1, is characterized in that, treating before Detection of content carries out feature extraction, also comprises: obtain web page contents, as content to be detected according to URL(uniform resource locator);
Treat Detection of content and carry out feature extraction, comprising:
If comprise content of text in described web page contents, then based on Text character extraction algorithm, feature extraction is carried out to described content of text, and feature extraction result is added into the characteristic set of web page contents;
If comprise image content in described web page contents, then target signature identification is carried out to described image content, set up the proper vector of described image content according to target signature recognition result, be added into the characteristic set of described web page contents.
3. category detection method according to claim 2, is characterized in that, described Text character extraction algorithm is card side's algorithm;
Target signature identification is carried out to described image content, sets up the proper vector of described image content according to target signature recognition result, comprising:
Statistic histogram model is adopted to carry out Face Detection to described image content;
Set up the proper vector of described image content according to Face Detection result, wherein said proper vector is by least one vector formed in following element:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, the ratio that area of skin color accounts for the ratio of colour of skin boundary rectangle, maximum colour of skin connected region accounts for whole picture region, maximum colour of skin connected region account for colour of skin boundary rectangle ratio and center picture region colour of skin ratio.
4. the category detection method according to any one of claim 1-3, is characterized in that, described at least two kinds of sorters comprise at least two kinds in following sorter:
Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and logistic regression sorter.
5. the category detection method according to any one of claim 1-3, is characterized in that, according to the classification testing result that described at least two kinds of sorters obtain, determines the final classification testing result corresponding to described content to be detected, comprising:
According to the result of calculation of following formula, determine the final classification testing result corresponding to described content to be detected:
r = 0 Σ i = 1 i = n ( m i × w i ) ≤ σ 1 Σ i = 1 i = n ( m i × w i ) > σ
Wherein, i is integer; N for described in total number of at least two kinds of sorters; m ifor the classification testing result of i-th sorter in described at least two kinds of sorters, value is 1 or 0, and 0 classification representing described content to be detected is non-targeted classification, and 1 classification representing described content to be detected is target classification; w ifor the ballot weight of described i-th kind of sorter; σ is setting threshold value; R=1 represents that the final classification testing result of described content to be detected is described target classification, and r=0 represents that the final classification testing result of described content to be detected is not described target classification.
6. category detection method according to claim 5, is characterized in that, in the classification testing result obtained according to described at least two kinds of sorters, after determining the final classification testing result corresponding to described content to be detected, also comprises:
By the final classification testing result corresponding to described content to be detected obtained, the classification testing result obtained with described at least two kinds of sorters compares, whether create correct classification testing result with the sorter at least two kinds of sorters described in judging, and compared result stores;
Every the period 1 of setting, the recall rate of the sorter described in calculating once according to stored comparative result at least two kinds of sorters, in wherein said at least two kinds of sorters, the recall rate of i-th kind of sorter is: the number of the correct classification testing result that i-th kind of sorter produces and the ratio of the number of all categories testing result that i-th kind of sorter produces in the current period 1 within the current period 1.
7. category detection method according to claim 6, it is characterized in that, after the recall rate of the sorter at least two kinds of sorters described in calculating once, also comprise: the ballot weight of the sorter at least two kinds of sorters described in upgrading once according to following formula:
w i ' = a i Σ i = 1 i = n a i
Wherein, a ifor the recall rate of i-th kind of sorter that this calculates; w i' for this upgrade after the ballot weight of i-th kind of sorter.
8. category detection method according to claim 6, is characterized in that, also comprises:
Sorter recall rate in described at least two kinds of sorters being all less than within a N continuous period 1 superseded threshold value removes, to redefine the sorter suitable with described content to be detected, wherein said N be greater than 1 integer.
9. category detection method according to claim 6, it is characterized in that, sorter in described at least two kinds of sorters comprises the Sample Storehouse storing initial sample, and described Sample Storehouse is trained to the disaggregated model for carrying out classification detection to described content to be detected obtained;
In the final classification testing result corresponding to described content to be detected that will obtain, after the classification testing result obtained with described at least two kinds of sorters compares, also comprise: if the sorter in described at least two kinds of sorters creates the classification testing result of mistake, then using described content to be detected as feedback samples, add create mistake classification testing result sorter Sample Storehouse in;
Every the second round of setting, training once creates the sorter Sample Storehouse of the classification testing result of mistake within current second round, the disaggregated model of the sorter of the classification testing result of mistake is created, to upgrade the described sorter creating the classification testing result of mistake according to this training result correction.
10. a content type pick-up unit, is characterized in that, comprising:
Content Feature Extraction unit, carries out feature extraction for treating Detection of content;
Content type detecting unit, for according to feature extraction result, adopts at least two kind sorters suitable with described content to be detected, carries out classification detection to described content to be detected;
Content detection result determining unit, for the classification testing result obtained according to described at least two kinds of sorters, determines the final classification testing result corresponding to described content to be detected.
11. classification pick-up units according to claim 10, is characterized in that, also comprise:
Contents acquiring unit, treats before Detection of content carries out feature extraction, for obtaining web page contents, as content to be detected according to URL(uniform resource locator) at described Content Feature Extraction unit;
Described Content Feature Extraction unit, comprising:
Text character extraction subelement, if for comprising content of text in described web page contents, then carry out feature extraction based on Text character extraction algorithm to described content of text, and is added into the characteristic set of web page contents by feature extraction result;
Picture feature extracts subelement, if for comprising image content in described web page contents, then target signature identification is carried out to described image content, set up the proper vector of described image content according to target signature recognition result, be added into the characteristic set of described web page contents.
12. classification pick-up units according to claim 11, is characterized in that, described Text character extraction algorithm is card side's algorithm;
Described picture feature extracts subelement, specifically for:
Statistic histogram model is adopted to carry out Face Detection to described image content;
Set up the proper vector of described image content according to Face Detection result, wherein said proper vector is by least one vector formed in following element:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, the ratio that area of skin color accounts for the ratio of colour of skin boundary rectangle, maximum colour of skin connected region accounts for whole picture region, maximum colour of skin connected region account for colour of skin boundary rectangle ratio and center picture region colour of skin ratio.
13. classification pick-up units according to any one of claim 10-12, it is characterized in that, described at least two kinds of sorters comprise at least two kinds in following sorter:
Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and logistic regression sorter.
14. classification pick-up units according to any one of claim 10-12, is characterized in that, described content detection result determining unit, specifically for:
According to the result of calculation of following formula, determine the final classification testing result corresponding to described content to be detected:
r = 0 Σ i = 1 i = n ( m i × w i ) ≤ σ 1 Σ i = 1 i = n ( m i × w i ) > σ
Wherein, i is integer; N for described in total number of at least two kinds of sorters; m ifor the classification testing result of i-th sorter in described at least two kinds of sorters, value is 1 or 0, and 0 classification representing described content to be detected is non-targeted classification, and 1 classification representing described content to be detected is target classification; w ifor the ballot weight of described i-th kind of sorter; σ is setting threshold value; R=1 represents that the final classification testing result of described content to be detected is described target classification, and r=0 represents that the final classification testing result of described content to be detected is not described target classification.
15. classification pick-up units according to claim 14, is characterized in that, also comprise:
Sorter detects result judging unit, for the classification testing result obtained according to described at least two kinds of sorters in described content detection result determining unit, after determining the final classification testing result corresponding to described content to be detected, by the final classification testing result corresponding to described content to be detected obtained, the classification testing result obtained with described at least two kinds of sorters compares, whether create correct classification testing result with the sorter at least two kinds of sorters described in judging, and compared result stores;
Sorter recall rate computing unit, for the period 1 every setting, according to described sorter detect comparative result that result judging unit stores calculate once described in the recall rate of sorter at least two kinds of sorters, in wherein said at least two kinds of sorters, the recall rate of i-th kind of sorter is: the number of the correct classification testing result that i-th kind of sorter produces and the ratio of the number of all categories testing result that i-th kind of sorter produces in the current period 1 within the current period 1.
16. classification pick-up units according to claim 15, is characterized in that, also comprise:
Sorter ballot weight updating block, after recall rate for the sorter at least two kinds of sorters described in calculating once at described sorter recall rate computing unit, the ballot weight of the sorter at least two kinds of sorters described in upgrading once according to following formula:
w i ' = a i Σ i = 1 i = n a i
Wherein, a ifor the recall rate of i-th kind of sorter that this calculates; w i' for this upgrade after the ballot weight of i-th kind of sorter.
17. classification pick-up units according to claim 15, is characterized in that, also comprise:
Unit eliminated by sorter, sorter for the recall rate in described at least two kinds of sorters being all less than within a N continuous period 1 superseded threshold value removes, to redefine the sorter suitable with described content to be detected, wherein said N be greater than 1 integer.
18. classification pick-up units according to claim 15, it is characterized in that, sorter in described at least two kinds of sorters comprises the Sample Storehouse storing initial sample, and described Sample Storehouse is trained to the disaggregated model for carrying out classification detection to described content to be detected obtained;
Described device also comprises:
Feedback samples adding device, for detecting the final classification testing result corresponding to described content to be detected that result judging unit will obtain at described sorter, after the classification testing result obtained with described at least two kinds of sorters compares, if the sorter in described at least two kinds of sorters creates the classification testing result of mistake, then using described content to be detected as feedback samples, add create mistake classification testing result sorter Sample Storehouse in;
Sorter amending unit, for the second round every setting, training once creates the sorter Sample Storehouse of the classification testing result of mistake within current second round, the disaggregated model of the sorter of the classification testing result of mistake is created, to upgrade the described sorter creating the classification testing result of mistake according to this training result correction.
CN201410569492.6A 2014-10-22 2014-10-22 content type detection method and device Active CN104391860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410569492.6A CN104391860B (en) 2014-10-22 2014-10-22 content type detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410569492.6A CN104391860B (en) 2014-10-22 2014-10-22 content type detection method and device

Publications (2)

Publication Number Publication Date
CN104391860A true CN104391860A (en) 2015-03-04
CN104391860B CN104391860B (en) 2018-03-02

Family

ID=52609764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410569492.6A Active CN104391860B (en) 2014-10-22 2014-10-22 content type detection method and device

Country Status (1)

Country Link
CN (1) CN104391860B (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951802A (en) * 2015-06-17 2015-09-30 中国科学院自动化研究所 Classifier updating method
CN104965905A (en) * 2015-06-30 2015-10-07 北京奇虎科技有限公司 Web page classifying method and apparatus
CN105426356A (en) * 2015-10-29 2016-03-23 杭州九言科技股份有限公司 Target information identification method and apparatus
CN105426354A (en) * 2015-10-29 2016-03-23 杭州九言科技股份有限公司 Sentence vector fusion method and apparatus
CN106383766A (en) * 2016-09-09 2017-02-08 北京百度网讯科技有限公司 System monitoring method and device
CN106649384A (en) * 2015-11-03 2017-05-10 中国电信股份有限公司 Method and device for classifying URL (Uniform Resource Locator)
WO2017124713A1 (en) * 2016-01-18 2017-07-27 华为技术有限公司 Data model determination method and apparatus
CN107193836A (en) * 2016-03-15 2017-09-22 腾讯科技(深圳)有限公司 A kind of recognition methods and device
CN107730286A (en) * 2016-08-10 2018-02-23 中国移动通信集团黑龙江有限公司 A kind of target customer's screening technique and device
CN107766234A (en) * 2017-08-31 2018-03-06 广州数沃信息科技有限公司 A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
CN107801090A (en) * 2017-11-03 2018-03-13 北京奇虎科技有限公司 Utilize the method, apparatus and computing device of audio-frequency information detection anomalous video file
CN107895119A (en) * 2017-12-28 2018-04-10 北京奇虎科技有限公司 Program installation packet inspection method, device and electronic equipment
CN107995152A (en) * 2016-10-27 2018-05-04 腾讯科技(深圳)有限公司 A kind of malicious access detection method, device and detection service device
CN108136263A (en) * 2015-08-20 2018-06-08 Cy游戏公司 Information processing system, program and server
CN108304483A (en) * 2017-12-29 2018-07-20 东软集团股份有限公司 A kind of Web page classification method, device and equipment
CN108509794A (en) * 2018-03-09 2018-09-07 中山大学 A kind of malicious web pages defence detection method based on classification learning algorithm
CN108804472A (en) * 2017-05-04 2018-11-13 腾讯科技(深圳)有限公司 A kind of webpage content extraction method, device and server
CN108932502A (en) * 2018-07-13 2018-12-04 希蓝科技(北京)有限公司 A kind of electrocardiogram template classification model modification system and method for self study
CN109344884A (en) * 2018-09-14 2019-02-15 腾讯科技(深圳)有限公司 The method and device of media information classification method, training picture classification model
CN110363223A (en) * 2019-06-20 2019-10-22 华南理工大学 Industrial flow data processing method, detection method, system, device and medium
CN110502552A (en) * 2019-08-20 2019-11-26 重庆大学 A kind of classification data conversion method based on fine tuning conditional probability
CN110852285A (en) * 2019-11-14 2020-02-28 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN110875874A (en) * 2018-09-03 2020-03-10 Oppo广东移动通信有限公司 Electronic red packet detection method and device and mobile terminal
CN111310096A (en) * 2020-02-25 2020-06-19 维沃移动通信有限公司 Content saving method, electronic device, and computer-readable storage medium
CN112347244A (en) * 2019-08-08 2021-02-09 四川大学 Method for detecting website involved in yellow and gambling based on mixed feature analysis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005279217A (en) * 2004-03-29 2005-10-13 Kohei Kadowaki Picture expressing device with body sensitive function
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101145171A (en) * 2007-09-15 2008-03-19 中国科学院合肥物质科学研究院 Gene microarray data predication method based on independent component integrated study
CN101251851A (en) * 2008-02-29 2008-08-27 吉林大学 Multi-classifier integrating method based on increment native Bayes network
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005279217A (en) * 2004-03-29 2005-10-13 Kohei Kadowaki Picture expressing device with body sensitive function
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation
CN101145171A (en) * 2007-09-15 2008-03-19 中国科学院合肥物质科学研究院 Gene microarray data predication method based on independent component integrated study
CN101251851A (en) * 2008-02-29 2008-08-27 吉林大学 Multi-classifier integrating method based on increment native Bayes network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张文博等: "一种自适应权值的多特征融合分类方法", 《系统工程与电子技术》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951802A (en) * 2015-06-17 2015-09-30 中国科学院自动化研究所 Classifier updating method
CN104965905B (en) * 2015-06-30 2018-05-04 北京奇虎科技有限公司 A kind of method and apparatus of Web page classifying
CN104965905A (en) * 2015-06-30 2015-10-07 北京奇虎科技有限公司 Web page classifying method and apparatus
WO2017000610A1 (en) * 2015-06-30 2017-01-05 北京奇虎科技有限公司 Webpage classification method and apparatus
US10909427B2 (en) 2015-06-30 2021-02-02 Beijing Qihoo Techology Company Limited Method and device for classifying webpages
CN108136263A (en) * 2015-08-20 2018-06-08 Cy游戏公司 Information processing system, program and server
CN105426356A (en) * 2015-10-29 2016-03-23 杭州九言科技股份有限公司 Target information identification method and apparatus
CN105426354A (en) * 2015-10-29 2016-03-23 杭州九言科技股份有限公司 Sentence vector fusion method and apparatus
CN105426354B (en) * 2015-10-29 2019-03-22 杭州九言科技股份有限公司 The fusion method and device of a kind of vector
CN105426356B (en) * 2015-10-29 2019-05-21 杭州九言科技股份有限公司 A kind of target information recognition methods and device
CN106649384A (en) * 2015-11-03 2017-05-10 中国电信股份有限公司 Method and device for classifying URL (Uniform Resource Locator)
WO2017124713A1 (en) * 2016-01-18 2017-07-27 华为技术有限公司 Data model determination method and apparatus
CN107193836A (en) * 2016-03-15 2017-09-22 腾讯科技(深圳)有限公司 A kind of recognition methods and device
CN107730286A (en) * 2016-08-10 2018-02-23 中国移动通信集团黑龙江有限公司 A kind of target customer's screening technique and device
US11276004B2 (en) 2016-09-09 2022-03-15 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for monitoring system
CN106383766A (en) * 2016-09-09 2017-02-08 北京百度网讯科技有限公司 System monitoring method and device
CN107995152A (en) * 2016-10-27 2018-05-04 腾讯科技(深圳)有限公司 A kind of malicious access detection method, device and detection service device
CN107995152B (en) * 2016-10-27 2020-07-03 腾讯科技(深圳)有限公司 Malicious access detection method and device and detection server
CN108804472A (en) * 2017-05-04 2018-11-13 腾讯科技(深圳)有限公司 A kind of webpage content extraction method, device and server
CN107766234A (en) * 2017-08-31 2018-03-06 广州数沃信息科技有限公司 A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
CN107801090A (en) * 2017-11-03 2018-03-13 北京奇虎科技有限公司 Utilize the method, apparatus and computing device of audio-frequency information detection anomalous video file
CN107895119A (en) * 2017-12-28 2018-04-10 北京奇虎科技有限公司 Program installation packet inspection method, device and electronic equipment
CN108304483A (en) * 2017-12-29 2018-07-20 东软集团股份有限公司 A kind of Web page classification method, device and equipment
CN108304483B (en) * 2017-12-29 2021-01-19 东软集团股份有限公司 Webpage classification method, device and equipment
CN108509794A (en) * 2018-03-09 2018-09-07 中山大学 A kind of malicious web pages defence detection method based on classification learning algorithm
CN108932502A (en) * 2018-07-13 2018-12-04 希蓝科技(北京)有限公司 A kind of electrocardiogram template classification model modification system and method for self study
CN110875874A (en) * 2018-09-03 2020-03-10 Oppo广东移动通信有限公司 Electronic red packet detection method and device and mobile terminal
CN109344884A (en) * 2018-09-14 2019-02-15 腾讯科技(深圳)有限公司 The method and device of media information classification method, training picture classification model
CN109344884B (en) * 2018-09-14 2023-09-12 深圳市雅阅科技有限公司 Media information classification method, method and device for training picture classification model
CN110363223A (en) * 2019-06-20 2019-10-22 华南理工大学 Industrial flow data processing method, detection method, system, device and medium
CN112347244A (en) * 2019-08-08 2021-02-09 四川大学 Method for detecting website involved in yellow and gambling based on mixed feature analysis
CN110502552B (en) * 2019-08-20 2022-10-28 重庆大学 Classification data conversion method based on fine-tuning conditional probability
CN110502552A (en) * 2019-08-20 2019-11-26 重庆大学 A kind of classification data conversion method based on fine tuning conditional probability
CN110852285A (en) * 2019-11-14 2020-02-28 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN110852285B (en) * 2019-11-14 2023-04-18 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN111310096A (en) * 2020-02-25 2020-06-19 维沃移动通信有限公司 Content saving method, electronic device, and computer-readable storage medium

Also Published As

Publication number Publication date
CN104391860B (en) 2018-03-02

Similar Documents

Publication Publication Date Title
CN104391860A (en) Content type detection method and device
CN103605794B (en) Website classifying method
CN105426356B (en) A kind of target information recognition methods and device
CN102651088B (en) Classification method for malicious code based on A_Kohonen neural network
CN106600423A (en) Machine learning-based car insurance data processing method and device and car insurance fraud identification method and device
CN107122375A (en) The recognition methods of image subject based on characteristics of image
CN104239485A (en) Statistical machine learning-based internet hidden link detection method
CN103577755A (en) Malicious script static detection method based on SVM (support vector machine)
CN105069141A (en) Construction method and construction system for stock standard news library
CN103345528A (en) Text classification method based on correlation analysis and KNN
CN107545038B (en) Text classification method and equipment
CN105354198A (en) Data processing method and apparatus
CN110647995A (en) Rule training method, device, equipment and storage medium
CN108446616A (en) Method for extracting roads based on full convolutional neural networks integrated study
CN102436512B (en) Preference-based web page text content control method
CN109191210A (en) A kind of broadband target user's recognition methods based on Adaboost algorithm
CN111126820A (en) Electricity stealing prevention method and system
CN107368526A (en) A kind of data processing method and device
CN113590764B (en) Training sample construction method and device, electronic equipment and storage medium
CN103279944A (en) Image division method based on biogeography optimization
CN106484913A (en) Method and server that a kind of Target Photo determines
CN103902706A (en) Method for classifying and predicting big data on basis of SVM (support vector machine)
CN104657391B (en) The processing method and processing device of the page
CN112888008A (en) Base station abnormity detection method, device, equipment and storage medium
CN106485188A (en) A kind of industrial exchanger user anomaly detection method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20190809

Address after: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer 2

Patentee after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

Address before: 100091 Beijing, Haidian District, northeast Wang West Road, No. 4, Zhongguancun Software Park, building C, block, 1-03

Patentee before: Pacify a Heng Tong (Beijing) Science and Technology Ltd.

TR01 Transfer of patent right