CN104391860B - content type detection method and device - Google Patents

content type detection method and device Download PDF

Info

Publication number
CN104391860B
CN104391860B CN201410569492.6A CN201410569492A CN104391860B CN 104391860 B CN104391860 B CN 104391860B CN 201410569492 A CN201410569492 A CN 201410569492A CN 104391860 B CN104391860 B CN 104391860B
Authority
CN
China
Prior art keywords
grader
content
classification
testing result
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410569492.6A
Other languages
Chinese (zh)
Other versions
CN104391860A (en
Inventor
唐呈光
张兵
杨念
耿志峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Iyuntian Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iyuntian Co ltd filed Critical Iyuntian Co ltd
Priority to CN201410569492.6A priority Critical patent/CN104391860B/en
Publication of CN104391860A publication Critical patent/CN104391860A/en
Application granted granted Critical
Publication of CN104391860B publication Critical patent/CN104391860B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a content type detection method and a content type detection device. The method comprises the following steps: extracting the characteristics of the content to be detected; performing category detection on the content to be detected by adopting at least two classifiers matched with the content to be detected according to a feature extraction result; and determining a final class detection result corresponding to the content to be detected according to the class detection results obtained by the at least two classifiers. The technical scheme provided by the embodiment of the invention can automatically detect the type of the acquired content, shorten the detection time and reduce the detection cost.

Description

Content type detection method and device
Technical field
The present embodiments relate to Classification and Identification technical field, more particularly to a kind of content type detection method and device.
Background technology
With the development of Internet technology, the information on internet is all the time all with the swift and violent increasing of exponential speed Add, people obtain and use information mode is also more and more various and facilitation.But internet is brought in the life to people While convenient, the also life to people brings many negative effects.For example the number of site on internet is in profit , can be by some unsound content displayings to user, so as to have a strong impact on that user's browses body with the purpose for improving clicking rate Test, for teenager, these contents can produce material impact to its physical and mental development.
At present, the discriminating majority to web site contents (such as Pornograph) is based on artificial judgement, although this method Accurately, but efficiency is low, and needs to expend substantial amounts of man power and material, can not tackle what is increasingly spread unchecked in current site at all Harmful content.
The content of the invention
The embodiment of the present invention provides a kind of content type detection method and device, can enter to the classification of acquired content Row automatic detection, shorten detection time, reduce testing cost.
In a first aspect, the embodiments of the invention provide a kind of content type detection method, this method includes:
Treat detection content and carry out feature extraction;
According to feature extraction result, using at least two graders being adapted with the content to be detected, treated to described Detection content carries out classification detection;
The classification testing result obtained according at least two grader, it is determined that corresponding to the content to be detected most Whole classification testing result.
Second aspect, the embodiment of the present invention additionally provide a kind of content type detection means, and the device includes:
Content Feature Extraction unit, feature extraction is carried out for treating detection content;
Content type detection unit, for according to feature extraction result, using be adapted with the content to be detected to Few two kinds of graders, classification detection is carried out to the content to be detected;
Content detection result determining unit, for the classification testing result obtained according at least two grader, really Surely the final classification testing result of the content to be detected is corresponded to.
Technical scheme provided in an embodiment of the present invention, the feature that detection content is treated using grader are detected, and are realized The automatic identification of detection content generic is treated, spent manpower and thing can be substantially reduced compared to artificial detection Power, shorten detection time, reduce testing cost;Also, the classification testing result based on Various Classifiers on Regional is treated to determine to correspond to It the final classification testing result of detection content, can effectively ensure the correctness of classification testing result, improve accuracy of detection.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention one provides;
Fig. 2 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention two provides;
Fig. 3 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural representation for content type detection means that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural representation for content type detection means that the embodiment of the present invention five provides;
Fig. 6 is a kind of schematic flow sheet for preferable content type detection method that the embodiment of the present invention six provides.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one:
Fig. 1 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention one provides, and the present embodiment can The situation of classification detection is carried out suitable for treating detection content, this method can be performed by classification detection means, described device Realized by software and/or hardware.Referring to Fig. 1, the content type detection method that the present embodiment provides specifically includes following operation:
Operation 110, treat detection content progress feature extraction.
In the present embodiment, content to be detected can be stored in advance in local, or be obtained in real time from other equipment Obtained text and/or the content of picture format.For example, content to be detected is the HTML to being obtained from the server in internet (HyperText Mark-up Language, HTML) page, parsed to obtain include text and/or The web page contents of picture format.
For the content of text formatting, the texts such as card side, document frequency, information gain, mutual information, cross entropy can be based on Eigen extraction algorithm, feature extraction is carried out to it;For the content of picture format, mesh can be carried out to image content first The identification of thing is marked, establishes the characteristic vector of the image content according to recognition result afterwards.Wherein, the characteristic vector may include mesh Mark the area of thing, number, position and account for the elements such as whole picture region ratio.
Operate 120, according to feature extraction result, using at least two graders being adapted with content to be detected, treat Detection content carries out classification detection.
In the present embodiment, at least two graders being adapted with content to be detected, every kind of grader are pre-created The detection for treating detection content generic can independently be realized.Specifically, every kind of grader can realize that treating detection content enters The detection of at least one classification of row, such as it is to belong to target classification to detect the classification of the content to be detected, is still not belonging to target Classification, or which kind of target classification that the classification of the content to be detected belongs in plurality of target classification detected.
The establishment process of various graders can be specially:Great amount of samples in sample library storage is trained;According to instruction Practice result and obtain the disaggregated model for belonging to this grader.A part of the disaggregated model as grader, it is inputted and output is The input and output of its corresponding grader.Wherein, the great amount of samples stored in Sample Storehouse need to belong to target classification including its classification One group of sample and its classification be not belonging to the other another group of sample of target class;Training to sample includes putting forward sample progress feature Take, this feature extraction algorithm should be consistent with the above-mentioned feature extraction algorithm for treating detection content.
, can be using its feature extraction result as described at least two after treating detection content and carrying out feature extraction The input of the disaggregated model of grader, corresponded to being handled respectively feature extraction result using each disaggregated model to generate The classification testing result of content to be detected, and classification testing result is exported.
In embodiments of the present invention, at least two graders being adapted with content to be detected may include in following grader At least two:SVMs (SVM, Support Vector Machine) grader, naive Bayesian (Bayes) classification Device, k nearest neighbor distance (KNN, k-NearestNeighbor) grader, decision tree (ID3, Iterative Dichotomiser 3) grader and logistic regression (Logistic) grader.
Operation 130, the classification testing result obtained according at least two graders, it is determined that corresponding to content to be detected most Whole classification testing result.
After the classification for treating detection content respectively using grader not of the same race is detected, setting rule can be based on, it is right Resulting classification testing result is handled, to determine the final classification testing result corresponding to content to be detected.Specifically Processing procedure can be:Each number with identical category testing result in all categories testing result obtained by statistics; Corresponding identical category testing result detects as the final classification corresponding to content to be detected in the case of using number maximum As a result.Detected for example, the classification that 5 kinds of graders treat detection content has been respectively adopted, its testing result is followed successively by:It is to be checked Survey content to belong to target classification, be not belonging to target classification, belong to target classification, be not belonging to target classification, belong to target classification, then Resulting statistical result is in processing procedure:Testing result is that to belong to the other number of target class be 3 to content to be detected, detection knot Fruit is that to be not belonging to the other number of target class be 2 to content to be detected, therefore corresponding to the final classification testing result of content to be detected For:Target detection content belongs to target classification.
Certainly, its processing procedure can also be other modes, and the present embodiment is not construed as limiting to this.For example, it can be directed in advance Different classification testing results assigns different values, such as it is that content to be detected belongs to the 1st target to assign classification testing result The value of classification is 1, and classification testing result is that content to be detected belongs to the 2nd other value of target class for 2, and classification testing result was both The 1st target classification is not belonging to, is also not belonging to the 2nd other value of target class as 0;Then, by corresponding to all categories testing result Value is weighted to obtain a new value, and then determines to examine corresponding to the final classification of content to be detected according to new value Survey result.Wherein, the weight corresponding to the value of any classification testing result, it is to obtain category testing result in advance that can be The weighted value that corresponding grader assigns.
The technical scheme that the present embodiment provides, the feature that detection content is treated using grader detected, and is realized pair The automatic identification of content generic to be detected, man power and material can be substantially reduced compared to artificial detection, when shortening detection Between, reduce testing cost;Also, the classification testing result based on Various Classifiers on Regional is final corresponding to content to be detected to determine It classification testing result, can effectively ensure the correctness of classification testing result, improve accuracy of detection.
Embodiment two:
Fig. 2 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention two provides, and the present embodiment exists On the basis of above-described embodiment one, add the operation for obtaining content to be detected, and based on the operation aforesaid operations 110 are made into One-step optimization.Referring to Fig. 2, the content type detection method that the present embodiment provides specifically includes following operation:
Operate 210, web page contents are obtained according to URL, as content to be detected;
If including content of text in operation 220, web page contents, content of text is entered based on Text character extraction algorithm Row feature extraction, and the characteristic set by feature extraction result added to web page contents;
If including image content in operation 230, web page contents, target signature identification is carried out to image content, according to Target signature recognition result establishes the characteristic vector of image content, added to the characteristic set of web page contents;
Operation 240, the characteristic set according to web page contents, using at least two graders being adapted with web page contents, Classification detection is carried out to web page contents;
Operation 250, the classification testing result obtained according at least two graders, it is determined that corresponding to the final of web page contents Classification testing result.
In the present embodiment, it can send resource based on the URL prestored to corresponding server and obtain Request, the html page that the reception server returns according to the request are taken, and html page is parsed, is wherein wrapped with extraction The content of text and image content contained, as accessed web page contents, namely content to be detected.
Text character extraction algorithm can be card side, document frequency, information gain, mutual information or cross entropy etc.;Target The grader that feature is adapted to the content to be detected classification to be detected is associated, such as whether grader will detect web page contents In the case of belonging to yellow harmful content classification, target signature can be features of skin colors.
If web page contents include content of text and image content simultaneously, can by the feature extraction result of content of text and The characteristic vector of image content is in the lump as the feature extraction result for treating detection content, to carry out the detection of following categories.When So, it is determined to save the cost of classification detection and spent time, the main contents that also can first treat detection content, with Judge that it is content of text, or image content, the feature extraction result only using identified main contents is as treating afterwards The feature extraction result of detection content, to carry out the detection of following categories.
It is for detecting whether web page contents belong to the specific of yellow harmful content this classification for category detection method Application scenarios, in a kind of embodiment of the present embodiment, Text character extraction algorithm is preferably the side's of card algorithm;To picture Content carries out target signature identification, and the characteristic vector of image content is established according to target signature recognition result, including:
Face Detection is carried out to image content using statistic histogram model;
Establish the characteristic vector of image content according to Face Detection result, wherein characteristic vector be by following element extremely A few vector formed:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for colour of skin boundary rectangle Ratio, maximum colour of skin connected region accounts for the ratio of whole picture region, maximum colour of skin connected region accounts for colour of skin boundary rectangle Ratio and center picture region colour of skin ratio.
In this embodiment, Face Detection is carried out to image content, can identify the skin included in picture Color area information, the information may include the number, size, location and shape of area of skin color, can determine accordingly in above-mentioned vector Either element.Wherein, center picture region colour of skin ratio refers to:The colour of skin area included in the setting central area of picture Domain accounts for the ratio of the central area.
The technical scheme of the present embodiment, the feature of web page contents is detected using grader not of the same race, realized pair The automatic identification of web page contents generic, particularly, it automatically can detect to belong to yellow from substantial amounts of web page contents The content of bad classification.Compared to artificial detection, the present embodiment can substantially reduce the man power and material spent by it, shorten inspection The time is surveyed, reduces testing cost.
Embodiment three:
Fig. 3 is a kind of schematic flow sheet for content type detection method that the embodiment of the present invention three provides, and the present embodiment exists On the basis of the various embodiments described above, to " the classification testing result obtained according at least two graders, it is determined that corresponding in webpage Further optimization is made in the operation of the final classification testing result held ", and accordingly adds Optimum Classification device and its ballot weight Operation.Referring to Fig. 3, the content type detection method that the present embodiment provides specifically includes following operation:
Operation 310, treat detection content progress feature extraction;
Operate 320, according to feature extraction result, using at least two graders being adapted with content to be detected, treat Detection content carries out classification detection;
Operation 330, the result of calculation according to equation below, it is determined that detecting knot corresponding to the final classification of content to be detected Fruit:
Wherein, i is integer;N is the total number of at least two graders;miFor i-th of classification at least two graders The classification testing result of device, value are that 1 or 0,0 classification for representing content to be detected represents content to be detected as non-targeted classification, 1 Classification be target classification;wiFor the ballot weight of i-th kind of grader;σ is given threshold;R=1 represents content to be detected most Whole classification testing result is target classification, and r=0 represents that the final classification testing result of content to be detected is not target classification.
In the present embodiment, the initial value of the ballot weight of various graders can be pre-arranged, all graders Ballot weight sum is 1.For example, the initial value of each ballot weight can be set equal, also can be according to grader not of the same race The size of accuracy of detection, to set ballot weight, specifically, for the bigger grader of accuracy of detection, for the ballot of its imparting Weight is then bigger.
The technical scheme that the present embodiment provides, ballot is carried out by the classification testing result for obtaining Various Classifiers on Regional and added Power, to determine the final classification testing result corresponding to content to be detected, it is possible to increase treat the inspection of detection content generic Survey precision so that classification testing result is more nearly the classification belonging to content reality to be detected.
In view of the grader being pre-designed, due to the finiteness of sample stored in its Sample Storehouse, can not ensure Resulting classification testing result is necessarily correct.Therefore, on the basis of above-mentioned technical proposal, can be to grader and its ballot side Formula makees further optimization, to improve the accuracy of final classification testing result.
Specifically, in the classification testing result obtained according at least two graders, it is determined that corresponding to content to be detected After final classification testing result, in addition to:
The final classification testing result corresponding to content to be detected that will be obtained, the classification obtained with least two graders Testing result is compared, and correct classification testing result whether is generated with the grader judged at least two graders, And compared result is stored;Specifically, can be by final classification testing result, the classification obtained respectively with various graders is examined Survey result to be compared, to judge whether various graders generate correct classification testing result;
Every the period 1 of setting, the classification at least two graders is calculated according to the comparative result stored The recall rate of i-th kind of grader is in the recall rate of device, wherein at least two kinds of graders:The i-th kind point within the current period 1 The number of correct classification testing result caused by class device is examined with all categories caused by i-th kind of grader in the current period 1 Survey the ratio of the number of result.
For example, in nearest seven days, certain grader has carried out the classification detection operation of 50 times altogether, and according to being stored Comparative result, it is known that the grader has in the classification detection operation of described 50 times and generates correct classification detection for 30 times As a result, 20 times in addition then generate mistake classification testing result, therefore the grader in nearest seven days obtained by detection Rate is:30/50=0.6.
In embodiments of the present invention, on the one hand, can the recall rate based on each grader, to update the franchise of each grader Weight.Specifically, after the recall rate for the grader being calculated at least two graders, may also include:According to such as Lower formula updates the ballot weight of the grader at least two graders:
Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' classify for i-th kind after this renewal The ballot weight of device.
In embodiments of the present invention, on the other hand, can also the recall rate based on each grader, it is pre-designed to eliminate Corresponding grader at least two graders.Specifically, category detection method provided in an embodiment of the present invention can also be further Including:
The grader that recall rate at least two graders is respectively less than into superseded threshold value within continuous N number of period 1 enters Row removes, and to redefine the grader being adapted with content to be detected, wherein N is the integer more than 1.
Certainly or by other means, the corresponding classification at least two pre-designed graders is eliminated Device.For example, calculate the average recall rate of various graders in continuous N number of period 1;If minimum average detected rate is less than Threshold value is eliminated, then is eliminated its corresponding grader.
It should be noted that after some grader is eliminated, the various graders remained need to be redefined Ballot weight, to ensure its ballot weight sum as 1.Specifically, the ballot weight of the various graders newly determined can be The equal value automatically generated or the recall rate based on each grader remained, each classification redefined The ballot weight of device, the determination process can be found in the process of the ballot weight of each grader of above-mentioned renewal, will not be repeated here.
On the basis of above-mentioned technical proposal, the grader at least two graders includes the sample for being stored with initial sample This storehouse, and the disaggregated model for being used to treat detection content and carrying out classification detection for being trained to obtain to Sample Storehouse;
In the final classification testing result corresponding to content to be detected that will be obtained, the class obtained with least two graders After other testing result is compared, in addition to:If the grader at least two graders generates the classification inspection of mistake Result is surveyed, then generates the sample of the grader of the classification testing result of mistake using content to be detected as feedback samples, addition In storehouse;
Every the second round of setting, training once generates the classification testing result of mistake within current second round Grader Sample Storehouse, the disaggregated model of the grader of the classification testing result of mistake is generated according to this training result amendment, It is updated with the grader of the classification testing result to generating mistake.
Preferably, second round can be seven days.
The embodiment of the present invention can overcome grader few in sample size by the self-correction to Sample Storehouse and ballot weight In the case of treat detection content classification detection it is inaccurate the problem of, so as to improve accuracy of detection.
Example IV:
Fig. 4 show a kind of structural representation of content type detection means of the offer of the embodiment of the present invention four, this implementation Example is applicable to treat the situation that detection content carries out classification detection.Referring to Fig. 4, the concrete structure of the content type detection means It is as follows:
Content Feature Extraction unit 410, feature extraction is carried out for treating detection content;
Content type detection unit 420, for according to feature extraction result, using what is be adapted with the content to be detected At least two graders, classification detection is carried out to the content to be detected;
Content detection result determining unit 430, for the classification testing result obtained according at least two grader, It is determined that the final classification testing result corresponding to the content to be detected.
In a kind of preferred embodiment of the present embodiment, described device also includes:
Contents acquiring unit 400, the Content Feature Extraction unit 410 treat detection content carry out feature extraction it Before, for obtaining web page contents according to URL, as content to be detected;
The Content Feature Extraction unit 410, including:
Text character extraction subelement 4101, if for including content of text in the web page contents, based on text Feature extraction algorithm carries out feature extraction to the content of text, and feature extraction result is added to the feature set of web page contents Close;
Picture feature extracts subelement 4102, if for including image content in the web page contents, to the figure Piece content carries out target signature identification, and the characteristic vector of the image content is established according to target signature recognition result, is added to The characteristic set of the web page contents.
Further, the Text character extraction algorithm is card side's algorithm;
The picture feature extracts subelement 4102, is specifically used for:
Face Detection is carried out to the image content using statistic histogram model;
The characteristic vector of the image content is established according to Face Detection result, wherein the characteristic vector is by following member At least one formed vector in element:
Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for colour of skin boundary rectangle Ratio, maximum colour of skin connected region accounts for the ratio of whole picture region, maximum colour of skin connected region accounts for colour of skin boundary rectangle Ratio and center picture region colour of skin ratio.
In embodiments of the present invention, at least two grader includes at least two in following grader:
Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and patrol Collect recurrence grader.
The said goods can perform the method that the embodiment of the present invention one and embodiment two are provided, and it is corresponding to possess execution method Functional module and beneficial effect.Not ins and outs of detailed description in the present embodiment, refer to embodiment one and embodiment two.
Embodiment five:
Fig. 5 is a kind of structural representation for content type detection means that the embodiment of the present invention five provides, and the present embodiment exists On the basis of above-described embodiment four, further optimization is made to the structure of content testing result determining unit, and accordingly add The corresponding units of Optimum Classification device and its ballot weight.Referring to Fig. 5, the present embodiment provide content type detection means it is specific Structure is as follows:
Content Feature Extraction unit 510, feature extraction is carried out for treating detection content;
Content type detection unit 520, for according to feature extraction result, using what is be adapted with the content to be detected At least two graders, classification detection is carried out to the content to be detected;
Content detection result determining unit 530, for the classification testing result obtained according at least two grader, It is determined that the final classification testing result corresponding to the content to be detected.
Further, the content detection result determining unit 530, is specifically used for:
According to the result of calculation of equation below, it is determined that the final classification testing result corresponding to the content to be detected:
Wherein, i is integer;N is the total number of at least two grader;miFor at least two grader The classification testing result of i grader, value are 1 or 0,0 to represent the classification of the content to be detected as non-targeted classification, 1 generation The classification of content to be detected described in table is target classification;wiFor the ballot weight of i-th kind of grader;σ is given threshold;r The final classification testing result of=1 expression content to be detected is the target classification, and r=0 represents the content to be detected Final classification testing result be not the target classification.
Further, on the basis of above-mentioned technical proposal, described device also includes:
Grader detection result judging unit 540, for the content detection result determining unit 530 according to it is described extremely The classification testing result that few two kinds of graders obtain, it is determined that corresponding to the content to be detected final classification testing result it Afterwards, the final classification testing result corresponding to the content to be detected that will be obtained, is obtained with least two grader Classification testing result is compared, and is examined with judging whether the grader at least two grader generates correct classification Result is surveyed, and compared result is stored;
Grader recall rate computing unit 550, for the period 1 every setting, result is detected according to the grader The comparative result that judging unit 540 stores calculates the recall rate of the grader in once at least two grader, wherein institute The recall rate for stating i-th kind of grader at least two graders is:It is correct caused by i-th kind of grader within the current period 1 Classification testing result number and current period 1 in the number of all categories testing result caused by i-th kind of grader Ratio.
Further, described device also includes:
Grader ballot weight updating block 560, for being calculated one in the grader recall rate computing unit 550 After the recall rate of grader in secondary at least two grader, according to once described at least two points of equation below renewal The ballot weight of grader in class device:
Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' classify for i-th kind after this renewal The ballot weight of device.
Further, described device also includes:
Grader eliminates unit 570, for by the recall rate at least two grader in continuous N number of period 1 The interior grader for being respectively less than superseded threshold value is removed, to redefine the grader being adapted with the content to be detected, its Described in N be integer more than 1.
On the basis of above-mentioned technical proposal, the grader at least two grader includes being stored with initial sample Sample Storehouse, and the classification for being used to carry out the content to be detected classification detection for being trained to obtain to the Sample Storehouse Model;
Described device also includes:
Feedback samples adding device 580, for corresponding in grader detection result judging unit 540 by what is obtained The final classification testing result of the content to be detected, the classification testing result obtained with least two grader are compared , will be described to be checked if the grader at least two grader generates the classification testing result of mistake after relatively Content is surveyed as feedback samples, in the Sample Storehouse for the grader for adding the classification testing result for generating mistake;
Grader amending unit 590, for the second round every setting, training once produces within current second round The grader Sample Storehouse of the classification testing result of mistake, generate according to this training result amendment the classification inspection of mistake The disaggregated model of the grader of result is surveyed, is updated with the grader to the classification testing result for generating mistake.
The said goods can perform the method that the embodiment of the present invention one, embodiment two and embodiment three are provided, and possess execution The corresponding functional module of method and beneficial effect.Not ins and outs of detailed description in the present embodiment, refer to embodiment one, Embodiment two and embodiment three.
Embodiment six:
Fig. 6 is a kind of schematic flow sheet for preferable content type detection method that the embodiment of the present invention six provides.This reality Applying example can be based on above-mentioned all embodiment, there is provided a kind of preferred embodiment.The content provided referring to Fig. 6, the present embodiment Category detection method specifically includes following operation:
The to the effect that content of text or image content that operation 610, detection web page contents are included.
Operation 620, the main contents included to web page contents carry out feature extraction.
Operate 630, according to feature extraction result, SVM, Bayes, KNN, the ID3 being adapted with web page contents is respectively adopted And Logistic graders, classification detection is carried out to web page contents.
Wherein, various graders include the Sample Storehouse for being stored with initial sample, and the Sample Storehouse is trained What is obtained is used to carry out web page contents the disaggregated model of classification detection.
For SVM classifier, it the advantage is that:Inputted by the way that classification input is changed into numerical value, support can be made Vector supports grouped data and numeric data simultaneously, is adapted to large-scale data.
For Bayes graders, it the advantage is that:Possessed when the acceptable training of great amount of samples data and inquiry At high speed, incremental training is supported;Explanation to grader actual learning is relatively easy.
For KNN graders, it the advantage is that:Numerical prediction can be carried out using complicated function, while kept again The characteristics of being easily understood;Rational data zooming amount;New samples data can be added at any time, without re-starting instruction Practice.
For ID3 graders, it the advantage is that:It is easily explained a model of undergoing training, and the grader is set Mostly important factor of judgment has all been arranged in the root position close to tree by calculating method well;Can simultaneously treatment classification number According to and numeric data;It is easily handled influencing each other between variable;It is adapted to small-scale data.
For Logistic graders, it the advantage is that:Analysis linear relationship is good at, to integrally-built point of data Analysis is better than decision tree.
Operate 640, the classification testing result that each grader obtains is weighted ballot.
Operation 650, the final classification testing result for corresponding to web page contents is obtained according to Nearest Neighbor with Weighted Voting result.
Operation 660, the recall rate to various graders are monitored.
Specifically, the final classification testing result that will be obtained respectively, the classification testing result obtained with various graders are entered Row compares, and whether generates correct classification testing result with the grader judged in various graders, and compared result is entered Row storage;
Every seven days, according to the comparative result stored, the recall rate of various graders is calculated once.
Operation 670, the recall rate according to the various graders monitored, update the ballot weight of various graders.
Operation 680, the Sample Storehouse to grader are updated.
If specifically, any of above-mentioned five kinds of graders grader generate mistake classification testing result, The Sample Storehouse of the grader of the classification testing result of mistake is generated using its corresponding web page contents as feedback samples, addition In.Certainly, only the URL of web page contents can also be added in the Sample Storehouse of grader for the classification testing result for generating mistake. Before being subsequently trained to the Sample Storehouse, html page is just obtained from server based on the URL added in real time, and parse life Into web page contents, as the feedback samples in Sample Storehouse.
Operation 690, superseded and training management is carried out to grader.
Specifically, grader is eliminated, including:
The grader that recall rate in above-mentioned five kinds of graders was respectively less than into superseded threshold value in continuous one month is moved Remove, to redefine the grader being adapted with web page contents.
Being trained management to grader includes:
Every seven days, training once when generated in the first seven day mistake classification testing result grader Sample Storehouse, The disaggregated model of the grader of the classification testing result of mistake is generated according to this training result amendment, with to generating mistake The grader of classification testing result be updated, so can regularly correct the classification deviation of grader, and then improve point Class device carries out the degree of accuracy of classification detection.
The technical scheme beneficial effect specific as follows that the present embodiment provides:
First, the detection to web site contents classification can be realized automatically, is used manpower and material resources sparingly, and detection efficiency is high;
Second, the Algorithm for Training of a variety of machine learning is produced into the class of a prediction into Various Classifiers on Regional, every kind of grader Other testing result, multiple prediction result Nearest Neighbor with Weighted Voting are produced into a final result, so that the classification to web site contents Detect that its correctness is relatively reliable, has foundation;
3rd, by the way that to the self-correction of sample and ballot weight in each grader, general machine learning model can be overcome In the case where sample size is few, classification detects the problem of inaccurate;
4th, the correctness for the classification testing result that grader is predicted every time is counted, and periodically examines accordingly The accuracy rate of grader is examined, is eliminated for accuracy rate grader on the lower side always, can to examine web site contents classification The degree of accuracy of survey is higher.
Pay attention to, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art various obvious changes, Readjust and substitute without departing from protection scope of the present invention.Therefore, although being carried out by above example to the present invention It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims (14)

  1. A kind of 1. content type detection method, it is characterised in that including:
    Treat detection content and carry out feature extraction;
    According to feature extraction result, using at least two graders being adapted with the content to be detected, to described to be detected Content carries out classification detection;
    The classification testing result that each grader is obtained carries out quantification treatment, and determines the ballot weight of each grader;Its In, it is bigger for the bigger grader of accuracy of detection, corresponding ballot weight;
    Calculate the quantized value of all graders and the weighted sum of ballot weight;
    If the weighted sum is less than or equal to given threshold, then it represents that the final classification testing result of the content to be detected is mesh Classification is marked, the weighted sum is more than given threshold, and the final classification testing result for representing the content to be detected is not the mesh Mark classification;
    The final classification testing result corresponding to the content to be detected that will be obtained, is obtained with least two grader Classification testing result is compared, and is examined with judging whether the grader at least two grader generates correct classification Result is surveyed, and compared result is stored;
    Every the period 1 of setting, the classification in once at least two grader is calculated according to the comparative result stored The recall rate of device, wherein the recall rate of i-th kind of grader is at least two grader:I-th within the current period 1 The number of correct classification testing result caused by kind grader and all classes caused by i-th kind of grader in the current period 1 The ratio of the number of other testing result;
    According to the recall rate of each grader, update the ballot weight of each grader or eliminate at least two grader Corresponding grader.
  2. 2. category detection method according to claim 1, it is characterised in that treat detection content carry out feature extraction it Before, in addition to:Web page contents are obtained according to URL, as content to be detected;
    Treat detection content and carry out feature extraction, including:
    If including content of text in the web page contents, the content of text is carried out based on Text character extraction algorithm special Sign extraction, and the characteristic set by feature extraction result added to web page contents;
    If including image content in the web page contents, target signature identification is carried out to the image content, according to target Feature recognition result establishes the characteristic vector of the image content, added to the characteristic set of the web page contents.
  3. 3. category detection method according to claim 2, it is characterised in that the Text character extraction algorithm is calculated for card side Method;
    Target signature identification is carried out to the image content, the feature of the image content is established according to target signature recognition result Vector, including:
    Face Detection is carried out to the image content using statistic histogram model;
    The characteristic vector of the image content is established according to Face Detection result, wherein the characteristic vector is by following element At least one formed vector:
    Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for the ratio of colour of skin boundary rectangle Example, maximum colour of skin connected region account for the ratio of whole picture region, maximum colour of skin connected region accounts for the ratio of colour of skin boundary rectangle With center picture region colour of skin ratio.
  4. 4. according to the category detection method any one of claim 1-3, it is characterised in that at least two grader Including at least two in following grader:
    Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and logic are returned Return grader.
  5. 5. category detection method according to claim 1, it is characterised in that be calculated once described at least two points After the recall rate of grader in class device, in addition to:According in equation below renewal once at least two grader The ballot weight of grader:
    <mrow> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&amp;prime;</mo> </msup> <mo>=</mo> <mfrac> <msub> <mi>a</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>=</mo> <mi>n</mi> </mrow> </msubsup> <msub> <mi>a</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>
    Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' it is this i-th kind of grader after updating Ballot weight.
  6. 6. category detection method according to claim 1, it is characterised in that also include:
    The grader that recall rate at least two grader is respectively less than into superseded threshold value within continuous N number of period 1 enters Row removes, to redefine the grader being adapted with the content to be detected, wherein the N is the integer more than 1.
  7. 7. category detection method according to claim 1, it is characterised in that the grader at least two grader Sample Storehouse including being stored with initial sample, and be used for what the Sample Storehouse was trained to obtain to the content to be detected Carry out the disaggregated model of classification detection;
    In the final classification testing result corresponding to the content to be detected that will be obtained, obtained with least two grader Classification testing result be compared after, in addition to:If the grader at least two grader generates mistake Classification testing result, then using the content to be detected as feedback samples, add the classification testing result that generates mistake In the Sample Storehouse of grader;
    Every the second round of setting, training once generates the classification of the classification testing result of mistake within current second round Device Sample Storehouse, the disaggregated model of the grader of the classification testing result of mistake is generated according to this training result amendment, It is updated with the grader to the classification testing result for generating mistake.
  8. A kind of 8. content type detection means, it is characterised in that including:
    Content Feature Extraction unit, feature extraction is carried out for treating detection content;
    Content type detection unit, for according to feature extraction result, using at least two be adapted with the content to be detected Kind grader, classification detection is carried out to the content to be detected;
    Content detection result determining unit, is used for:
    The classification testing result that each grader is obtained carries out quantification treatment, and determines the ballot weight of each grader;Its In, it is bigger for the bigger grader of accuracy of detection, corresponding ballot weight;
    Calculate the quantized value of all graders and the weighted sum of ballot weight;
    If the weighted sum is less than or equal to given threshold, then it represents that the final classification testing result of the content to be detected is mesh Classification is marked, the weighted sum is more than given threshold, and the final classification testing result for representing the content to be detected is not the mesh Mark classification;
    Grader detects result judging unit, for classifying in the content detection result determining unit according to described at least two The classification testing result that device obtains, it is determined that after corresponding to the final classification testing result of the content to be detected, by what is obtained Corresponding to the final classification testing result of the content to be detected, the classification testing result obtained with least two grader It is compared, to judge whether the grader at least two grader generates correct classification testing result, and it is right Comparative result is stored;
    Grader recall rate computing unit, for the period 1 every setting, result is detected according to the grader and judges list The comparative result of member storage calculates the recall rate of the grader in once at least two grader, wherein described at least two The recall rate of i-th kind of grader is in grader:The correct classification detection caused by i-th kind of grader within the current period 1 As a result the ratio of number and the number of all categories testing result caused by i-th kind of grader in the current period 1;Foundation The recall rate of each grader, update each grader ballot weight or eliminate at least two grader in corresponding classification Device.
  9. 9. classification detection means according to claim 8, it is characterised in that also include:
    Contents acquiring unit, before the Content Feature Extraction unit treats detection content progress feature extraction, for basis URL obtains web page contents, as content to be detected;
    The Content Feature Extraction unit, including:
    Text character extraction subelement, if for including content of text in the web page contents, based on Text character extraction Algorithm carries out feature extraction to the content of text, and feature extraction result is added to the characteristic set of web page contents;
    Picture feature extracts subelement, if for including image content in the web page contents, the image content is entered Row target signature identifies, the characteristic vector of the image content is established according to target signature recognition result, added to the webpage The characteristic set of content.
  10. 10. classification detection means according to claim 9, it is characterised in that the Text character extraction algorithm is card side Algorithm;
    The picture feature extracts subelement, is specifically used for:
    Face Detection is carried out to the image content using statistic histogram model;
    The characteristic vector of the image content is established according to Face Detection result, wherein the characteristic vector is by following element At least one formed vector:
    Colour of skin connected region number, area of skin color account for the ratio of whole picture region, area of skin color accounts for the ratio of colour of skin boundary rectangle Example, maximum colour of skin connected region account for the ratio of whole picture region, maximum colour of skin connected region accounts for the ratio of colour of skin boundary rectangle With center picture region colour of skin ratio.
  11. 11. the classification detection means according to any one of claim 8-10, it is characterised in that at least two classification Device includes at least two in following grader:
    Support vector machine classifier, Naive Bayes Classifier, k nearest neighbor distance classifier, decision tree classifier and logic are returned Return grader.
  12. 12. classification detection means according to claim 8, it is characterised in that also include:
    Grader ballot weight updating block, for the grader recall rate computing unit be calculated once it is described at least After the recall rate of grader in two kinds of graders, according to point in equation below renewal once at least two grader The ballot weight of class device:
    <mrow> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mo>&amp;prime;</mo> </msup> <mo>=</mo> <mfrac> <msub> <mi>a</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>i</mi> <mo>=</mo> <mi>n</mi> </mrow> </msubsup> <msub> <mi>a</mi> <mi>i</mi> </msub> </mrow> </mfrac> </mrow>
    Wherein, aiFor the recall rate of this i-th kind of grader being calculated;wi' it is this i-th kind of grader after updating Ballot weight.
  13. 13. classification detection means according to claim 8, it is characterised in that also include:
    Grader eliminates unit, for by the recall rate at least two grader within continuous N number of period 1 it is small Removed in the grader for eliminating threshold value, to redefine the grader being adapted with the content to be detected, wherein the N For the integer more than 1.
  14. 14. classification detection means according to claim 8, it is characterised in that the classification at least two grader Sample Storehouse of the device including being stored with initial sample, and be trained to obtain to the Sample Storehouse are used for described to be detected interior Hold the disaggregated model for carrying out classification detection;
    Described device also includes:
    Feedback samples adding device, for will be obtained in grader detection result judging unit corresponding to described to be detected The final classification testing result of content, compared with the classification testing result that at least two grader obtains after, such as Grader at least two graders described in fruit generate mistake classification testing result, then using the content to be detected as Feedback samples, add in the Sample Storehouse for the grader for generating wrong classification testing result;
    Grader amending unit, for the second round every setting, training once generates mistake within current second round Classification testing result grader Sample Storehouse, generated according to this training result amendment mistake classification testing result Grader disaggregated model, to be updated to the grader of classification testing result for generating mistake.
CN201410569492.6A 2014-10-22 2014-10-22 content type detection method and device Active CN104391860B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410569492.6A CN104391860B (en) 2014-10-22 2014-10-22 content type detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410569492.6A CN104391860B (en) 2014-10-22 2014-10-22 content type detection method and device

Publications (2)

Publication Number Publication Date
CN104391860A CN104391860A (en) 2015-03-04
CN104391860B true CN104391860B (en) 2018-03-02

Family

ID=52609764

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410569492.6A Active CN104391860B (en) 2014-10-22 2014-10-22 content type detection method and device

Country Status (1)

Country Link
CN (1) CN104391860B (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951802A (en) * 2015-06-17 2015-09-30 中国科学院自动化研究所 Classifier updating method
CN104965905B (en) 2015-06-30 2018-05-04 北京奇虎科技有限公司 A kind of method and apparatus of Web page classifying
JP5901828B1 (en) * 2015-08-20 2016-04-13 株式会社Cygames Information processing system, program, and server
CN105426354B (en) * 2015-10-29 2019-03-22 杭州九言科技股份有限公司 The fusion method and device of a kind of vector
CN105426356B (en) * 2015-10-29 2019-05-21 杭州九言科技股份有限公司 A kind of target information recognition methods and device
CN106649384B (en) * 2015-11-03 2019-07-09 中国电信股份有限公司 The method and apparatus classified to URL
CN106980623B (en) * 2016-01-18 2020-02-21 华为技术有限公司 Data model determination method and device
CN107193836B (en) * 2016-03-15 2021-08-10 腾讯科技(深圳)有限公司 Identification method and device
CN107730286A (en) * 2016-08-10 2018-02-23 中国移动通信集团黑龙江有限公司 A kind of target customer's screening technique and device
CN106383766B (en) 2016-09-09 2018-09-11 北京百度网讯科技有限公司 System monitoring method and apparatus
CN107995152B (en) * 2016-10-27 2020-07-03 腾讯科技(深圳)有限公司 Malicious access detection method and device and detection server
CN108804472A (en) * 2017-05-04 2018-11-13 腾讯科技(深圳)有限公司 A kind of webpage content extraction method, device and server
CN107766234A (en) * 2017-08-31 2018-03-06 广州数沃信息科技有限公司 A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device
CN107801090A (en) * 2017-11-03 2018-03-13 北京奇虎科技有限公司 Utilize the method, apparatus and computing device of audio-frequency information detection anomalous video file
CN107895119A (en) * 2017-12-28 2018-04-10 北京奇虎科技有限公司 Program installation packet inspection method, device and electronic equipment
CN108304483B (en) * 2017-12-29 2021-01-19 东软集团股份有限公司 Webpage classification method, device and equipment
CN108509794A (en) * 2018-03-09 2018-09-07 中山大学 A kind of malicious web pages defence detection method based on classification learning algorithm
CN108932502A (en) * 2018-07-13 2018-12-04 希蓝科技(北京)有限公司 A kind of electrocardiogram template classification model modification system and method for self study
CN110875874B (en) * 2018-09-03 2022-06-07 Oppo广东移动通信有限公司 Electronic red packet detection method and device and mobile terminal
CN109344884B (en) * 2018-09-14 2023-09-12 深圳市雅阅科技有限公司 Media information classification method, method and device for training picture classification model
CN111612492A (en) * 2019-02-26 2020-09-01 北京奇虎科技有限公司 User online accurate marketing method and device based on multi-feature fusion
CN111695083B (en) * 2019-03-15 2024-09-20 北京京东尚科信息技术有限公司 Detection method and detection equipment
CN110363223A (en) * 2019-06-20 2019-10-22 华南理工大学 Industrial flow data processing method, detection method, system, device and medium
CN112347244B (en) * 2019-08-08 2023-07-25 四川大学 Yellow-based and gambling-based website detection method based on mixed feature analysis
CN110502552B (en) * 2019-08-20 2022-10-28 重庆大学 Classification data conversion method based on fine-tuning conditional probability
CN110852285B (en) * 2019-11-14 2023-04-18 腾讯科技(深圳)有限公司 Object detection method and device, computer equipment and storage medium
CN111310096A (en) * 2020-02-25 2020-06-19 维沃移动通信有限公司 Content saving method, electronic device, and computer-readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005279217A (en) * 2004-03-29 2005-10-13 Kohei Kadowaki Picture expressing device with body sensitive function
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101145171A (en) * 2007-09-15 2008-03-19 中国科学院合肥物质科学研究院 Gene microarray data predication method based on independent component integrated study
CN101251851A (en) * 2008-02-29 2008-08-27 吉林大学 Multi-classifier integrating method based on increment native Bayes network
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005279217A (en) * 2004-03-29 2005-10-13 Kohei Kadowaki Picture expressing device with body sensitive function
CN101055621A (en) * 2006-04-10 2007-10-17 中国科学院自动化研究所 Content based sensitive web page identification method
CN101281521A (en) * 2007-04-05 2008-10-08 中国科学院自动化研究所 Method and system for filtering sensitive web page based on multiple classifier amalgamation
CN101145171A (en) * 2007-09-15 2008-03-19 中国科学院合肥物质科学研究院 Gene microarray data predication method based on independent component integrated study
CN101251851A (en) * 2008-02-29 2008-08-27 吉林大学 Multi-classifier integrating method based on increment native Bayes network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种自适应权值的多特征融合分类方法;张文博等;《系统工程与电子技术》;20130630;正文第1135页,第1137页 *

Also Published As

Publication number Publication date
CN104391860A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
CN104391860B (en) content type detection method and device
CN106599155B (en) Webpage classification method and system
CN110245229A (en) A kind of deep learning theme sensibility classification method based on data enhancing
CN104239485A (en) Statistical machine learning-based internet hidden link detection method
CN105069141A (en) Construction method and construction system for stock standard news library
CN103577755A (en) Malicious script static detection method based on SVM (support vector machine)
CN110287316A (en) A kind of Alarm Classification method, apparatus, electronic equipment and storage medium
CN106446931A (en) Feature extraction and classification method and system based on support vector data description
CN110310012B (en) Data analysis method, device, equipment and computer readable storage medium
CN107545038A (en) A kind of file classification method and equipment
CN113590764A (en) Training sample construction method and device, electronic equipment and storage medium
CN111048214A (en) Early warning method and device for spreading situation of foreign livestock and poultry epidemic diseases
CN110647995A (en) Rule training method, device, equipment and storage medium
CN104850617A (en) Short text processing method and apparatus
CN107368526A (en) A kind of data processing method and device
CN108959293A (en) A kind of text data classification method and server
CN104915680B (en) Multi-tag transformation Relationship Prediction method based on Ameliorative RBF Neural Networks
CN113723747A (en) Analysis report generation method, electronic device and readable storage medium
CN112888008B (en) Base station abnormality detection method, device, equipment and storage medium
CN110808947B (en) Automatic vulnerability quantitative evaluation method and system
CN112307170A (en) Relation extraction model training method, relation extraction method, device and medium
CN103294828A (en) Verification method and verification device of data mining model dimension
CN116823164A (en) Business approval method, device, equipment and storage medium
CN104462215B (en) A kind of scientific and technical literature based on time series is cited number Forecasting Methodology
Soni et al. A novel optimized classifier for the loan repayment capability prediction system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20190809

Address after: 100085 Beijing, Haidian District, No. ten on the ground floor, No. 10 Baidu building, layer 2

Patentee after: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

Address before: 100091 Beijing, Haidian District, northeast Wang West Road, No. 4, Zhongguancun Software Park, building C, block, 1-03

Patentee before: Pacify a Heng Tong (Beijing) Science and Technology Ltd.

TR01 Transfer of patent right