CN106682677A - Advertising identification rule induction method, device and equipment - Google Patents

Advertising identification rule induction method, device and equipment Download PDF

Info

Publication number
CN106682677A
CN106682677A CN201510768446.3A CN201510768446A CN106682677A CN 106682677 A CN106682677 A CN 106682677A CN 201510768446 A CN201510768446 A CN 201510768446A CN 106682677 A CN106682677 A CN 106682677A
Authority
CN
China
Prior art keywords
advertisement
elements
recognition rule
test set
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510768446.3A
Other languages
Chinese (zh)
Inventor
周志明
丁俊玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou UCWeb Computer Technology Co Ltd
Guangzhou Dongjing Computer Technology Co Ltd
Original Assignee
Guangzhou Dongjing Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Dongjing Computer Technology Co Ltd filed Critical Guangzhou Dongjing Computer Technology Co Ltd
Priority to CN201510768446.3A priority Critical patent/CN106682677A/en
Publication of CN106682677A publication Critical patent/CN106682677A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention discloses an advertising identification rule induction method, device and equipment. According to the method, a training set is generated based on a first URL's List; according to the identification result by manual work and/or by advertising recognition software, each element in the training set is labeled as an advertising element or non-advertising element; through the machine learning algorithm, an advertising recognition model is obtained based on the advertising recognition characteristics of each element in the training set and the judgment whether or not the characteristics are the labeled results of the advertising elements; a testing set is generated based on a second URL's List; based on the advertising recognition characteristics of each element in the testing set, the centralized advertising elements are identified and tested using the advertising recognition model; an advertising identification rule is obtained by carrying out induction on uniform resource locators of the advertising elements in the test set. By this time, a new advertising identification rule can be used to identify advertising elements in a page, or the new advertising identification rule and the manual labeling rule/ advertising identification rule of advertising identification software can be combined to identify the advertising elements in the page.

Description

Advertisement recognition rule inductive method, device and equipment
Technical field
The present invention relates to Internet technical field, specifically, is related to a kind of advertisement recognition rule and concludes Method, device and equipment.
Background technology
With the popularization and development of internet, increasing user has been accustomed in such as mobile phone, has put down Webpage is browsed on the terminal device of plate computer etc, is obtained information.However, user is enjoying above-mentioned one When series is convenient, thing followed web advertisement is also more and more, such as banner Banner advertisements, presses Button advertisement, pop-up window advertisement, page suspension advertisement and interstitials etc..For being moved using mobile phone etc. Dynamic terminal is browsed for the user of webpage, and in the case of display screen limited space, these webpages are wide Announcement can not only affect the acquisition of information, but also can consumption network flow.Therefore, how effectively to filter Advertisement in webpage is the problem that industry is being researched and solved.
Now widely used advertisement filter method is mainly filtered using advertisement filter software, such as AdBlock, net net great master etc..Using advertisement filter software can to the banner in webpage, pop-up, regard The advertisement of the forms such as frequency is filtered, and the filtration needs of user can be met to a certain extent.
But, the filtering rule of advertisement filter software needs Jing often to update the demand that could meet user, Accordingly, it would be desirable to the renewal for often safeguarding software using substantial amounts of manpower Jing, could allow it exactly Filtering advertisements, meet the demand of user.
The content of the invention
The invention solves the problems that a technical problem be to provide a kind of advertisement recognition rule inductive method, dress Put and equipment, it being capable of automatic sorting advertisement recognition rule.
According to an aspect of the present invention, a kind of advertisement recognition rule inductive method is disclosed, including: Training set is generated based on the first list of websites, training set is including corresponding to each network address in the first list of websites At least part of element and its advertisement identification feature in webpage;It is soft according to manually and/or by advertisement recognizing The result that part is identified, by each element in training set ad elements or non-advertisement unit are labeled as Element;By machine learning algorithm, advertisement identification feature based on each element in training set and its it is whether The annotation results of ad elements, obtain advertisement identification model;Based on the second list of websites generating test set, Test set includes at least part of element in webpage corresponding to each network address and its advertisement in the second list of websites Identification feature;Based on the advertisement identification feature of each element in test set, recognized using advertisement identification model Ad elements in test set;The URL of the ad elements in test set is concluded, Obtain advertisement recognition rule.
Thus, it is possible to first pass through the mode that artificial mark or advertisement identification software or both are combined, will instruct It is ad elements or non-ad elements to practice each rubidium marking concentrated, then according to these ad elements With non-ad elements and its corresponding advertisement identification feature, advertisement can be set up by machine learning model Identification model, is then identified using the advertisement identification model for establishing to test set, identifies survey The ad elements that examination is concentrated, the URL of the ad elements that will identify that is concluded, just New advertisement recognition rule can be obtained.At this point it is possible to recognize page using new advertisement recognition rule Ad elements in face, it is also possible to by the rule of new advertisement recognition rule and artificial mark/advertisement identification The advertisement recognition rule of software be combined to recognize the page in ad elements, with realize accurately Identify the purpose of the ad elements in webpage.
Preferably, the method can also include:The element for meeting advertisement recognition rule in test set is presented; Advertisement recognition rule is screened according to the artificial judgment of the element to being presented.
Thus, after the ad elements in test set is identified using advertisement identification model, can also increase One plus artificial screening step, to filter out the ad elements of marked erroneous, so that concluding what is drawn Advertisement recognition rule can be more accurate.
Preferably, the method can also include that iteration performs following steps:According to advertisement recognition rule pair Element in training set re-starts identification, and the element in training set is labeled as into ad elements again Or non-ad elements;By machine learning algorithm, the advertisement identification feature based on each element in training set And its whether be ad elements annotation results again, obtain advertisement identification model;Based on the second network address List generating test set, test set is included in the second list of websites in webpage corresponding to each network address at least Partial Elements and its advertisement identification feature;Based on the advertisement identification feature of each element in test set, use Ad elements in advertisement identification model identification test set;Unified money to the ad elements in test set Source finger URL is concluded, and obtains advertisement recognition rule;Present and meet the advertisement identification in test set The element of rule;Advertisement recognition rule is screened according to the artificial judgment of the element to being presented.
Thus, after advertisement recognition rule is obtained, can be according to the advertisement recognition rule for obtaining to training The element of concentration is marked again, and according to annotation results advertisement identification model is re-established, based on building again Vertical advertisement identification model, then the element in test set is marked again, according to annotation results again again An advertisement recognition rule is obtained, is screened by the advertisement recognition rule manually to reacquiring, By obvious inappropriate rejecting, above-mentioned steps then can be repeated.What is obtained after successive ignition is wide Accusing recognition rule can take union, used as a final advertisement recognition rule, the advertisement for so obtaining Recognition rule can filter out most ad elements in the page, and less judge by accident, filter Effect is significant.
Preferably, in the above-mentioned methods, the element in training set can be included by advertisement identification software All ad elements for identifying in webpage corresponding to each network address from the first list of websites and at least partly Non- ad elements;Element in test set can be included by advertisement identification software from the second list of websites In all ad elements for identifying in webpage corresponding to each network address and at least part of non-ad elements.
Thus, the element in training set and test set includes being arranged from the first network address by advertisement identification software The all ad elements identified in webpage corresponding to each network address in table, the second list of websites, so, When setting up advertisement identification model by the annotation results in training set, advertisement identification model can be improved The degree of accuracy.Also, ad elements are known in set up advertisement identification model is utilized to test set Not, when the URL and to being identified as ad elements is concluded, due to wrapping in test set Containing more ad elements, as such, it is possible to so that the advertisement recognition rule summarized is more comprehensive, accurate Really.Furthermore it is possible to the mode being trained by using positive negative sample obtains advertisement identification model, That is, can simultaneously comprising ad elements and non-ad elements in training set and test set.So, The higher advertisement identification model of accuracy can be obtained, such that it is able to lift the practicality of advertisement identification model Property.
Preferably, in above-mentioned advertisement recognition rule inductive method, advertisement identification feature can include source Whether the number of times that whether occurs comprising specific character string combination, in foreign lands website in code, element are bar Positioning properties, picture format, dynamic picture frame number in shape, CSS.
According to another aspect of the present invention, a kind of advertisement recognition rule sorting device is also disclosed, is wrapped Include:Training set generation module, for generating training set based on the first list of websites, training set includes the At least part of element and its advertisement identification feature in one list of websites in webpage corresponding to each network address;Unit Plain labeling module, for according to result that is artificial and/or being identified by advertisement identification software, will instruct Practice each element concentrated and be labeled as ad elements or non-ad elements;Advertisement identification model generates mould Block, for by machine learning algorithm, advertisement identification feature based on each element in training set and its be The no annotation results for ad elements, obtain advertisement identification model;Test set generation module, for base In the second list of websites generating test set, test set includes net corresponding to each network address in the second list of websites At least part of element and its advertisement identification feature in page;Elemental recognition module, for based on test set The advertisement identification feature of middle each element, using advertisement identification model the ad elements in test set are recognized; Module is concluded, for concluding to the URL of the ad elements in test set, is obtained Advertisement recognition rule.
Preferably, the device can also include:Element present module, for test set to be presented in meet The element of advertisement recognition rule;Advertisement recognition rule screening module, for basis to the element that presented Artificial judgment screening advertisement recognition rule.
Preferably, in above-mentioned advertisement recognition rule sorting device, the instruction that training set generation module is generated Practicing the element concentrated includes by advertisement identification software the webpage corresponding to each network address from the first list of websites In all ad elements for identifying and at least part of non-ad elements;What test set generation module was generated Element in test set includes by advertisement identification software the net corresponding to each network address from the second list of websites The all ad elements identified in page and at least part of non-ad elements.
According to another aspect of the present invention, also disclose a kind of advertisement recognition rule and conclude equipment, bag Input unit, mixed-media network modules mixed-media, memory, display and processor are included, wherein, input unit is received First list of websites and the second list of websites of user input;Mixed-media network modules mixed-media is used to access the first network address row Webpage in table and the second list of websites corresponding to each website;Processor is based on mixed-media network modules mixed-media from the first net The web data that each network address is obtained in the list of location generates training set, and training set is stored on a memory, Training set includes at least part of element in webpage corresponding to each network address and its advertisement in the first list of websites Identification feature, processor will be instructed according to result that is artificial and/or being identified by advertisement identification software Practice each element concentrated and be labeled as ad elements or non-ad elements, and by annotation results accordingly Storage is on a memory;Processor is by machine learning algorithm, the advertisement based on each element in training set Identification feature and its be whether ad elements annotation results, obtain advertisement identification model;Processor base In the mixed-media network modules mixed-media web data generating test set that each network address is obtained from the second list of websites, and will survey On a memory, test set is including in webpage corresponding to each network address in the second list of websites for the storage of examination collection At least part of element and its advertisement identification feature;Advertisement identification of the processor based on each element in test set Feature, using advertisement identification model the ad elements in test set are recognized;Processor is in test set The URL of ad elements is concluded, and obtains advertisement recognition rule, and advertisement is recognized Rule storage is on a memory.
Preferably, in above-mentioned advertisement recognition rule conclusion equipment, present over the display in test set Meet the element of advertisement recognition rule, the judged result that processor is input into according to user by input unit To screen advertisement recognition rule.
To sum up, advertisement recognition rule inductive method disclosed by the invention, device and equipment, can basis Advertisement identification software and/or artificial advertisement recognition rule, by the element in training set advertisement unit is labeled as Plain or non-ad elements, generate with regard to advertisement identification feature according to each element annotation results in training set Advertisement identification model, reuse the advertisement identification model that obtains by each element mark in test set For ad elements or non-ad elements, finally to the URL of the ad elements in test set Concluded, it is possible to obtain new advertisement recognition rule, new advertisement recognition rule can be used as existing There is the supplement of artificial filtering rule or software filtering rule, preferably to recognize the ad elements in webpage. Thus, the advertisement recognition rule for finally giving combines advertisement identification feature and existing advertisement identification rule Then, thus can be preferably in filtering page using new advertisement recognition rule advertisement, reduce loading The flow consumed during the page, lifts the viewing experience of user.
Description of the drawings
Disclosure illustrative embodiments are described in more detail by combining accompanying drawing, the disclosure Above-mentioned and other purposes, feature and advantage will be apparent from, wherein, it is exemplary in the disclosure In embodiment, identical reference number typically represents same parts.
Fig. 1 shows the indicative flowchart of the advertisement recognition rule inductive method of the present invention.
Fig. 2 shows showing for advertisement recognition rule inductive method according to another embodiment of the invention Meaning property flow chart.
Fig. 3 shows showing for advertisement recognition rule inductive method according to another embodiment of the invention Meaning property flow chart.
Fig. 4 shows that the flow process of a specific embodiment of advertisement recognition rule inductive method of the present invention is shown It is intended to.
Fig. 5 is shown according to the structure of one embodiment of advertisement recognition rule sorting device in the page of the present invention It is intended to.
Fig. 6 shows the structure of advertisement recognition rule sorting device in accordance with another embodiment of the present invention Schematic block diagram.
Fig. 7 shows that the advertisement recognition rule of the present invention concludes the schematic block diagram of equipment.
Specific embodiment
The preferred embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing in accompanying drawing The preferred embodiment of the disclosure is shown, however, it is to be appreciated that may be realized in various forms the disclosure And should not be limited by embodiments set forth herein.Conversely, thesing embodiments are provided so that The disclosure is more thorough and complete, and the scope of the present disclosure can be conveyed to intactly this area Technical staff.
Fig. 1 shows the indicative flowchart of the advertisement recognition rule inductive method of the present invention.
Wherein, shown in Fig. 1 execution sequence is merely to more clearly describe the present invention, it is to be understood that For the purpose of the present invention, step S140 can be exchanged with the order of S110, S120, S130, you can with Step S140 is first carried out, then execution step S110, S120, S130, it is also possible to while perform, its Execution sequence has no impact on the present invention.
In step S110, training set is generated based on the first list of websites.Wherein, training set includes first At least part of element and its advertisement identification feature in list of websites in webpage corresponding to each network address.
First list of websites can be the multiple network address, or some for randomly selecting than more typical Network address, such as can be the network address of page browsing amount (page view, PV) multiple pages in the top. Webpage in first list of websites corresponding to each network address includes multiple elements, for the first list of websites In each webpage, the corresponding advertisement identification of a part of element and element in each webpage can be chosen special Levy as training set.
Advertisement identification feature can be some features that ad elements usually have.For example, for the page In an element for, when in the source code corresponding to it comprising some specific character strings or character During string combination, it is believed that it is ad elements, such as when include in the source code corresponding to element " ad " or When " ads ", it is believed that it is ad elements, now, " ad " or " ads " is exactly that an advertisement identification is special Levy.Again for example, for an element in the page, the number of times occurred in foreign lands websites when it compared with When many, it is believed that it is ad elements, now, element can also in the number of times that foreign lands website occurs It is advertisement identification feature.Again for example, the advertisement in webpage is often fixed to certain position in the page, Or move as user moves the page, therefore, positioning properties of the element in CSS Can be as advertisement identification feature, specifically, for positioning properties are absolute/fixed positioning Element, can be construed as ad elements.In addition, whether element is bar shaped, picture format, dynamic Picture frame number etc. can serve as advertisement identification feature, and here is omitted.
In step S120, according to result that is artificial and/or being identified by advertisement identification software, will instruct Practice each element concentrated and be labeled as ad elements or non-ad elements.
That is, can be using existing advertisement recognition rules pair such as artificial and/or advertisement identification softwares Element in training set is labeled, and each element in training set is labeled as into ad elements or non-advertisement Element.Wherein, in the case where the advertisement recognition rule degree of accuracy of advertisement identification software is higher, can be with Do not use artificial mark to be labeled the element in training set.
In step S130, by machine learning algorithm, the advertisement identification based on each element in training set is special The annotation results of ad elements are levied and its be whether, advertisement identification model is obtained.
The URL of several elements, corresponding advertisement identification feature, whether advertisement are included in training set Mark, therefore, based on training set, advertisement identification model can be obtained by machine learning algorithm.
Advertisement identification model denotes the corresponding relation of advertisement identification feature and ad elements, based on advertisement Identification model may determine that whether element is ad elements.
In step S140, based on the second list of websites generating test set, test set is arranged including the second network address At least part of element and its advertisement identification feature in table in webpage corresponding to each network address.
Wherein, the second list of websites can also be the multiple network address, or some for randomly selecting Such as can be in the top multiple of page browsing amount (page view, PV) than more typical network address The network address of the page.Webpage in first list of websites corresponding to each network address includes multiple elements, for Each webpage in first list of websites, can choose a part of element and element correspondence in each webpage Advertisement identification feature as test set.
In step S150, based on the advertisement identification feature of each element in test set, using advertisement mould is recognized Ad elements in type identification test set.
The advertisement identification model obtained using step S130, is labeled to each element in test set, Label it as ad elements or non-ad elements.
In step S160, the URL (URL) of the ad elements in test set is carried out Conclude, obtain advertisement recognition rule.
After being labeled to the element in test set using advertisement identification model, it is possible to conclude test Concentration is noted as the URL (URL) of each element of ad elements, obtains advertisement knowledge It is irregular.Wherein, the unified resource for being noted as each element of ad elements in test set is determined Position symbol, can there is various conclusion modes.For example, when the URL (URL) of multiple elements Respectively http://abc.com/ad/1.gif、http://abc.com/ad/2.gif ... when, one can be generalized into Individual advertisement recognition rule http://abc.com/ad/*.gif.Again for example, for being marked as ad elements Element, when the same URLs of one of them is http://example.com/ads/banner123.gif When, it is also possible to it is generalized into advertisement recognition rule http://example.com/ads/banner*.gif.Certainly, According to the concrete form of the URL for being marked as ad elements, there can also be other to conclude Mode, here is omitted.
The URL (URL) that ad elements are marked as in test set is concluded Afterwards, the advertisement recognition rule for obtaining will can be concluded with basis at the beginning manually and/or by advertisement identification The advertisement recognition rule taken when software is identified carries out union process, that is, conclusion is obtained Advertisement recognition rule and at the beginning according to artificial and/or when being identified by advertisement identification software The advertisement recognition rule taken is used as new advertisement recognition rule.
To sum up, advertisement recognition rule inductive method of the invention, can according to advertisement identification software and/or Artificial advertisement recognition rule, by the element in training set ad elements or non-ad elements are labeled as, The advertisement identification model with regard to advertisement identification feature is generated according to each element annotation results in training set, Reuse the advertisement identification model for obtaining and each element in test set is labeled as into ad elements or non-wide Element is accused, finally the URL of the ad elements in test set is concluded, it is possible to Obtain new advertisement recognition rule, new advertisement recognition rule can as existing artificial filter rule or The supplement of software filtering rule, preferably to recognize the ad elements in webpage.
Thus, the advertisement recognition rule for finally giving combines advertisement identification feature and existing advertisement is known Not other rule, thus advertisement that can be preferably in filtering page using new advertisement recognition rule, reduce The flow consumed during loading page, lifts the viewing experience of user.
Fig. 2 shows showing for advertisement recognition rule inductive method according to another embodiment of the invention Meaning property flow chart.As shown in Fig. 2 the advertisement recognition rule inductive method of the embodiment of the present invention is except bag Include outside step shown in Fig. 1, also including step S170, S180.
In step S170, the element for meeting advertisement recognition rule in test set is presented.
In step S180, advertisement recognition rule is screened according to the artificial judgment of the element to being presented.
Thus, after the ad elements in test set is identified using advertisement identification model, can also increase One plus artificial screening step, to filter out the ad elements of marked erroneous, so that concluding what is drawn Advertisement recognition rule can be more accurate.Alternatively, it is also possible to enter to the advertisement recognition rule that conclusion is obtained Row artificial screening, to exclude those irrational filtering rule is substantially concluded.
Fig. 3 shows showing for advertisement recognition rule inductive method according to another embodiment of the invention Meaning property flow chart.
As shown in figure 3, the advertisement recognition rule inductive method of the embodiment of the present invention is comprising complete shown in Fig. 2 Portion step S110 to S180, difference is, after execution of step S110 to S180 successively, Also include iteration execution step S190, S130 to S180.
In step S190, identification is re-started to the element in training set according to advertisement recognition rule, with Element in training set is labeled as into ad elements or non-ad elements again.
Wherein, training set can adopt the training set that step S110 is obtained, it is also possible to reacquire training Collection, the process for reacquiring training set can be found in Fig. 1 with regard to the narration of step S110.
Identification is re-started to the element in training set according to advertisement recognition rule, wherein, advertisement identification Rule can be carried out the advertisement recognition rule that step S160 obtains, or execution step 160 To artificial and/or advertisement identification software the filtering rule taken of advertisement recognition rule and step S120 Superposition.
After execution of step S190, successively execution step S130 is to step S180, step S130 Detailed description to step S180 can be found in the associated description of Fig. 1, Fig. 2, and here is omitted.Its In, it should be appreciated that step S140, then execution step S130 can be first carried out, it is also possible to while performing Step S130 and S140.
Then iteration execution step S190, S130 to S180.Wherein, the number of times for repeating can be with Set as the case may be.
To sum up, after advertisement recognition rule is obtained, can be according to the advertisement recognition rule for obtaining to training The element of concentration is marked again, and according to annotation results advertisement identification model is re-established, based on building again Vertical advertisement identification model, then the element in test set is marked again, according to annotation results again again An advertisement recognition rule is obtained, is screened by the advertisement recognition rule manually to reacquiring, By obvious inappropriate rejecting, above-mentioned steps then can be repeated.What is obtained after successive ignition is wide Accusing recognition rule can take union, as a final advertisement recognition rule, as such, it is possible to so that The advertisement recognition rule for finally giving can filter out most ad elements, filter effect in the page Significantly.
After successive ignition, can be according to the result, Yi Jitong being identified using advertisement identification model The result that advertisement identification software is identified is crossed, the accuracy rate and recall ratio of advertisement identification model is calculated, Determined the need for continuing iteration according to the accuracy rate and recall ratio of calculated advertisement identification model.
In addition, preferably, the element in training set can be included by advertisement identification software from first The all ad elements identified in webpage corresponding to each network address in list of websites and at least part of non-advertisement Element, correspondingly, the element in test set can include being arranged from the second network address by advertisement identification software The all ad elements identified in webpage corresponding to each network address in table and at least part of non-ad elements.
So, because the element in training set and test set is included by advertisement identification software from the first net The all ad elements identified in webpage corresponding to each network address in location list, the second list of websites, this Sample, when setting up advertisement identification model by the annotation results in training set, can improve advertisement identification mould The degree of accuracy of type.Also, ad elements are entered in set up advertisement identification model is utilized to test set Row identification, and the URL to being identified as ad elements is when concluding, due to test set In include more ad elements, as such, it is possible to so that the advertisement recognition rule summarized more comprehensively, Accurately.
Further, ad elements and non-ad elements can be simultaneously included in training set and test set, Thus, it is possible to the positive negative sample in by using training set is trained, higher wide of accuracy is obtained Identification model is accused, such that it is able to lift the practicality of advertisement identification model.
Fig. 4 shows that the flow process of a specific embodiment of advertisement recognition rule inductive method of the present invention is shown It is intended to.The embodiment in some details and Fig. 1, Fig. 2 in the present embodiment is essentially identical, and something in common please Referring to Fig. 1, Fig. 2 and corresponding explanatory note, no longer describe in detail herein.
As shown in figure 4, first, user can include the list of websites of multiple network address in client input, Wherein, the multiple network address in list of websites can be the network address of the webpage that user Jing is often browsed, it is also possible to It is the network address corresponding to page browsing amount (page view, PV) webpage in the top.
After the complete list of websites of user input, the list of websites that can crawl user input by server is (i.e. Url list), for each webpage in list of websites, a part of element can be randomly selected together with it Advertisement identification feature is used as sample set.Wherein, advertisement identification feature can preset, specifically, can Manual analysis is carried out with the advertisement in being present in webpage, its feature is concluded as advertisement identification feature.For example, Advertisement identification feature can be generalized into following form:
Feature JavaScript Iframe Picture Flash
Container id/class includes " ad "
In foreign lands, website occurrence number is more
Bar shaped
Absolute/fixed is positioned
Picture format
GIF animation frame numbers
For the sample set for being formed, two parts can be divided, used as training set, a part is used as test for a part Collection.For training set, existing advertisement identification software (such as ADBlock Plus) can be taken by training set In each element be labeled, label it as ad elements or non-ad elements.At this point it is possible to take The mode of artificial mark is marked again to the Partial Elements in labeled training set, so that in training set Markup information can be more accurate.
Include the url of several elements, corresponding characteristic of advertisement value, whether in labeled training set The mark of advertisement.Thus can be used to train advertisement identification model.
Because the scikit-learn based on python has good performance in sorting algorithm, therefore, this reality Apply and can select in example logistic regression, the decision Tree algorithms model of scikit-learn as advertisement identification model Basis, using training set training pattern, to obtain advertisement identification model.
Test set is labeled using advertisement identification model, each element in test set is labeled as into advertisement Element or non-ad elements.
The url list of multiple elements that ad elements are noted as in test set is obtained, URL column is concluded Table, obtains new advertisement recognition rule, for new advertisement recognition rule, can delete through manual verification Go some to contain the rule of non-advertisement, then authenticated newly-increased filtering rule is added advertisement recognition rule Collection, and training set and test set sample are marked again according to the filtering rule of advertisement recognition rule concentration, repeatedly In generation, repeats above-mentioned flow process, to improve, accurate advertisement recognition rule.
In addition, see after newly-increased filtering rule is obtained, can with after newly-increased filtering rule is added again plus List of websites is carried, advertisement filter validity and false determination ratio is checked by identification model, or manually spot-check website, Check advertisement filter validity and false determination ratio.
Advertisement recognition rule inductive method in the page of the present invention is described above with reference to Fig. 1 to Fig. 4.Under Face describes advertisement recognition rule sorting device and equipment in the invention page with reference to Fig. 5 to Fig. 7.Retouch below The device stated and very multiunit function of equipment respectively with the phase above with reference to described by Fig. 1 to Fig. 4 Answer the function phase of step same.In order to avoid repeating, here emphasis describes what device, equipment can have Cellular construction, and for some details are then repeated no more, may be referred to corresponding description above.
Fig. 5 is shown according to the structure of one embodiment of advertisement recognition rule sorting device in the page of the present invention It is intended to.As shown in figure 5, device includes training set generation module 110, element labeling module 120, advertisement Identification model generation module 130, test set generation module 140, elemental recognition module 150 and conclusion Module 160.
Training set generation module 110 is used to generate training set based on the first list of websites, and training set includes At least part of element and its advertisement identification feature in first list of websites in webpage corresponding to each network address.
Wherein, the acquisition of the first list of websites and the concept of advertisement identification feature may refer to be walked in Fig. 1 The associated description of rapid S110.
Element labeling module 120 is used for according to knot that is artificial and/or being identified by advertisement identification software Really, each element in training set is labeled as into ad elements or non-ad elements.
That is, element labeling module 120 can be existing using artificial and/or advertisement identification software etc. Advertisement recognition rule the element in training set is labeled, each element in training set is labeled as Ad elements or non-ad elements.Wherein, advertisement identification software the degree of accuracy of advertisement recognition rule compared with In the case of height, artificial mark can not used the element in training set is labeled.
Advertisement identification model generation module 130 is used for by machine learning algorithm, based on each in training set The advertisement identification feature of element and its be whether ad elements annotation results, obtain advertisement identification model.
Wherein, the advertisement identification model for being generated based on advertisement identification model generation module 130 is denoted extensively Identification feature, the corresponding relation of ad elements are accused, whether element may determine that based on advertisement identification model For ad elements.
Test set generation module 140 is used to be based on the second list of websites generating test set, wherein, test Collection includes at least part of element in the second list of websites in webpage corresponding to each network address and its advertisement identification Feature.
Wherein, the acquisition of the second list of websites and the concept of advertisement identification feature may refer to be walked in Fig. 1 The associated description of rapid S140.
Elemental recognition module 150 is used for the advertisement identification feature based on each element in test set, using wide Accuse the ad elements in identification model identification test set.
Using the advertisement identification model generated by advertisement identification model generation module 130, to test set In each element be labeled, label it as ad elements or non-ad elements.
Concluding module 160 is used to conclude the URL of the ad elements in test set, Obtain advertisement recognition rule.
Wherein, the correlation that the concrete conclusion mode for concluding module 160 can be found in step S160 in Fig. 1 is retouched State.
To sum up, advertisement identifying device of the invention, can be according to advertisement identification software and/or artificial wide Recognition rule is accused, the element in training set ad elements or non-ad elements is labeled as into, according to training Concentrate each element annotation results to generate the advertisement identification model with regard to advertisement identification feature, reuse To advertisement identification model each element in test set is labeled as into ad elements or non-ad elements, Finally the URL of the ad elements in test set is concluded, it is possible to obtain new Advertisement recognition rule, new advertisement recognition rule can be filtered as existing artificial filter rule or software The supplement of rule, preferably to recognize the ad elements in webpage.
In addition, preferably, the element in the training set of the generation of training set generation module 110 can be wrapped Include by owning that advertisement identification software is identified from the first list of websites in webpage corresponding to each network address Ad elements and at least part of non-ad elements;In the test set that test set generation module 140 is generated Element can include by advertisement identification software the webpage corresponding to each network address from second list of websites In all ad elements for identifying and at least part of non-ad elements.
So, because the element in training set and test set is included by advertisement identification software from the first net The all ad elements identified in webpage corresponding to each network address in location list, the second list of websites, this Sample, when setting up advertisement identification model by the annotation results in training set, can improve advertisement identification mould The degree of accuracy of type.Also, ad elements are entered in set up advertisement identification model is utilized to test set Row identification, and the URL to being identified as ad elements is when concluding, due to test set In include more ad elements, as such, it is possible to so that the advertisement recognition rule summarized more comprehensively, Accurately.
Further, due in training set and test set non-advertisement unit can also be noted as containing part Element element, thus can by using training set in positive negative sample be trained, obtain advertisement Identification model.So that the accuracy of advertisement identification model that training is obtained is higher, practicality is stronger.
Fig. 6 shows the structure of advertisement recognition rule sorting device in accordance with another embodiment of the present invention Schematic block diagram.
As shown in fig. 6, in a preferred embodiment, the device is except containing shown in Fig. 5 Outside all structures, also alternatively include that element is presented module 170 and advertisement recognition rule screening module 180, Wherein, introduce herein in Fig. 5 without structure, can with the related introduction of Fig. 5 identical structures The explanation with regard to Fig. 5 is seen above, here is omitted.
Element is presented module 170 is used to that the unit for meeting the advertisement recognition rule in the test set to be presented Element.Advertisement recognition rule screening module 180 is used to be sieved according to the artificial judgment of the element to being presented Select advertisement recognition rule.
Thus, after the ad elements in test set is identified using advertisement identification model, can also lead to Cross advertisement recognition rule screening module 180 and filter out the ad elements of marked erroneous, so that concluding The advertisement recognition rule for going out can be more accurate.In addition, advertisement recognition rule screening module 180 also may be used To screen to the advertisement recognition rule that conclusion is obtained, to exclude those irrational mistake is substantially concluded Filter rule.
In addition, screening in 180 pairs of advertisement recognition rules for obtaining of advertisement recognition rule screening module Afterwards, element labeling module 120, advertisement identification model generation module 130, elemental recognition module 150, The advertisement recognition rule that concluding module 160 can obtain according to screening repeats correlation step.
Specifically, carry out in 180 pairs of advertisement recognition rules for obtaining of advertisement recognition rule screening module After screening, element labeling module 120 can be according to the advertisement recognition rule for obtaining to being given birth to based on training set The training set generated into module 110 is marked again, and advertisement identification module 130 can be according to element Annotation results of the labeling module 120 to training set, regenerate advertisement identification model, elemental recognition mould Block 150 can be re-recognized according to the advertisement identification model for regenerating to the element in test set, Mark, the markup information again for concluding element of the module 160 in test set concludes again advertisement knowledge Rule, is not then presented module 170 and presents to the element for meeting new advertisement recognition rule by element Advertisement recognition rule screening module 180, with the new advertisement recognition rule of artificial screening.Then can repeat Perform said process.Wherein, the number of times for repeating can set as the case may be.
The advertisement recognition rule obtained after successive ignition can take union, know as a final advertisement It is irregular, as such, it is possible to allow the advertisement recognition rule for finally giving to filter out big portion in the page The ad elements divided, filter effect is notable.
Fig. 7 shows that the advertisement recognition rule of the present invention concludes the schematic block diagram of equipment.Such as Fig. 7 Shown, equipment includes input unit 3, mixed-media network modules mixed-media 4, memory 2, display 5 and processor 1。
First list of websites and the second list of websites of the receiving user's input of input unit 3, processor 1 First list of websites and the second list of websites of user input, network can be obtained by input unit 3 Module 4 is used to access the webpage in the first list of websites and the second list of websites corresponding to each website, place Reason device 1 generates training based on the web data that each network address is obtained from the first list of websites of mixed-media network modules mixed-media 4 Collection, and training set is stored on memory 2, training set includes each network address institute in the first list of websites At least part of element and its advertisement identification feature in correspondence webpage.
Processor 1 according to result that is artificial and/or being identified by advertisement identification software, by training set In each element be labeled as ad elements or non-ad elements, and annotation results are accordingly stored On memory 1, processor 1 is by machine learning algorithm, the advertisement based on each element in training set Identification feature and its be whether ad elements annotation results, obtain advertisement identification model, the base of processor 1 In the web data generating test set that each network address is obtained from the second list of websites of mixed-media network modules mixed-media 4, and will Test set is stored on memory 2, and test set includes webpage corresponding to each network address in the second list of websites In at least part of element and its advertisement identification feature, processor 1 based in test set each element it is wide Identification feature is accused, using advertisement identification model the ad elements in test set, 1 pair of survey of processor are recognized The URL of the ad elements that examination is concentrated is concluded, and obtains advertisement recognition rule, and will Advertisement recognition rule is stored on memory 2.Present on display 5 and meet described wide in test set Accuse the element of recognition rule, the judged result that processor 1 is input into by input unit 3 according to user come Screening advertisement recognition rule.
To sum up, the advertisement recognition rule based on the present invention concludes equipment, and user is input into network address in client After list, it is possible to processed by list of websites of the server to user input, advertisement identification is obtained Rule, the advertisement recognition rule can be used as the advertisement recognition rule of existing other advertisement filter softwares Supplement, import advertisement filter software, it is also possible to be stored in client in the form of an executable program, Perform the operation of the ad elements in identification webpage.In the advertisement recognition rule for obtaining being sent by server Afterwards, user can also be screened by hand, to exclude the advertisement recognition rule of apparent error, to enter one Step improves the accuracy of filtering rule.
Above by reference to accompanying drawing describe in detail advertisement recognition rule inductive method of the invention, Device and equipment.
Additionally, the method according to the invention is also implemented as a kind of computer program, the computer journey Sequence includes the computer program code of the above steps for limiting in the said method for performing the present invention Instruction.Or, the method according to the invention is also implemented as a kind of computer program, the meter Calculation machine program product includes computer-readable medium, is stored with the computer-readable medium for holding The computer program of the above-mentioned functions limited in the said method of the row present invention.Those skilled in the art are also It will be clear that, various illustrative logical blocks, module, circuit with reference to described by disclosure herein and Algorithm steps may be implemented as the combination of electronic hardware, computer software or both.
The system and method that flow chart and block diagram in accompanying drawing shows multiple embodiments of the invention Architectural framework in the cards, function and operation.At this point, each in flow chart or block diagram Square frame can represent a part for module, program segment or a code, the module, program segment or generation A part for code is used for the executable instruction of the logic function that realization specifies comprising one or more.Also should Work as attention, at some as in the realizations replaced, the function of being marked in square frame can also be being different from The order marked in accompanying drawing occurs.For example, two continuous square frames can essentially be substantially in parallel Perform, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also It is noted that block diagram and/or each square frame and block diagram and/or the square frame in flow chart in flow chart Combination, can with performing the function of regulation or the special hardware based system of operation realizing, Or can be realized with the combination of computer instruction with specialized hardware.
It is described above various embodiments of the present invention, described above is exemplary, and exhaustive Property, and it is also not necessarily limited to disclosed each embodiment.In the model without departing from illustrated each embodiment Enclose and spirit in the case of, many modifications and changes for those skilled in the art Will be apparent from.The selection of term used herein, it is intended to best explain the original of each embodiment Reason, practical application or the improvement to the technology in market, or other the common skills for making the art Art personnel are understood that each embodiment disclosed herein.

Claims (10)

1. a kind of advertisement recognition rule inductive method, including:
Training set is generated based on the first list of websites, the training set includes at least part of element and its advertisement identification feature in first list of websites in webpage corresponding to each network address;
According to result that is artificial and/or being identified by advertisement identification software, each element in the training set is labeled as into ad elements or non-ad elements;
By machine learning algorithm, advertisement identification feature based on each element in the training set and its be whether ad elements annotation results, obtain advertisement identification model;
Based on the second list of websites generating test set, the test set includes at least part of element and its advertisement identification feature in second list of websites in webpage corresponding to each network address;
Based on the advertisement identification feature of each element in the test set, using the advertisement identification model ad elements in the test set are recognized;
The URL of the ad elements in the test set is concluded, advertisement recognition rule is obtained.
2. advertisement recognition rule inductive method according to claim 1, also includes:
The element for meeting the advertisement recognition rule in the test set is presented;
The advertisement recognition rule is screened according to the artificial judgment of the element to being presented.
3. advertisement recognition rule inductive method according to claim 2, also performs following steps including iteration:
Identification is re-started to the element in the training set according to the advertisement recognition rule, the element in the training set is labeled as into ad elements or non-ad elements again;
By machine learning algorithm, advertisement identification feature based on each element in the training set and its be whether ad elements annotation results again, obtain advertisement identification model;
Based on the second list of websites generating test set, the test set includes at least part of element and its advertisement identification feature in second list of websites in webpage corresponding to each network address;
Based on the advertisement identification feature of each element in the test set, using the advertisement identification model ad elements in the test set are recognized;
The URL of the ad elements in the test set is concluded, advertisement recognition rule is obtained;
The element for meeting the advertisement recognition rule in the test set is presented;
The advertisement recognition rule is screened according to the artificial judgment of the element to being presented.
4. advertisement recognition rule inductive method according to claim 3, wherein,
Element in the training set includes all ad elements identified from webpage corresponding to each network address in first list of websites by advertisement identification software and at least part of non-ad elements;
Element in the test set includes all ad elements identified from webpage corresponding to each network address in second list of websites by advertisement identification software and at least part of non-ad elements.
5. the advertisement recognition rule inductive method according to any one of Claims 1-4, wherein,
The advertisement identification feature includes positioning properties, picture format, the dynamic picture frame number in whether combining comprising specific character string in source code, whether being bar shaped, CSS in the number of times, element that foreign lands website occurs.
6. a kind of advertisement recognition rule sorting device, including:
Training set generation module, for generating training set based on the first list of websites, the training set includes at least part of element and its advertisement identification feature in first list of websites in webpage corresponding to each network address;
Element labeling module, for according to result that is artificial and/or being identified by advertisement identification software, each element in the training set being labeled as into ad elements or non-ad elements;
Advertisement identification model generation module, for by machine learning algorithm, advertisement identification feature based on each element in the training set and its be whether ad elements annotation results, obtain advertisement identification model;
Test set generation module, for based on the second list of websites generating test set, the test set to include at least part of element and its advertisement identification feature in second list of websites in webpage corresponding to each network address;
Elemental recognition module, for the advertisement identification feature based on each element in the test set, using the advertisement identification model ad elements in the test set is recognized;
Module is concluded, for concluding to the URL of the ad elements in the test set, advertisement recognition rule is obtained.
7. advertisement recognition rule sorting device according to claim 6, also includes:
Element is presented module, for the test set to be presented in meet the element of the advertisement recognition rule;
Advertisement recognition rule screening module, for screening the advertisement recognition rule according to the artificial judgment of the element to being presented.
8. the advertisement recognition rule sorting device according to claim 6 or 7, wherein,
Element in the training set that the training set generation module is generated includes all ad elements identified from webpage corresponding to each network address in first list of websites by advertisement identification software and at least part of non-ad elements;
Element in the test set that the test set generation module is generated includes all ad elements identified from webpage corresponding to each network address in second list of websites by advertisement identification software and at least part of non-ad elements.
9. a kind of advertisement recognition rule concludes equipment, including input unit, mixed-media network modules mixed-media, memory, display and processor, wherein,
First list of websites and the second list of websites of the input unit receiving user's input;
The mixed-media network modules mixed-media is used to access the webpage in first list of websites and second list of websites corresponding to each website;
The processor generates training set based on the mixed-media network modules mixed-media from the web data that each network address in first list of websites is obtained, and the training set is stored on the memory, the training set includes at least part of element and its advertisement identification feature in first list of websites in webpage corresponding to each network address
Each element in the training set is labeled as ad elements or non-ad elements by the processor according to result that is artificial and/or being identified by advertisement identification software, and annotation results are accordingly stored on the memory;
The processor by machine learning algorithm, advertisement identification feature based on each element in the training set and its be whether ad elements annotation results, obtain advertisement identification model;
The web data generating test set that the processor is obtained based on the mixed-media network modules mixed-media from each network address in second list of websites, and the test set is stored on the memory, the test set includes at least part of element and its advertisement identification feature in second list of websites in webpage corresponding to each network address;
Advertisement identification feature of the processor based on each element in the test set, using the advertisement identification model ad elements in the test set are recognized;
The processor is concluded to the URL of the ad elements in the test set, obtains advertisement recognition rule, and the advertisement recognition rule is stored on the memory.
10. advertisement recognition rule according to claim 9 concludes equipment, wherein,
The element for meeting the advertisement recognition rule in the test set is presented on the display,
The judged result that the processor is input into by the input unit according to user is screening the advertisement recognition rule.
CN201510768446.3A 2015-11-11 2015-11-11 Advertising identification rule induction method, device and equipment Pending CN106682677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510768446.3A CN106682677A (en) 2015-11-11 2015-11-11 Advertising identification rule induction method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510768446.3A CN106682677A (en) 2015-11-11 2015-11-11 Advertising identification rule induction method, device and equipment

Publications (1)

Publication Number Publication Date
CN106682677A true CN106682677A (en) 2017-05-17

Family

ID=58865347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510768446.3A Pending CN106682677A (en) 2015-11-11 2015-11-11 Advertising identification rule induction method, device and equipment

Country Status (1)

Country Link
CN (1) CN106682677A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733764A (en) * 2018-04-16 2018-11-02 优视科技有限公司 Advertisement filter rule generating method based on machine learning and advertisement filtering system
CN110110982A (en) * 2019-04-26 2019-08-09 特赞(上海)信息科技有限公司 The checking method and device of intention material
CN110704615A (en) * 2019-09-04 2020-01-17 北京航空航天大学 Internet financial non-dominant advertisement identification method and device
CN111914199A (en) * 2019-05-10 2020-11-10 腾讯科技(深圳)有限公司 Page element filtering method, device, equipment and storage medium
CN112075068A (en) * 2018-05-03 2020-12-11 三星电子株式会社 Electronic device and operation method thereof
CN112988811A (en) * 2021-03-09 2021-06-18 重庆可兰达科技有限公司 Method, system, terminal and medium for detecting APP advertisement content compliance

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276417A (en) * 2008-04-17 2008-10-01 上海交通大学 Method for filtering internet cartoon medium rubbish information based on content
CN101526946A (en) * 2008-03-07 2009-09-09 鸿富锦精密工业(深圳)有限公司 Search system, web page browser, web page filter system and web page filter method thereof
CN101593200A (en) * 2009-06-19 2009-12-02 淮海工学院 Chinese Web page classification method based on the keyword frequency analysis
CN104239422A (en) * 2014-08-21 2014-12-24 小米科技有限责任公司 Advertisement identification method, advertisement identification device and electronic equipment
US20160306893A1 (en) * 2013-12-02 2016-10-20 Beijing Qihoo Technology Company Limited Url purification method and url purification apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101526946A (en) * 2008-03-07 2009-09-09 鸿富锦精密工业(深圳)有限公司 Search system, web page browser, web page filter system and web page filter method thereof
CN101276417A (en) * 2008-04-17 2008-10-01 上海交通大学 Method for filtering internet cartoon medium rubbish information based on content
CN101593200A (en) * 2009-06-19 2009-12-02 淮海工学院 Chinese Web page classification method based on the keyword frequency analysis
US20160306893A1 (en) * 2013-12-02 2016-10-20 Beijing Qihoo Technology Company Limited Url purification method and url purification apparatus
CN104239422A (en) * 2014-08-21 2014-12-24 小米科技有限责任公司 Advertisement identification method, advertisement identification device and electronic equipment

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108733764A (en) * 2018-04-16 2018-11-02 优视科技有限公司 Advertisement filter rule generating method based on machine learning and advertisement filtering system
CN108733764B (en) * 2018-04-16 2021-09-10 阿里巴巴(中国)有限公司 Advertisement filtering rule generation method based on machine learning and advertisement filtering system
CN112075068A (en) * 2018-05-03 2020-12-11 三星电子株式会社 Electronic device and operation method thereof
US11893063B2 (en) 2018-05-03 2024-02-06 Samsung Electronics Co., Ltd. Electronic device and operation method thereof
CN110110982A (en) * 2019-04-26 2019-08-09 特赞(上海)信息科技有限公司 The checking method and device of intention material
CN111914199A (en) * 2019-05-10 2020-11-10 腾讯科技(深圳)有限公司 Page element filtering method, device, equipment and storage medium
CN111914199B (en) * 2019-05-10 2024-04-12 腾讯科技(深圳)有限公司 Page element filtering method, device, equipment and storage medium
CN110704615A (en) * 2019-09-04 2020-01-17 北京航空航天大学 Internet financial non-dominant advertisement identification method and device
CN112988811A (en) * 2021-03-09 2021-06-18 重庆可兰达科技有限公司 Method, system, terminal and medium for detecting APP advertisement content compliance
CN112988811B (en) * 2021-03-09 2023-06-06 重庆可兰达科技有限公司 Method, system, terminal and medium for detecting APP advertisement content compliance

Similar Documents

Publication Publication Date Title
CN106682677A (en) Advertising identification rule induction method, device and equipment
CN108733764B (en) Advertisement filtering rule generation method based on machine learning and advertisement filtering system
CN106649610A (en) Image labeling method and apparatus
CN105306495B (en) user identification method and device
CN106503172A (en) The method and apparatus that learning path recommended by knowledge based collection of illustrative plates
CN107608874A (en) Method of testing and device
CN102419777B (en) System and method for filtering internet image advertisements
CN107341805A (en) Background segment and network model training, image processing method and device before image
CN102650999B (en) A kind of method and system of extracting object attribute value information from webpage
CN109299258A (en) A kind of public sentiment event detecting method, device and equipment
CN108229523A (en) Image detection, neural network training method, device and electronic equipment
CN107731229A (en) Method and apparatus for identifying voice
CN107908959A (en) Site information detection method, device, electronic equipment and storage medium
CN103514279B (en) A kind of Sentence-level sensibility classification method and device
CN104765746A (en) Data processing method and device for mobile communication terminal browser
CN107643929A (en) Information shows the methods of exhibiting and device at interface
CN107153716A (en) Webpage content extracting method and device
CN103491116A (en) Method and device for processing text-related structural data
CN108763313A (en) On-line training method, server and the storage medium of model
CN107590236A (en) A kind of big data acquisition method and system towards enterprise in charge of construction
CN109977762A (en) A kind of text positioning method and device, text recognition method and device
CN105956002A (en) Webpage classification method and device based on URL analysis
CN107291774A (en) Error sample recognition methods and device
CN108197337A (en) A kind of file classification method and device
CN107729931A (en) Picture methods of marking and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170517