CN105760545A - Configuration rule based website data search method - Google Patents

Configuration rule based website data search method Download PDF

Info

Publication number
CN105760545A
CN105760545A CN201610152001.7A CN201610152001A CN105760545A CN 105760545 A CN105760545 A CN 105760545A CN 201610152001 A CN201610152001 A CN 201610152001A CN 105760545 A CN105760545 A CN 105760545A
Authority
CN
China
Prior art keywords
rule
node
css
link
website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610152001.7A
Other languages
Chinese (zh)
Inventor
赵海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Zingrow Information Technology Co Ltd
Original Assignee
Hunan Zingrow Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Zingrow Information Technology Co Ltd filed Critical Hunan Zingrow Information Technology Co Ltd
Priority to CN201610152001.7A priority Critical patent/CN105760545A/en
Publication of CN105760545A publication Critical patent/CN105760545A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of vertical search, in particular to a configuration rule based website data search method. The method is implemented by the following steps: configuring a portal rule, a link rule and a detail rule for a to-be-searched website; analyzing the portal rule to obtain a website portal url, the link rule associated with the portal rule and parameters during website access; analyzing the link rule associated with the portal rule to obtain a link rule grammar and the detail rule associated with the link rule; and analyzing the detail rule associated with the link rule to obtain a detail rule grammar so as to acquire the content in a page. According to the method, developers can be freed from a writing crawler system and can perform data acquisition on one website only by writing configuration rules for different websites; and website rule writing is simpler than direct crawler system writing, and the maintenance is very convenient, so that the development and maintenance costs can be greatly reduced for enterprises.

Description

Website data searching method based on configurable rule
Technical field
The present invention relates to vertical search field, specifically based on the website data searching method of configurable rule.
Background technology
Vertical search engine is the professional search engine for some industry, it is segmentation and the extension of search engine, being that the information that certain class in web page library is special is once integrated, the data that directed point field extracts needs return to user after processing again with some form.Vertical search is relativeUniversal search engineContain much information, inquire about new search engine service pattern inaccurate, that the degree of depth is inadequate etc. puts forward, by the information having certain values provided for a certain specific area, a certain specific crowd or a certain particular demands and related service.Its feature is exactly " special, smart, deep ", and has industry color, the magnanimity information disordering of universal search engine of comparing, and vertical search engine then seems more absorbed, concrete and gos deep into.At present, in vertical search field, owing to website varies, the data of different web sites to be captured, then need to write (or several) crawlers for each website, because accomplishing that a set of program is covered all of website without way.
Summary of the invention
For above-mentioned technical problem, the present invention provides the website data searching method based on configurable rule, the method is based on resolution rules and scans for, as long as being the rule of a website a set of collection of configuration, just can when not revising search engine system, realize the purpose that the content of this website is acquired. thus the purpose of the multiplexing reached, it is possible to greatly save the development cost of the enterprise being engaged in the work of this field and maintain cost..
The technical solution used in the present invention is: based on the website data searching method of configurable rule, it sequentially includes the following steps:
(1) for configuration entrance rule in website to be searched, Link Rule and details rule;
(2) entrance rule, parameter when obtaining the entrance url of website and the Link Rule of entrance rule association and access this website are resolved;
(3) resolve above-mentioned and entrance rule association Link Rule, obtain one for the Link Rule grammer resolving this Website page and the details rule being associated with this Link Rule;
(4) resolve the above-mentioned details being associated with Link Rule rule, obtain several details rule syntaxs for gathering content on this Website page, thus the content gathered on the page.
As preferably, described Link Rule is one or more layers structure;After performing Link Rule, obtain the url of Object linking;If the url of Object linking is the url of details, then Link Rule adopts a Rotating fields;If the url of Object linking is not the url of details, then Link Rule takes multiple structure to carry out recurrence execution.
As preferably, described Link Rule grammer includes CSS matched rule, destination node rule and nodal community rule, and wherein CSS matched rule is the HTML node that application CSS selector grammer selects to need, and obtains a node listing;Destination node rule, for above-mentioned node listing is further mated, obtains one group of node comprising link;Nodal community rule, for the attribute of the above-mentioned node comprising link is mated, obtains one group of url list.
As preferably, described Link Rule grammer also includes filtering rule before CSS, and before CSS, filtering rule is that unwanted node in HTML selected in application CSS selector grammer.
As preferably, described Link Rule grammer also includes filtering rule after CSS, and after CSS, filtering rule is that application CSS selector grammer selects unwanted node, and the node chosen with CSS matched rule compares, if node is identical, then remove the node that CSS matched rule chooses;If node is the child node of the node that CSS matched rule is chosen, then in the node that CSS matched rule is chosen, delete corresponding child node.
As preferably, described Link Rule grammer also includes regular expression rule, and regular expression rule is that the above-mentioned node comprising link is further modified.
As preferably, each described details rule syntax is gathered sub-rule by several and forms, and each gathers sub-rule corresponding to gathering a certain block text information on website.
As preferably, each described collection sub-rule includes CSS matched rule, utilizes the node on CSS matched rule coupling html, and the text that the node that will match to comprises is as final result.
As preferably, each described collection sub-rule also includes filtering rule after CSS, utilizes CSS selector grammer to select unwanted node, and the node mated with CSS matched rule compares, if node is identical, then remove the node that CSS matched rule chooses.
As preferably, each described collection sub-rule also includes regular expression rule, utilizes the text that above-mentioned node is comprised by regular expression rule to do further modification.
From the above, the method can free developer from writing crawler system, developer has only to write the configuration rule for different websites just can realize the data acquisition to a website, and write website rule and be compared to that directly to write crawler system simply too much, safeguard simultaneously and be also convenient for a lot, it is possible to greatly save development cost and maintenance cost for enterprise.
Detailed description of the invention
The present invention is described more detail below, and illustrative examples and explanation in this present invention are used for explaining the present invention, but not as a limitation of the invention.
Based on the website data searching method of configurable rule, first it sequentially include the following steps:, for configuration entrance rule in website to be searched, Link Rule and details rule;Resolving entrance rule, parameter when obtaining the entrance url of website and the Link Rule of entrance rule association and access this website;Entrance url accesses the entrance of this website for command deployment engine, and this entrance is usually the homepage of website, but it could also be possible that website specific page, is the search engine URL that accesses first page of this website;With the Link Rule of entrance rule association for telling which Link Rule search engine uses to gather linking of this page, entrance rule and Link Rule are the relations of one-to-many, so a plurality of Link Rule can be configured here, when configuring a plurality of Link Rule, search engine performs by the sequential iteration of configuration rule;Implementation process can be arranged parameter when accessing this website mainly below these: whether use proxy server when accessing this station, the access frequency of request is set, each run number of times is set, the parameter etc. of request header when accessing website and the character for identifying website that some are relevant to business is set.
Then, resolve above-mentioned and entrance rule association Link Rule, obtain one for the Link Rule grammer resolving this Website page and the details rule being associated with this Link Rule;Link Rule grammer is the core content of Link Rule, and search engine is by resolving this rule just it is known that how to go to obtain the link on this page;Owing to after completing to resolve, search engine can obtain one group of url list for gathering information further, in order to gather the concrete content inside these url, just needing to resolve by the details rule of next stage. entrance rule and Link Rule are the relations of one-to-many, so a plurality of Link Rule can be configured here, when configuring a plurality of Link Rule, search engine performs by the sequencing iteration of configuration rule, until there being a rule parsing success just to terminate.
nullThe effect of Link Rule is in that to tell how capture program goes to gather the link on website,Capture program just can gather the link on website by resolving this rule. and the application adopts this rule of structural design of a kind of layering,One Link Rule can have the structure of multilamellar. and a Link Rule is likely to have the structure of a layer or multilamellar. after often having performed one layer of rule,Search engine will obtain the url of one group of Object linking,If the url of Object linking is the url of details,As long as our structure of one layer is just passable,Going to gather to details rule if thus url is lost. target url is not the url of details,We just to take the structure of multilamellar to carry out the execution of recurrence. when the structure of multilamellar of our configuration,So search engine can be used as each url in the target url list of last layer as new entrance,The Link Rule being continuing with next layer comes its coupling,Result will obtain target url one group new,To the last one layer is just terminated.
First illustratively, one Link Rule grammer is made up of following components: filtering rule before CSS, CSS matched rule, filtering rule after CSS, regular expression rule, destination node rule, nodal community rule. the effect of each rule is as follows: filtering rule before CSS: our undesired node in HTML selected in application CSS selector grammer, and removes in the DOM of HTML;CSS matched rule: the HTML node that we are required selected in application CSS selector grammer;Filtering rule after CSS: our unwanted node should be selected by CSS selector grammer, and compare with the node chosen in previous step, if identical with the node in previous step, then remove the node chosen in previous step, if the child node of previous step interior joint;Then the node chosen in previous step is deleted corresponding child node. destination node rule: continue remaining node application CSS selector grammatical rules after filtering is selected corresponding node, such as<A>label node etc.;Nodal community rule: the node chosen in previous step is applied this attribution rule, selects the URL that the attribute on node is corresponding, the HREF attribute on such as A node;Regular expression rule: using regular expression rule, it is further processed modification. than if desired for situations such as one part of joint linked are replaced. after our acquisition engine reads the Link Rule of this layer, the link on the page can be gathered in the following order:
1, before utilizing CSS, filtering rule filters out our undesired html content, just skips over this step without filtering rule before configuration CSS and gathers;
2, utilizing CSS matched rule to remove the node on coupling html, result should be a node listing;
3, after utilizing CSS, filtering rule filters out our undesired node in previous step result, just skips over this step without filtering rule after configuration CSS and gathers;
4, utilizing destination node rule that the result in previous step is further mated, result will obtain one group of node comprising link;
5, utilize nodal community rule that the attribute of all nodes comprising link of previous step is mated, finally will obtain one group of url list;
6, utilize regular expression rule that the result of previous step is further modified, if not then skip over this step.
For needing the website of the Link Rule of configuring multi-layer, we are accomplished by using the grammar design of multilamellar:
{ { ground floor rule } } { { second layer rule } } ... .{{ n-th layer rule } }, we just can realize the rule of infinite layering and pick up in such a way.
Then, resolve the above-mentioned details being associated with Link Rule rule, obtain several details rule syntaxs for gathering content on this Website page, thus the content gathered on the page.The effect of details rule is to tell how capture program goes to gather the text message on website, capture program just can realize the collection of the information to website by this rule. and each details rule syntax is made up of several collection sub-rules specifically, and each gathers sub-rule corresponding to gathering a certain block text information on website.Specifically to configure several collection sub-rule depending on concrete business.Than if desired for the words gathering news, it usually needs configure following collection sub-rule: title sub-rule, text sub-rule, author's sub-rule, issuing time sub-rule etc..So ConfigurationDetails rule configures one or more exactly gathers sub-rule, then these are gathered sub-rule and be combined just passable.One gathers sub-rule and is made up of following 3 parts: CSS matched rule, CSS filtering rule, regular expression rule.Their effect is as follows respectively: CSS matched rule: the HTML node that we are required selected in application CSS selector grammer;Filtering rule after CSS: our unwanted node selected in application CSS selector grammer, and compare with the node chosen in previous step, if identical with the node in previous step, then remove the selection in previous step;Regular expression rule: using regular expression rule is further processed modification, ratio is if desired for situations such as one part of joint linked are replaced.After our acquisition engine reads the rule of this layer, the information on the page can be gathered in the following order:
1, utilizing CSS matched rule to remove the node on coupling html, result should be a node listing;
2, after utilizing CSS, filtering rule filters out our undesired node in previous step result, just skips over this step without filtering rule after configuration CSS and gathers;
3, judging whether also have configuration regular expression rule, without configuration regular expression rule, the text comprised by the node matched above is as final result;If being configured with regular expression, then text node comprised does further modification, and the result after modification as final result.
News to gather Sina website's page below, illustrates how our rule configures:
1, configuration entrance rule.
Entrance URL is set to the homepage of Sina:http://www.sina.com.cn
Configure the ID:1 (this ID realizes being as the criterion of distribution with system, fills in 1 herein and uses for citing) of next Link Rule associated
2, configuration Link Rule.
Link Rule grammer is arranged, and we need to gather all of news links in homepage herein, through observation we have found that the URL of news links be substantially with: numeral .shtml ending. so our rule can so configure:
CSS matched rule could be arranged to: body
Meaning is grammer according to CSS the BODY node of the whole document target as us, because all of link is all contained by BODY node.
After CSS, filtering rule could be arranged to: a:not (a [href ~=d{4}.shtml])
Grammatical rules according to CSS, the meaning of this rule is that the URL filtering out the link of BODY label does not comprise data+.shtml link. so we can ensure that result is all the link of news, rather than the link of the non-new text such as second-level directory.
Destination node rule could be arranged to: a
Grammatical rules according to CSS, is exactly that all of a label in BODY node is as directory node.
Nodal community rule could be arranged to: href
Meaning is to choose the href attribute url as a result in a label.
Configure the ID:2 (this ID realizes being as the criterion of distribution with system, fills in 2 herein and uses for citing) of next collection rule associated
3, ConfigurationDetails rule.
Assume that we need the content gathered to be the title of news, text, these fields of issuing time, if the news package that previous step collects contains this URL:http://news.sina.com.cn/c/2015-11-30/doc- ifxmainy1476425.shtml, we are just to gather this URL here, and we can configure collection rule as follows:
Title CSS matched rule: h1
According to CSS grammer, this rule is that to choose a label be the node of h1. owing to this CSS matched rule has been able to collect the title of news, we need not mate filtering rule and the canonical matched rule of title again.
Text CSS matched rule: #artibody
According to CSS grammer, it is the node of artibody that this rule chooses an ID. similarly, as this CSS rule has been able to collect the text of news, we are also without the filtering rule and the canonical matched rule that additionally configure text.
Issuing time CSS matched rule: #navtimeSource
According to CSS grammer, it is the node of navtimeSource that this rule chooses an ID. owing to this node includes the information of time, we have just collected issuing time, and we are no longer necessary to additionally configure the filtering rule of issuing time and canonical matched rule.
Last: Created:2015-12-01 16:37 Tuesday
Emacs24.4.1(Orgmode8.2.10)
Validate
The technical scheme above embodiment of the present invention provided is described in detail, principle and the embodiment of the embodiment of the present invention are set forth by specific case used herein, and the explanation of above example is only applicable to help to understand the principle of the embodiment of the present invention;Simultaneously for one of ordinary skill in the art, according to the embodiment of the present invention, all will change in detailed description of the invention and range of application, in sum, this specification content should not be construed as limitation of the present invention.

Claims (10)

1., based on the website data searching method of configurable rule, it sequentially includes the following steps:
(1) for configuration entrance rule in website to be searched, Link Rule and details rule;
(2) entrance rule, parameter when obtaining the entrance url of website and the Link Rule of entrance rule association and access this website are resolved;
(3) resolve above-mentioned and entrance rule association Link Rule, obtain one for the Link Rule grammer resolving this Website page and the details rule being associated with this Link Rule;
(4) resolve the above-mentioned details being associated with Link Rule rule, obtain several details rule syntaxs for gathering content on this Website page, thus the content gathered on the page.
2. according to claim 1 based on the website data searching method of configurable rule, it is characterised in that: described Link Rule is one or more layers structure;After performing Link Rule, obtain the url of Object linking;If the url of Object linking is the url of details, then Link Rule adopts a Rotating fields;If the url of Object linking is not the url of details, then Link Rule takes multiple structure to carry out recurrence execution.
3. as claimed in claim 1 based on the website data searching method of configurable rule, it is characterized in that: described Link Rule grammer includes CSS matched rule, destination node rule and nodal community rule, wherein CSS matched rule is the HTML node that application CSS selector grammer selects to need, and obtains a node listing;Destination node rule, for above-mentioned node listing is further mated, obtains one group of node comprising link;Nodal community rule, for the attribute of the above-mentioned node comprising link is mated, obtains one group of url list.
4. as claimed in claim 3 based on the website data searching method of configurable rule, it is characterised in that: described Link Rule grammer also includes filtering rule before CSS, and before CSS, filtering rule is that unwanted node in HTML selected in application CSS selector grammer.
5. as claimed in claim 3 based on the website data searching method of configurable rule, it is characterized in that: described Link Rule grammer also includes filtering rule after CSS, after CSS, filtering rule is that application CSS selector grammer selects unwanted node, and the node chosen with CSS matched rule compares, if node is identical, then remove the node that CSS matched rule chooses;If node is the child node of the node that CSS matched rule is chosen, then in the node that CSS matched rule is chosen, delete corresponding child node.
6. as claimed in claim 1 based on the website data searching method of configurable rule, it is characterised in that: described Link Rule grammer also includes regular expression rule, and regular expression rule is that the above-mentioned node comprising link is further modified.
7. as claimed in claim 1 based on the website data searching method of configurable rule, it is characterised in that: each described details rule syntax is gathered sub-rule by several and forms, and each gathers sub-rule corresponding to gathering a certain block text information on website.
8. as claimed in claim 7 based on the website data searching method of configurable rule, it is characterized in that: each described collection sub-rule includes CSS matched rule, utilize the node on CSS matched rule coupling html, and the text that the node that will match to comprises is as final result.
9. as claimed in claim 7 based on the website data searching method of configurable rule, it is characterized in that: each described collection sub-rule also includes filtering rule after CSS, CSS selector grammer is utilized to select unwanted node, and the node mated with CSS matched rule compares, if node is identical, then remove the node that CSS matched rule chooses.
10. as claimed in claim 7 based on the website data searching method of configurable rule, it is characterised in that: each described collection sub-rule also includes regular expression rule, utilizes the text that above-mentioned node is comprised by regular expression rule to do further modification.
CN201610152001.7A 2016-03-17 2016-03-17 Configuration rule based website data search method Pending CN105760545A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610152001.7A CN105760545A (en) 2016-03-17 2016-03-17 Configuration rule based website data search method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610152001.7A CN105760545A (en) 2016-03-17 2016-03-17 Configuration rule based website data search method

Publications (1)

Publication Number Publication Date
CN105760545A true CN105760545A (en) 2016-07-13

Family

ID=56333372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610152001.7A Pending CN105760545A (en) 2016-03-17 2016-03-17 Configuration rule based website data search method

Country Status (1)

Country Link
CN (1) CN105760545A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092632A (en) * 2017-02-09 2017-08-25 北京小度信息科技有限公司 Data processing method and device
CN107784113A (en) * 2017-11-08 2018-03-09 深圳市科盾科技有限公司 Html web page collecting method, device and computer-readable recording medium
CN109359232A (en) * 2018-08-21 2019-02-19 中国平安人寿保险股份有限公司 Obtain method, apparatus, computer equipment and the storage medium of room rate
CN109472630A (en) * 2018-09-17 2019-03-15 平安科技(深圳)有限公司 Check method, apparatus, computer equipment and the storage medium of room rate in declaration form
CN111881404A (en) * 2020-08-05 2020-11-03 广州裕睿信息科技有限公司 Configuration data acquisition method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020088A (en) * 2011-09-27 2013-04-03 腾讯科技(深圳)有限公司 Data processing device and method
CN103279567A (en) * 2013-06-18 2013-09-04 重庆邮电大学 Web data collection method and system both based on AJAX (asynchronous javascript and extensible markup language)
CN105302876A (en) * 2015-09-28 2016-02-03 孙燕群 Regular expression based URL filtering method
CN105335516A (en) * 2015-11-04 2016-02-17 浪潮软件集团有限公司 Construction method of universal acquisition system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020088A (en) * 2011-09-27 2013-04-03 腾讯科技(深圳)有限公司 Data processing device and method
CN103279567A (en) * 2013-06-18 2013-09-04 重庆邮电大学 Web data collection method and system both based on AJAX (asynchronous javascript and extensible markup language)
CN105302876A (en) * 2015-09-28 2016-02-03 孙燕群 Regular expression based URL filtering method
CN105335516A (en) * 2015-11-04 2016-02-17 浪潮软件集团有限公司 Construction method of universal acquisition system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092632A (en) * 2017-02-09 2017-08-25 北京小度信息科技有限公司 Data processing method and device
CN107784113A (en) * 2017-11-08 2018-03-09 深圳市科盾科技有限公司 Html web page collecting method, device and computer-readable recording medium
CN109359232A (en) * 2018-08-21 2019-02-19 中国平安人寿保险股份有限公司 Obtain method, apparatus, computer equipment and the storage medium of room rate
CN109472630A (en) * 2018-09-17 2019-03-15 平安科技(深圳)有限公司 Check method, apparatus, computer equipment and the storage medium of room rate in declaration form
CN111881404A (en) * 2020-08-05 2020-11-03 广州裕睿信息科技有限公司 Configuration data acquisition method, device and system

Similar Documents

Publication Publication Date Title
CN105760545A (en) Configuration rule based website data search method
CN108052632B (en) Network information acquisition method and system and enterprise information search system
CN103475687B (en) Distributed method and system for download site data
CN102339320B (en) Malicious web recognition method and device
CN102591992A (en) Webpage classification identifying system and method based on vertical search and focused crawler technology
CN101520798A (en) Webpage classification technology based on vertical search and focused crawler
CN102098229B (en) Method and device for optimizing and auditing uniform resource locator (URL) as well as network device
EP1713010A2 (en) Using attribute inheritance to identify crawl paths
Weltevrede et al. Where do bloggers blog? Platform transitions within the historical Dutch blogosphere
CN1705944A (en) System and method for conducting adaptive search using a peer-to-peer network
CN101630330A (en) Method for webpage classification
US20170053031A1 (en) Information forecast and acquisition method based on webpage link parameter analysis
KR101236990B1 (en) Cooperative Spatial Query Processing Method between a Server and a Sensor Network and Server thereof
CN104268216A (en) Data cleaning system based on internet information
CN103258017B (en) A kind of parallel square crossing network data acquisition method and system
CN101404666A (en) Infinite layer collection method based on Web page
CN102855418A (en) Method for discovering Web intranet agent bugs
CN106202467A (en) A kind of definable towards peer-to-peer network searches for the web crawlers method of emphasis
CN103823907A (en) Method, device and engine for integrating on-line video resource addresses
US7975218B2 (en) Apparatus and method for forming document group structure data and storage medium
CN105677921A (en) Method and system for acquiring Internet public opinion data
Aspert et al. A graph-structured dataset for Wikipedia research
CN104156458B (en) The extracting method and device of a kind of information
CN101727485B (en) WSDL collection method based on focused search
CN106066875A (en) A kind of high efficient data capture method and system based on deep net reptile

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160713

RJ01 Rejection of invention patent application after publication