CN106649322A - Method and device for crawling keyword category information from electronic business websites - Google Patents
Method and device for crawling keyword category information from electronic business websites Download PDFInfo
- Publication number
- CN106649322A CN106649322A CN201510719610.1A CN201510719610A CN106649322A CN 106649322 A CN106649322 A CN 106649322A CN 201510719610 A CN201510719610 A CN 201510719610A CN 106649322 A CN106649322 A CN 106649322A
- Authority
- CN
- China
- Prior art keywords
- electric business
- url
- information
- business website
- keyword
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
- G06F16/9566—URL specific, e.g. using aliases, detecting broken or misspelled links
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Abstract
The invention discloses a method and device for crawling keyword category information from electronic business websites, relates to the technical field of internet and mainly aims at improving the efficiency of crawling the keyword category information from the electronic business websites. According to the main technical scheme, search URL of the electronic business websites are established according to information of the electronic business websites and keywords for crawling category information; the established search URL of the electronic business websites are accessed to obtain page information of webpages corresponding to the URL; the page information of the webpages is parsed to extract the information for describing the keyword category information of the electronic business websites in the pages, and the keyword category information on the electronic business websites is obtained. The method and the device are mainly used for crawling the keyword category information on the electronic business websites.
Description
Technical field
The present invention relates to Internet technical field, more particularly to one kind crawls electric business website keyword category
The method and device of information.
Background technology
Keyword classification information is a highly important information.Especially for electric business website, for
One search keyword at family, correctly provides the category belonging to keyword, for electric business website and
For search engine marketing is all of great significance.Wherein, the category at this is just for electric business,
It refers to the attribute of foundation commodity, and commodity are divided into some classifications, and can according to different dimensions
To carry out multistage category.
Web crawlers is very general in an internet, generally existing technology.Many companies, it is personal
Will be by web crawlers come batch, the information crawled on a large scale on WWW.General network
Reptile, its principle for crawling information is generally, and it safeguards one group of URL (Uniform
Resource Locator, URL) list, add an initial URL in lists first, then
Each URL in traversal url list, obtains the corresponding pages of URL, then extracts in the page
URL, update in url list.
At present, when electric business website keyword category information is crawled, what is be usually used is exactly general net
Network reptile.Because the merchandise news of electric business website is various, its different commodity corresponds to the different pages, therefore
Obtain the category information of different keyword corresponding goods, it is necessary to going from the webpage for newly crawling repeatedly
Extract the URL information of webpage and then be maintained into url list, carry out URL corresponding pages again afterwards
Acquisition so that crawl the less efficient of electric business website keyword category information.
The content of the invention
In view of this, the present invention provides a kind of method and dress for crawling electric business website keyword category information
Put, its main purpose is to improve the efficiency for crawling electric business website keyword category information.
To reach above-mentioned purpose, the present invention provides following technical scheme:
On the one hand, the present invention provides a kind of method for crawling electric business website keyword category information, including:
According to electric business site information, the search unification of the keyword construction electric business website for crawling category information
URLs URL;
The search URL of the electric business website of construction is accessed, the page letter of the corresponding webpages of the URL is obtained
Breath;
The page info of the webpage is parsed, electric business website described in the page is extracted and is closed
The information of keyword category, obtains electric business website keyword category information.
On the other hand, the present invention provides a kind of device for crawling electric business website keyword category information, bag
Include:
Structural unit, for according to electric business site information, crawl category information keyword construction electric business
The search uniform resource position mark URL of website;
Access unit, for accessing the search URL of the electric business website of construction, obtains the URL correspondences
Webpage page info;
Resolution unit, for parsing to the page info of the webpage, in extracting the page
The information of description electric business website keyword category, obtains electric business website keyword category information.
What the present invention was provided crawls the method and device of electric business website keyword category information, and it is climbed
The webpage URL of power taking business website keyword category information is extracted from known web pages, but root
According to electric business site information, crawl category information keyword construction, so relative to prior art,
Eliminate and URL is extracted from known web pages and URL is stored in url list, enter again afterwards
Crawling for row URL correspondence webpages, improves to a certain extent the efficiency of the webpage for crawling, Jin Erti
The high efficiency for crawling electric business website keyword category information.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the present invention's
Technological means, and being practiced according to the content of specification, and in order to allow the above-mentioned of the present invention and
Other objects, features and advantages can become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of the drawings
By the detailed description for reading hereafter preferred embodiment, various other advantage and benefit for
Those of ordinary skill in the art will be clear from understanding.Accompanying drawing is only used for illustrating the mesh of preferred embodiment
, and it is not considered as limitation of the present invention.And in whole accompanying drawing, with identical with reference to symbol
Number represent identical part.In the accompanying drawings:
Fig. 1 shows that the embodiment of the present invention provides a kind of side for crawling electric business website keyword category information
Method flow chart;
Fig. 2 shows that the embodiment of the present invention provides a kind of dress for crawling electric business website keyword category information
Put composition frame chart.
Specific embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing in accompanying drawing
The exemplary embodiment of the disclosure is shown, it being understood, however, that may be realized in various forms the disclosure
And should not be limited by embodiments set forth here.On the contrary, there is provided these embodiments are able to more
Thoroughly understand the disclosure, and can be by the complete technology for conveying to this area of the scope of the present disclosure
Personnel.
The embodiment of the present invention provides a kind of method for crawling electric business website keyword category information, such as Fig. 1
Shown, the method includes:
101st, according to electric business site information, the search of the keyword construction electric business website for crawling category information
URL。
It should be noted that crawl the corresponding URL of access critical word of category information, and in electric business net
It is input into keyword in standing to scan for, the page of return is identical, in general, electric business website
Search URL has a format like http://search.XXX.com/SearchKeyword=YYY, wherein,
XXX is the domain name of electric business website, and YYY refers to the keyword for specifically crawling category information.
Based on this kind of principle, the electric business site information in the embodiment of the present invention can be but be not limited to electricity
The domain name of business website, according to electric business site information, crawls the keyword construction electric business website of category information
Search URL following shape can be constructed according to the domain-name information of electric business, the keyword for crawling category information
The search URL of the electric business website of formula, the form of the search URL of structure is as implied above.For each
The keyword of input, replaces the YYY parts in URL, constructs corresponding search URL.
102nd, the search URL of the electric business website of construction is accessed, the corresponding webpages of the URL are obtained
Page info.
Further, in order to accelerate access construction electric business website search URL, access when,
Can carry out in batches.For example, by the network library of programming language offer (such as the requests in Python
Storehouse) batch access construction electric business website search URL.Some multithreadings can specifically be passed through
Method, the search URL of the electric business website of construction is simultaneously and concurrently accessed in batches by multithreading, obtains institute
State the page info of the corresponding webpages of URL.Can certainly be using other batch access methods, tool
The embodiment of the present invention is not limited to this when body is implemented.
It should be noted that when the page info of the corresponding webpages of the URL is obtained, the page of acquisition
Surface information can be HTML (Hyper text Markup Language, HTML) code
Form, the concrete embodiment of the present invention is not defined to this.But subsequent page information for convenience
Parsing, the page info of the preferred HTML code form of the embodiment of the present invention.
103rd, the page info of the webpage is parsed, extracts electric business net described in the page
The information of keyword category of standing, obtains electric business website keyword category information.
, wherein it is desired to explanation, the page info to the webpage carry out parsing extract it is described
The information of electric business website keyword category described in the page, when obtaining electric business website keyword category information,
It is different according to the different meetings of form of the page info for obtaining.
For example, when the page info is HTML code form, directly to the HTML code
Parsed, just can be extracted the information of electric business website keyword category described in the page, obtained
To electric business website keyword category information.Wherein, directly the HTML code parsed, just
The information of electric business website keyword category described in the page can be extracted, electric business website pass is obtained
Keyword category information, is specifically as follows using the lxml bags in Python, according to CSS
(Cascading Style Sheets, it is one kind for showing HTML or XML (standard generalized markups
The a subset of language) etc. file pattern computer language) information, extract and retouched in the page
The information of electric business website keyword category is stated, electric business website keyword category information is obtained.
In the embodiment of the present invention, it carries out crawling the webpage URL of electric business website keyword category information not
To extract from known web pages, but according to electric business site information, crawl the keyword of category information
Construction, so relative to prior art, eliminate and URL is extracted from known web pages and by URL
In being stored in url list, crawling for URL correspondence webpages is carried out again afterwards, carry to a certain extent
The efficiency of the high webpage for crawling, and then improve the efficiency for crawling electric business website keyword category information.
Also, the embodiment of the present invention can be carried out in batches when the search URL of construction is accessed, and enter one
Step improves the efficiency of the webpage for crawling, and then improves and crawl electric business website keyword category information
Efficiency.
Based on said method embodiment, the embodiment of the present invention also provides one kind and crawls electric business website keyword
The device of category information, as shown in Fig. 2 the device includes:
Structural unit 21, for according to electric business site information, crawl category information keyword construction electricity
The search URL of business website;Wherein, crawl the corresponding URL of access critical word of category information, with
It is input into keyword in electric business website to scan for, the page of return is identical, in general, electric business
The search URL of website has a format like:
http://search.XXX.com/SearchKeyword=YYY, wherein, XXX is electric business website
Domain name, YYY refers to the keyword for specifically crawling category information.
Based on this kind of principle, the electric business site information in the embodiment of the present invention can be but be not limited to electricity
The domain name of business website, according to electric business site information, crawls the keyword construction electric business website of category information
Search URL following shape can be constructed according to the domain-name information of electric business, the keyword for crawling category information
The search URL of the electric business website of formula, the form of the search URL of structure is as implied above.For each
The keyword of input, replaces the YYY parts in URL, constructs corresponding search URL.
Access unit 22, for accessing the search URL of the electric business website of construction, obtains the URL
The page info of corresponding webpage;Wherein, further, in order to accelerate access construction electric business website
Search URL, access when, can carry out in batches.For example, provided by programming language
Network library (such as the requests storehouses in Python) batch accesses the search URL of the electric business website of construction.
The method that can specifically some multithreadings be passed through, by multithreading, simultaneously and concurrently batch accesses construction
The search URL of electric business website, obtains the page info of the corresponding webpages of the URL.Can certainly
Using other batch access methods, the embodiment of the present invention is not limited to this when being embodied as.
It should be noted that when the page info of the corresponding webpages of the URL is obtained, the page of acquisition
Surface information can be HTML code form, and the concrete embodiment of the present invention is not defined to this.But
The parsing of subsequent page information for convenience, the page of the preferred HTML code form of the embodiment of the present invention
Information.
Resolution unit 23, for parsing to the page info of the webpage, extracts the page
Described in electric business website keyword category information, obtain electric business website keyword category information.Wherein,
It should be noted that carry out parsing in the page info to the webpage extracting described in the page
The information of electric business website keyword category, when obtaining electric business website keyword category information, according to acquisition
Page info form it is different can be different.
For example, when the page info is HTML code form, directly to the HTML code
Parsed, just can be extracted the information of electric business website keyword category described in the page, obtained
To electric business website keyword category information.Wherein, directly the HTML code parsed, just
The information of electric business website keyword category described in the page can be extracted, electric business website pass is obtained
Keyword category information, is specifically as follows using the lxml bags in Python, according to CSS information, carries
The information of electric business website keyword category described in the page is taken out, electric business website keyword product are obtained
Category information.
In the embodiment of the present invention, it carries out crawling the webpage URL of electric business website keyword category information not
To extract from known web pages, but according to electric business site information, crawl the keyword of category information
Construction, so relative to prior art, eliminate and URL is extracted from known web pages and by URL
In being stored in url list, crawling for URL correspondence webpages is carried out again afterwards, carry to a certain extent
The efficiency of the high webpage for crawling, and then improve the efficiency for crawling electric business website keyword category information.
Also, the embodiment of the present invention can be carried out in batches when the search URL of construction is accessed, and enter one
Step improves the efficiency of the webpage for crawling, and then improves and crawl electric business website keyword category information
Efficiency.
The device for crawling electric business website keyword category information includes processor and memory, above-mentioned
Structural unit, access unit and resolution unit etc. as program unit store in memory, by
Reason device performs storage said procedure unit in memory to realize corresponding function.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can
To arrange one or more, improved by adjusting kernel parameter and crawl electric business website keyword category letter
The efficiency of breath.
Memory potentially includes the volatile memory in computer-readable medium, random access memory
The form such as device (RAM) and/or Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash
RAM), memory includes at least one storage chip.
Present invention also provides a kind of computer program, when performing on data processing equipment,
It is adapted for carrying out initializing the program code of there are as below methods step:According to electric business site information, crawl product
The search uniform resource position mark URL of the keyword construction electric business website of category information;Access the electricity of construction
The search URL of business website, obtains the page info of the corresponding webpages of the URL;To the webpage
Page info is parsed, and extracts the information of electric business website keyword category described in the page,
Obtain electric business website keyword category information.
Those skilled in the art it should be appreciated that embodiments herein can be provided as method, system,
Or computer program.Therefore, the application can be implemented using complete hardware embodiment, complete software
Example or with reference to the form of the embodiment in terms of software and hardware.And, the application can be adopted at one
Or it is multiple wherein include computer usable program code computer-usable storage medium (including but not
Be limited to magnetic disc store, CD-ROM, optical memory etc.) on the computer program implemented
Form.
The application is with reference to the method according to the embodiment of the present application, equipment (system) and computer program
The flow chart and/or block diagram of product is describing.It should be understood that can be realized flowing by computer program instructions
In each flow process and/or square frame and flow chart and/or block diagram in journey figure and/or block diagram
Flow process and/or square frame combination.Can provide these computer program instructions to all-purpose computer, specially
With the processor of computer, Embedded Processor or other programmable data processing devices producing one
Machine so that produced by the instruction of computer or the computing device of other programmable data processing devices
It is raw to be used to realize in one flow process of flow chart or one square frame of multiple flow processs and/or block diagram or multiple sides
The device of the function of specifying in frame.
These computer program instructions may be alternatively stored in can guide computer or other programmable datas to process
In the computer-readable memory that equipment works in a specific way so that be stored in the computer-readable and deposit
Instruction in reservoir is produced and includes the manufacture of command device, and command device realization is in flow chart one
The function of specifying in flow process or one square frame of multiple flow processs and/or block diagram or multiple square frames.
These computer program instructions can also be loaded into computer or other programmable data processing devices
On so that series of operation steps is performed on computer or other programmable devices to produce computer
The process of realization, so as to the instruction performed on computer or other programmable devices is provided for realizing
Specify in one flow process of flow chart or one square frame of multiple flow processs and/or block diagram or multiple square frames
The step of function.
In a typical configuration, computing device include one or more processors (CPU), input/
Output interface, network interface and internal memory.
Memory potentially includes the volatile memory in computer-readable medium, random access memory
The form such as device (RAM) and/or Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash
RAM).Memory is the example of computer-readable medium.
Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be with
Information Store is realized by any method or technique.Information can be computer-readable instruction, data knot
Structure, the module of program or other data.The example of the storage medium of computer includes, but are not limited to phase
Become internal memory (PRAM), static RAM (SRAM), dynamic random access memory
(DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electricity can
Erasable programmable read-only memory (EPROM) (EEPROM), fast flash memory bank or other memory techniques, read-only light
Disk read-only storage (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic
Cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus or any other non-transmission medium,
Can be used to store the information that can be accessed by a computing device.Define according to herein, computer-readable
Medium does not include temporary computer readable media (transitory media), the such as data-signal and load of modulation
Ripple.
Embodiments herein is these are only, the application is not limited to.For this area skill
For art personnel, the application can have various modifications and variations.It is all spirit herein and principle it
Interior made any modification, equivalent substitution and improvements etc., should be included in claims hereof model
Within enclosing.
Claims (10)
1. a kind of method for crawling electric business website keyword category information, it is characterised in that include:
According to electric business site information, the search unification of the keyword construction electric business website for crawling category information
URLs URL;
The search URL of the electric business website of construction is accessed, the page letter of the corresponding webpages of the URL is obtained
Breath;
The page info of the webpage is parsed, electric business website described in the page is extracted and is closed
The information of keyword category, obtains electric business website keyword category information.
2. method according to claim 1, it is characterised in that the electric business site information includes
The domain name of electric business website;According to electric business site information, crawl category information keyword construction electric business net
The search URL for standing includes:
Domain-name information according to electric business, the keyword for crawling category information construct the electric business net of following form
The search URL for standing:
http://search.XXX.com/SearchKeyword=YYY
Wherein, XXX is the domain name of electric business website, and YYY refers to the key for specifically crawling category information
Word.
3. method according to claim 1 and 2, it is characterised in that access the electric business net of construction
The search URL for standing, obtaining the page info of the corresponding webpages of the URL includes:
Batch accesses the search URL of the electric business website of construction, obtains the page of the corresponding webpages of the URL
Surface information.
4. method according to claim 3, it is characterised in that the batch accesses the electricity of construction
The search URL of business website, obtaining the page info of the corresponding webpages of the URL includes:
Simultaneously and concurrently access the search URL of the electric business website of construction in batches by multithreading, obtain described
The page info of the corresponding webpages of URL.
5. method according to claim 4, it is characterised in that the page info is hypertext
The page info of the form of markup language HTML code.
6. method according to claim 5, it is characterised in that the page info to the webpage
Parsed, extracted the information of electric business website keyword category described in the page, obtained electric business
Website keyword category information includes:
Directly the HTML code is parsed, electric business website described in the page is extracted and is closed
The information of keyword category, obtains electric business website keyword category information.
7. a kind of device for crawling electric business website keyword category information, it is characterised in that include:
Structural unit, for according to electric business site information, crawl category information keyword construction electric business
The search uniform resource position mark URL of website;
Access unit, for accessing the search URL of the electric business website of construction, obtains the URL correspondences
Webpage page info;
Resolution unit, for parsing to the page info of the webpage, in extracting the page
The information of description electric business website keyword category, obtains electric business website keyword category information.
8. device according to claim 7, it is characterised in that the electric business site information includes
The domain name of electric business website;The structural unit specifically for:
Domain-name information according to electric business, the keyword for crawling category information construct the electric business net of following form
The search URL for standing:
http://search.XXX.com/SearchKeyword=YYY
Wherein, XXX is the domain name of electric business website, and YYY refers to the key for specifically crawling category information
Word.
9. the device according to claim 7 or 8, it is characterised in that the access unit is used for
Batch accesses the search URL of the electric business website of construction, obtains the page letter of the corresponding webpages of the URL
Breath.
10. device according to claim 9, it is characterised in that the batch accesses construction
The search URL of electric business website, obtaining the page info of the corresponding webpages of the URL includes:
Simultaneously and concurrently access the search URL of the electric business website of construction in batches by multithreading, obtain described
The page info of the corresponding webpages of URL.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510719610.1A CN106649322A (en) | 2015-10-29 | 2015-10-29 | Method and device for crawling keyword category information from electronic business websites |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510719610.1A CN106649322A (en) | 2015-10-29 | 2015-10-29 | Method and device for crawling keyword category information from electronic business websites |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649322A true CN106649322A (en) | 2017-05-10 |
Family
ID=58830257
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510719610.1A Pending CN106649322A (en) | 2015-10-29 | 2015-10-29 | Method and device for crawling keyword category information from electronic business websites |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649322A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555176A (en) * | 2018-03-30 | 2019-12-10 | 佛山市优特美邦电子商务有限公司 | E-commerce platform constructed by adopting internet commodity data analysis and collection method |
CN111368174A (en) * | 2020-03-09 | 2020-07-03 | 北京九州云动科技有限公司 | Searching method and device supporting multi-provider platform commodity URL or commodity password |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012113658A (en) * | 2010-11-26 | 2012-06-14 | Ntt Docomo Inc | Data prefetch system, and device, method, and program therefor |
CN102930059A (en) * | 2012-11-26 | 2013-02-13 | 电子科技大学 | Method for designing focused crawler |
CN102982174A (en) * | 2012-12-17 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device for performing web search in browser |
CN103927400A (en) * | 2014-05-07 | 2014-07-16 | 重庆邮电大学 | Web site product detailed information classification crawling and product information base establishing method |
CN104881501A (en) * | 2015-06-19 | 2015-09-02 | 四川大学 | Automatic Internet information obtaining and pushing method |
-
2015
- 2015-10-29 CN CN201510719610.1A patent/CN106649322A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2012113658A (en) * | 2010-11-26 | 2012-06-14 | Ntt Docomo Inc | Data prefetch system, and device, method, and program therefor |
CN102930059A (en) * | 2012-11-26 | 2013-02-13 | 电子科技大学 | Method for designing focused crawler |
CN102982174A (en) * | 2012-12-17 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device for performing web search in browser |
CN103927400A (en) * | 2014-05-07 | 2014-07-16 | 重庆邮电大学 | Web site product detailed information classification crawling and product information base establishing method |
CN104881501A (en) * | 2015-06-19 | 2015-09-02 | 四川大学 | Automatic Internet information obtaining and pushing method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110555176A (en) * | 2018-03-30 | 2019-12-10 | 佛山市优特美邦电子商务有限公司 | E-commerce platform constructed by adopting internet commodity data analysis and collection method |
CN111368174A (en) * | 2020-03-09 | 2020-07-03 | 北京九州云动科技有限公司 | Searching method and device supporting multi-provider platform commodity URL or commodity password |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11675969B2 (en) | Dynamic native content insertion | |
CN107808000B (en) | System and method for collecting and extracting data of dark net | |
US8239387B2 (en) | Structural clustering and template identification for electronic documents | |
US7725466B2 (en) | High accuracy document information-element vector encoding server | |
CN102831252B (en) | A kind of method for upgrading index data base and device, searching method and system | |
Gowda et al. | Clustering web pages based on structure and style similarity (application paper) | |
JP6203374B2 (en) | Web page style address integration | |
US11580177B2 (en) | Identifying information using referenced text | |
US8205153B2 (en) | Information extraction combining spatial and textual layout cues | |
Szeredi et al. | The semantic web explained: The technology and mathematics behind web 3.0 | |
CN1909522A (en) | Method for acquiring front-page keyword and its application system | |
CN104331438B (en) | To novel web page contents selectivity abstracting method and device | |
CN102314494B (en) | Method and equipment for processing webpage contents | |
CN106547749B (en) | Webpage data acquisition method and device | |
CN103744845A (en) | Method and system for WEB platform data caching | |
US20220292160A1 (en) | Automated system and method for creating structured data objects for a media-based electronic document | |
CN110020068B (en) | Method and device for configuring page crawling rules | |
CN106649322A (en) | Method and device for crawling keyword category information from electronic business websites | |
CN108121712A (en) | A kind of keyword storage method and device | |
US9195940B2 (en) | Jabba-type override for correcting or improving output of a model | |
Li et al. | Practical study of subclasses of regular expressions in DTD and XML schema | |
CN110110182A (en) | A kind of collecting method and system suitable for crawling in batches | |
Fugazza et al. | Describing geospatial assets in the Web of Data: A metadata management scenario | |
CN104021143A (en) | Method and device for recording webpage access behavior | |
US9530094B2 (en) | Jabba-type contextual tagger |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |
|
RJ01 | Rejection of invention patent application after publication |