CN106886547A - A kind of scenario generation method and device - Google Patents

A kind of scenario generation method and device Download PDF

Info

Publication number
CN106886547A
CN106886547A CN201610551151.5A CN201610551151A CN106886547A CN 106886547 A CN106886547 A CN 106886547A CN 201610551151 A CN201610551151 A CN 201610551151A CN 106886547 A CN106886547 A CN 106886547A
Authority
CN
China
Prior art keywords
web page
script
page contents
code
webpage
Prior art date
Application number
CN201610551151.5A
Other languages
Chinese (zh)
Inventor
孙宇
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to CN201610551151.5A priority Critical patent/CN106886547A/en
Publication of CN106886547A publication Critical patent/CN106886547A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code

Abstract

This application discloses a kind of scenario generation method and device, during for solving to capture web page contents using web crawlers in the prior art, the less efficient problem of manual compiling crawl script.The method includes:Determine the web page contents that user selectes in the webpage of display;According to the web page contents for determining, it is determined that the corresponding web page code of web page contents of display;According to the web page code, generation crawl script.

Description

A kind of scenario generation method and device

Technical field

The application is related to field of computer technology, more particularly to a kind of scenario generation method and device.

Background technology

Conventionally, as web crawlers can capture the content of text in webpage, it is widely used at present In the fields such as search, data mining.Web crawlers can capture the full content in webpage, it is also possible to capture the part in webpage Content.

At present, to go to capture the object content in target web using web crawlers, staff need to first write crawl The script of object content, web crawlers can grab object content according to the script.

If for example, now want to be captured using web crawlers the pricing information of the commodity in webpage as shown in Figure 1, i.e., " price:$149.99”.So, staff just will be by the corresponding webpage of browser access, then in the corresponding net of the webpage " price is searched in page code:The corresponding web page codes of $ 149.99 ", i.e. " price:The corresponding minimum document objects of $ 149.99 " Model (Document Object Model, dom) is set.

Wherein, " price:The corresponding minimum dom trees of $ 149.99 " are as follows:

<Div id=" kfs_family_16 " class=" kfs-inner-container kfs-selected " Style=" width:20%;left:40%;background-image:url(https://images-na.ssl- images-amazon.com/ima ges/G/01/kindle/stripe/kfs-selector-2._CB386844303_ .gif);" onClick=" javascript:(function(){})()">

<A class=" kfs-current kfs-link ">

<Imgclass=" kfs-img " style=" margin-top:9px;" src=" https://images- na.ssl-images-amazon.com/images/G/01/kindle/dp/2015/848470/famnav/fs-m._ CB292709393_.p ng"/>

<br/>

Fire HD 8

<br/>

<Span class=" kfs-price ">

$149.99

</span>

<br/>

</a>

<Divid=" kfs_popover_content_16 "=class=" kfs-popover-container " style =" displ ay:none;">Incredibly thin and light,designed for entertainment</div>

Finding " price:After the corresponding minimum dom trees of $ 149.99 ", " price is obtained:$ 149.99 " is corresponding super Text mark up language (HyperText Markup Language, HTML) attribute value information, such as id=" kfs_family_ 16 ", class=" kfs-price " etc..Staff is write out comprising those html attribute values according to those attribute value informations The crawl script of information.During the script for writing and the corresponding web page code of the commodity sent into analytics engine in the lump, so that Obtaining analytics engine can find " price according to the id and class in crawl script:The corresponding minimum dom of $ 149.99 " Tree, and extract " price in the minimum dom trees:$ 149.99 " this pricing information.

Although by the above method, web crawlers can capture the content in webpage, however it is necessary that manual compiling crawl pin This, it is less efficient.

The content of the invention

The embodiment of the present application provides a kind of scenario generation method and device, for solving to utilize web crawlers in the prior art During crawl web page contents, the less efficient problem of manual compiling crawl script.

The embodiment of the present application uses following technical proposals:

A kind of scenario generation method, including:

Determine the web page contents that user selectes in the webpage of display;

According to the web page contents for determining, the corresponding web page code of the web page contents is determined;

According to the web page code, generation crawl script.

A kind of script generation device, including:

Content determination module, determines the web page contents that user selectes in the webpage of display;

Code determining module, according to the web page contents for determining, determines the corresponding web page code of the web page contents;

Script generation module, according to the web page code, generation crawl script.

Above-mentioned at least one technical scheme that the embodiment of the present application is used can reach following beneficial effect:

During with capturing web page contents using web crawlers in the prior art, needing manual compiling to capture script and comparing, using this The scenario generation method that application embodiment is provided, by determining the web page contents that user selectes in webpage, determines the webpage The corresponding web page code of content, and according to web page code generation crawl script, network is utilized in the prior art so as to solve During crawler capturing web page contents, the less efficient problem of manual compiling crawl script.

Brief description of the drawings

Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In the accompanying drawings:

Fig. 1 is the content in target web of the prior art;

A kind of particular flow sheet of scenario generation method that Fig. 2 a are provided for the embodiment of the present application;

The page of the determination html attribute value that Fig. 2 b are provided for the embodiment of the present application;

The page of inquiry user's any web page contents of crawl that Fig. 2 c are provided for the embodiment of the present application;

Fig. 2 d select the page shown after web page contents for the subscriber frame that the embodiment of the present application is provided;

Frame selects the page shown after web page contents to the user that Fig. 2 e are provided for the embodiment of the present application twice;

A kind of concrete structure schematic diagram of script generation device that Fig. 3 is provided for the embodiment of the present application.

Specific embodiment

To make the purpose, technical scheme and advantage of the application clearer, below in conjunction with the application specific embodiment and Corresponding accompanying drawing is clearly and completely described to technical scheme.Obviously, described embodiment is only the application one Section Example, rather than whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Go out the every other embodiment obtained under the premise of creative work, belong to the scope of the application protection.

Below in conjunction with accompanying drawing, the technical scheme that the embodiment of the present application is provided is described in detail.

In order to solve in the prior art using web crawlers capture web page contents when, manual compiling capture script efficiency compared with Low problem, the embodiment of the present application provides a kind of scenario generation method.

The executive agent of the method, can be, but not limited to be mobile phone, panel computer or PC (Personal Computer, PC) etc. the application (Application, APP) that runs on user terminal, or those user terminals, or, also Can be the equipment such as server.

For ease of description, as a example by hereafter executive agent in this way is PC, the implementation method to the method is introduced. It is appreciated that the executive agent of the method is a kind of exemplary explanation for PC, the restriction to the method is not construed as.

The idiographic flow schematic diagram of the method as shown in Figure 2 a, comprises the steps:

Step 11, shows webpage.

In the embodiment of the present application, when the web page contents during user wants to capture webpage using web crawlers, Ke Yitong Cross the browser of installation or other applications with browser function in PC and access the network address, to allow that the PC shows The webpage, is easy to subsequent operation.Subsequently illustrated by taking browser as an example.

Specifically, user can be in a browser network address input frame in be input into network address, and access the network address, the PC just shows The corresponding webpage of the network address is shown.

Step 12, determines the web page contents that user selectes in the webpage of display.

After PC shows webpage, user can select in the webpage for wanting crawl in the web page according to the actual requirements Hold, to allow that browser determines the web page contents that user selectes in webpage, and then carry out subsequent operation, ultimately generate and grab Take script.

Why user can select web page contents in webpage, because there is the first pin in the web page code of webpage This.First script is used to provide the function that web page contents are selected in webpage.First script includes CSS (Cascading Style Sheets, CSS) script.Generally, the first script is located at the top in the web page code of webpage Portion or bottom.If because the first script to be embedded into the centre position of the web page code of webpage, being performed in browser follow-up During operation, it is possible to can miss first script an as part for the web page code of webpage, and then influence final crawl script Generation.Therefore, the first script is typically embedded in the top or bottom of web page code.

In actual applications, there is the first script in the web page code of webpage, it is possible to because user is by browsing After device accesses network address, there is the first script in the web page code of the webpage that server is returned, it is also possible to because Before determining the web page contents that user selectes in the webpage, default first script has been embedded into server and returned by browser In the web page code for returning.

If by after browser access network address, there is the first script in the web page code of the webpage that server is returned , this is likely to be server after the web page code request of acquisition webpage of browser transmission is received, just by default the One script be embedded in the web page code of webpage, and the web page code then is sent into browser again;It could also be possible that browser Developer consulted with the developer of webpage in advance, the developer of webpage when the web page code of the webpage is write, Just in the first script embedded in into the web page code.Then, there is the first script in the web page code that server is returned.Wherein, First script can't influence browser to render webpage.

Step 13, according to the web page contents for determining, determines the corresponding web page code of the web page contents.

After the step 12 that is finished, browser can determine institute according to by performing the web page contents that step 12 determines State the corresponding web page code of web page contents.

Because determining the web page contents correspondence according to by performing the web page contents that step 12 determines in browser Web page code before, just be embedded into default second script in the web page code of webpage by browser, with cause this browse Device can determine the corresponding web page code of object content according to object content, by the second script.Wherein, the second script includes Javascript (JS) script.

Or, after user accesses the corresponding network address of webpage in a browser, just exist in the web page code that server is returned Second script.So, browser just can determine the corresponding webpage of the web page contents according to web page contents, by the second script Code.Wherein, the reason for there is the second script in the web page code that server is returned is probably that server is receiving browser After the web page code request of the acquisition webpage of transmission, during default second script just embedded in into the web page code of webpage, then The web page code is sent to browser again;It could also be possible that the developer of browser consults with the developer of webpage in advance It is good, the developer of webpage when the web page code of the webpage is write, during the second script just embedded in into the web page code.In It is the presence of the second script in the web page code that server is returned.Wherein, the second script can't influence browser to the wash with watercolours of webpage Dye.

So, browser, by the second script, is determined in the webpage according to by performing the web page contents that step 12 determines The specific embodiment for holding corresponding web page code can be with as described below:

Browser according to by perform step 12 determine web page contents, by the second script, in the web page code of webpage In, the corresponding minimum dom trees of the web page contents are determined, then in the minimum dom trees, determine web page contents correspondence Html attribute value.

Because browser according to web page code when webpage is rendered, web page contents can be set up and closed with the mapping of web page code Mapping relations between system, or coordinate points and web page code where the web page contents set up in webpage.Therefore, browser can According to above-mentioned mapping relations, according to by performing the web page contents that step 12 determines, to determine the corresponding net of the web page contents The corresponding minimum dom trees of page code, i.e. web page contents.If the web page contents determined by performing step 12 correspond to one most Small dom trees, that determines that the corresponding HTML of web page contents determined by performing step 12 belongs in the minimum dom trees Property value.Wherein, html attribute value can be class, or id and class.Specifically, determining html attribute in browser Before value, browser can show an inquiry page in the screen of PC where the browser, be used to inquire that user determines Id and class in minimum dom trees, are also to determine the class in the dom trees of minimum.For example, the page as shown in Figure 2 b, The page includes determining the control of id and class, and the control for determining class.If user clicks on determines id's and class Control, then browser determines id and class, if user clicks on the control for determining class, browser determines class.

If the web page contents determined by performing step 12 correspond to the dom trees of at least two minimums, that is minimum at those Dom trees in, determine respectively by perform step 12 determine the corresponding html attribute value of web page contents.It is determined that HTML belongs to Before property value, browser can show the inquiry page as shown in Figure 2 b in the screen of the PC where the browser, be used to Inquiry user determines id and class in each minimum dom tree, is also to determine in each minimum dom tree class.User just can according to actual needs, it is determined which control clicked on, so that selection of the browser according to user, really Fixed corresponding html attribute value.

After above-mentioned html attribute value is determined, step 14 is just can perform, ultimately generate crawl script.

Step 14, according to the web page code, generation crawl script.

After the step 13 that is finished, the web page contents corresponding webpage generation of the desired crawl that browser just will can be determined Html attribute value in code is added in default script generation template, generation crawl script.Wherein, the crawl script is used to grab Take the web page contents matched with html attribute value.

Wherein, if the html attribute value that browser is determined is id and class, then browser just will can determine respectively Id and class in each dom for going out are added to default script with { id=XXX, class=XXX } this combining form In generation template, generation crawl script.If the html attribute value that browser is determined is class, then browser just can divide Class in each dom that will not determine is added to default script and generates with { class=XXX } this combining form In template, generation crawl script.

Wherein, can be stored in the crawl script locally after generation crawl script by browser.In addition, browser is also To can be stored in by performing the corresponding minimum dom trees of web page contents that step 12 determines and whole web page code Locally, alloing that browser can use above-mentioned crawl script, minimum dom trees and webpage generation in subsequent operation Code, captures web page contents.

After generation crawl script, browser can eject a page in the PC where the browser, be used to inform User's crawl script has been generated, and asks the user whether to capture web page contents.

For example, the page can be as shown in Figure 2 c.The page includes the first crawl control and the second crawl control.If with Family clicks on first and captures control, and browser just will capture script and corresponding by performing the web page contents that step 12 determines Minimum dom trees, are sent to analytics engine.If including id and class in crawl script, analytics engine finds the id institutes according to id Minimum dom trees, further according to the class being present in the id in same combination, extract the use in the dom trees of minimum Want the web page contents of crawl in family.For example, if it is determined that user's " price for wanting in crawl webpage as shown in Figure 1:$ 149.99 corresponding minimum dom trees are:

<Div id=" kfs_family_16 " class=" kfs-inner-container kfs-selected " Style=" width:20%;left:40%;background-image:url(https://images-na.ssl- images-amazon.com/ima ges/G/01/kindle/stripe/kfs-selector-2._CB386844303_ .gif);"

OnClick=" javascript:(function(){})()">

<A class=" kfs-current kfs-link ">

<Imgclass=" kfs-img " style=" margin-top:9px;" src=" https://images- na.ssl-images-amazon.com/images/G/01/kindle/dp/2015/848470/famnav/fs-m._ CB292709393_.p ng"/>

<br/>

Fire HD 8

<br/>

<Span class=" kfs-price ">

$149.99

</span>

<br/>

</a>

<Divid=" kfs_popover_content_16 "=class=" kfs-popover-container " style =" displ ay:none;">Incredibly thin and light,designed for entertainment</div>

Crawl " price:Id=" kfs_family_16 ", and " price are included in the crawl script of $ 149.99 ":$ 149.99 " corresponding class=" kfs-price ".

So, user clicks on first and captures control, and browser just can be by above-mentioned minimum dom trees and crawl script in the lump It is sent to analytics engine, the minimum dom trees that analytics engine just can be according to where id=" kfs_family_16 " finds the id, Further according to class=" kfs-price " " price is extracted from the dom trees:$ 149.99 " this pricing information.

If not including id in crawl script, class is only included, browser just can extract out according to class and be sent to parsing The web page contents matched with class in all of minimum dom trees in engine.

After user clicks on second captures control, browser just will capture script and be sent to parsing with the web page code of webpage Engine.If including id and class in crawl script, minimum dom tree of the analytics engine according to where id finds the id, further according to It is present in the class in same combination with the id, extracts the web page contents that the user in the dom trees of minimum wants crawl.

If not including id in crawl script, class is only included, browser just can extract out the webpage of webpage according to class The web page contents matched with class in code.

It should be noted that the executive agent that the embodiment of the present application provides each step of method can be with identical, it is also possible to It is different.For example, after browser is finished step 13, web page code and html attribute value that browser will can be determined Server is sent to, to cause server according to the web page code, generation crawl script.In addition, above-mentioned simply pacify with PC A kind of exemplary illustration as a example by the browser of dress, the executive agent of the embodiment of the present application is except that can be browsing of installing in PC Other installed outside device, or in PC have the application of browser function, can also be in mobile terminal with clear Look at the APP of device function, the application does not carry out any restriction to this.

In the embodiment of the present application, browser is when step 12 is performed, specifically, in one embodiment, Yong Huyi Denier starts to carry out frame choosing in webpage, and browser just can start to confirm the web page contents of subscriber frame choosing.Or, in subscriber frame choosing After end, framed by rectangular box by the object content that frame is selected, and the choosing of continuation frame can be shown in the webpage, is submitted to and is cancelled Control.Wherein, the object content for being framed by rectangular box, can be shown with highlighted display mode, it is also possible to the webpage quilt The mode that is initially displayed of the object content shows that this can be configured according to user's request when showing, the application is implemented Example does not carry out any restriction to this.For example, as shown in Figure 2 d, the page shown in Fig. 2 d is just for certain of crawl is wanted in subscriber frame choosing After the pricing information of a certain commodity in one shopping website it is shown go out the page.Pricing information in the page is by a rectangle Square frame is framed, and shows the control for continuing frame choosing, submitting to and cancel respectively on the right side of pricing information.Wherein, framed Pricing information is not highlighted, but to be initially displayed status display.

After occurring the control for continuing frame choosing, submitting to and cancel in webpage, if user wants to continue to the other contents of frame choosing, Just continuation frame selected control can be clicked on, is continued frame and is selected other guide.If user is no longer want to frame and selects other guide, submission control just can be clicked on The web page contents that subscriber frame is selected just are defined as object content by part, browser.If user thinks the content of the front frame choosing of cancellation, frame choosing Other contents, then click on and cancel control, just can frame choosing again.

In addition, in the scenario generation method of the embodiment of the present application offer, user selects to want the webpage of crawl in webpage During content, rough selection can be first carried out once, browser can determine this according to the web page contents of user's selection for the first time The corresponding minimum dom of web page contents.Then, user carries out second choosing on the basis of the web page contents of first time selection again Select, browser just can determine the corresponding web page code of web page contents of second selection in the minimum dom for determining In html attribute value.Such as, as shown in Figure 2 e, if user want capture webpage in " $175 " pricing information, first When secondary selected, user can be rough by comprising " $175 " web page contents of this pricing information are selected in the lump, browser Just the corresponding minimum dom trees of those web page contents can be determined according to the content of the selection for the first time.In user at second During selection, " $175 can be only selected ", then browser just can determine " $175 in the minimum dom trees " corresponding Html attribute value.There are two rectangular box in Fig. 2 e, wherein, all the elements in larger rectangular box are user's choosing for the first time The content selected, the content that the web page contents in less rectangular box are selected for second for user.

In the embodiment of the present application, can also realize what is provided in the embodiment of the present application by a kind of script generation device Scenario generation method.

As shown in figure 3, a kind of structural representation of the script generation device provided for the embodiment of the present application, under mainly including State device:

Content determination module 31, determines the web page contents that user selectes in the webpage of display.

Code determining module 32, according to the web page contents for determining, determines the corresponding web page code of the web page contents.

Script generation module 33, according to the web page code, generation crawl script.

In one embodiment, described device also includes:

First insertion module, content determination module 31 determine user display webpage in select web page contents it Before, default first script is embedded into the web page code of the webpage.Wherein, first script is used to provide described The function of web page contents is selected in webpage, first script includes CSS CSS scripts.

In one embodiment, described device also includes:

Second insertion module, in code determining module 32 according to the web page contents for determining, determines the web page contents correspondence Web page code before, default second script is embedded into the web page code of the webpage, second script include JS Script.

Then code determining module 32, according to the web page contents for determining, by second script, determines the web page contents Corresponding web page code.

In one embodiment, code determining module 32, in the web page code of the webpage, determines in the webpage Hold corresponding minimum DOM Document Object Model dom trees;

In the minimum dom trees, the corresponding HTML html attribute value of the web page contents is determined.

In one embodiment, script generation module 33, the html attribute value that will be determined is added to default In script generation template, generation crawl script, the crawl script is used to capture the webpage matched with the html attribute value Content.

In one embodiment, described device also includes:

Context resolution module, analytics engine is sent to by the crawl script and web page code, by analytics engine, is grabbed Take corresponding web page contents.

During with capturing web page contents using web crawlers in the prior art, needing manual compiling to capture script and comparing, using this The scenario generation method that application embodiment is provided, by determining the web page contents that user selectes in webpage, determines the webpage The corresponding web page code of content, and according to web page code generation crawl script, network is utilized in the prior art so as to solve During crawler capturing web page contents, the less efficient problem of manual compiling crawl script.

It should be understood by those skilled in the art that, embodiments of the invention can be provided as method, system or computer program Product.Therefore, the present invention can be using the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.And, the present invention can be used and wherein include the computer of computer usable program code at one or more The computer program implemented in usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) is produced The form of product.

The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram are described.It should be understood that every first-class during flow chart and/or block diagram can be realized by computer program instructions The combination of flow and/or square frame in journey and/or square frame and flow chart and/or block diagram.These computer programs can be provided The processor of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced for reality by the instruction of computer or the computing device of other programmable data processing devices The device of the function of being specified in present one flow of flow chart or multiple one square frame of flow and/or block diagram or multiple square frames.

These computer program instructions may be alternatively stored in can guide computer or other programmable data processing devices with spy In determining the computer-readable memory that mode works so that instruction of the storage in the computer-readable memory is produced and include finger Make the manufacture of device, the command device realize in one flow of flow chart or multiple one square frame of flow and/or block diagram or The function of being specified in multiple square frames.

These computer program instructions can be also loaded into computer or other programmable data processing devices so that in meter Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented treatment, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.

In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.

Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.

Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information Store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, can be used to store the information that can be accessed by a computing device.Defined according to herein, calculated Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.

Also, it should be noted that term " including ", "comprising" or its any other variant be intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of key elements not only include those key elements, but also wrapping Include other key elements being not expressly set out, or also include for this process, method, commodity or equipment is intrinsic wants Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Also there is other identical element in process, method, commodity or the equipment of element.

It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, the application can be using the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Form.And, the application can be used to be can use in one or more computers for wherein including computer usable program code and deposited The shape of the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

Embodiments herein is the foregoing is only, the application is not limited to.For those skilled in the art For, the application can have various modifications and variations.It is all any modifications made within spirit herein and principle, equivalent Replace, improve etc., within the scope of should be included in claims hereof.

Claims (12)

1. a kind of scenario generation method, it is characterised in that methods described includes:
Determine the web page contents that user selectes in the webpage of display;
According to the web page contents for determining, the corresponding web page code of the web page contents is determined;
According to the web page code, generation crawl script.
2. the method for claim 1, it is characterised in that determine web page contents that user selectes in the webpage of display it Before, methods described also includes:
Default first script is embedded into the web page code of the webpage;Wherein, first script is used to provide in institute The function of selected web page contents in webpage is stated, first script includes CSS CSS scripts.
3. the method for claim 1, it is characterised in that according to the web page contents for determining, determine the web page contents pair Before the web page code answered, methods described also includes:
Default second script is embedded into the web page code of the webpage, second script includes JS scripts;
According to the web page contents for determining, the corresponding web page code of the web page contents is determined, specifically include:
According to the web page contents for determining, by second script, the corresponding web page code of the web page contents is determined.
4. the method for claim 1, it is characterised in that determine the corresponding web page code of the web page contents, specific bag Include:
In the web page code of the webpage, the corresponding minimum DOM Document Object Model dom trees of the web page contents are determined;
In the minimum dom trees, the corresponding HTML html attribute value of the web page contents is determined.
5. method as claimed in claim 4, it is characterised in that according to the web page code, generates script, specifically includes:
The html attribute value that will be determined is added in default script generation template, generation crawl script, the crawl Script is used to capture the web page contents matched with the html attribute value.
6. method as claimed in claim 5, it is characterised in that methods described also includes:
The crawl script and web page code are sent to analytics engine, by analytics engine, corresponding web page contents is captured.
7. a kind of script generation device, it is characterised in that described device includes:
Content determination module, determines the web page contents that user selectes in the webpage of display;
Code determining module, according to the web page contents for determining, determines the corresponding web page code of the web page contents;
Script generation module, according to the web page code, generation crawl script.
8. device as claimed in claim 7, it is characterised in that described device also includes:
First insertion module, before content determination module determines the web page contents that user selectes in the webpage of display, will be pre- If the first script be embedded into the web page code of the webpage;Wherein, first script is used to provide in the webpage The function of selected web page contents, first script includes CSS CSS scripts.
9. device as claimed in claim 7, it is characterised in that described device also includes:
Second insertion module, in code determining module according to the web page contents for determining, determines the corresponding webpage of the web page contents Before code, default second script is embedded into the web page code of the webpage, second script includes JS scripts;
Then code determining module, according to the web page contents for determining, by second script, determines that the web page contents are corresponding Web page code.
10. device as claimed in claim 7, it is characterised in that code determining module, in the web page code of the webpage, Determine the corresponding minimum DOM Document Object Model dom trees of the web page contents;
In the minimum dom trees, the corresponding HTML html attribute value of the web page contents is determined.
11. devices as claimed in claim 10, it is characterised in that script generation module, the html attribute that will be determined Value is added in default script generation template, and generation crawl script, the crawl script is used to capture and the html attribute The web page contents that value matches.
12. devices as claimed in claim 11, it is characterised in that described device also includes:
Context resolution module, analytics engine is sent to by the crawl script and web page code, by analytics engine, captures phase The web page contents answered.
CN201610551151.5A 2016-07-13 2016-07-13 A kind of scenario generation method and device CN106886547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610551151.5A CN106886547A (en) 2016-07-13 2016-07-13 A kind of scenario generation method and device

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201610551151.5A CN106886547A (en) 2016-07-13 2016-07-13 A kind of scenario generation method and device
TW106119133A TWI683225B (en) 2016-07-13 2017-06-08 Script generation method and device
PCT/CN2017/091674 WO2018010573A1 (en) 2016-07-13 2017-07-04 Method and device for generating script

Publications (1)

Publication Number Publication Date
CN106886547A true CN106886547A (en) 2017-06-23

Family

ID=59176754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610551151.5A CN106886547A (en) 2016-07-13 2016-07-13 A kind of scenario generation method and device

Country Status (3)

Country Link
CN (1) CN106886547A (en)
TW (1) TWI683225B (en)
WO (1) WO2018010573A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018010573A1 (en) * 2016-07-13 2018-01-18 阿里巴巴集团控股有限公司 Method and device for generating script
CN107609150A (en) * 2017-08-28 2018-01-19 湖北省楚天云有限公司 A kind of interactive network reptile creation method chosen based on page elements and system
WO2019019344A1 (en) * 2017-07-26 2019-01-31 上海壹账通金融科技有限公司 Webpage data crawling method and device, user terminal, and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894138A (en) * 2010-06-25 2010-11-24 优视科技有限公司 Visual page content subscription processing method and system thereof
CN105243159A (en) * 2015-10-28 2016-01-13 福建亿榕信息技术有限公司 Visual script editor-based distributed web crawler system
CN105468730A (en) * 2015-11-20 2016-04-06 广州华多网络科技有限公司 Webpage information extraction method and equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EA201301239A1 (en) * 2013-10-28 2015-04-30 Общество С Ограниченной Ответственностью "Параллелз" Method for placing a network site using virtual hosting
CN106886547A (en) * 2016-07-13 2017-06-23 阿里巴巴集团控股有限公司 A kind of scenario generation method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894138A (en) * 2010-06-25 2010-11-24 优视科技有限公司 Visual page content subscription processing method and system thereof
CN105243159A (en) * 2015-10-28 2016-01-13 福建亿榕信息技术有限公司 Visual script editor-based distributed web crawler system
CN105468730A (en) * 2015-11-20 2016-04-06 广州华多网络科技有限公司 Webpage information extraction method and equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018010573A1 (en) * 2016-07-13 2018-01-18 阿里巴巴集团控股有限公司 Method and device for generating script
WO2019019344A1 (en) * 2017-07-26 2019-01-31 上海壹账通金融科技有限公司 Webpage data crawling method and device, user terminal, and readable storage medium
CN107609150A (en) * 2017-08-28 2018-01-19 湖北省楚天云有限公司 A kind of interactive network reptile creation method chosen based on page elements and system

Also Published As

Publication number Publication date
WO2018010573A1 (en) 2018-01-18
TW201804340A (en) 2018-02-01
TWI683225B (en) 2020-01-21

Similar Documents

Publication Publication Date Title
US20180101279A1 (en) System for providing dynamic linked panels in user interface
US20170228465A1 (en) Result types for conditional data display
US9582601B2 (en) Combining server-side and client-side user interface elements
US20190361973A1 (en) Dynamic native content insertion
US20150213514A1 (en) Systems and methods for providing modular configurable creative units for delivery via intext advertising
US9552212B2 (en) Caching intermediate data for scroll view rendering
US8838583B1 (en) Diversity within search results
US20150363368A1 (en) Transforming HTML Forms into Mobile Native Forms
US8788935B1 (en) Systems and methods for creating or updating an application using website content
US8332763B2 (en) Aggregating dynamic visual content
US9224151B2 (en) Presenting advertisements based on web-page interaction
US9606712B1 (en) Placement of user interface elements in a browser based on navigation input
EP2891992A1 (en) Systems and methods for visual definition of data associations
CN103502983B (en) Memoizing Web-browsing computation with DOM-based isomorphism
US10754917B2 (en) Method and system for displaying customized webpage on double webview
US9984408B1 (en) Method, medium, and system for live video cooperative shopping
US8347230B2 (en) Visual presentation of multiple internet pages
AU2012370492B2 (en) Graphical overlay related to data mining and analytics
US8051370B2 (en) Intelligent autocompletion
US8560941B2 (en) Schema based user interface mechanisms
US20130326333A1 (en) Mobile Content Management System
US20160188551A1 (en) System for clipping webpages
EP2721513B1 (en) Live browser tooling in an integrated development environment
US20130212465A1 (en) Postponed rendering of select web page elements
DE102013017085A1 (en) System for deep linking and search engine support for websites integrating a third-party application and components

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201014

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co., Ltd

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

Effective date of registration: 20201014

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co., Ltd