CN103377260B - The analysis method and device of a kind of network log URL - Google Patents

The analysis method and device of a kind of network log URL Download PDF

Info

Publication number
CN103377260B
CN103377260B CN201210133170.8A CN201210133170A CN103377260B CN 103377260 B CN103377260 B CN 103377260B CN 201210133170 A CN201210133170 A CN 201210133170A CN 103377260 B CN103377260 B CN 103377260B
Authority
CN
China
Prior art keywords
url
duplicate removal
regular expression
numbering
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210133170.8A
Other languages
Chinese (zh)
Other versions
CN103377260A (en
Inventor
张清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taobao China Software Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201210133170.8A priority Critical patent/CN103377260B/en
Publication of CN103377260A publication Critical patent/CN103377260A/en
Application granted granted Critical
Publication of CN103377260B publication Critical patent/CN103377260B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

This application provides the analysis method and device of a kind of network log URL.Methods described includes:Extract the URL in Webpage log;Duplicate removal treatment is carried out to the URL;Preset multiple regular expressions are used successively, and canonical matching, the numbering of the regular expression that extraction is matched with URL after duplicate removal are carried out to URL after duplicate removal;For URL before duplicate removal, the regular expression numbering of URL after same duplicate removal is replicated, numbered as corresponding regular expression;Different regular expression numbering corresponding to each URL before duplicate removal is counted, and the application can reduce the amount of calculation of canonical matching, reduce and calculate cost.

Description

The analysis method and device of a kind of network log URL
Technical field
The application is related to the technical field of data processing, the analysis method and dress of more particularly to a kind of network log URL Put.
Background technology
Various analysis mining treatment often are carried out to these magnanimity Web log (network log) in business analysis, its In, the important information of guest access is included in the URL of Web log, it usually needs carry out using regular expression and URL Match somebody with somebody, the regular expression generic to matching carries out business analysis.
In the prior art, three steps of the URL processing procedures of whole Web log point:
1. it is collected into the Web log of magnanimity and stores initial data;
2. couple URL carries out the matching of regular expression, each URL match regularity might have it is a plurality of (generally It is 1-10 bars in the range of this);
3. according to the corresponding business category of regularity, the follow-up data index analysis of output business category.
Assuming that original web log have n bars, matching regular expression has m bars, then the data that real matching process is produced Matching just has n × m bars.
Above problems of the prior art are that URL canonical matching process is complex, Large-Scale Interconnected net Web log Record number be magnanimity, a plurality of canonical matched rule carries out canonical matching one by one to the URL of magnanimity successively, and amount of calculation is very Greatly, calculate relatively costly.
Therefore, technical problems to be solved in this application are to provide a kind of analysis mechanisms of network log URL, to reduce The amount of calculation of canonical matching, reduces and calculates cost.
The content of the invention
Technical problems to be solved in this application are to provide a kind of analysis method of network log URL, to reduce canonical The amount of calculation matched somebody with somebody, reduces and calculates cost.
Present invention also provides a kind of analytical equipment of network log URL, be used to ensure the above method in practice should With and realize.
In order to solve the above problems, this application discloses a kind of analysis method of network log URL, including:
Extract the URL in Webpage log;
Duplicate removal treatment is carried out to the URL;
Preset multiple regular expressions are used successively, canonical matching is carried out to URL after duplicate removal, extract and URL after duplicate removal The numbering of the regular expression of matching;
For URL before duplicate removal, the regular expression numbering of URL after same duplicate removal is replicated, as corresponding canonical Expression formula is numbered;
Different regular expression numbering corresponding to each URL before duplicate removal is counted.
Preferably, the URL before duplicate removal and after duplicate removal is stored in the first form and the second form in column form respectively;Institute The corresponding regular expression numberings of the URL after duplicate removal are stated, correspondence storage is in the second form.
Preferably, it is described for duplicate removal before all URL, in the URL after duplicate removal, find same URL correspondences Regular expression, include the step of as corresponding regular expression:
The data of the second form are entered into every trade and turns row;
Equivalent connection is carried out by URL columns in the first form and the second form, all URL before duplicate removal are found Its corresponding regular expression numbering.
Preferably, the corresponding regular expression numberings of URL before the duplicate removal, correspondence is added in the first form.
Preferably, the corresponding regular expression numberings of URL before the duplicate removal, replace corresponding URL in the first form.
Preferably, the step of different regular expression numbering corresponding to each URL before duplicate removal is counted is to divide Each different regular expression is not calculated numbers the number of times occurred in all URL before duplicate removal.
Preferably, the numbering of the regular expression is the numbering of its affiliated business category.
Present invention also provides a kind of analytical equipment of network log URL, including:
URL extraction modules, for extracting the URL in Webpage log;
URL deduplication modules, for carrying out duplicate removal treatment to the URL;
Canonical matching module, for using preset multiple regular expressions successively, canonical is carried out to URL after duplicate removal Match somebody with somebody, the numbering of the regular expression that extraction is matched with URL after duplicate removal;
Matching result replication module, for for URL before duplicate removal, replicating the regular expressions of URL after same duplicate removal Formula is numbered, and is numbered as corresponding regular expression;
Statistical module, for being counted to the corresponding different regular expression numberings of each URL before duplicate removal.
Preferably, the URL before duplicate removal and after duplicate removal is stored in the first form and the second form in column form respectively;Institute The corresponding regular expression numberings of the URL after duplicate removal are stated, correspondence storage is in the second form.
Preferably, the matching result replication module includes:
Row turns row submodule, and row are turned for the data of the second form to be entered into every trade;
Equivalence connection submodule, for carrying out equivalent connection by URL columns in the first form and the second form, makes All URL before duplicate removal find its corresponding regular expression numbering.
Compared with prior art, the application has advantages below:
According to the application, for the URL in the Web log of magnanimity, the URL for wherein repeating first is removed, after to duplicate removal URL carries out canonical matching, and due to the log the insides of magnanimity, the number of times of the repeated accesses of URL is very high, after duplicate removal, for identical URL carries out canonical matching technique cost only once, by the matching result of URL after duplicate removal, you can obtain same all The corresponding regular expressions of URL.It is reduced to therefore, it is possible to the very effective calculating cost by the matching of URL canonicals minimum.
The application can carry out equivalence by the URL storages before and after duplicate removal in the table by by URL columns before and after duplicate removal Connection, you can find all URL and the corresponding relation of its regular expression before duplicate removal, connect compared to the non-equivalence that canonical is matched Connect, calculating cost can be reduced.And, it is right in can selecting for regular expression numbering to replace table when equivalent connection is carried out The URL for answering, displaying result just only has the numbering of canonical matching expression, compared to the situation that there is URL, substantially reduces table The col width of lattice, takes resource smaller.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the analysis method embodiment of network log URL of the application;
Fig. 2 is a kind of structured flowchart of the analytical equipment embodiment of network log URL of the application.
Specific embodiment
It is below in conjunction with the accompanying drawings and specific real to enable above-mentioned purpose, the feature and advantage of the application more obvious understandable Mode is applied to be described in further detail the application.
With reference to Fig. 1, a kind of flow chart of the analysis method embodiment of network log URL of the application is shown, specifically may be used To comprise the following steps:
Step 101, the URL extracted in Webpage log.
Webpage log be record the various raw informations such as web server reception processing request and run time error with .log the file for ending up, specifically, it should be server log.The webpage ground of guest request access is contained in Webpage log Location URL.
URL is made up of agreement, domain name, the part of request address three, and intactly URL has uniquely determined a resource for request, The resource can be the page, content module, file or multimedia resource etc..For website, the use of URL is to money to URL Unique positioning in source, so mode can have a lot, with unique description of resource (resource name or referred to as etc.), resource it is unique Identification code (ID, numeral mark etc.), or dynamic parameter.Therefore, the information by extracting in URL can learn visitor's visit Which web page contents is asked, by the analysis to URL in massive logs, can learn that various web page resources are accessed for situation, Such as number of times, the information such as frequency.
Step 102, duplicate removal treatment is carried out to the URL.
One URL can be accessed repeatedly in mono- day, therefore, can there is the URL of substantial amounts of repetition in the network log of magnanimity. The duplicate removal is processed as removing the network address repeated in the Webpage log, and the URL for retaining is differed.Carrying out duplicate removal When treatment, unduplicated URL in all URL can be extracted, or URL is sequentially placed into table, in judging table before storing With the presence or absence of the identical network address, if not existing, it is added in table, if in the presence of not being added.
In a preferred embodiment of the present application, the URL before duplicate removal and after duplicate removal can be stored in column form respectively In the first form and the second form.It is shown in the following example.
First form is:
A
http://men.taobao.com/123456
http://men.taobao.com/123456
http://men.taobao.com/123456
http://women.taobao.com/123456
http://women.taobao.com/123456
http://women.taobao.com/123456
Wherein, http://men.taobao.com/123456 this URL is repeated 3 times, http:// Women.taobao.com/123456 this URL is also repeated 3 times, therefore, the second form obtained after duplicate removal is:
D
http://men.taobao.com/123456
http://women.taobao.com/123456
Step 103, successively use preset multiple regular expressions, canonical matching is carried out to URL after duplicate removal, extract and go The numbering of the regular expression of URL matchings after weight.
It is well known that, regular expression is the instrument for carrying out text matches, generally by some general characters and some Metacharacter (meta characters) is constituted.General character includes the letter and number of capital and small letter, and metacharacter is then with special Implication.The matching of regular expression is it is to be understood that in given character string, find and given regular expression phase The part matched somebody with somebody.It is possible in character string have more than one part to meet given regular expression, at this moment each such portion Divide and be referred to as a matching.It is herein that URL is matched with the default regular expression comprising keyword, has matched and said Comprising the keyword in regular expression in bright URL, unmatch, explanation does not include.Multiple regular expressions are carried out by URL The matching of formula can learn the classification of information or information included in URL.
In a preferred embodiment of the present application, the corresponding regular expression numberings of URL after the duplicate removal can be right Should store in the second form.Specifically, the numbering of the regular expression can be the numbering of its affiliated business category.
As above example, as a result as shown in the table after being matched:
D E
http://men.taobao.com/123456 men
http://women.taobao.com/123456 men
http://women.taobao.com/123456 women
To http://men.taobao.com/123456 carries out canonical matching using a plurality of preset regular expression, obtains Go out and the matching regular expressions that numbering is men.http:Both the key of men is included in //women.taobao.com/123456 Word, also the keyword comprising women, can be men and the matching regular expressions of women with numbering.
Step 104, for URL before duplicate removal, the regular expression numbering of URL after same duplicate removal is replicated, as right The regular expression answered is numbered.
URL before duplicate removal can be found in the URL after duplicate removal it is same, therefore, for URL before duplicate removal, can be by The corresponding regular expression numberings of same URL are numbered as oneself corresponding regular expression.Because the application is pin Canonical matching is carried out to the URL after duplicate removal, relative to being matched one by one for every URL in the prior art, work can be greatly reduced Measure.As above example, needs to be matched one by one for 6 URL in the prior art, and after duplicate removal, only 2 URL need to be carried out Match somebody with somebody, it is then that matching result is corresponding with 6 URL.
In concrete implementation, after the result that the URL before and after duplicate removal and canonical are matched is put into form, the step 104 can include:
Sub-step S11, the data of the second form are entered every trade turn row.
Sub-step S12, equivalent connection is carried out by URL columns in the first form and the second form, before making duplicate removal All URL find its corresponding regular expression numbering.
After canonical matching is carried out, the corresponding regular expression numbering of each bar URL is stored in column form, can So that by the numbering of the corresponding regular expression of url, sequential storage is as follows to a row the inside by size:
D E F
http://men.taobao.com/123456 men
http://women.taobao.com/123456 men women
Then, equivalent connection is carried out to A row and D row, the regular expression in E row, F row and G row can be thus compiled Number and A row in URL associate.
In a preferred embodiment of the present application, the corresponding regular expression numberings of URL, can correspond to before the duplicate removal It is added in the first form, for example, regular expression numbering is added in the row in the first form on the right side of URL, and enters with URL Row correspondence.
In another preferred embodiment of the present application, the corresponding regular expression numberings of URL, can replace before the duplicate removal Corresponding URL in the first form is changed, i.e., for each URL in the first form, by URL correspondences same in the second form Regular expression numbering, be added in the first form, and replace former URL.
The application only carries out canonical matching, therefore, it is possible to highly effective for the URL of magnanimity to wherein unduplicated URL The matching of reduction url canonicals calculating cost to minimum.
Step 105, different regular expression numbering corresponding to each URL before duplicate removal are counted.
In concrete implementation, the step 105 can be to calculate each different regular expression numbering respectively going The number of times occurred in all URL before weight, according to the keyword corresponding to different regular expressions or classification, can be to guest access The various information of website are counted.
In concrete implementation, the application can be implemented in the Data Warehouse Platforms such as Hadoop or Hive.
In sum, according to the application, for the URL in the Web log of magnanimity, the URL for wherein repeating first is removed, right URL after duplicate removal carries out canonical matching, and due to the Web log the insides of magnanimity, the number of times of the repeated accesses of URL is very high, duplicate removal Afterwards, for identical URL carry out canonical matching technique cost only once, by the matching result of URL after duplicate removal, you can obtain with The corresponding regular expressions of all URL of identical.Therefore, it is possible to the very effective calculating cost for matching URL canonicals It is reduced to minimum.
The application can carry out equivalence by the URL storages before and after duplicate removal in the table by by URL columns before and after duplicate removal Connection, you can find all URL and the corresponding relation of its regular expression before duplicate removal, connect compared to the non-equivalence that canonical is matched Connect, calculating cost can be reduced.And, it is right in can selecting for regular expression numbering to replace table when equivalent connection is carried out The URL for answering, displaying result just only has the numbering of canonical matching expression, compared to the situation that there is URL, substantially reduces form Col width, take resource it is smaller.
For embodiment of the method, in order to be briefly described, therefore it is all expressed as a series of combination of actions, but this area Technical staff should know that the application is not limited by described sequence of movement, because according to the application, some steps can Sequentially or simultaneously carried out with using other.Secondly, those skilled in the art should also know, implementation described in this description Example belongs to preferred embodiment, necessary to involved action and module not necessarily the application.
With reference to Fig. 2, it illustrates a kind of structured flowchart of the analytical equipment embodiment of network log URL of the application, tool Body can include with lower module:
URL extraction modules 201, for extracting the URL in Webpage log;
URL deduplication modules 202, for carrying out duplicate removal treatment to the URL;
Canonical matching module 203, for using preset multiple regular expressions successively, canonical is carried out to URL after duplicate removal Matching, the numbering of the regular expression that extraction is matched with URL after duplicate removal;
Matching result replication module 204, for for URL before duplicate removal, replicating the canonical table of URL after same duplicate removal Up to formula numbering, numbered as corresponding regular expression;
Statistical module 205, for being counted to the corresponding different regular expression numberings of each URL before duplicate removal.
In a preferred embodiment of the present application, the URL before duplicate removal and after duplicate removal can be stored in column form respectively In the first form and the second form;The corresponding regular expression numberings of URL after the duplicate removal, can correspond to storage second In form.
In a preferred embodiment of the present application, the matching result replication module can include:
Row turns row submodule, and row are turned for the data of the second form to be entered into every trade;
Equivalence connection submodule, for carrying out equivalent connection by URL columns in the first form and the second form, makes All URL before duplicate removal find its corresponding regular expression numbering.
In a preferred embodiment of the present application, the corresponding regular expression numberings of URL, can correspond to before the duplicate removal It is added in the first form.
In a preferred embodiment of the present application, the corresponding regular expression numberings of URL, can replace before the duplicate removal Corresponding URL in first form.
In a preferred embodiment of the present application, the statistical module can be,
Computing module, numbers what is occurred in all URL before duplicate removal for calculating each different regular expression respectively Number of times.
In a preferred embodiment of the present application, the numbering of the regular expression can be its affiliated business category Numbering.
Because described device embodiment essentially corresponds to the embodiment of the method shown in earlier figures 1 and Fig. 2, therefore the present embodiment Not detailed part, may refer to the related description in previous embodiment in description, just not repeat herein.
The application can be used in numerous general or special purpose computing system environments or configuration.For example:Personal computer, service Device computer, handheld device or portable set, laptop device, multicomputer system, the system based on microprocessor, top set Box, programmable consumer-elcetronics devices, network PC, minicom, mainframe computer, including any of the above system or equipment DCE etc..
The application can be described in the general context of computer executable instructions, such as program Module.Usually, program module includes performing particular task or realizes routine, program, object, the group of particular abstract data type Part, data structure etc..The application can also be in a distributed computing environment put into practice, in these DCEs, by Remote processing devices connected by communication network perform task.In a distributed computing environment, program module can be with In local and remote computer-readable storage medium including including storage device.
Herein, term " including ", "comprising" or any other variant thereof is intended to cover non-exclusive inclusion, from And the process, method, article or the equipment that include a series of key elements is not only included those key elements, but also including not bright Other key elements really listed, or it is this process, method, article or the intrinsic key element of equipment also to include.Do not having In the case of more limitations, the key element limited by sentence " including ... ", it is not excluded that in the mistake including the key element Also there is other identical element in journey, method, article or equipment.
Above to a kind of analysis method of network log URL provided herein, and, a kind of network log URL's Analytical equipment is described in detail, and specific case used herein is explained the principle and implementation method of the application State, the explanation of above example is only intended to help and understands the present processes and its core concept;Simultaneously for this area Those skilled in the art, according to the thought of the application, will change, to sum up institute in specific embodiments and applications State, this specification content should not be construed as the limitation to the application.

Claims (10)

1. a kind of analysis method of network log URL, it is characterised in that including:
Extract the URL in Webpage log;
Carry out duplicate removal treatment to the URL, remove the network address repeated in the Webpage log, the URL for retaining not phases Together;
Preset multiple regular expressions being used successively, canonical matching being carried out to URL after duplicate removal, extraction is matched with URL after duplicate removal Regular expression numbering;
For URL before duplicate removal, the regular expression numbering of URL after same duplicate removal is replicated, as corresponding regular expressions Formula is numbered;
Different regular expression numbering corresponding to each URL before duplicate removal is counted.
2. the method for claim 1, it is characterised in that the URL before duplicate removal and after duplicate removal is stored in column form respectively In the first form and the second form;The corresponding regular expression numberings of URL after the duplicate removal, correspondence storage is in the second form In.
3. method as claimed in claim 2, it is characterised in that for the URL before all duplicate removals, the URL after duplicate removal In, the corresponding regular expressions of same URL are found, include the step of as corresponding regular expression:
The data of the second form are entered into every trade and turns row;
Equivalent connection is carried out by URL columns in the first form and the second form, all URL before duplicate removal is found its right The regular expression answered is numbered.
4. method as claimed in claim 2, it is characterised in that the corresponding regular expression numberings of URL, correspondence before the duplicate removal It is added in the first form.
5. method as claimed in claim 2, it is characterised in that the corresponding regular expression numberings of URL before the duplicate removal, replaces Corresponding URL in first form.
6. the method for claim 1, it is characterised in that described to the corresponding different regular expressions of each URL before duplicate removal The step of formula numbering is counted is to calculate each different regular expression respectively and number to occur in all URL before duplicate removal Number of times.
7. the method as described in claim any one of 1-6, it is characterised in that the numbering of the regular expression is its affiliated business The numbering of industry classification.
8. a kind of analytical equipment of network log URL, it is characterised in that including:
URL extraction modules, for extracting the URL in Webpage log;
URL deduplication modules, for carrying out duplicate removal treatment to the URL, remove the network address repeated in the Webpage log, protect The URL for leaving is differed;
Canonical matching module, for using preset multiple regular expressions successively, canonical matching is carried out to URL after duplicate removal, is carried Take the numbering of the regular expression matched with URL after duplicate removal;
Matching result replication module, for for URL before duplicate removal, replicating the regular expression volume of URL after same duplicate removal Number, numbered as corresponding regular expression;
Statistical module, for being counted to the corresponding different regular expression numberings of each URL before duplicate removal.
9. device as claimed in claim 8, it is characterised in that the URL before duplicate removal and after duplicate removal is stored in column form respectively In the first form and the second form;The corresponding regular expression numberings of URL after the duplicate removal, correspondence storage is in the second form In.
10. device as claimed in claim 9, it is characterised in that the matching result replication module includes:
Row turns row submodule, and row are turned for the data of the second form to be entered into every trade;
Equivalence connection submodule, for carrying out equivalent connection by URL columns in the first form and the second form, makes duplicate removal Preceding all URL find its corresponding regular expression numbering.
CN201210133170.8A 2012-04-28 2012-04-28 The analysis method and device of a kind of network log URL Active CN103377260B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210133170.8A CN103377260B (en) 2012-04-28 2012-04-28 The analysis method and device of a kind of network log URL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210133170.8A CN103377260B (en) 2012-04-28 2012-04-28 The analysis method and device of a kind of network log URL

Publications (2)

Publication Number Publication Date
CN103377260A CN103377260A (en) 2013-10-30
CN103377260B true CN103377260B (en) 2017-05-31

Family

ID=49462386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210133170.8A Active CN103377260B (en) 2012-04-28 2012-04-28 The analysis method and device of a kind of network log URL

Country Status (1)

Country Link
CN (1) CN103377260B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617198B (en) * 2013-11-14 2017-10-27 北京国双科技有限公司 Page merging method and device
CN104933056B (en) * 2014-03-18 2019-08-13 腾讯科技(深圳)有限公司 Uniform resource locator De-weight method and device
CN103986606B (en) * 2014-05-27 2017-03-29 重庆邮电大学 It is a kind of based on the parallelism recognition of MapReduce algorithms, the method for statistical web page URL
CN104252532A (en) * 2014-09-11 2014-12-31 北京优特捷信息技术有限公司 Website information statistic method and device
CN105790967B (en) * 2014-12-18 2020-04-14 华为技术有限公司 Network log processing method and device
CN105005600B (en) * 2015-07-02 2017-05-24 焦点科技股份有限公司 Preprocessing method of URL (Uniform Resource Locator) in access log
CN105591836B (en) * 2015-09-09 2019-03-15 新华三技术有限公司 Data-flow detection method and apparatus
CN105516114B (en) * 2015-12-01 2018-12-14 珠海市君天电子科技有限公司 Method and device for scanning vulnerability based on webpage hash value and electronic equipment
US11250004B2 (en) * 2016-09-27 2022-02-15 Nippon Telegraph And Telephone Corporation Secure equijoin system, secure equijoin device, secure equijoin method, and program
CN107145542A (en) * 2017-04-25 2017-09-08 上海斐讯数据通信技术有限公司 The high efficiency extraction subscription client ID method and system from URL
CN110012010B (en) * 2019-04-03 2021-09-17 杭州汉领信息科技有限公司 Target site self-learning modeling-based WAF defense method
CN109995784B (en) * 2019-04-03 2022-02-11 杭州汉领信息科技有限公司 UDP-based data extraction acceleration method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9654495B2 (en) * 2006-12-01 2017-05-16 Websense, Llc System and method of analyzing web addresses
US7979458B2 (en) * 2007-01-16 2011-07-12 Microsoft Corporation Associating security trimmers with documents in an enterprise search system
CN101937469B (en) * 2010-09-15 2012-09-05 任子行网络技术股份有限公司 Information capture method of video website

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727447A (en) * 2008-10-10 2010-06-09 浙江搜富网络技术有限公司 Generation method and device of regular expression based on URL

Also Published As

Publication number Publication date
CN103377260A (en) 2013-10-30

Similar Documents

Publication Publication Date Title
CN103377260B (en) The analysis method and device of a kind of network log URL
US8832102B2 (en) Methods and apparatuses for clustering electronic documents based on structural features and static content features
CN101694668B (en) Method and device for confirming web structure similarity
RU2012144649A (en) PRODUCT SYNTHESIS FROM MULTIPLE SOURCES
CN103617213B (en) Method and system for identifying newspage attributive characters
CN112749284A (en) Knowledge graph construction method, device, equipment and storage medium
CN104778164A (en) Method and device for detecting repeated URL (Uniform Resource Locator)
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN105426379A (en) Keyword weight calculation method based on position of word
CA2833355A1 (en) System and method for automatic wrapper induction by applying filters
CN113204953A (en) Text matching method and device based on semantic recognition and device readable storage medium
CN103559202B (en) A kind of webpage content extraction apparatus and method
CN113220875B (en) Internet information classification method and system based on industry labels and electronic equipment
CN102929948B (en) list page identification system and method
CN110347934B (en) Text data filtering method, device and medium
CN107085603A (en) A kind of data processing method and device
CN104298786B (en) A kind of image search method and device
US20120284224A1 (en) Build of website knowledge tables
Lacasta et al. Population of a spatio-temporal knowledge base for jurisdictional domains
CN103678432B (en) A kind of web page body extracting method based on web page body feature and intermediary's true value
CA3046474A1 (en) Portfolio-based text analytics tool
CN103324640B (en) A kind of method, device and equipment determining search result document
CN112395856B (en) Text matching method, text matching device, computer system and readable storage medium
CN108073588B (en) Column information extraction method and device
Garcıa-Cumbreras et al. Sinai at weps-3: Online reputation management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1186804

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1186804

Country of ref document: HK

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211103

Address after: Room 554, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: TAOBAO (CHINA) SOFTWARE CO.,LTD.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: ALIBABA GROUP HOLDING Ltd.