CN103324761B - A kind of based on internet data formation product database method and system - Google Patents
A kind of based on internet data formation product database method and system Download PDFInfo
- Publication number
- CN103324761B CN103324761B CN201310292303.0A CN201310292303A CN103324761B CN 103324761 B CN103324761 B CN 103324761B CN 201310292303 A CN201310292303 A CN 201310292303A CN 103324761 B CN103324761 B CN 103324761B
- Authority
- CN
- China
- Prior art keywords
- product
- data
- attribute
- degree
- product attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Abstract
The invention discloses a kind of based on internet data formation product database method and system.The method is as follows: use Theme Crawler of Content technology, crawl to be higher than the web data of predetermined threshold value with degree of subject relativity;The web data of crawl is carried out structured storage;The web data of structured storage is classified automatically according to product generic;Add up occurrence number and the time of occurrence of product attribute in automatic sorted web data, according to default weight, product attribute occurrence number and time of occurrence are weighted, obtain product attribute decision value, determine that product attribute puts in order according to product attribute decision value.This system, including data capture module, structured storage module, data categorization module and attribute decision-making module.This based on internet data formation product database method and system, user just would know that more comprehensively integrated information without collecting the product information arranged in the Internet;Ensure that the real-time of data, meet the real-time requirement of user.
Description
Technical field
The present invention relates to internet data processing technology field, formed based on internet data in particular to one
Product database method and system.
Background technology
At present, the catalogue of some main stream website is formed, and is both for every profession and trade and uses fixing product to issue template, shape
Become the description of a product.Further, for the describing mode of same product, the standard that each website is taked is the most different.This
Sample, due to product promulgated standard form disunity, for product demand side, demanding criteria is of all kinds, due to each big net
Stand product description form disunity, therefore comprehensive improvement is carried out for product information the most difficult, it is impossible to know and meet demand mark
Accurate product more comprehensively information, selecting if carrying out product by demanding criteria, selecting for high-volume polytypic product
Situation, generally requires reading magnanimity webpage, inefficiency.
In sum, owing to lacking a kind of unified product description standard in correlation technique, and product information is caused to arrange
The technical problem of difficulty.
Summary of the invention
It is an object of the invention to provide a kind of based on internet data formation product database method and system, to solve
Above-mentioned problem.
Provide a kind of based on internet data formation product database method, including step in an embodiment of the present invention
Rapid:
Step A, uses Theme Crawler of Content technology, crawl to be higher than the web data of predetermined threshold value, wherein, institute with degree of subject relativity
State degree of subject relativity to be calculated by content Controlling UEP and link Controlling UEP;
Step B, carries out structured storage by the described web data captured;
Step C, classifies according to product generic automatically to the web data of described structured storage;
Step D, adds up occurrence number and the time of occurrence of product attribute in automatic sorted web data, according to presetting
Weight product attribute occurrence number and time of occurrence are weighted, obtain product attribute decision value, according to described product
Product attribute decision value determines that product attribute puts in order;
Wherein, the occurrence number of product attribute is designated as F, and the time of occurrence of product attribute is designated as T, and the power of Data Source
Heavily it is designated as W, by formula (F+T) * W, obtains described product attribute decision value.
Wherein, described step A includes step:
Web data after Content Feature Extraction is analyzed, it is determined that web page contents with designated key degree of association is
No reach described predetermined threshold value, be then to retain this webpage, no, then filter out this webpage;And/or, extraction from webpage is surpassed
Chain information is calculated, and draws the degree of association of each URL indication page and designated key, degree of association reaches the net of predetermined threshold value
Page retains;
The URL of the webpage of reservation joined in queue of creeping and be ranked up with the height of degree of subject relativity according to it;
According to the URL creeped in queue, set up with network after being connected to download its indication content of pages.
Wherein, described step B includes step:
The web page tag of the web data captured is analyzed, for the different product pages, is obtained by entity tag
Take product entity information, and form record, obtain corresponding product attribute information and the property value of correspondence by attribute tags
Carry out structured storage.
Wherein, described step C includes step:
Extract the text message in web data, determine the characteristic item set for classification automatically, according to described characteristic item
Training text vector is redescribed in set, determines training text collection;
After current text arrives, analyzing current text according to the Feature Words in described characteristic item set, determining ought be above
This vector representation;
Concentrating at training text and select K the text most like with current text, computing formula is:
WiRepresent the characteristic vector of i-th document, WjRepresenting the characteristic vector of jth piece document, M is characterized the dimension of vector,
Sim (d) represents the similarity of the i-th and j piece document, and k represents the kth dimension of text vector;
In K the text most like with current text, calculating each weight successively, computing formula is as follows:
X is a point, and Cj is known class, diIt is k nearest neighbours' point of x,It it is vector
And vectorSimilarity,For category attribute function;
According to the weight obtained, calculate the similarity between current text and K text, according to similarity, determine and deserve
The generic of front text.
Wherein, described C includes step:
Categorization vector space is set up in advance according to training sample and taxonomic hierarchies;
To one when a point sample is classified, calculate the similarity of sample to be divided and each categorization vector, then select
Take the maximum classification of similarity as the classification corresponding to this sample to be divided.
Wherein, described step C includes step:
According to SVM algorithm and/or Bayes algorithm, web data is classified automatically.
Wherein, after described step D, further comprise the steps of:
According to the product attribute key word of user's input, retrieve the product information matched and according to product attribute decision value
Height product information is shown with tabular form.
The embodiment of the present invention also provides for a kind of based on internet data formation product database system, including data grabber mould
Block, structured storage module, data categorization module and attribute decision-making module;
Described data capture module, is used for using Theme Crawler of Content technology, captures with degree of subject relativity higher than predetermined threshold value
Web data, wherein, described degree of subject relativity is calculated by content Controlling UEP and link Controlling UEP;
Described structured storage module, for carrying out structured storage by the described web data captured;
Described data categorization module, for carrying out certainly according to product generic the web data of described structured storage
Dynamic classification;
Described attribute decision-making module, for adding up in automatic sorted web data the occurrence number of product attribute and going out
Between Xian Shi, according to default weight, product attribute occurrence number and time of occurrence are weighted, obtain product attribute certainly
According to described product attribute decision value, plan value, determines that product attribute puts in order;
Wherein, the occurrence number of product attribute is designated as F, and the time of occurrence of product attribute is designated as T, and the power of Data Source
Heavily it is designated as W, by formula (F+T) * W, obtains described product attribute decision value.
Wherein, described data capture module, it is used for:
Web data after Content Feature Extraction is analyzed, it is determined that web page contents with designated key degree of association is
No reach described predetermined threshold value, be then to retain this webpage, no, then filter out this webpage;And/or, extraction from webpage is surpassed
Chain information is calculated, and draws the degree of association of each URL indication page and designated key, degree of association reaches the net of predetermined threshold value
Page retains;
The URL of the webpage of reservation joined in queue of creeping and be ranked up with the height of degree of subject relativity according to it;
According to the URL creeped in queue, set up with network after being connected to download its indication content of pages.
Wherein, described structured storage module, it is used for:
The web page tag of the web data captured is analyzed, for the different product pages, is obtained by entity tag
Take product entity information, and form record, obtain corresponding product attribute information and the property value of correspondence by attribute tags
Carry out structured storage.
The one of the above embodiment of the present invention forms product database method and system based on internet data, by capturing
Data, structured storage, automatically classification and attribute decision value calculate several steps, the product information in magnanimity web data are entered
Classify after row structured storage, then each attribute of product is calculated, obtain the row of each attribute that product shows
The most skimble-scamble various product information description contents so, have just been carried out summarizing by row order, and user is known wanting
During the specifying information of a certain product, related data can be transferred according to product attribute, it is not necessary to read magnanimity webpage so that user for
Product information in the Internet arranges without carrying out collecting, and i.e. would know that more comprehensively integrated information.Meanwhile, calculate product to belong to
Property decision value time, occurrence number and time by attribute are weighted, in this manner it is ensured that the real-time of data, full
The real-time requirement of the most of users of foot.
Accompanying drawing explanation
Fig. 1 is the flow process of a kind of embodiment forming product database method based on internet data of the present invention
Figure;
Fig. 2 be the present invention a kind of based on internet data formed product database method an embodiment in use
The principle schematic of SVM algorithm;
Fig. 3 is that the structure of a kind of embodiment forming product database system based on internet data of the present invention is shown
It is intended to.
Detailed description of the invention
Below by specific embodiment and combine accompanying drawing the present invention is described in further detail.
Embodiments provide a kind of based on internet data formation product database method, shown in Figure 1, bag
Include step:
Step S110: use Theme Crawler of Content technology, crawl to be higher than the web data of predetermined threshold value with degree of subject relativity.
The embodiment of the present invention uses Theme Crawler of Content technology, utilizes search engines to realize information gathering merit based on theme
Energy.Typically by functions such as queue of creeping, network connector, topic model, content Controlling UEP and link Controlling UEP
Module forms.
Wherein, queue of creeping is URL (UniformResourceLocator, the net higher by a series of degree of subject relativity
Page address) composition.In addition to special instruction, in the present invention, URL refers both to web page address.
Queue of creeping is made up of seed website at the beginning of topic search engine carries out subject search, and these seed websites can
Be given with the expert by the sector field, it is also possible to automatically generate by some authoritative websites.
After search procedure starts, the URL that system discovery is new, and add to climb to after its sequence according to degree of subject relativity
In row queue.Network connector, then according to creeping the URL in queue, is set up with network after being connected to download in its indication page
Hold.
Topic model is realized by theme modeling method, and theme morphology is conventional theme modeling method.Key word method with
One stack features key word represents subject content, including user's request theme and document content.One subject key words is permissible
Being single word phrase, including the attribute such as weight, languages, conventional relevancy algorithm is Word-frequency.
Wherein, calculate degree of subject relativity, can be by content Controlling UEP and link Controlling UEP.
Content Controlling UEP refers to that the web data after Content Feature Extraction is analyzed by system, it is determined that webpage
How are content and designated key degree of association, filter the unrelated page, retain degree of association and reach the webpage of threshold value.
Link Controlling UEP refers to that the hyperlink information extracted from webpage is calculated by system, draws each URL institute
Refer to the degree of association of the page and designated key, the URL meeting theme degree requirement is joined and creeps in queue, and it is crawled
Priority ordered, to ensure that the page that degree of association is high is preferentially retrieved.
Described predetermined threshold value, judges whether to retain this web data according to data on webpage and degree of subject relativity size
The quantization cut off value of one degree of association, specifically can be determined according to practical situation by those skilled in the art, and the present invention differs string
Lift.If degree of association hundred-mark system represents, then predetermined threshold value can be 60-100.
Step S111: the described web data captured is carried out structured storage.
The embodiment of the present invention, by being analyzed the web page tag capturing data, forms label repository, to capturing net
Page data carries out structured storage.
For the different product pages, obtain product entity by entity tag, and form record, obtained by attribute tags
Take product attribute and the property value of correspondence of correspondence, carry out structured storage.
Step S112: the web data of described structured storage is classified automatically according to product generic.
Automatically the mode of classification has multiple, and several embodiment be set forth below:
The classifying rules of one of which method foundation is:
The method only determines sample to be divided according to the classification of one or several closest samples determining in class decision-making
Affiliated classification.
Concrete algorithm steps is as follows:
Training text vector is redescribed according to characteristic item set;
After current text arrives, according to Feature Words participle current text, determine the vector representation of current text;
Concentrating at training text and select K the text most like with current text, computing formula is:
WiRepresent the characteristic vector of i-th shelves, WjRepresenting the characteristic vector of jth piece document, M is characterized the dimension of vector,
Sim (d) represents the similarity of the i-th and j piece document, and K is the kth dimension of vector;
In K neighbours of current text, calculating the weight of every class successively, computing formula is as follows:
X is a point, and Cj is known class, and di is k nearest neighbours' point of x,It it is vectorWith
VectorSimilarity,For category attribute function, if diBelong to class Cj, then functional value is 1, is otherwise 0.
Afterwards, according to the weight obtained, calculate the similarity between current text and K text, according to similarity, determine
The generic of this current text.
Another way is, the characteristic vector by document representation is weighting: D=D (T1, W1;T2, W2;…;Tn, Wn), so
Method by calculating text similarity determines the classification of sample to be divided afterwards.When text is represented as vector space model
Waiting, the similarity of text just can represent by the inner product between characteristic vector.
This kind of mode sets up categorization vector space according to the training sample in corpus and taxonomic hierarchies typically in advance.When needing
The when of one sample to be divided classification, it is only necessary to the similarity calculating sample to be divided and each categorization vector is interior
Long-pending, then choose the maximum classification of similarity as the classification corresponding to this sample to be divided.
Additionally, also can use SVM algorithm and/or Bayes algorithm that web data is classified automatically.
SVM algorithm, shown in Figure 2, it is that the optimal classification surface in the case of linear separability develops, basic thought
Visible figure, cut-off rule 1 and cut-off rule 2 can correctly by 2 class samples separately, and such cut-off rule has wireless a plurality of, but segmentation
Line 1 makes the gap maximum of 2 class samples, referred to as optimal classification line (more higher-dimension is optimal classification surface or optimal hyperlane).
Bayes algorithm is a kind of method for classifying modes in the case of known prior probability and class conditional probability, treats point
The classification results of sample depends on the entirety of sample in each class field.
If training sample set is divided into M class, it is designated as C={c1 ..., ci ... cM}, the prior probability of every class is P (ci), i=
1,2 ..., M.When sample set is the biggest, it is believed that P (ci)=ci class sample number/total number of samples.For a sample to be divided
X, its class conditional probability being attributed to cj class is P (X/ci), then according to Bayes theorem, the posterior probability P (ci/ of available cj class
X):
P (ci/x)=P (x/ci) P (ci)/P (x) (formula 1-1)
If P (ci/X)=MaxjP (cj/X), i=1,2 ..., M, j=1,2 ..., M, then there is x ∈ ci (formula 1-2)
Formula (1-2) is maximum posterior probability decision rule criterion, formula (1-1) is substituted into formula (1-2), then has:
If P (x/ci) P (ci)=Maxj [P (x/cj) P (cj)], i=1,2 ..., M, j=1,2 ..., M, then x ∈ ci.
Step S113: add up occurrence number and the time of occurrence of product attribute in automatic sorted web data, according to
Product attribute occurrence number and time of occurrence are weighted by the weight preset, and obtain product attribute decision value, according to institute
State product attribute decision value and determine that product attribute puts in order.
Attribute decision package contains two parameters, the occurrence number (F) of attribute, the time of occurrence (T) of attribute, and Data Source
Weight (W), pass through formula: (F+T) W, obtain attribute decision value.Obtain attribute according to this attribute decision value to be selected in and sequence.
Wherein, the weight of the time of occurrence of attribute and the weight of occurrence number, all specifically can determine according to practical situation,
Usually, the time of Data Source is the most remote, then the weight of the time of occurrence of these data is the least.
The embodiment of the present invention also provides for a kind of based on internet data formation product database system, shown in Figure 3, bag
Include data capture module 1, structured storage module 2, data categorization module 3 and attribute decision-making module 4.
Described data capture module 1, is used for using Theme Crawler of Content technology, captures with degree of subject relativity higher than predetermined threshold value
Web data.
Described structured storage module 2, for carrying out structured storage by the described web data captured.
Described data categorization module 3, for carrying out according to product generic the web data of described structured storage
Automatically classification.
Described attribute decision-making module 4, for add up in automatic sorted web data the occurrence number of product attribute and
Time of occurrence, is weighted product attribute occurrence number and time of occurrence according to default weight, obtains product attribute
According to described product attribute decision value, decision value, determines that product attribute puts in order.
These Database Systems should still be provided with searcher and management platform.
Searcher provides the user query interface, retrieves index data base according to the retrieval type that user proposes, presses
Page link and relevant information are returned to user to after result ranking by degree of association height.
Management platform is responsible for being monitored whole system and managing, and main realization determines theme, initializes crawl device, control
The functions such as crawling process processed, coordination optimization intermodule functional realiey, user are mutual.As a perfect search engine, management
Platform also should supply cross-platform network service application interface.
Wherein, as a kind of embodiment, described data capture module 1, it is used for: to after Content Feature Extraction
Web data is analyzed, it is determined that whether web page contents and designated key degree of association reach described predetermined threshold value, are, then retaining should
Webpage, no, then filter out this webpage;And/or, the hyperlink information extracted from webpage is calculated, draws each URL indication
The page and the degree of association of designated key, reach the webpage reservation of predetermined threshold value by degree of association;The URL of the webpage of reservation is joined
Creep in queue and be ranked up with the height of degree of subject relativity according to it;According to the URL creeped in queue, set up even with network
To download its indication content of pages after connecing.
Preferably as a kind of embodiment, described structured storage module 2, it is used for: to the web data captured
Web page tag is analyzed, and for the different product pages, obtains product entity information by entity tag, and forms record,
The property value being obtained corresponding product attribute information and correspondence by attribute tags carries out structured storage.
To sum up, the method and system that the embodiment of the present invention is provided, main utilization web crawlers technology, magnanimity webpage is entered
Row captures, and mainly carries out comprehensive e-commerce website, vertical electron-like business web site, manufacturer website, purchaser website
Capturing, and extract product up-to-date, effective and related data, the data captured are entered by maintenance data Structure Storage Technology afterwards
Row structured storage, sets up electronic commerce data source.Maintenance data sorting technique again, classifies the data captured.Pass through
Set up learning sample data for each classification, by the language material of data, name Entity recognition, semantic understanding, optimize the intelligence such as sample
Change technology, and it is aided with artificial correction, it is achieved data automatic classification.Finally, by Attribute Synthetic Assessment System, the frequency that attribute is occurred
Rate, time are analyzed, and analyze in conjunction with user's typing custom, form the attribute queueing discipline under each classification, generate each classification
Description standard.
So, by the integrated use to above technology, the unified standard to every profession and trade product description is defined, by right
Purchaser's standard is acquired, and can form the product description standard just to particular Buyer, and product description content can be simultaneously
Multiple standard rooms are changed, and adapt to different purchasers and check, and can dock purchasing system, realize order contents by interface
Auto-initiation, improves the treatment effeciency of system greatly.
Obviously, those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with general
Calculating device realize, they can concentrate on single calculating device, or be distributed in multiple calculating device and formed
Network on, alternatively, they can with calculate the executable program code of device realize, it is thus possible to by they store
Performed by calculating device in the storage device, or they are fabricated to respectively each integrated circuit modules, or by them
In multiple modules or step be fabricated to single integrated circuit module and realize.So, the present invention be not restricted to any specifically
Hardware and software combines.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, that is made any repaiies
Change, equivalent, improvement etc., should be included within the scope of the present invention.
Claims (10)
1. one kind forms product database method based on internet data, it is characterised in that include step:
Step A, uses Theme Crawler of Content technology, crawl to be higher than the web data of predetermined threshold value, wherein, described master with degree of subject relativity
Topic degree of association is calculated by content Controlling UEP and link Controlling UEP;
Step B, carries out structured storage by the described web data captured;
Step C, classifies according to product generic automatically to the web data of described structured storage;
Step D, adds up occurrence number and the time of occurrence of product attribute in automatic sorted web data, according to default power
Heavily product attribute occurrence number and time of occurrence are weighted, obtain product attribute decision value, belong to according to described product
Property decision value determines that product attribute puts in order;
Wherein, the occurrence number of product attribute is designated as F, and the time of occurrence of product attribute is designated as T, and the weight note of Data Source
For W, by formula (F+T) * W, obtain described product attribute decision value.
The most according to claim 1 based on internet data formation product database method, it is characterised in that described step
A includes step:
Web data after Content Feature Extraction is analyzed, it is determined that whether web page contents reaches with designated key degree of association
To described predetermined threshold value, it is then to retain this webpage, no, then filter out this webpage;And/or, to the hyperlink letter extracted from webpage
Breath is calculated, and draws the degree of association of each URL indication page and designated key, and the webpage that degree of association reaches predetermined threshold value is protected
Stay;
The URL of the webpage of reservation joined in queue of creeping and be ranked up with the height of degree of subject relativity according to it;
According to the URL creeped in queue, set up with network after being connected to download its indication content of pages.
The most according to claim 1 based on internet data formation product database method, it is characterised in that described step
Rapid B includes step:
The web page tag of the web data captured is analyzed, for the different product pages, is obtained by entity tag and produce
Product entity information, and form record, the property value being obtained corresponding product attribute information and correspondence by attribute tags is carried out
Structured storage.
The most according to claim 1 based on internet data formation product database method, it is characterised in that described step
C includes step:
Extract the text message in web data, determine the characteristic item set for classification automatically, according to described characteristic item set
Redescribe training text vector, determine training text collection;
After current text arrives, analyze current text according to the Feature Words in described characteristic item set, determine current text
Vector representation;
Concentrating at training text and select K the text most like with current text, computing formula is:
WiRepresent the characteristic vector of i-th document, WjRepresenting the characteristic vector of jth piece document, M is characterized the dimension of vector, sim
D () represents the similarity of the i-th and j piece document, k represents the kth dimension of text vector,It is i-th document vector,It it is jth
Piece document vector;
In K the text most like with current text, calculating each weight successively, computing formula is as follows:
X is a point, and Cj is known class, and di is k nearest neighbours' point of x,It it is vectorWith to
AmountSimilarity,For category attribute function,Representing the vector of a point, KNN represents that K arest neighbors is tied
Point algorithm,For the weight of any one text in K neighbours of current text;
According to the weight obtained, calculate the similarity between current text and K text, according to similarity, determine and deserve above
This generic.
The most according to claim 1 based on internet data formation product database method, it is characterised in that described step
C includes step:
Categorization vector space is set up in advance according to training sample and taxonomic hierarchies;
To one when a point sample is classified, calculate the similarity of sample to be divided and each categorization vector, then choose phase
Like spending maximum classification as the classification corresponding to this sample to be divided.
The most according to claim 1 based on internet data formation product database method, it is characterised in that described step
C includes step:
According to SVM algorithm and/or Bayes algorithm, web data is classified automatically.
The most according to claim 1 based on internet data formation product database method, it is characterised in that described step
After D, further comprise the steps of:
According to the product attribute key word of user's input, retrieve the product information matched the height according to product attribute decision value
Low product information is shown with tabular form.
8. one kind forms product database system based on internet data, it is characterised in that include data capture module, structuring
Memory module, data categorization module and attribute decision-making module;
Described data capture module, is used for using Theme Crawler of Content technology, crawl to be higher than the webpage of predetermined threshold value with degree of subject relativity
Data, wherein, described degree of subject relativity is calculated by content Controlling UEP and link Controlling UEP;
Described structured storage module, for carrying out structured storage by the described web data captured;
Described data categorization module, for automatically dividing according to product generic the web data of described structured storage
Class;
Described attribute decision-making module, during for adding up in automatic sorted web data the occurrence number of product attribute and occurring
Between, according to default weight, product attribute occurrence number and time of occurrence are weighted, obtain product attribute decision value,
Determine that product attribute puts in order according to described product attribute decision value;
Wherein, the occurrence number of product attribute is designated as F, and the time of occurrence of product attribute is designated as T, and the weight note of Data Source
For W, by formula (F+T) * W, obtain described product attribute decision value.
The most according to claim 8 based on internet data formation product database system, it is characterised in that described data
Handling module, is used for:
Web data after Content Feature Extraction is analyzed, it is determined that whether web page contents reaches with designated key degree of association
To described predetermined threshold value, it is then to retain this webpage, no, then filter out this webpage;And/or, to the hyperlink letter extracted from webpage
Breath is calculated, and draws the degree of association of each URL indication page and designated key, and the webpage that degree of association reaches predetermined threshold value is protected
Stay;
The URL of the webpage of reservation joined in queue of creeping and be ranked up with the height of degree of subject relativity according to it;
According to the URL creeped in queue, set up with network after being connected to download its indication content of pages.
The most according to claim 8 based on internet data formation product database system, it is characterised in that described knot
Structure memory module, is used for:
The web page tag of the web data captured is analyzed, for the different product pages, is obtained by entity tag and produce
Product entity information, and form record, the property value being obtained corresponding product attribute information and correspondence by attribute tags is carried out
Structured storage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310292303.0A CN103324761B (en) | 2013-07-11 | A kind of based on internet data formation product database method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310292303.0A CN103324761B (en) | 2013-07-11 | A kind of based on internet data formation product database method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103324761A CN103324761A (en) | 2013-09-25 |
CN103324761B true CN103324761B (en) | 2016-11-30 |
Family
ID=
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN201210293Y (en) * | 2008-03-07 | 2009-03-18 | 施侃晟 | Computer assistant reporting and knowledge generating system |
CN101477556A (en) * | 2009-01-22 | 2009-07-08 | 苏州智讯科技有限公司 | Method for discovering hot sport in internet mass information |
CN102591992A (en) * | 2012-02-15 | 2012-07-18 | 苏州亚新丰信息技术有限公司 | Webpage classification identifying system and method based on vertical search and focused crawler technology |
CN103186675A (en) * | 2013-04-03 | 2013-07-03 | 南京安讯科技有限责任公司 | Automatic webpage classification method based on network hot word identification |
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN201210293Y (en) * | 2008-03-07 | 2009-03-18 | 施侃晟 | Computer assistant reporting and knowledge generating system |
CN101477556A (en) * | 2009-01-22 | 2009-07-08 | 苏州智讯科技有限公司 | Method for discovering hot sport in internet mass information |
CN102591992A (en) * | 2012-02-15 | 2012-07-18 | 苏州亚新丰信息技术有限公司 | Webpage classification identifying system and method based on vertical search and focused crawler technology |
CN103186675A (en) * | 2013-04-03 | 2013-07-03 | 南京安讯科技有限责任公司 | Automatic webpage classification method based on network hot word identification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103914478B (en) | Webpage training method and system, webpage Forecasting Methodology and system | |
CN110674407B (en) | Hybrid recommendation method based on graph convolution neural network | |
US7680858B2 (en) | Techniques for clustering structurally similar web pages | |
US7676465B2 (en) | Techniques for clustering structurally similar web pages based on page features | |
US7672943B2 (en) | Calculating a downloading priority for the uniform resource locator in response to the domain density score, the anchor text score, the URL string score, the category need score, and the link proximity score for targeted web crawling | |
CN103226578B (en) | Towards the website identification of medical domain and the method for webpage disaggregated classification | |
EP1736901B1 (en) | Method for classifying sub-trees in semi-structured documents | |
CN107862022B (en) | Culture resource recommendation system | |
CN102473190B (en) | Keyword assignment to a web page | |
US20110264651A1 (en) | Large scale entity-specific resource classification | |
CN106250513A (en) | A kind of event personalization sorting technique based on event modeling and system | |
CN107885793A (en) | A kind of hot microblog topic analyzing and predicting method and system | |
CN104199822A (en) | Method and system for identifying demand classification corresponding to searching | |
CN103823824A (en) | Method and system for automatically constructing text classification corpus by aid of internet | |
CN102317937A (en) | System and method for aggregating and ranking data from a plurality of web sites | |
CN103324666A (en) | Topic tracing method and device based on micro-blog data | |
CN109840532A (en) | A kind of law court's class case recommended method based on k-means | |
CN102855282B (en) | A kind of document recommendation method and device | |
CN101968819A (en) | Audio/video intelligent catalog information acquisition method facing to wide area network | |
CN106649849A (en) | Text information base building method and device and searching method, device and system | |
CN103412903B (en) | The Internet of Things real-time searching method and system predicted based on object of interest | |
CN109871433B (en) | Method, device, equipment and medium for calculating relevance between document and topic | |
CN103778206A (en) | Method for providing network service resources | |
Rao et al. | A machine learning approach to classify news articles based on location | |
CN101178721A (en) | Method for classifying and managing useful poser information in forum |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20161130 |