CN108334630A - A kind of URL classification method and system - Google Patents

A kind of URL classification method and system Download PDF

Info

Publication number
CN108334630A
CN108334630A CN201810156915.XA CN201810156915A CN108334630A CN 108334630 A CN108334630 A CN 108334630A CN 201810156915 A CN201810156915 A CN 201810156915A CN 108334630 A CN108334630 A CN 108334630A
Authority
CN
China
Prior art keywords
url
classification
sorted
feature
library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810156915.XA
Other languages
Chinese (zh)
Inventor
黄世纬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kangfei Information Technology Co Ltd
Original Assignee
Shanghai Kangfei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kangfei Information Technology Co Ltd filed Critical Shanghai Kangfei Information Technology Co Ltd
Priority to CN201810156915.XA priority Critical patent/CN108334630A/en
Publication of CN108334630A publication Critical patent/CN108334630A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention discloses a kind of URL classification method and system, is related to information sorting technique field, and the URL classification method includes:Judge the classification information with the presence or absence of URL to be sorted in preset URL classification library;When the classification information of the URL to be sorted is not present in the URL classification library, from the corresponding webpages of the URL to be sorted, the feature phrase of expression web page contents is obtained;Lexical analysis is carried out to the feature phrase, to generate the classification marker of expression user behavior;According to the URL to be sorted and the corresponding classification markers of the URL to be sorted, corresponding classification information is generated, and be recorded in the URL classification library.The present invention can be achieved to classify to all URL, and have very high accuracy.

Description

A kind of URL classification method and system
Technical field
The present invention relates to information sorting technique field more particularly to a kind of URL classification method and system.
Background technology
Uniform resource locator (URL) is one kind of the position and access method of the resource to that can be obtained from internet Succinct expression is the address of standard resource on internet, also referred to as web page address.
Currently, it is a kind of technology used when carrying out labeling to user that the URL accessed user, which carries out analysis,.But It is that current this technology is typically embodied as carrying out analysis to the composition of URL or obtains the classification of URL by clustering method.So And the composition of URL is ever-changing, it is that can not accomplish very high accuracy to carry out classification to URL from the composition of URL merely.If It is analyzed from cluster angle, the training samples number based on current URL is limited, and there are prodigious inclined for the result trained Difference.Therefore, it if accurately to classify to URL, needs to be analyzed from the corresponding content of pages of URL.
For example, the patent of Publication No. CN106960040A, it discloses a kind of classifications of URL to determine method and device, The method includes:In the corresponding web page contents of URL to be sorted, the corresponding each tagged word of preset each feature is obtained Section;For each feature field, this feature field is divided at least one first phrase, according to each phrase pre-saved Target classification probability in each feature and non-targeted class probability, determine the First Eigenvalue of this feature field;According to true The fixed corresponding each the First Eigenvalues of the URL to be sorted, and the URL classification model completed is trained in advance, described in determination The corresponding classification of URL to be sorted.In technical solution disclosed in the patent document, accesses and obtain pair to URL first Then the keyword answered carries out cluster training to obtain corresponding training sample, due to training sample to these keywords Often there is certain error in reality is used in quantity factor.
By analyzing above, in existing technology, there are certain defects for the accuracy classified to URL.
Invention content
Technical problem to be solved by the present invention lies in existing technology, the accuracy classified to URL is not It is high.
In order to solve the above technical problem, the present invention provides a kind of URL classification method and system.
The URL classification method includes:
Judge the classification information with the presence or absence of URL to be sorted in preset URL classification library;
When the classification information of the URL to be sorted is not present in the URL classification library, corresponded to from the URL to be sorted Webpage in, obtain expression web page contents feature phrase;
Lexical analysis is carried out to the feature phrase, to generate the classification marker of expression user behavior;
According to the URL to be sorted and the corresponding classification markers of the URL to be sorted, corresponding classification information is generated, And it is recorded in the URL classification library.
Optionally, classification information of the judgement with the presence or absence of URL to be sorted in preset URL classification library includes:
Intercept the feature string of the URL to be sorted;
The URL classification library is inquired according to the feature string, to judge in the URL classification library with the presence or absence of described The classification information of URL to be sorted.
Optionally, described according to the URL to be sorted and the corresponding classification markers of the URL to be sorted, it generates and corresponds to Classification information include:
According to the corresponding feature strings of URL to be sorted and the classification marker, corresponding classification letter is generated Breath.
Optionally, in the corresponding webpage from the URL to be sorted, the feature phrase packet of expression web page contents is obtained It includes:
By accessing the URL to be sorted, the corresponding web page contents of URL to be sorted are obtained;
Determine the feature phrase for expressing the web page contents.
Optionally, the feature phrase includes at least the web page title information that the URL to be sorted corresponds to webpage.
On the other hand, the present invention also provides a kind of URL classification systems, including:
Judgment module, for judging the classification information in preset URL classification library with the presence or absence of URL to be sorted;
Feature phrase acquisition module, for when the classification information that the URL to be sorted is not present in the URL classification library When, from the corresponding webpages of the URL to be sorted, obtain the feature phrase of expression web page contents;
Classification marker generation module, for carrying out lexical analysis to the feature phrase, to generate expression user behavior Classification marker;
Sort module, for according to the URL to be sorted and the corresponding classification markers of the URL to be sorted, generation pair The classification information answered, and be recorded in the URL classification library.
Optionally, the judgment module includes:
Character string intercepts submodule, the feature string for intercepting the URL to be sorted;
Judging submodule, for inquiring the URL classification library according to the feature string, to judge the URL classification It whether there is the classification information of the URL to be sorted in library.
Optionally, the sort module includes:
Classification information generates submodule, for according to the corresponding feature strings of the URL to be sorted and described point Class marks, and generates corresponding classification information.
Optionally, the feature phrase acquisition module includes:
URL accesses submodule, for by accessing the URL to be sorted, obtaining the corresponding web page contents of URL to be sorted;
Feature phrase determination sub-module, for determining the feature phrase for expressing the web page contents.
Optionally, the feature phrase includes at least the web page title information that the URL to be sorted corresponds to webpage.
When needing the URL that classifies to can not find corresponding classification in URL classification library, by the corresponding webpages of URL into Row analysis, the feature phrase of extraction expression web page contents, and lexical analysis is carried out to this feature phrase, and then obtain expression user The classification marker of behavior, and classified according to URL and the classification marker, to update URL classification library.The present invention can be real Now classify to URL, and there is very high accuracy.
Description of the drawings
Fig. 1 is a kind of flow chart for URL classification method that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of URL classification method provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structure diagram for URL classification system that the embodiment of the present invention three provides.
Specific implementation mode
Following is a specific embodiment of the present invention in conjunction with the accompanying drawings, technical scheme of the present invention will be further described, However, the present invention is not limited to these examples.
It is also understood that specific embodiment described herein is used only for understanding the present invention, it is not used to limit this hair It is bright.
In the present invention, it is provided with URL classification library, the classification information of URL is provided in the URL classification library.It is described In URL classification library, classification logotype is carried out to URL using classification marker, each classification marker can correspond to a plurality of types of URL.
The classification marker can be used for expressing the behavior of user, for example, " purchase washing machine ", " inquiry washing machine price ", " cosmetics for buying some brand " etc..
When a URL of needs is classified, the URL classification library is inquired first, it can be in the URL classification library There are the corresponding classification informations of the URL.When classification information corresponding there is no the URL in the URL classification library, basis is needed The corresponding text mining of the URL, to generate a new classification.
Specifically, the feature phrase of extraction expression web page contents, and lexical analysis is carried out to this feature phrase, and then obtain The classification marker of user behavior is expressed, and is classified according to URL and the classification marker, to update URL classification library.
Due in the present invention, when needing the URL to classify to can not find corresponding classification in URL classification library, producing new Classification, and then realize URL classification integrality and accuracy.Be not in the case where URL can not be divided into a certain classification.This Outside, since in the method for the present invention, the more accurate matching way used can realize higher accuracy in classification.
Embodiment one
Fig. 1 shows a kind of flow chart for URL classification method that the embodiment of the present invention one provides, and is described in detail such as in conjunction with attached drawing Under:
In the present embodiment, URL to be sorted is searched first in URL classification library, is existed needing the URL to classify When can not find corresponding classification in URL classification library, by analyzing the corresponding webpages of URL, extraction expression web page contents Feature phrase, and lexical analysis is carried out to this feature phrase, and then the classification marker of expression user behavior is obtained, and according to URL And the classification marker is classified, to update URL classification library.
Step S101 judges the classification information that whether there is URL to be sorted in preset URL classification library.
The classification information of URL is provided in the URL classification library.Classification marker can be used, classification logotype is carried out to URL, Each classification marker can correspond to a plurality of types of URL.
It can be by the way that the URL be matched in class library, to judge whether the URL to be sorted belongs to described A URL classification in URL classification library.Wherein, it can also be fuzzy matching that matching process, which can be accurate matching,.It is waited for point described When the URL of class is matched, the character string that the partial character of the URL to be sorted can be used to constitute.In general, which can The main characteristic information for including the URL, does not limit the concrete form of the character string here.
Step S102 is waited for point when the classification information of the URL to be sorted is not present in the URL classification library from described In the corresponding webpages of class URL, the feature phrase of expression web page contents is obtained.
When the classification information of the URL to be sorted is not present in the URL classification library, that is, the URL to classify is being needed to exist When can not find corresponding classification in URL classification library, the corresponding web page contents to the URL to be sorted is needed to analyze, with Just new classification is generated in the URL classification library.
The feature phrase of expression web page contents is obtained from the corresponding webpages of the URL, the feature phrase includes multiple Vocabulary for expressing web page contents.The vocabulary can be obtained by text messages such as the titles of the corresponding webpage of extraction.
Further, the picture in the corresponding webpages of URL can be also identified, to obtain the text of expression image content Information extracts vocabulary in the text message, and is included in the feature phrase.The identification can be the text identified in webpage Information can also be the text message by shape recognition in picture at the description shape, such as have a refrigerator shape in picture, It can be identified as text message " refrigerator ".
In addition, the video content in webpage can be also identified, it is similar to picture recognition, it just repeats no more here.
It should be noted that above-mentioned picture recognition and video identification are all the common prior arts, here no longer to its into Row specifically describes.
The feature vocabulary is used to express the content of webpage, but there are many information exhibition methods in webpage, is not limited to text Font formula can take various ways to extract webpage information.
Optionally, the feature phrase includes at least the web page title information that the URL to be sorted corresponds to webpage.
Step S103 carries out lexical analysis to the feature phrase, to generate the classification marker of expression user behavior.
Lexical analysis is carried out to the feature vocabulary, to generate the classification marker of expression user behavior.
Further, lexical analysis module can be used to realize for the lexical analysis, pass through the lexical analysis API Calls vocabulary Analysis module produces the classification marker of expression user behavior.When calling the lexical analysis module, can by feature vocabulary with The mode of vectorization indicates, vocabulary vector is obtained, for example, [" stores xx ", " washing machine ", " xx models ", " roller "].
The lexical analysis API be the lexical analysis module an access address, will need the vocabulary analyzed to Amount is transmitted to the lexical analysis module.The lexical analysis Module implementations have very much, for example can analyze as needed It is cumulative that vocabulary of all categories in vocabulary vector carries out weight, finally using the higher attribute of weight as the class another characteristic, For example the vocabulary contained in the vocabulary vector has:" Haier's washing machine ", " TCL washing machines ", " Samsung SC1000 ", then vocabulary point Analysis module is weighted washing machine, i.e., the weight of washing machine is 2 in classification, and the weight sheet of other classifications is 1, therefore described It is " washing machine " that lexical analysis module exports one of feature in the vocabulary vector in the category.And according to each classification Feature obtains the classification marker of expression user behavior.
In the present invention, the classification marker can be used for expressing the behavior of user, for example, " purchase washing machine ", " inquiry is washed Clothing machine price ", " cosmetics for buying some brand " etc..
It should be noted that lexical analysis is technological means commonly used in the prior art, there are many mode of realization, above-mentioned mistake Journey is one such.
Step S104 is generated corresponding according to the URL to be sorted and the corresponding classification markers of the URL to be sorted Classification information, and be recorded in the URL classification library.
When generating the classification information of the URL, certain processing can be carried out to the URL, is obtained representative Character string enables to represent a kind of URL;Directly the URL can also be completely written in URL classification library.
In the URL classification library, classification logotype is carried out to URL using classification marker, each classification marker can correspond to more The URL of type.
The classification information includes the relevant character strings of the URL and corresponding classification marker.
When needing the URL to classify not in preset URL classification library, by analyzing the corresponding webpages of URL, carry The feature phrase of expression web page contents is taken, and lexical analysis is carried out to this feature phrase, and then obtains point of expression user behavior Class marks, and is classified according to URL and the classification marker, to update URL classification library.With existing sorting technique phase Than the complete classification to all URL can be achieved in the present invention, and has very high accuracy.
Embodiment two
Fig. 2 shows a kind of flow charts of URL classification method provided by Embodiment 2 of the present invention, are described in detail such as in conjunction with attached drawing Under:
Step S201 intercepts the feature string of the URL to be sorted.
The feature string is character string representative in the URL, can represent a kind of URL.For example, URL For " bbs.phicomm.com/article/titleS=123 ", corresponding feature string are:“phicomm.com/ article”.The present invention does not limit specifically feature string intercept method.In general, the feature string is at least Field in main part and upper directory including domain name.
Step S202 inquires the URL classification library, to judge to be in the URL classification library according to the feature string It is no that there are the classification informations of the URL to be sorted.
Step S203 is waited for point when the classification information of the URL to be sorted is not present in the URL classification library from described In the corresponding webpages of class URL, the feature phrase of expression web page contents is obtained;
Step S204 carries out lexical analysis to the feature phrase, to generate the classification marker of expression user behavior.
Step S205 is generated and is corresponded to according to the corresponding feature strings of URL to be sorted and the classification marker Classification information, and be recorded in the URL classification library.
In the present embodiment, the corresponding feature strings of URL to be sorted and corresponding classification marker are written In the URL classification library.
In the present embodiment, it by the way that the corresponding feature strings of the URL and classification marker to be written in class library, generates New classification.
In the present embodiment, the feature string of the URL can represent a kind of URL, and then realize a kind of URL classifications Determination.
Embodiment three
Fig. 3 shows a kind of structure diagram for URL classification system that the embodiment of the present invention three provides, and is described in detail such as in conjunction with attached drawing Under:
The URL classification system includes:
Judgment module 31, for judging the classification information in preset URL classification library with the presence or absence of URL to be sorted;
Feature phrase acquisition module 32, for when the classification information that the URL to be sorted is not present in the URL classification library When, from the corresponding webpages of the URL to be sorted, obtain the feature phrase of expression web page contents;
Classification marker generation module 33, for carrying out lexical analysis to the feature phrase, to generate expression user behavior Classification marker;
Sort module 34, for according to the URL to be sorted and the corresponding classification markers of the URL to be sorted, generating Corresponding classification information, and be recorded in the URL classification library.
Optionally, the judgment module 31 includes:
Character string intercepts submodule, the feature string for intercepting the URL to be sorted;
Judging submodule, for inquiring the URL classification library according to the feature string, to judge the URL classification It whether there is the classification information of the URL to be sorted in library.
Optionally, the sort module 34 includes:
Classification information generates submodule, for according to the corresponding feature strings of the URL to be sorted and described point Class marks, and generates corresponding classification information.
Optionally, the feature phrase acquisition module 32 includes:
URL accesses submodule, for by accessing the URL to be sorted, obtaining the corresponding web page contents of URL to be sorted;
Feature phrase determination sub-module, for determining the feature phrase for expressing the web page contents.
Optionally, the feature phrase includes at least the web page title information that the URL to be sorted corresponds to webpage.
By URL classification system in this present embodiment to being used for embodiment of the method above-mentioned, the content of detailed description is referring to aforementioned Embodiment of the method one and embodiment of the method two, which is not described herein again.
It should be appreciated that there is no the stringent sequences that executes for the step in the present invention, it is all it is contemplated that and not influencing function The variation of realization all should be within the scope of the present invention.
In embodiment provided herein, it should be appreciated that described method and system is all schematical, in reality By adjusting can difference in the implementation process of border.
In addition, the specific name of each functional unit or module is also only to facilitate mutually differentiation, is not used to the present invention Protection domain.
Specific embodiment described herein is only an example for the spirit of the invention.Technology belonging to the present invention is led The technical staff in domain can make various modifications or additions to the described embodiments or replace by a similar method In generation, however, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.

Claims (10)

1. a kind of URL classification method, which is characterized in that including step:
Judge the classification information with the presence or absence of URL to be sorted in preset URL classification library;
When the classification information of the URL to be sorted is not present in the URL classification library, from the corresponding nets of the URL to be sorted In page, the feature phrase of expression web page contents is obtained;
Lexical analysis is carried out to the feature phrase, to generate the classification marker of expression user behavior;
According to the URL to be sorted and the corresponding classification markers of the URL to be sorted, corresponding classification information is generated, and remember Record is in the URL classification library.
2. URL classification method according to claim 1, which is characterized in that the judgement is in preset URL classification library The no classification information there are URL to be sorted includes:
Intercept the feature string of the URL to be sorted;
The URL classification library is inquired according to the feature string, to judge to wait for point with the presence or absence of described in the URL classification library The classification information of class URL.
3. URL classification method according to claim 2, which is characterized in that described according to the URL to be sorted and described The corresponding classification marker of URL to be sorted, generating corresponding classification information includes:
According to the corresponding feature strings of URL to be sorted and the classification marker, corresponding classification information is generated.
4. URL classification method according to claim 1, which is characterized in that described from the corresponding webpages of the URL to be sorted In, the feature phrase for obtaining expression web page contents includes:
By accessing the URL to be sorted, the corresponding web page contents of URL to be sorted are obtained;
Determine the feature phrase for expressing the web page contents.
5. URL classification method according to claim 1, which is characterized in that the feature phrase includes at least described wait for point Class URL corresponds to the web page title information of webpage.
6. a kind of URL classification system, which is characterized in that including:
Judgment module, for judging the classification information in preset URL classification library with the presence or absence of URL to be sorted;
Feature phrase acquisition module is used for when the classification information of the URL to be sorted is not present in the URL classification library, from In the corresponding webpage of the URL to be sorted, the feature phrase of expression web page contents is obtained;
Classification marker generation module, for carrying out lexical analysis to the feature phrase, to generate the classification of expression user behavior Label;
Sort module, for according to the URL to be sorted and the corresponding classification markers of the URL to be sorted, generating corresponding Classification information, and be recorded in the URL classification library.
7. URL classification system according to claim 6, which is characterized in that the judgment module includes:
Character string intercepts submodule, the feature string for intercepting the URL to be sorted;
Judging submodule, for inquiring the URL classification library according to the feature string, to judge in the URL classification library With the presence or absence of the classification information of the URL to be sorted.
8. URL classification system according to claim 7, which is characterized in that the sort module includes:
Classification information generates submodule, for according to the corresponding feature strings of URL to be sorted and the contingency table Note, generates corresponding classification information.
9. URL classification system according to claim 6, which is characterized in that the feature phrase acquisition module includes:
URL accesses submodule, for by accessing the URL to be sorted, obtaining the corresponding web page contents of URL to be sorted;
Feature phrase determination sub-module, for determining the feature phrase for expressing the web page contents.
10. URL classification system according to claim 6, which is characterized in that the feature phrase includes at least described wait for point Class URL corresponds to the web page title information of webpage.
CN201810156915.XA 2018-02-24 2018-02-24 A kind of URL classification method and system Pending CN108334630A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810156915.XA CN108334630A (en) 2018-02-24 2018-02-24 A kind of URL classification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810156915.XA CN108334630A (en) 2018-02-24 2018-02-24 A kind of URL classification method and system

Publications (1)

Publication Number Publication Date
CN108334630A true CN108334630A (en) 2018-07-27

Family

ID=62929737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810156915.XA Pending CN108334630A (en) 2018-02-24 2018-02-24 A kind of URL classification method and system

Country Status (1)

Country Link
CN (1) CN108334630A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516136A (en) * 2019-08-29 2019-11-29 南京烽火天地通信科技有限公司 A kind of internet crawler content page recognition methods based on sample

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819591A (en) * 2012-08-07 2012-12-12 北京网康科技有限公司 Content-based web page classification method and system
CN102819597A (en) * 2012-08-13 2012-12-12 北京星网锐捷网络技术有限公司 Web page classification method and equipment
US20160217144A1 (en) * 2013-09-04 2016-07-28 Zte Corporation Method and device for obtaining web page category standards, and method and device for categorizing web page categories
WO2017167067A1 (en) * 2016-03-30 2017-10-05 阿里巴巴集团控股有限公司 Method and device for webpage text classification, method and device for webpage text recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819591A (en) * 2012-08-07 2012-12-12 北京网康科技有限公司 Content-based web page classification method and system
CN102819597A (en) * 2012-08-13 2012-12-12 北京星网锐捷网络技术有限公司 Web page classification method and equipment
US20160217144A1 (en) * 2013-09-04 2016-07-28 Zte Corporation Method and device for obtaining web page category standards, and method and device for categorizing web page categories
WO2017167067A1 (en) * 2016-03-30 2017-10-05 阿里巴巴集团控股有限公司 Method and device for webpage text classification, method and device for webpage text recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
宗校军: "中文网页定题采集及分类研究", 《中国博士学位论文全文数据库(信息科技辑)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516136A (en) * 2019-08-29 2019-11-29 南京烽火天地通信科技有限公司 A kind of internet crawler content page recognition methods based on sample

Similar Documents

Publication Publication Date Title
US11809393B2 (en) Image and text data hierarchical classifiers
CN101542486B (en) Rank graph
US8856129B2 (en) Flexible and scalable structured web data extraction
CN102053983B (en) Method, system and device for querying vertical search
US7676745B2 (en) Document segmentation based on visual gaps
JP4637969B1 (en) Properly understand the intent of web pages and user preferences, and recommend the best information in real time
EP2570974A1 (en) Automatic crowd sourcing for machine learning in information extraction
CN103886020B (en) A kind of real estate information method for fast searching
US20110246462A1 (en) Method and System for Prompting Changes of Electronic Document Content
CN107038173A (en) Application query method and apparatus, similar application detection method and device
CN105243058A (en) Webpage content translation method and electronic apparatus
CN103617192B (en) The clustering method and device of a kind of data object
US20230351789A1 (en) Systems and methods for deep learning based approach for content extraction
CN112035675A (en) Medical text labeling method, device, equipment and storage medium
CN104036190A (en) Method and device for detecting page tampering
ur Rehman et al. Learning a semantic space for modeling images, tags and feelings in cross-media search
CN114222000B (en) Information pushing method, device, computer equipment and storage medium
US11386263B2 (en) Automatic generation of form application
CN109885583A (en) Data query method, apparatus, equipment and storage medium based on block chain
CN108334630A (en) A kind of URL classification method and system
CN113836434B (en) Web page data processing method based on database
CN109948015B (en) Meta search list result extraction method and system
CN115186240A (en) Social network user alignment method, device and medium based on relevance information
JP2007323238A (en) Highlighting device and program
CN114239689A (en) Multi-mode-based website type judgment method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180727

WD01 Invention patent application deemed withdrawn after publication