CN1536510A - Method and system for filtering column tube information by utilizing picture and characters identification technique - Google Patents
Method and system for filtering column tube information by utilizing picture and characters identification technique Download PDFInfo
- Publication number
- CN1536510A CN1536510A CNA031101046A CN03110104A CN1536510A CN 1536510 A CN1536510 A CN 1536510A CN A031101046 A CNA031101046 A CN A031101046A CN 03110104 A CN03110104 A CN 03110104A CN 1536510 A CN1536510 A CN 1536510A
- Authority
- CN
- China
- Prior art keywords
- information
- network address
- technology
- column tube
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and system of filtering array tube information by the graph and text identifying technique, under the condition of network device transmission and by instant interactivity of network route and user, making the user able to perform instant operation and contrast on the webpage or website connected by the client end through remote servo server and data base, to make the user obtain the needed array tube information which has been filtered.
Description
Technical field
The invention relates to a kind of method and system that utilizes the picture and characters identification technique filtering column tube information; Be meant especially, under a kind of environment that is applied to network equipment transmission, immediate interactive by networking and user, allow the user can see through the servomechanism and the data bank of far-end, instant computing comparison is done in webpage or website that the user searched, make the user can obtain the method and system of required filtering column tube information.
Background technology
Domestic in order to meet the arrival of informationized society, make its people of new generation have basic information literacy, honestly try hard to recommend moving information education plan; Yet outside promotion message education, prior problem is when how to allow the user utilize computer, can obtain suitable information and absorb positive healthy networking information at the appropriate age.
The function of normal use is based on browsing information and send and receive e-mail on the networking for general networking user, and wherein, the most normal activity of carrying out is for using Search engine, browse life leisure information, grasping software and chat with friends etc.Yet, no doubt exist every field on the omnifarious website just like the good information as the encyclopedia, relative, be flooded with also directly and cheap stimulus to the sense organ, except Eight Diagrams news, pornographic information, negative messages such as networking gambling, drug trafficking and firearms are arranged more.Because being easy to get property that Internet had, immediacy and speciality such as interactive make above-mentioned dangerous message can touch general networking user easily, but more be difficult to the branch evident and tracking is investigated and seized because of its anonymity.Under the situation that at present user can't restrain oneself, the supplier can't check on, government has no way of manages, law has no way of punishing, the chance that the networking user touches flame also increases day by day.
Existing filtering technique is broadly divided into three classes; First kind filtering technique class is for filtering software (fitering SOftware).The net user can be selected to avoid seeing to be contained " specific wording " article, for instance, will " violence " or " pornographic " be set at key word, so, have in every content " violence " or " pornographic " and etc. the article of word just can be blocked by software.The second class filtering technique is hierarchy system (rating systems), as Internet content choice platform (the Platform for Internet Content Select ion, PICS).Hierarchy system on the existing Internet then is to online content classified filtering in addition automatically by the portal management person.
The 3rd class filtering technique then is an access control technology, as described in No. the 6233618th, United States Patent (USP), is to be implemented within the network equipment, for example networking doors such as agent servo, interchanger or fire wall.Analysis is from the request of user side and the information of user side, as URLs, IP address or other resources proof data, further with data bank in data compare, can avoid aforementioned first kind of technology to come the restriction of filtering web page content according to key word.
Summary of the invention
Because in the prior art, being restricted to of aforementioned first kind method: merely in the mode of keyword filtration, when also filtering out the information that other have front property meaning when filtering improper information; For example, contain with keyword filtration " property " information of this word, just may filter out the article of other subject under discussion legitimacy problems of discussion simultaneously.Second method crosses the boundary of a country owing to Internet, most flame is from external website, but because the networking is the international world of an intercommunication, therefore can't be little because of the little representative information amount of national territory, will make monitoring that difficulty is in fact probably arranged, and can expend more manpowers and cost than first method.Though and the third method can solve first method is only come the filtering web page content according to key word deficiency.But technical characterictic is to be arranged at user side, and along with the increase of tubulation information, will cause the load of user side main frame to increase and the excessive consume of memory body.
Therefore need a kind of filtering column tube information technology, the mode of can provide easier, fast, not consuming the user side resource is reached the instant comparison of webpage or website and is filtered.
The purpose of this invention is to provide a kind of method of utilizing the picture and characters identification technique filtering column tube information.
The purpose of this invention is to provide a kind of system that utilizes the picture and characters identification technique filtering column tube information.
Another object of the present invention provides a kind of method and system that utilizes the picture and characters identification technique filtering column tube information, by network address or webpage that the user retrieved are sent to the far-end servomechanism, use literal and image jointly as the information identification technology of websites collection foundation the data that searches, set up the tubulation information of immediate updating, and the tubulation information acquisition instantaneity and the correctness that make the user obtain.
Another object of the present invention provides the filtering column tube information system that compares on a kind of distal wire, wherein the filtration work of user side processing networking tubulation information can be carried out by the servomechanism of far-end on the line, and the content of the data bank immediate updating networking tubulation information of servomechanism.
Another object of the present invention provides a kind of filtering column tube information system that can promote the far-end comparison of comparison efficiency, can be by the networking tubulation information of the effective filter user end of the servomechanism of far-end on the line.
The invention provides a kind of method and system that utilizes the picture and characters identification technique filtering column tube information, mainly comprise a data bank builder that is connected with Internet; One entry network site makes the user import the key word that desire is searched webpage or website; One automatic Search engine, the relevance that sees through aforementioned key word is from relevant webpage or the web site contents of Internet retrieval, and webpage or web site contents done the identification that picture and text separate, and produce respectively with the information of picture and text as classification, and with information storage in data bank.
Wherein, the store information of data bank is the information of webpage or web site contents classification, and the classification work that information is based on the data bank builder produces.
Wherein, the script classify technology can be implemented by prior art, as Chinese file automated taxonomy (the Taiwan patent announcement is numbered No. 439042).
Wherein, the image identification technology can be implemented by prior art, as based on the interactive image search method (the Taiwan patent announcement is numbered No. 501035) of local object or intelligent colour of skin detection techniques (Intelligent Flesh-Tone Detection, IFTD).
Brief description of drawingsfig
Description of drawings
Fig. 1 utilizes the Organization Chart of the system of picture and text identification filtering column tube information for the present invention;
Fig. 2 utilizes the schematic flow sheet of the method for picture and text identification filtering column tube information for the present invention;
Fig. 3 utilizes the enforcement calcspar of the data bank builder and the data bank of picture and text identification filtering column tube information for the present invention;
Fig. 4 utilizes the process flow diagram of the method for picture and text identification filtering column tube information for the present invention.
Embodiment
Seeing also Fig. 1, is to show that the present invention utilizes the Organization Chart of the system of picture and text identification filtering column tube information.As shown in Figure 1, the present invention utilizes the system of picture and text identification filtering column tube information, comprises: a user side and a far-end servomechanism.Wherein, user side comprises: one or a plurality of browser 1, can be linked to website or webpage via Internet 6; One retrieval unit 2 is in order to the network address of retrieval user end computer desire binding website or webpage; One delivery unit 3 can transmit the network address that is retrieved; One receiving element 5 is in order to receive comparison result; One performance element 4 according to stopping of being passed back of far-end servomechanism or allow the result, moves, and when stopping, then stops the binding of user side and network address as comparison result, when comparison result when allowing, then allow user side to be linked to network address.
Please continue to consult Fig. 1, wherein, servo driver end comprises; One data bank 9 is deposited the classified information of web site contents, wherein comprises the network address of networking tubulation, a receiving element 7, the network address that is retrieved in order to reception; Whether one arithmetic element 8 is arranged in data in order to the comparison network address that is retrieved, and produces the comparison result that stops or allow; One delivery unit 11 is in order to transmit comparison result to user side; One data bank builder 10 is webpage on the access Internet and/or website, classifies according to holding within its webpage and/or the website, and its network address sorting result is updated in the data bank 9.
Wherein, data bank builder 10 further comprises an information search unit, in order to the network address and the content of automatic search tubulation; One information identification unit in conjunction with literal and image, and utilizes the classification of flexible strategy as web site contents.
Seeing also Fig. 2, and please cooperate and consult Fig. 1, is displayed map 2 is utilized the method for picture and text identification filtering column tube information for the present invention schematic flow sheet.The present invention utilizes the method for picture and text identification filtering column tube information to comprise: retrieve network address to be determined, and be sent to servomechanism; The network address of data bank in network address and the servomechanism is carried out data comparison, the comparison result that can be stopped or allow after the comparison; The far-end servomechanism is transmitted back to comparison result the ustomer premises access equipment in aforementioned source again, and the filtration work that makes user side handle networking tubulation information is to be carried out by a far-end servomechanism.
In preferred embodiment of the present invention, the ustomer premises access equipment of carrying out user side work can be agent servo, router, interchanger, fire wall, bridge or other networking door equipment.In addition, the servo work of far-end still comprises the content of immediate updating data bank, and the content of this data bank comprises pornographic, gambling, violence or specific networking tubulation information.
Wherein, the network address for utilizing information search and information identification technology to be collected in the data bank 9, and by the instant networking tubulation information of collecting on the automatic Search engine line.
Seeing also Fig. 3, is to show that the present invention utilizes the data bank builder of system of picture and text identification filtering column tube information and the enforcement calcspar of data bank.As shown in Figure 3, the system of data bank builder and data bank comprises: inlet Search engine 12 makes the user can import the key word or the network address of desire retrieval website; Key word data bank 14 is in order to deposit the relative key word of site information; Search engine network address data bank 15 is in order to deposit the relative network address of site information; Integrated Search engine 13 links by an inlet Search engine 12 and Internet 6, in order to the retrieval site information; Uncertain network address data bank 16 is in order to deposit the network address of non-tubulation information; Automatically Search engine 17 is retrieved the website data automatically in order to the network address that is transmitted according to uncertain network address data bank; Pretreatment unit 18 receives the website data that automatic Search engine is retrieved, and does literal and image classification according to web site contents; Web page contents clearing cell 19 is in order to remove the title or the annotation of web page contents; Video search unit 20 is in order to the image of searching web pages content; Script classify unit 21 receives written historical materials and classification that the web page contents clearing cell is transmitted; Image classification unit 22 receives the image data and the classification that are transmitted the video search unit; Arithmetic element 8 is done the flexible strategy computing with sorted written historical materials and image data; Computing record material storehouse 24 is in order to store the record of flexible strategy computing; Hyperlink word marking language record material storehouse 23 is in order to deposit the original shelves of webpage; Data bank 9 is in order to store the tubulation webpage.
Seeing also Fig. 4, and please cooperate and consult Fig. 1, Fig. 2 and Fig. 3, is to show that the present invention utilizes the process flow diagram of the method for picture and text identification filtering column tube information.Disclose the flow process of user side work and the flow process of the servo work of far-end at this specific embodiment of the present invention.Comprise following steps:
When reading key word and searching network address 101 users by inlet Search engine 12 input one key words or network address, inlet Search engine 12 can be sent to integrated Search engine 13 with key word or the network address of being imported.
When retrieval address 102 is imported a crucial space or network address as the user, integrated Search engine 13 will be from key word data 14 storehouses and search draw the network address of searching corresponding webpage in the sincere network address data bank 15, and the network address that is searched will be deposited in uncertain network address data bank 16 and will be sent to automatic search draw pen.
Automatically search 103 when automatic Search engine receives the network address that uncertain network address data bank 16 transmitted, will be connected to Internet 6 automatically and via the corresponding info web of Internet 6 search keys or network address.
Searching web pages content 104 these steps are info webs that step 103 is retrieved, and are sent to pretreatment unit 18, do further web page contents and handle.
Pre-treatment web page contents 105 is given step 106 and 107 with the info web of being retrieved.
The info web that retrieval image 106 is retrieved step 105 is sent to video search unit 20, and the default removing principle of foundation is eliminated literal, title and the annotation of info web, the image information of a searching web pages, and give step 108.
The info web that removing web page contents 107 is retrieved step 105 is sent to web page contents clearing cell 19, comes the image information of searching web pages information according to default retrieval principle, the word content of a searching web pages, and give step 109.
The image information that mark 108 behind the computing image is retrieved step 106 is sent to the image classification unit and does image classification, as pornographic, violence, drugs etc.Then, image information is made the mark that the flexible strategy computing calculates image information by arithmetic element 8.Wherein, mark can be a preset value, can be set in the scope, and the mark after the calculating is tubulation information greater than preset value, as not, then is non-tubulation information.Wherein, the mark behind the computing image can be further combines with mark behind the computing word content, and produces the mark that combines of an image and literal computing, the flexible strategy that are used as classifying in order to the information characteristic according to website or webpage.
The word content that mark 109 behind the computing word content is retrieved step 107 is sent to script classify unit 21, does further image classification, as pornographic, violence, drugs etc.Then, calculate the mark of image information as flexible strategy by arithmetic element 8.Wherein, mark can be a preset value, is tubulation information when greater than preset value, as not, then is non-tubulation information.Wherein, the mark behind the computing word content can be further combines with mark behind the computing image, and produces the mark that combines of an image and literal computing, the flexible strategy that are used as classifying in order to the information characteristic according to website or webpage.
Image after the weight computing and word content mark and write down 110 operation results and save as computing record 24 with step 108 and 109,23 of hyperlink word marking language records are the original shelves that store info web, and the eigenwert of tubulation Intelligence Page is stored in the data bank.Wherein, the eigenwert that is stored in the data bank will continue to be sent to automatic Search engine, whether still meet tubulation condition and normal operation in order to confirm the tubulation Intelligence Page.
After describing preferred embodiment of the present invention in detail, being familiar with this technology personage can clearly understand, can carry out various variations and modification not breaking away under following claim and the spirit, and the present invention also is not subject to the embodiment of illustrated embodiment in the instructions.
Claims (19)
1. method of utilizing picture and text identification filtering column tube information, it comprises the following steps:
Retrieve network address to be determined, and be sent to servomechanism;
The network address of data bank in network address and the servomechanism is carried out data comparison, the comparison result that can be stopped or allow after the comparison;
Comparison result is passed back;
Carry out action according to the comparison result passed back, when comparison result when stopping, then stop and the binding of network address, when comparison result during, then be linked to network address for permission;
It is characterized in that servomechanism is positioned at far-end; And
The data bank content is to utilize automatic Search engine to grasp the information search technology of website data automatically, uses literal and image jointly as the information identification technology of websites collection foundation the information that searches, and sets up the tubulation information of immediate updating.
2. the method for utilizing picture and text identification filtering column tube information as claimed in claim 1 is characterized in that, wherein retrieval actions is a plurality of network address of retrieval.
3. the method for utilizing picture and text identification filtering column tube information as claimed in claim 1 is characterized in that, wherein tubulation information is webpage and/or the website for the desire management.
4. the method for utilizing picture and text identification filtering column tube information as claimed in claim 1 is characterized in that, wherein tubulation information is the webpage and the website of pornographic, gambling, violence, drugs and/or firearms.
5. the method for utilizing picture and text identification filtering column tube information as claimed in claim 1 is characterized in that, wherein the information search technology is for grasping the technology and the integrated Search engine of web page contents automatically, the technology of utilizing key word commonly used to search.
6. the method for utilizing picture and text identification filtering column tube information as claimed in claim 1, it is characterized in that, wherein the information identification technology be for Chinese file sorting technique or other similar script classify technology, image recognition technology or other similar image classification technology, in conjunction with literal and image jointly as the technology of websites collection.
7. system that utilizes the picture and characters identification technique filtering column tube information, it comprises:
One user end apparatus comprises:
One or a plurality of browser can be linked to website or webpage via Internet;
One retrieval unit is in order to the network address of retrieval user end computer desire binding website or webpage;
One delivery unit can transmit the network address that is retrieved;
One receiving element can receive comparison result;
One performance element can or allow the result according to stopping of being passed back by servomechanism, move, when comparison result when stopping, then stop and the binding of network address, when comparison result when allowing, then be linked to network address; And
One far-end servomechanism comprises:
One data bank contains the network address of networking tubulation;
One information search unit is in order to the network address and the content of automatic search tubulation;
One information identification unit in conjunction with literal and image, utilizes the classification of flexible strategy as web site contents;
One receiving element, the network address that is retrieved in order to reception;
Whether one arithmetic element is arranged in data bank, the comparison result that is stopped or allow in order to the comparison network address that is retrieved;
One delivery unit is in order to transmit comparison result to user side;
It is characterized in that servomechanism is positioned at far-end; And
The content of data bank is to utilize the information search technology of automatic Search engine extracting website data, and uses literal and image jointly as the information identification technology of websites collection foundation the data that searches, and sets up the tubulation information of immediate updating.
8. the system that utilizes picture and text identification filtering column tube information as claimed in claim 7 is characterized in that, wherein retrieval actions is a plurality of network address of retrieval.
9. the system that utilizes picture and text identification filtering column tube information as claimed in claim 7 is characterized in that, wherein tubulation information is webpage and/or the website for institute's desire management.
10. the system that utilizes picture and text identification filtering column tube information as claimed in claim 9 is characterized in that, wherein the webpage of desire management and website are webpage and the websites for pornographic, gambling, violence, drugs and/or firearms.
11. the system that utilizes picture and text identification filtering column tube information as claimed in claim 7 is characterized in that, wherein the information search technology is for grasping the technology and the integrated Search engine of web page contents automatically, the technology of utilizing key word commonly used to search.
12. the system that utilizes picture and text identification filtering column tube information as claimed in claim 7, it is characterized in that, wherein the information identification technology be for Chinese file sorting technique or other similar script classify technology, image recognition technology or other similar image classification technology, in conjunction with literal and image jointly as the technology of websites collection.
13. a system that utilizes the picture and characters identification technique filtering column tube information, it comprises:
One user end apparatus comprises:
One or a plurality of browser can be linked to website or webpage via Internet;
One retrieval unit is in order to the network address of retrieval user end computer desire binding website or webpage;
One delivery unit can transmit the network address that is retrieved;
One receiving element can receive comparison result;
One performance element can or allow the result to move according to stopping of being passed back by servomechanism, when comparison result when stopping, stop that then website or web page contents show, when comparison result when allowing, allow website or web page contents to show; And
One far-end servomechanism comprises:
One data bank is deposited the classified information of network address;
One data bank builder is searched network address automatically and is held within linking, and content-based literal and image be with the classification of flexible strategy as network address, and the updatedb content;
One receiving element, the network address that is retrieved in order to reception;
Whether one arithmetic element is arranged in data bank, the comparison result that is stopped or allow in order to the comparison network address that is retrieved;
One delivery unit is in order to transmit comparison result to user side;
It is characterized in that servomechanism is positioned at far-end; And
The data bank builder comprises the information search technology that automatic Search engine grasps the website data, and uses literal and image jointly as the information identification technology of websites collection foundation the data that searches.
14. the system that utilizes picture and text identification filtering column tube information as claimed in claim 13 is characterized in that, data bank builder wherein comprises:
One information search unit is in order to the network address of automatic search tubulation information and the content of network address binding; And
One information identification unit, literal, image or both combinations based on linking content utilize the classification of flexible strategy as network address.
15. the system that utilizes picture and text identification filtering column tube information as claimed in claim 13 is characterized in that, wherein retrieval actions is a plurality of network address of retrieval.
16. the system that utilizes picture and text identification filtering column tube information as claimed in claim 13 is characterized in that, wherein tubulation information is webpage and/or the website for institute's desire management.
17. the system that utilizes picture and text identification filtering column tube information as claimed in claim 16 is characterized in that, wherein desire about webpage of management and website are webpage and the websites for pornographic, gambling, violence, drugs and/or firearms.
18. the system that utilizes picture and text identification filtering column tube information as claimed in claim 13 is characterized in that, wherein the information search technology is for grasping the technology and the integrated Search engine of web page contents automatically, the technology of utilizing key word commonly used to search.
19. the system that utilizes picture and text identification filtering column tube information as claimed in claim 13, it is characterized in that, wherein the information identification technology be for Chinese file sorting technique or other similar script classify technology, image recognition technology or other similar image classification technology, in conjunction with literal and image jointly as the technology of network address classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA031101046A CN1536510A (en) | 2003-04-10 | 2003-04-10 | Method and system for filtering column tube information by utilizing picture and characters identification technique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA031101046A CN1536510A (en) | 2003-04-10 | 2003-04-10 | Method and system for filtering column tube information by utilizing picture and characters identification technique |
Publications (1)
Publication Number | Publication Date |
---|---|
CN1536510A true CN1536510A (en) | 2004-10-13 |
Family
ID=34319627
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA031101046A Pending CN1536510A (en) | 2003-04-10 | 2003-04-10 | Method and system for filtering column tube information by utilizing picture and characters identification technique |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1536510A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1319331C (en) * | 2004-11-25 | 2007-05-30 | 刘文印 | Method and system for detecting and identifying counterfeit web page |
CN101504650B (en) * | 2009-01-15 | 2010-12-22 | 北京傲游天下科技有限公司 | Intelligent network rendering engine switching method |
-
2003
- 2003-04-10 CN CNA031101046A patent/CN1536510A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1319331C (en) * | 2004-11-25 | 2007-05-30 | 刘文印 | Method and system for detecting and identifying counterfeit web page |
CN101504650B (en) * | 2009-01-15 | 2010-12-22 | 北京傲游天下科技有限公司 | Intelligent network rendering engine switching method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jackoway et al. | Identification of live news events using Twitter | |
US8402021B2 (en) | Providing posts to discussion threads in response to a search query | |
US8135739B2 (en) | Online relevance engine | |
US20120030152A1 (en) | Ranking entity facets using user-click feedback | |
CN111538931B (en) | Public opinion monitoring method and device based on big data, computer equipment and medium | |
US8712999B2 (en) | Systems and methods for online search recirculation and query categorization | |
CN105718590A (en) | Multi-tenant oriented SaaS public opinion monitoring system and method | |
CN106383887A (en) | Environment-friendly news data acquisition and recommendation display method and system | |
Liu et al. | Identifying web spam with user behavior analysis | |
CN101261629A (en) | Specific information searching method based on automatic classification technology | |
US20080147631A1 (en) | Method and system for collecting and retrieving information from web sites | |
CN103914538B (en) | theme capturing method based on anchor text context and link analysis | |
CN106649498A (en) | Network public opinion analysis system based on crawler and text clustering analysis | |
CN109284441B (en) | Dynamic self-adaptive network sensitive information detection method and device | |
KR20030069640A (en) | System and method for geting information on hierarchical and conceptual clustering | |
Cohen et al. | Learning to understand the web | |
WO2017179778A1 (en) | Search method and apparatus using big data | |
CN1536510A (en) | Method and system for filtering column tube information by utilizing picture and characters identification technique | |
US20190026370A1 (en) | System and Method for Categorizing Web Search Results | |
Liu et al. | User behavior oriented web spam detection | |
Schopman et al. | Deriving concept mappings through instance mappings | |
Smith | Does metadata count? A Webometric investigation | |
Patil et al. | The Role of Web Content Mining and Web Usage Mining in Improving Search Result Delivery | |
Xu et al. | Analysis and Design of Improved Intelligent Search Strategy for Web Crawler | |
WO2004088542A1 (en) | A method of managing registered web sites in search engine and a system thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |