WO2004109588A1 - Methods and systems for training content filters and resolving uncertainty in content filtering operations - Google Patents
Methods and systems for training content filters and resolving uncertainty in content filtering operations Download PDFInfo
- Publication number
- WO2004109588A1 WO2004109588A1 PCT/US2004/017575 US2004017575W WO2004109588A1 WO 2004109588 A1 WO2004109588 A1 WO 2004109588A1 US 2004017575 W US2004017575 W US 2004017575W WO 2004109588 A1 WO2004109588 A1 WO 2004109588A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- filters
- relationships
- filter
- uncertainty
- results
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0227—Filtering policies
- H04L63/0263—Rule management
Definitions
- the present invention relates to computer filters and, more particularly, to methods and systems for resolving non-classifiable information in filtering operations.
- a filter assists the user to efficiently process and organize large amounts of information.
- a filter is a program code that examines information for certain qualifying criteria and classifies the information accordingly.
- a picture filter is a program used to detect and categorize faces (e.g., categories include happy facial expressions, sad facial expressions, etc.) in photographs.
- the problem with filters is that the filters sometimes cannot categorize certain information because the filters are not programmed to consider that particular information.
- the picture filter described above is trained to recognize and categorize happy facial expressions and sad facial expressions only. If a photograph of a frustrated facial expression is provided to the picture filter, the picture filter cannot classify the frustrated facial expression because the picture filter is trained to recognize happy and sad facial expressions only.
- the present invention fills these needs by providing methods and systems for resolving uncertainty resulting from content filtering operations. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, computer readable media, or a device. Several inventive embodiments of the present invention are described below.
- a method for resolving uncertainty resulting from content filtering operations is provided.
- data is first received and processed through a plurality of filters.
- Each of the plurality of filters is capable of producing results, the results including classification of the filtered data and identification of uncertainty in the classification.
- the results from each of the plurality of filters are processed and the processing of the results is configured to produce relationships between the plurality of filters.
- the produced relationships are applied back to any one of the plurality of filters that produced the results that included identification of uncertainty in the classification. The application of the produced relationships is used to resolve the identification of uncertainty.
- a computer readable medium having program instructions for resolving uncertainty resulting from content filtering operations.
- This computer readable medium provides program instructions for receiving results produced by a plurality of filters.
- the results include classification of filtered data and identification of uncertainty in the classification.
- the computer readable medium provides program instructions for establishing relationships between the plurality of filters and program instructions for applying the relationships. The application of the relationships enables the identification of uncertainty to be resolved.
- a system for resolving uncertainty resulting from content filtering operations includes a memory for storing a relationship processing engine and a central processing unit for executing the relationship processing engine stored in the memory.
- the relationship processing engine includes logic for receiving results produced by a plurality of filters, the results including classification of filtered data and identification of uncertainty in the classification; logic for establishing relationships between the plurality of filters; and logic for applying the relationships, the application of the relationships enabling the identification of uncertainty to be resolved.
- a system for resolving uncertainty resulting from content filtering operations includes a plurality of filtering means for processing data whereby each of the plurality of filtering means is capable of producing results.
- the results include classification of the filtered data and identification of uncertainty in the classification.
- the system additionally includes relationship processing means for processing the results from each of the plurality of filtering means. Additionally, the relationship processing means applies the produced relationships back to any one of the plurality of filtering means that produced the results that included identification of uncertainty in the classification.
- the processing of the results is configured to produce relationships between the plurality of filtering means and the application of the produced relationships is used to resolve the identification of uncertainty.
- Figure 1 is a simplified block diagram of a filter, in accordance with one embodiment of the present invention.
- Figure 2 is a simplified block diagram of a system for resolving the uncertainty resulting from content filtering operations, in accordance with one embodiment of the present invention.
- Figure 3 is a flowchart diagram of a high level overview of the method operations for resolving uncertainty resulting from content filtering operations, in accordance with one embodiment of the present invention.
- Figure 4 is a flowchart diagram of the detailed method operations for resolving uncertainty resulting from content filtering operations, in accordance with one embodiment of the present invention.
- FIG. 5 is a simplified diagram of an exemplary graphic user interface (GUI) that allows a user to manually establish relationships, in accordance with one embodiment of the present invention.
- GUI graphic user interface
- Figure 6A is a simplified block diagram of an exemplary processing of results and production of relationships, in accordance with one embodiment of the present invention.
- Figure 6B is a flowchart diagram of an exemplary processing of results and application of the relationships produced in Figure 6A, in accordance with one embodiment of the present invention.
- Filters cannot classify certain data and the embodiments described herein provide methods and systems for resolving the uncertainty in the classification of data.
- the uncertainty in the classification is resolved by using relationships between the filters.
- a computer automatically produces the relationships between the filters.
- a user manually specifies to the computer the relationships between the filters.
- Figure 1 is a simplified block diagram of a filter, in accordance with one embodiment of the present invention.
- filter 102 is a program code that examines data 104 for certain qualifying criteria and classifies the data accordingly.
- a spam email filter is a program used to detect unsolicited emails and to prevent the unsolicited emails from getting to a user's email inbox.
- the spam email filter looks for certain qualifying criteria on which the spam email filter bases its judgments. For instance, a simple version of the spam email filter is programmed to watch for particular words in a subject line of email messages and to exclude email with the particular words from the user's email inbox. More sophisticated spam email filters, such as Bayesian filters and other heuristic filters, attempt to identify spam email through suspicious word patterns or word frequency.
- filters include email filters that identify spam, personal mail, or classify mail by subject; filters that find and identify faces or specific objects (e.g., cars, houses, etc.) in pictures; filters that listen to music and identify the title of the song, group, etc.; filters that identify a type of web page such as a blog, a news page, a weather page, a financial page, a magazine page, etc.; filters that identify the person speaking in an audio recording; filters that identify spelling errors in text documents; and filters that identify the subjects/topics of a text document.
- filter 102 processes both data 104 and filter rules 106 to produce results 112. In other words, filter 102 examines data 104 for certain qualifying criteria and classifies the data accordingly.
- Data 104 are numerical or any other information represented in a form suitable for processing by a computer.
- Exemplary data 104 include email messages, program files, picture files, sounds files, movie files, web pages, word processing texts, etc. Additionally, data 104 may be received from any suitable source.
- Exemplary sources include networks (e.g., the Internet, local-area networks (LAN), wide-area networks (WAN), etc.), programs (e.g., video games, a work processors, drawing programs, etc.), databases, etc.
- Filter rules 106 are instructions that specify procedures to process data 104 and specify what data are allowed or rejected.
- a filter rule for the spam email filter discussed above specifies the examination of particular words in the subject lines of email messages and the exclusion of emails with the particular words in their subject lines.
- Results 112 include classifiable data 108 and data with uncertain classification 110.
- Classifiable data 108 are data particularly considered by filter rules 106.
- an exemplary filter rule for the spam email filter discussed above specifies the inclusion of emails with a particular word "dear" in the subject lines. Such emails are classified as non- spam. However, emails with a particular word "purchase” in the subject lines are classified as spam and excluded. Since emails with the particular words "dear” and "purchase” in the subject lines are particularly considered by filter rules 106, all emails with the particular words “dear” and "purchase” in the subject lines are classifiable data 108.
- data with uncertain classification 110 are data not particularly considered by filter rules 106.
- data with uncertain classification 110 are non-classifiable data.
- the above-discussed exemplary filter rule considers the particular words “dear” and "purchase” in the subject lines. Email messages without the particular words “dear” and “purchase” in the subject lines cannot be classified by filter 102 as spam or non-spam. Therefore, email messages without the particular words "dear” and "purchase” in the subject lines are data with uncertain classification 110.
- Figure 2 is a simplified block diagram of a system for resolving the uncertainty resulting from content filtering operations, in accordance with one embodiment of the present invention.
- the system includes spam email filter 202, picture filter 270, music filter 272, personal email filter 274, and relationship processing engine 260.
- Filters 202, 270, 272, and 274 process both data 104 and filter rules 210, 280, 282, and 284 to produce results 250, 252, 254, and 256.
- results 250, 252, 254, and 256 are provided 205 to relationship processing engine 260.
- results 250, 252, 254, and 256 are stored in a database such that the results may be searchable.
- relationship processor 220 included in relationship processing engine 260 processes results 250, 252, 254, and 256 from filters 202, 270, 272, and 274 to produce relationships between the filters.
- Figure 2 shows four filters 202, 270, 272, and 274, relationship processor 220 can process any number of filters.
- the produced relationships are relationship rules 222 between results 250, 252, 254, and 256.
- relationship rules 222 are manually established by a user.
- relationship rules 222 are automatically determined by relationship processing engine 260.
- relationship processing engine 260 records a sequence of user actions made when interfacing with filters 202, 270, 272, and 274.
- Exemplary user actions include deleting certain emails, consistently rejecting certain pictures, moving certain messages to one category, consistently classifying certain emails, etc.
- Such user actions may form relationship patterns and relationship processor 220 automatically recognizes these relationship patterns between filters 202, 270, 272, and 274 to enable relationships between the filters to be established automatically.
- relationship processor 220 formulates and stores the relationships as relationship rules 111. Relationship processor 220 then automatically resolves the identity of data with uncertain classification by applying the relationships. Thereafter, relationship processing engine 250 applies the resolved identity in the classification back 206 to any one of filters 202, 270, 272, and 274 that produced results 250, 252, 254, and 256 that included the data with uncertain classification.
- Figure 3 is a flowchart diagram of a high level overview of the method operations for resolving uncertainty resulting from content filtering operations, in accordance with one embodiment of the present invention. Starting in operation 310, filters, which may be designed to classify data in different ways, receive data and, in operation 312, process the data to produce results. The results include classification of the filtered data and identification of filtered data with uncertain classification.
- a relationship processing engine processes the results produced by each of the filters to produce relationships between the filters in operation 316.
- the produced relationships are then applied back to any one of the filters that produced the results that included the identification of uncertainty in the classification.
- the application of the produced relationships is used to resolve the identification of uncertainty.
- FIG. 4 is a flowchart diagram of the detailed method operations for resolving uncertainty resulting from content filtering operations, in accordance with one embodiment of the present invention.
- filters process both data and filter rules to produce results.
- Results include classifiable data and data with uncertain classification.
- the filtered data with uncertain classification are then read from the results. Any existing relationships between the filters are first checked in operation 414. If there are relevant, existing relationships between the filters, the relationship rules are read in operation 416 and applied in operation 418 to resolve the identification of the uncertainty.
- the relationships are automatically established in operation 424.
- the relationships may be automatically produced by analyzing user actions. Thereafter, in operation 426, a user is asked to confirm the automatically produced relationships. If the user confirms that the automatically produced relationships are correct, then the relationship rules are applied in operation 418 to resolve the identification of the uncertainty. However, if the user specifies that the automatically produced relationships are incorrect, then the user is given an option to manually establish the relationships in operation 428. After the user manually establishes the relationships, the relationships are formulated into relationship rules. The relationship rules are then applied in operation 418 to resolve the identification of uncertainty.
- the resolved identity in the classification is applied back to the filters in operation 422.
- a check is then conducted in operation 420 to determine whether any data with uncertain classification remain. If there are additional data with uncertain classification, then the operations described above are again repeated starting in operation 412. Else, the method operation ends.
- FIG. 5 is a simplified diagram of an exemplary graphic user interface (GUI) that allows a user to manually establish relationships, in accordance with one embodiment of the present invention.
- GUI graphic user interface
- a user may be asked to confirm the automatically produced relationships.
- the user browses web page 802 at web address "www.wired.com.”
- Web page 802 is processed through a variety of filters and a relationship processing engine processes the results, produces relationships between the filters, and applies the produced relationships to resolve the identification of the web page's category.
- the relationship processing engine automatically determines that web page 802 belongs to news, computers, and technology categories and consequently, displays a popup menu region 804 listing the categories of the web page.
- pop-up menu region 804 also allows the user to manually establish the relationships between the filters.
- the user may manually establish the relationships by checking or unchecking each box 806 corresponding to each category. The user simply checks box 806 next to the corresponding category to indicate that web page 802 belongs to the referenced category. Alternatively, the user may uncheck the category to indicate that web page 802 does not belong to the referenced category.
- pop-up menu region 804 allows the user to confirm that the automatically established relationships are correct and, if not correct, then manually establish the relationships.
- any number of suitable layouts can be designed for region layouts illustrated above as Figure 5 does not represent all possible layout options available.
- the displayable appearance of the regions can be defined by any suitable geometric shape (e.g., rectangle, square, circle, triangle, etc.), alphanumeric character (e.g., A,v,t,Q,l,9,10, etc.), symbol (e.g., $,*,@, ⁇ , $,- ⁇ -,V, etc.), shading, pattern (e.g., solid, hatch, stripes, dots, etc.), and color.
- pop-up menu region 804 in Figure 5 may be omitted or dynamically assigned.
- the regions can be fixed or customizable.
- the computing devices may have a fixed set of layouts, utilize a defined protocol or language to define a layout, or an external structure can be reported to a computing device that defines a layout.
- FIG. 6A is a simplified block diagram of an exemplary processing of results and production of relationships, in accordance with one embodiment of the present invention.
- the exemplary system includes spam email filter 202, personal email filter 274, relationship processing engine 260, and monitor 502.
- Spam filters 202 and personal email filter 274 process Email A 506 and filter rules 210 and 284 to produce results 250 and 256.
- Email A 506 is a personal email and, as a result, personal email filter 274 correctly classifies Email A 506 as personal email.
- spam email filter 202 is uncertain in the classification of Email A 506 because personal email is not considered by filter rule 210 of the spam email filter.
- spam email filter 202 cannot classify Email A 506 and results 250 produced by the spam email filter identifies Email A with uncertain classification.
- Relationship processing engine 260 then processes results 250 and 256 to establish one or more relationships between spam email filter 202 and personal email filter 274.
- a user manually establishes the relationships.
- relationship processing engine 260 asks the user whether personal email is equal to spam email. The user manually specifies that personal email is not equal to spam email.
- relationship processor 220 processes the user's input and results 250 and 256 to produce relationship rule 504 that personal email is not equal to spam email.
- FIG. 6B is a flowchart diagram of an exemplary processing of results and application of the relationships produced in Figure 6A, in accordance with one embodiment of the present invention.
- both spam email filter and personal email filter discussed above in Figure 6A receive an Email B, in operation 604, and process the Email B to produce results.
- spam email filter is uncertain as to the classification of Email B and, as such, a relationship processing engine further processes the results from spam email filter and personal email filter to resolve the classification of Email B.
- the relationship processing engine determines that an existing relationship between spam email filter and personal email filter exists, which was previously established in the discussion of Figure 6A, and retrieves the existing relationship in operation 606. According to the previously established relationship rule, personal email is not spam email.
- a check is conducted in operation 608 to determine whether Email B is classified as personal email.
- the particular relationship rule does not consider non-personal emails.
- the relationship processing engine in operation 614 prompts the user to manually establish any additional relationships between spam email filter and personal email filter to resolve the classification of Email B, in accordance with one embodiment of the present invention.
- the relationship processing engine may produce the relationships automatically. If no additional relationships are established, then the classification of Email B with respect to the spam email filter remains unresolved.
- Email B is classified as personal email
- the relationship rule is applied to Email B in operation 610.
- Email B is classified as non- spam email because, as discussed above, the previously established relationship rule specifies that personal email is not spam email.
- the resolved classification of Email B is then applied back to the spam filter in operation 616.
- the above described invention provides methods and systems for training filters and resolving non-classifiable information in filtering operations.
- the uncertainties in classification are resolved by looking at additional relationships between filters.
- the result of utilizing relationships between the filters allows the filters to interact with one another.
- a system includes email filters to identify mail from family members and face recognition filters to recognize family members' faces in pictures.
- the relationships between filters allow the grouping of family members in pictures with the family member's email. For instance, pictures of family members taken at various gatherings are scanned into a computer. Some of these pictures are naturally group photos containing most of, or the whole, family, and the computer would realize that there are certain pictures that always contain the same set of faces.
- the computer may then show a user these pictures and ask if the user wants to put these pictures in a new category.
- the computer looks at other content (e.g., email, videos, audio, etc.) with the assistance of filters and automatically adds any of these contents that contain the family members to the new "whole family" category.
- the classified categories may be sent to an Internet search engine to find related content.
- the invention also relates to a device or an apparatus for performing these operations.
- the apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer.
- various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
- the invention can also be embodied as computer readable code on a computer readable medium.
- the computer readable medium is any data storage device that can store data which can be thereafter read by a computer system.
- the computer readable medium also includes an electromagnetic carrier wave in which the computer code is embodied. Examples of the computer readable medium include hard drives, network attached storage (NAS), readonly memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, and other optical and non-optical data storage devices.
- the computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP04754228A EP1649407A1 (en) | 2003-06-04 | 2004-06-02 | Methods and systems for training content filters and resolving uncertainty in content filtering operations |
JP2006515150A JP2007537497A (en) | 2003-06-04 | 2004-06-02 | Method and system for training content filters and resolving uncertainty in content filtering operations |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US47608403P | 2003-06-04 | 2003-06-04 | |
US60/476,084 | 2003-06-04 | ||
US10/856,216 | 2004-05-27 | ||
US10/856,216 US20050015452A1 (en) | 2003-06-04 | 2004-05-27 | Methods and systems for training content filters and resolving uncertainty in content filtering operations |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2004109588A1 true WO2004109588A1 (en) | 2004-12-16 |
Family
ID=33514067
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2004/017575 WO2004109588A1 (en) | 2003-06-04 | 2004-06-02 | Methods and systems for training content filters and resolving uncertainty in content filtering operations |
Country Status (6)
Country | Link |
---|---|
US (1) | US20050015452A1 (en) |
EP (1) | EP1649407A1 (en) |
JP (1) | JP2007537497A (en) |
KR (1) | KR20060017534A (en) |
TW (1) | TW200513873A (en) |
WO (1) | WO2004109588A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009005964A2 (en) * | 2007-06-29 | 2009-01-08 | Microsoft Corporation | Content-based tagging of rss feeds and e-mail |
CN100456755C (en) * | 2006-08-31 | 2009-01-28 | 华为技术有限公司 | Method and device for filtering message |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8166406B1 (en) | 2001-12-04 | 2012-04-24 | Microsoft Corporation | Internet privacy user interface |
US7058652B2 (en) * | 2002-08-15 | 2006-06-06 | General Electric Capital Corporation | Method and system for event phrase identification |
US8214437B1 (en) * | 2003-07-21 | 2012-07-03 | Aol Inc. | Online adaptive filtering of messages |
US7693945B1 (en) * | 2004-06-30 | 2010-04-06 | Google Inc. | System for reclassification of electronic messages in a spam filtering system |
US8495144B1 (en) * | 2004-10-06 | 2013-07-23 | Trend Micro Incorporated | Techniques for identifying spam e-mail |
US20070011665A1 (en) * | 2005-06-21 | 2007-01-11 | Microsoft Corporation | Content syndication platform |
US8074272B2 (en) | 2005-07-07 | 2011-12-06 | Microsoft Corporation | Browser security notification |
US7865830B2 (en) * | 2005-07-12 | 2011-01-04 | Microsoft Corporation | Feed and email content |
US7831547B2 (en) | 2005-07-12 | 2010-11-09 | Microsoft Corporation | Searching and browsing URLs and URL history |
US7813482B2 (en) * | 2005-12-12 | 2010-10-12 | International Business Machines Corporation | Internet telephone voice mail management |
US7979803B2 (en) | 2006-03-06 | 2011-07-12 | Microsoft Corporation | RSS hostable control |
US8706820B2 (en) * | 2008-02-08 | 2014-04-22 | Microsoft Corporation | Rules extensibility engine |
US8700913B1 (en) | 2011-09-23 | 2014-04-15 | Trend Micro Incorporated | Detection of fake antivirus in computers |
US9179341B2 (en) | 2013-03-15 | 2015-11-03 | Sony Computer Entertainment Inc. | Method and system for simplifying WiFi setup for best performance |
US20150012597A1 (en) * | 2013-07-03 | 2015-01-08 | International Business Machines Corporation | Retroactive management of messages |
US10824666B2 (en) * | 2013-10-10 | 2020-11-03 | Aura Home, Inc. | Automated routing and display of community photographs in digital picture frames |
US11669562B2 (en) | 2013-10-10 | 2023-06-06 | Aura Home, Inc. | Method of clustering photos for digital picture frames with split screen display |
US10778618B2 (en) * | 2014-01-09 | 2020-09-15 | Oath Inc. | Method and system for classifying man vs. machine generated e-mail |
US20160314184A1 (en) * | 2015-04-27 | 2016-10-27 | Google Inc. | Classifying documents by cluster |
US10778633B2 (en) | 2016-09-23 | 2020-09-15 | Apple Inc. | Differential privacy for message text content mining |
US10565229B2 (en) | 2018-05-24 | 2020-02-18 | People.ai, Inc. | Systems and methods for matching electronic activities directly to record objects of systems of record |
US11924297B2 (en) | 2018-05-24 | 2024-03-05 | People.ai, Inc. | Systems and methods for generating a filtered data set |
US11463441B2 (en) | 2018-05-24 | 2022-10-04 | People.ai, Inc. | Systems and methods for managing the generation or deletion of record objects based on electronic activities and communication policies |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6161130A (en) * | 1998-06-23 | 2000-12-12 | Microsoft Corporation | Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6199102B1 (en) * | 1997-08-26 | 2001-03-06 | Christopher Alan Cobb | Method and system for filtering electronic messages |
US6393465B2 (en) * | 1997-11-25 | 2002-05-21 | Nixmail Corporation | Junk electronic mail detector and eliminator |
US6023723A (en) * | 1997-12-22 | 2000-02-08 | Accepted Marketing, Inc. | Method and system for filtering unwanted junk e-mail utilizing a plurality of filtering mechanisms |
US20040019651A1 (en) * | 2002-07-29 | 2004-01-29 | Andaker Kristian L. M. | Categorizing electronic messages based on collaborative feedback |
US20040083270A1 (en) * | 2002-10-23 | 2004-04-29 | David Heckerman | Method and system for identifying junk e-mail |
US7320020B2 (en) * | 2003-04-17 | 2008-01-15 | The Go Daddy Group, Inc. | Mail server probability spam filter |
-
2004
- 2004-05-27 US US10/856,216 patent/US20050015452A1/en not_active Abandoned
- 2004-06-02 WO PCT/US2004/017575 patent/WO2004109588A1/en active Application Filing
- 2004-06-02 EP EP04754228A patent/EP1649407A1/en not_active Withdrawn
- 2004-06-02 JP JP2006515150A patent/JP2007537497A/en active Pending
- 2004-06-02 TW TW093115823A patent/TW200513873A/en unknown
- 2004-06-02 KR KR1020057023296A patent/KR20060017534A/en not_active Application Discontinuation
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6161130A (en) * | 1998-06-23 | 2000-12-12 | Microsoft Corporation | Technique which utilizes a probabilistic classifier to detect "junk" e-mail by automatically updating a training and re-training the classifier based on the updated training set |
Non-Patent Citations (2)
Title |
---|
COHEN W W: "Learning Rules that classify E-mail", MACHINE LEARNING IN INFORMATION ACCESS, PAPERS FROM THE AAAI SYMPOSIUM, TECHNICAL REPORT, XX, XX, 1 May 1996 (1996-05-01), pages 1 - 8,INTERNET, XP002081149 * |
SVETLANA KIRITCHENKO, STAN MATWIN: "Email Classification with Co-Training", IBM CENTRE FOR ADVANCED STUDIES CONFERENCE, PROCEEDINGS OF THE 2001 CONFERENCE OF THE CENTRE FOR ADVANCED STUDIES ON COLLABORATIVE RESEARCH, 5 November 2001 (2001-11-05), IBM PRESS, TORONTO, ONTARIO, CANADA, pages 1 - 10, XP002299408, Retrieved from the Internet <URL:http://portal.acm.org/citation.cfm?id=782104> [retrieved on 20041004] * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100456755C (en) * | 2006-08-31 | 2009-01-28 | 华为技术有限公司 | Method and device for filtering message |
WO2009005964A2 (en) * | 2007-06-29 | 2009-01-08 | Microsoft Corporation | Content-based tagging of rss feeds and e-mail |
WO2009005964A3 (en) * | 2007-06-29 | 2009-03-12 | Microsoft Corp | Content-based tagging of rss feeds and e-mail |
US8239460B2 (en) | 2007-06-29 | 2012-08-07 | Microsoft Corporation | Content-based tagging of RSS feeds and E-mail |
Also Published As
Publication number | Publication date |
---|---|
EP1649407A1 (en) | 2006-04-26 |
JP2007537497A (en) | 2007-12-20 |
US20050015452A1 (en) | 2005-01-20 |
TW200513873A (en) | 2005-04-16 |
KR20060017534A (en) | 2006-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20050015452A1 (en) | Methods and systems for training content filters and resolving uncertainty in content filtering operations | |
US10445351B2 (en) | Customer support solution recommendation system | |
US7765212B2 (en) | Automatic organization of documents through email clustering | |
US7899769B2 (en) | Method for identifying emerging issues from textual customer feedback | |
CN102710537B (en) | Visual style for the trust classification of message | |
Kestemont et al. | Cross-genre authorship verification using unmasking | |
US8825672B1 (en) | System and method for determining originality of data content | |
US20180211260A1 (en) | Model-based routing and prioritization of customer support tickets | |
CN105095288B (en) | Data analysis method and data analysis device | |
US20060282442A1 (en) | Method of learning associations between documents and data sets | |
Kim et al. | Deep semantic frame-based deceptive opinion spam analysis | |
US9697246B1 (en) | Themes surfacing for communication data analysis | |
US11416907B2 (en) | Unbiased search and user feedback analytics | |
US20230038793A1 (en) | Automatic document classification | |
JP5692074B2 (en) | Information classification apparatus, information classification method, and program | |
US20100169318A1 (en) | Contextual representations from data streams | |
EP3928221A1 (en) | System and method for text categorization and sentiment analysis | |
CN110990587A (en) | Enterprise relation discovery method and system based on topic model | |
CN111191046A (en) | Method, device, computer storage medium and terminal for realizing information search | |
US20160034509A1 (en) | 3d analytics | |
Wang et al. | Opinion Analysis and Organization of Mobile Application User Reviews. | |
Sonbhadra et al. | Email classification via intention-based segmentation | |
CN111221978A (en) | Method and device for constructing knowledge graph, computer storage medium and terminal | |
Preeti | Review on Text Mining: Techniques, Applications and Issues | |
OJO et al. | SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200480022156.4 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006515150 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020057023296 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2004754228 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1020057023296 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2004754228 Country of ref document: EP |