EP2084620A1 - Processeur de documents et procédé associé - Google Patents
Processeur de documents et procédé associéInfo
- Publication number
- EP2084620A1 EP2084620A1 EP07718688A EP07718688A EP2084620A1 EP 2084620 A1 EP2084620 A1 EP 2084620A1 EP 07718688 A EP07718688 A EP 07718688A EP 07718688 A EP07718688 A EP 07718688A EP 2084620 A1 EP2084620 A1 EP 2084620A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- text
- author
- analysis
- trait
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/107—Computer-aided management of electronic mailing [e-mailing]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/131—Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
Definitions
- the present invention relates to a method and apparatus for processing documents.
- Embodiments of the present invention find application, though not exclusively, in the field of computational text processing, which is also known in some contexts as natural language processing, human language technology or computational linguistics.
- the outputs of some preferred embodiments of the invention may be used in a wide range of computing tasks such as automatic email categorization techniques, sentiment analysis, author attribution, and the like.
- a computer implemented method of processing a digitally encoded document having text composed by an author including the steps of: using a processor to analyse segmentation of the text and storing results of said segmentation analysis in a digitally accessible format; using a processor to analyse punctuation of the text and storing results of said punctuation analysis in a digitally accessible format; using a processor to linguistically analyse the text and storing results of said linguistic analysis in a digitally accessible format; and predicting an author trait using a machine learning system that is adapted to receive the results of said linguistic analysis, said segmentation analysis and said punctuation analysis as input, said machine learning system having been trained to process said input so as to output at least one predicted author trait.
- the linguistic analysis includes identification of predefined words and phrases in the text and the words and phrases may include any one or more of the following types: peoples' names, locations, dates, times, organizations, currency, uniform resource locators (URL's), email addresses, addresses, organizational descriptors, phone numbers, typical greetings and/or typical farewells.
- URL's uniform resource locators
- a preferred embodiment makes use of a database of words and phrases of these types.
- the segmentation analysis includes an analysis of the paragraph and sentence segmentation used in the text.
- results of said linguistic analysis, said segmentation analysis and said punctuation analysis are represented by one or more data structures associated with the document.
- data structures are feature vectors.
- the document is, or includes, any one of: an email; text sourced from an email; data sourced from a digital source; text sourced from an online newsgroup discussion; text sourced from a multiuser online chat session; a digitized facsimile; an SMS message; text sourced from an instant messaging communication session; a scanned document; text sourced by means of optical character recognition; text sourced from a file attached to an email; text sourced from a digital file; a word processor created file; a text file; or text sourced from a web site.
- the analysed email document 2 is saved into the memory of the computer 51 in a digitally accessible format in an annotation repository 8, which resides on the database server 54.
- an annotation repository 8 which resides on the database server 54.
- many other means for recording the results of the segmentation, punctuation and linguistic analysis of the text in digitally accessible formats may be devised by those skilled in the art.
- text that has been analysed and which falls into a specific category is copied into a memory location or bulk storage location that is exclusively reserved for the relevant category of text.
- a feature is a descriptive statistic calculated from either or both of the raw text and the annotations.
- Some features express the ratio of frequencies of two different annotation types (e.g. the ratio of sentence annotations to paragraph annotations), or the presence or absence of an annotation type (e.g. signature). More particularly, the features can be generally divided into three groupings:
- Structural level features typically refer to the annotations made regarding structural features of the text such as the presence of a signature block, reply status, attachments, headers, etc. Examples include information regarding: o indentation of paragraphs; o presence of farewells; o document length in characters, words, lines, sentences and/or paragraphs; and o mean paragraph length in lines, sentences and/or words.
- Function words from predefined functionWord lexicon such as: up, to Word ratio functionWord ;
- Words its part-of-speech posVBU Word_ratio_pos VBU_all VBU posIN Words its part-of-speech equal IN Word_ratio_posIN_all posJJ Words its part-of-speech equal JJ Word_ratio_posJJ_all posRB Words its part-of-speech equal RB Word_ratio_posRB_all posPR Words its part-of-speech equal PR Word_ratio_posPR_all posNNP Words its part-of-speech equal NNP Word_ratio_posNNP_all posPOS Words its part-of-speech equal POS Word_ratio_posPOS_all posMD Words its part-of-speech equal MD Word_ratio_posMD_all caseUpper Words of character case type upper Word_ratio_caseUpper_all caseLower Words of character case type lower Word_ratio_caseLower_all caseCamel Words of character case type
- MultiwordPrepositions All multiword prepositions (mwp) MultiwordPreposition_count_all
- HTML font tag with attribute size HTML_ratio_htmlFontAttributeSize- htmlFontAttributeSize- 1 -1 l_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorNa htmlFontAttributeColorNavy navy vy_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorTe htmlFontAttributeColorTeal teal al_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorSil htmlFontAttributeColorSilver silver ver_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorFu htmlFontAttributeColorFuchsia fuchsia chsia_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorW htmlFontAttributeColorWhite white hiteJitmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorYe htmlFontAttributeColor Yellow yellow llow_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorBla htmlFontAttributeColorBlack black ck_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorPur htmlFontAttributeColorPurple purple ple_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorOli htmlFontAttributeColorOlive olive ve_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorRe htmlFontAttributeColorRed red dJitmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorMa htmlFontAttributeColorMaroon maroon roon_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorAq htmlFontAttributeColorAqua aqua ua_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorGr htmlFontAttributeColorGray gray ay_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorBl htmlFontAttributeColorBlue blue ue_htmlTag
- HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorOt htmlFontAttributeColorOther other herJitmlTag
- HTML font tag with attribute face HTML_ratio_htmlFontAttributeFaceAria litmlFontAttributeFaceArial arial LhtmlTag
- HTML font tag with attribute face HTML_ratio_htmlFontAttributeFaceVer htmlFontAttributeFaceVerdana verdana dana_htmlTag
- HTML font tag with attribute face HTML_ratio_htmlFontAttributeFacePap htmlFontAttributeFacePapyrus papyrus yrus_htmlTag
- the feature Char_count_punc33 is a numeric value equal to the number of times ASCII code 33 (i.e. !) is used in the document being analysed.
- Some of the other features mentioned in the above list are counts and/or ratios associated with user-defined lexicons of commonly used emoticons, farewells, function words, greetings and multiword prepositions.
- Each of the feature names is a variable that is set to a numeric value that is calculated for the respective feature. For example, for an email comprised of 488 characters, the variable char_count_all is set to a value of 488.
- a feature vector is essentially a list of features that is structured in a predefined manner to function as input for the Support Vector Machines processing that occurs at step 12. With reference to the running example, the feature vector is as follows:
- the author traits that are predicted by the preferred embodiment include the following six demographic traits: age; gender; educational level; native language; country of origin and geographic region. Additionally, the preferred embodiment predicts the following psychometric traits: extraversion; agreeableness; conscientiousness; neuroticism; and openness. It will be appreciated that other preferred embodiments provide a greater or lesser number of predicted author traits as their output. In particular, some embodiments output at least three of the six demographic traits and at least three of the following six psychometric traits: extraversion; agreeableness; conscientiousness; neuroticism; psychoticism and openness.
- the output is initially in a coded format, which for the running example looks as follows:
- the first trait which is represented by code “0” is the predicted identity, which has a value of "u23-938484".
- the second predicted trait which is represented by code “1” relates to the authors predicted openness and it has a value of "3.0” on a scale of 1 to 5.
- Other predicted traits and their associated codes are as follows:
- the coded output is processed by the computer 51 and displayed in a user- friendly display format on the screen 58 of the laptop computer 56.
- a random example of such a display format is shown in the screen grab illustrated in figure 4.
- Each of the predicted author traits is associated with a confidence level representing an estimate of the likelihood that the predicted trait is correct. For example, it can be seen from figure 4 that the predicted age of the author is 35 - 44, and this prediction is associated with a confidence level of 77%.
- the confidence levels for any given author trait are calculated by the machine learning system based upon the strength of correlation between the selected input features and the relevant predicted author trait.
- classifiers are created by the selection of sets of features for each author trait. For each experiment, ten-fold cross-validation is preferably used. Ten- fold cross validation refers to the practice of using a 90- 10 split of the data for experiments and repeating this process for each 90-10 split of the data. To guarantee a reasonably random split of the data, the splits are randomized but must be reproducible. To evaluate and test the classifiers, new documents are given as input and existing classifiers are selected to predict author traits. Another option is to keep 10% of the data for testing purposes while 90% is used for training and tuning. The training and tuning data is split into 90% for training and 10 % for tuning. This process gets repeated for each 90-10 split of the training/tuning data, in a 10-fold cross-validation. As previously mentioned, to guarantee a reasonably random split of the data in the 10-fold cross- validation process, the training/tuning splits are randomized, but the splits are reproducible.
- each classifier 11 or 17 is not only specific to a particular author trait, but is also specific to a particular document type, such as emails, extracts from chat room communications, etc.
- the present invention may be embodied in computer software in the form of executable code for instructing a computer to perform the inventive method.
- the software and its associated data are capable of being stored upon a computer -readable medium in the form of one or more compact disks (CD's).
- CD's compact disks
- Alternative embodiments make use of other forms of digital storage media, such as Digital Versatile Discs (DVD's), hard drives, flash memory, Erasable Programmable Read-Only Memory (EPROM), and the like.
- DVD's Digital Versatile Discs
- EPROM Erasable Programmable Read-Only Memory
- the software and its associated data may be stored as one or more downloadable or remotely executable files that are accessible via a computer communications network such as the internet.
- the processing of documents undertaken by the preferred embodiment advantageously predicts a number of author traits. If properly configured and trained, preferred embodiments of the invention perform the predictions with a comparatively high degree of accuracy. Additionally, the preferred embodiment is not confined to analysis of the text of a small number of different authors, which compares favourably with at least some of the known prior art.
- the predictive processing is achieved with the use of a rich set of linguistic features, such as a database storing a plurality of named entities, common greetings and farewell phrases.
- the predictive processing also makes use of a comprehensive set of punctuation features. Additionally, the use of segmentation analysis provides further useful input to the predictive processing.
- the preferred embodiment is advantageously configurably to function with input documents from a variety of sources.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Strategic Management (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
- Information Transfer Between Computers (AREA)
Abstract
L'invention concerne un procédé informatique conçu pour traiter numériquement un document codé possédant un texte composé par un auteur au moyen d'un processeur, pour analyser la segmentation, la ponctuation et la linguistique du texte et pour stocker les résultats dans un format accessible numériquement. Les traits de l'auteur sont ensuite prédits au moyen d'un système d'apprentissage automatique basé sur les résultats de l'analyse de segmentation, de ponctuation et de linguistique du texte.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2006906095A AU2006906095A0 (en) | 2006-11-03 | Email document parsing method and apparatus | |
AU2006906623A AU2006906623A0 (en) | 2006-11-28 | Document processor and associated method | |
PCT/AU2007/000441 WO2008052240A1 (fr) | 2006-11-03 | 2007-04-05 | Processeur de documents et procédé associé |
Publications (2)
Publication Number | Publication Date |
---|---|
EP2084620A1 true EP2084620A1 (fr) | 2009-08-05 |
EP2084620A4 EP2084620A4 (fr) | 2011-05-11 |
Family
ID=39343669
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07718687A Withdrawn EP2092447A4 (fr) | 2006-11-03 | 2007-04-05 | Procédé et appareil d'analyse de courriels |
EP07718688A Withdrawn EP2084620A4 (fr) | 2006-11-03 | 2007-04-05 | Processeur de documents et procédé associé |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP07718687A Withdrawn EP2092447A4 (fr) | 2006-11-03 | 2007-04-05 | Procédé et appareil d'analyse de courriels |
Country Status (4)
Country | Link |
---|---|
US (2) | US20100114562A1 (fr) |
EP (2) | EP2092447A4 (fr) |
AU (2) | AU2007314123B2 (fr) |
WO (2) | WO2008052240A1 (fr) |
Families Citing this family (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10862994B1 (en) * | 2006-11-15 | 2020-12-08 | Conviva Inc. | Facilitating client decisions |
US8751605B1 (en) | 2006-11-15 | 2014-06-10 | Conviva Inc. | Accounting for network traffic |
US8874725B1 (en) | 2006-11-15 | 2014-10-28 | Conviva Inc. | Monitoring the performance of a content player |
US9264780B1 (en) | 2006-11-15 | 2016-02-16 | Conviva Inc. | Managing synchronized data requests in a content delivery network |
US8489923B1 (en) * | 2006-11-15 | 2013-07-16 | Conviva Inc. | Detecting problems in content distribution |
US8312379B2 (en) * | 2007-08-22 | 2012-11-13 | International Business Machines Corporation | Methods, systems, and computer program products for editing using an interface |
US9177313B1 (en) | 2007-10-18 | 2015-11-03 | Jpmorgan Chase Bank, N.A. | System and method for issuing, circulating and trading financial instruments with smart features |
US8788523B2 (en) * | 2008-01-15 | 2014-07-22 | Thomson Reuters Global Resources | Systems, methods and software for processing phrases and clauses in legal documents |
GB2463735A (en) * | 2008-09-30 | 2010-03-31 | Paul Howard James Roscoe | Fully biodegradable adhesives |
US20100125523A1 (en) * | 2008-11-18 | 2010-05-20 | Peer 39 Inc. | Method and a system for certifying a document for advertisement appropriateness |
CN101742442A (zh) * | 2008-11-20 | 2010-06-16 | 银河联动信息技术(北京)有限公司 | 通过短信息传输电子凭证的系统和方法 |
US8402494B1 (en) | 2009-03-23 | 2013-03-19 | Conviva Inc. | Switching content |
WO2011154023A1 (fr) * | 2010-06-11 | 2011-12-15 | Siemens Enterprise Communications Gmbh & Co. Kg | Procédé de création d'un document à l'aide d'un système de traitement d'informations |
US8612293B2 (en) | 2010-10-19 | 2013-12-17 | Citizennet Inc. | Generation of advertising targeting information based upon affinity information obtained from an online social network |
US9098836B2 (en) | 2010-11-16 | 2015-08-04 | Microsoft Technology Licensing, Llc | Rich email attachment presentation |
US9349130B2 (en) | 2010-11-17 | 2016-05-24 | Eloqua, Inc. | Generating relative and absolute positioned resources using a single editor having a single syntax |
US8819156B2 (en) | 2011-03-11 | 2014-08-26 | James Robert Miner | Systems and methods for message collection |
US9419928B2 (en) | 2011-03-11 | 2016-08-16 | James Robert Miner | Systems and methods for message collection |
US20120254166A1 (en) * | 2011-03-30 | 2012-10-04 | Google Inc. | Signature Detection in E-Mails |
US9063927B2 (en) * | 2011-04-06 | 2015-06-23 | Citizennet Inc. | Short message age classification |
US20130097166A1 (en) * | 2011-10-12 | 2013-04-18 | International Business Machines Corporation | Determining Demographic Information for a Document Author |
US10148716B1 (en) | 2012-04-09 | 2018-12-04 | Conviva Inc. | Dynamic generation of video manifest files |
US10489433B2 (en) | 2012-08-02 | 2019-11-26 | Artificial Solutions Iberia SL | Natural language data analytics platform |
US9418151B2 (en) * | 2012-06-12 | 2016-08-16 | Raytheon Company | Lexical enrichment of structured and semi-structured data |
US9269273B1 (en) | 2012-07-30 | 2016-02-23 | Weongozi Inc. | Systems, methods and computer program products for building a database associating n-grams with cognitive motivation orientations |
US9246965B1 (en) | 2012-09-05 | 2016-01-26 | Conviva Inc. | Source assignment based on network partitioning |
US10182096B1 (en) | 2012-09-05 | 2019-01-15 | Conviva Inc. | Virtual resource locator |
US10439969B2 (en) * | 2013-01-16 | 2019-10-08 | Google Llc | Double filtering of annotations in emails |
US9208142B2 (en) | 2013-05-20 | 2015-12-08 | International Business Machines Corporation | Analyzing documents corresponding to demographics |
US9483519B2 (en) | 2013-08-28 | 2016-11-01 | International Business Machines Corporation | Authorship enhanced corpus ingestion for natural language processing |
US20150074202A1 (en) * | 2013-09-10 | 2015-03-12 | Lenovo (Singapore) Pte. Ltd. | Processing action items from messages |
RU2013144681A (ru) | 2013-10-03 | 2015-04-10 | Общество С Ограниченной Ответственностью "Яндекс" | Система обработки электронного сообщения для определения его классификации |
US9275242B1 (en) * | 2013-10-14 | 2016-03-01 | Trend Micro Incorporated | Security system for cloud-based emails |
US9607319B2 (en) | 2013-12-30 | 2017-03-28 | Adtile Technologies, Inc. | Motion and gesture-based mobile advertising activation |
US9606977B2 (en) | 2014-01-22 | 2017-03-28 | Google Inc. | Identifying tasks in messages |
US10691872B2 (en) * | 2014-03-19 | 2020-06-23 | Microsoft Technology Licensing, Llc | Normalizing message style while preserving intent |
US9563689B1 (en) | 2014-08-27 | 2017-02-07 | Google Inc. | Generating and applying data extraction templates |
US9652530B1 (en) | 2014-08-27 | 2017-05-16 | Google Inc. | Generating and applying event data extraction templates |
US9785705B1 (en) | 2014-10-16 | 2017-10-10 | Google Inc. | Generating and applying data extraction templates |
US10305955B1 (en) | 2014-12-08 | 2019-05-28 | Conviva Inc. | Streaming decision in the cloud |
US10178043B1 (en) | 2014-12-08 | 2019-01-08 | Conviva Inc. | Dynamic bitrate range selection in the cloud for optimized video streaming |
US10216837B1 (en) | 2014-12-29 | 2019-02-26 | Google Llc | Selecting pattern matching segments for electronic communication clustering |
US10097489B2 (en) | 2015-01-29 | 2018-10-09 | Sap Se | Secure e-mail attachment routing and delivery |
US9578493B1 (en) | 2015-08-06 | 2017-02-21 | Adtile Technologies Inc. | Sensor control switch |
US10003561B2 (en) | 2015-08-24 | 2018-06-19 | Microsoft Technology Licensing, Llc | Conversation modification for enhanced user interaction |
US10275446B2 (en) | 2015-08-26 | 2019-04-30 | International Business Machines Corporation | Linguistic based determination of text location origin |
US9639524B2 (en) | 2015-08-26 | 2017-05-02 | International Business Machines Corporation | Linguistic based determination of text creation date |
US9659007B2 (en) | 2015-08-26 | 2017-05-23 | International Business Machines Corporation | Linguistic based determination of text location origin |
US10437463B2 (en) | 2015-10-16 | 2019-10-08 | Lumini Corporation | Motion-based graphical input system |
US9940318B2 (en) * | 2016-01-01 | 2018-04-10 | Google Llc | Generating and applying outgoing communication templates |
US10140291B2 (en) | 2016-06-30 | 2018-11-27 | International Business Machines Corporation | Task-oriented messaging system |
US10511563B2 (en) * | 2016-10-28 | 2019-12-17 | Micro Focus Llc | Hashes of email text |
US10387559B1 (en) * | 2016-11-22 | 2019-08-20 | Google Llc | Template-based identification of user interest |
US9983687B1 (en) | 2017-01-06 | 2018-05-29 | Adtile Technologies Inc. | Gesture-controlled augmented reality experience using a mobile communications device |
US10762895B2 (en) | 2017-06-30 | 2020-09-01 | International Business Machines Corporation | Linguistic profiling for digital customization and personalization |
US11620566B1 (en) | 2017-08-04 | 2023-04-04 | Grammarly, Inc. | Artificial intelligence communication assistance for improving the effectiveness of communications using reaction data |
US10929617B2 (en) * | 2018-07-20 | 2021-02-23 | International Business Machines Corporation | Text analysis in unsupported languages using backtranslation |
US11068530B1 (en) * | 2018-11-02 | 2021-07-20 | Shutterstock, Inc. | Context-based image selection for electronic media |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040158454A1 (en) * | 2003-02-11 | 2004-08-12 | Livia Polanyi | System and method for dynamically determining the attitude of an author of a natural language document |
Family Cites Families (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5111398A (en) * | 1988-11-21 | 1992-05-05 | Xerox Corporation | Processing natural language text using autonomous punctuational structure |
US5636325A (en) * | 1992-11-13 | 1997-06-03 | International Business Machines Corporation | Speech synthesis and analysis of dialects |
US6173406B1 (en) * | 1997-07-15 | 2001-01-09 | Microsoft Corporation | Authentication systems, methods, and computer program products |
US6285978B1 (en) * | 1998-09-24 | 2001-09-04 | International Business Machines Corporation | System and method for estimating accuracy of an automatic natural language translation |
US6732087B1 (en) * | 1999-10-01 | 2004-05-04 | Trialsmith, Inc. | Information storage, retrieval and delivery system and method operable with a computer network |
US6836768B1 (en) * | 1999-04-27 | 2004-12-28 | Surfnotes | Method and apparatus for improved information representation |
US6507829B1 (en) * | 1999-06-18 | 2003-01-14 | Ppd Development, Lp | Textual data classification method and apparatus |
AU1072101A (en) * | 1999-10-01 | 2001-05-10 | Talisma Corporation | Web mail management method and system |
WO2001033409A2 (fr) * | 1999-11-01 | 2001-05-10 | Kurzweil Cyberart Technologies, Inc. | Systeme generateur de poesie informatise |
US7275029B1 (en) * | 1999-11-05 | 2007-09-25 | Microsoft Corporation | System and method for joint optimization of language model performance and size |
US6567805B1 (en) * | 2000-05-15 | 2003-05-20 | International Business Machines Corporation | Interactive automated response system |
US7346492B2 (en) * | 2001-01-24 | 2008-03-18 | Shaw Stroz Llc | System and method for computerized psychological content analysis of computer and media generated communications to produce communications management support, indications, and warnings of dangerous behavior, assessment of media images, and personnel selection support |
US20030043188A1 (en) * | 2001-08-30 | 2003-03-06 | Daron John Bernard | Code read communication software |
US6993534B2 (en) * | 2002-05-08 | 2006-01-31 | International Business Machines Corporation | Data store for knowledge-based data mining system |
TWI306202B (en) * | 2002-08-01 | 2009-02-11 | Via Tech Inc | Method and system for parsing e-mail |
US7813917B2 (en) * | 2004-06-22 | 2010-10-12 | Gary Stephen Shuster | Candidate matching using algorithmic analysis of candidate-authored narrative information |
US20060129602A1 (en) * | 2004-12-15 | 2006-06-15 | Microsoft Corporation | Enable web sites to receive and process e-mail |
US8055715B2 (en) * | 2005-02-01 | 2011-11-08 | i365 MetaLINCS | Thread identification and classification |
WO2006088915A1 (fr) * | 2005-02-14 | 2006-08-24 | Inboxer, Inc. | Systeme d'application d'actions et de polices diverses a des messages electroniques avant leur sortie du controle de l'emetteur du message |
US20080084972A1 (en) * | 2006-09-27 | 2008-04-10 | Michael Robert Burke | Verifying that a message was authored by a user by utilizing a user profile generated for the user |
-
2007
- 2007-04-05 EP EP07718687A patent/EP2092447A4/fr not_active Withdrawn
- 2007-04-05 US US12/513,099 patent/US20100114562A1/en not_active Abandoned
- 2007-04-05 EP EP07718688A patent/EP2084620A4/fr not_active Withdrawn
- 2007-04-05 WO PCT/AU2007/000441 patent/WO2008052240A1/fr active Application Filing
- 2007-04-05 AU AU2007314123A patent/AU2007314123B2/en not_active Ceased
- 2007-04-05 WO PCT/AU2007/000440 patent/WO2008052239A1/fr active Application Filing
- 2007-04-05 AU AU2007314124A patent/AU2007314124B2/en not_active Ceased
- 2007-04-05 US US12/447,898 patent/US20100100815A1/en not_active Abandoned
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040158454A1 (en) * | 2003-02-11 | 2004-08-12 | Livia Polanyi | System and method for dynamically determining the attitude of an author of a natural language document |
Non-Patent Citations (4)
Title |
---|
Cunningham et al: "GATE: an Architecture for Development of Robust HLT Applications", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 6 July 2002 (2002-07-06), 12 July 2002 (2002-07-12), pages 168-175, XP002630481, Philadelphia, PA, US Retrieved from the Internet: URL:http://gate.ac.uk/sale/acl02/acl-main.pdf [retrieved on 2011-03-28] * |
DE VEL O: "Mining E-mail Authorship", PROC. WORKSHOP ON TEXT MINING, ACM INTERNATIONL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'2000),, 1 January 2000 (2000-01-01), XP008133413, * |
Nowson et al.: "Whose thumb is it anyway? Classifying author personality from weblog text", Coling/ACL 2006, 17 July 2006 (2006-07-17), 21 July 2006 (2006-07-21), XP002630474, Retrieved from the Internet: URL:http://nowson.com/papers/OberNowACL06.pdf [retrieved on 2011-03-28] * |
See also references of WO2008052240A1 * |
Also Published As
Publication number | Publication date |
---|---|
EP2092447A1 (fr) | 2009-08-26 |
EP2084620A4 (fr) | 2011-05-11 |
US20100100815A1 (en) | 2010-04-22 |
US20100114562A1 (en) | 2010-05-06 |
WO2008052239A1 (fr) | 2008-05-08 |
WO2008052240A1 (fr) | 2008-05-08 |
AU2007314123B2 (en) | 2009-09-03 |
AU2007314123A1 (en) | 2008-05-08 |
AU2007314124B2 (en) | 2009-08-20 |
EP2092447A4 (fr) | 2011-03-02 |
AU2007314124A1 (en) | 2008-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2007314124B2 (en) | Document processor and associated method | |
US6632251B1 (en) | Document producing support system | |
Zaidan et al. | Arabic dialect identification | |
Shaalan et al. | NERA: Named entity recognition for Arabic | |
US6820237B1 (en) | Apparatus and method for context-based highlighting of an electronic document | |
US9256679B2 (en) | Information search method and system, information provision method and system based on user's intention | |
US20020156817A1 (en) | System and method for extracting information | |
US20130006986A1 (en) | Automatic Classification of Electronic Content Into Projects | |
US20030210249A1 (en) | System and method of automatic data checking and correction | |
US11263714B1 (en) | Automated document analysis for varying natural languages | |
CN101887414A (zh) | 对包含图像符号的文本消息传达的评价自动打分的服务器 | |
Almuqren et al. | AraCust: a Saudi Telecom Tweets corpus for sentiment analysis | |
Al Qundus et al. | Exploring the impact of short-text complexity and structure on its quality in social media | |
Forsyth et al. | Found in translation: To what extent is authorial discriminability preserved by translators? | |
US20030074345A1 (en) | Apparatus for interpreting electronic legal documents | |
Kovriguina et al. | Metadata extraction from conference proceedings using template-based approach | |
Baron et al. | Children Online: A survey of child language and CMC corpora | |
US20220270008A1 (en) | Systems and methods for enhanced risk identification based on textual analysis | |
Afolabi et al. | Semantic text mining using domain ontology | |
Estival et al. | Author profiling for English and Arabic emails | |
Gobin-Rahimbux et al. | KreolStem: A hybrid language-dependent stemmer for Kreol Morisien | |
CN112199948A (zh) | 文本内容识别和违规广告识别方法、装置及电子设备 | |
Abera et al. | Information extraction model for afan oromo news text | |
LEMU | Named Entity Detection and Classification for Afaan Oromoo Text based on Bidirectional Encoder Representations from Transformers | |
Branting et al. | Decision support for detecting sensitive text in government records: Anonymous submission |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20090521 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR |
|
DAX | Request for extension of the european patent (deleted) | ||
A4 | Supplementary search report drawn up and despatched |
Effective date: 20110408 |
|
17Q | First examination report despatched |
Effective date: 20140325 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20140805 |