EP2084620A1 - Processeur de documents et procédé associé - Google Patents

Processeur de documents et procédé associé

Info

Publication number
EP2084620A1
EP2084620A1 EP07718688A EP07718688A EP2084620A1 EP 2084620 A1 EP2084620 A1 EP 2084620A1 EP 07718688 A EP07718688 A EP 07718688A EP 07718688 A EP07718688 A EP 07718688A EP 2084620 A1 EP2084620 A1 EP 2084620A1
Authority
EP
European Patent Office
Prior art keywords
text
author
analysis
trait
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP07718688A
Other languages
German (de)
English (en)
Other versions
EP2084620A4 (fr
Inventor
Ben Hutchinson
Dominique Estival
Wil Radford
Son Bao Pham
Tanja Gaustad
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Appen Ltd
Original Assignee
Appen Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2006906095A external-priority patent/AU2006906095A0/en
Application filed by Appen Ltd filed Critical Appen Ltd
Publication of EP2084620A1 publication Critical patent/EP2084620A1/fr
Publication of EP2084620A4 publication Critical patent/EP2084620A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the present invention relates to a method and apparatus for processing documents.
  • Embodiments of the present invention find application, though not exclusively, in the field of computational text processing, which is also known in some contexts as natural language processing, human language technology or computational linguistics.
  • the outputs of some preferred embodiments of the invention may be used in a wide range of computing tasks such as automatic email categorization techniques, sentiment analysis, author attribution, and the like.
  • a computer implemented method of processing a digitally encoded document having text composed by an author including the steps of: using a processor to analyse segmentation of the text and storing results of said segmentation analysis in a digitally accessible format; using a processor to analyse punctuation of the text and storing results of said punctuation analysis in a digitally accessible format; using a processor to linguistically analyse the text and storing results of said linguistic analysis in a digitally accessible format; and predicting an author trait using a machine learning system that is adapted to receive the results of said linguistic analysis, said segmentation analysis and said punctuation analysis as input, said machine learning system having been trained to process said input so as to output at least one predicted author trait.
  • the linguistic analysis includes identification of predefined words and phrases in the text and the words and phrases may include any one or more of the following types: peoples' names, locations, dates, times, organizations, currency, uniform resource locators (URL's), email addresses, addresses, organizational descriptors, phone numbers, typical greetings and/or typical farewells.
  • URL's uniform resource locators
  • a preferred embodiment makes use of a database of words and phrases of these types.
  • the segmentation analysis includes an analysis of the paragraph and sentence segmentation used in the text.
  • results of said linguistic analysis, said segmentation analysis and said punctuation analysis are represented by one or more data structures associated with the document.
  • data structures are feature vectors.
  • the document is, or includes, any one of: an email; text sourced from an email; data sourced from a digital source; text sourced from an online newsgroup discussion; text sourced from a multiuser online chat session; a digitized facsimile; an SMS message; text sourced from an instant messaging communication session; a scanned document; text sourced by means of optical character recognition; text sourced from a file attached to an email; text sourced from a digital file; a word processor created file; a text file; or text sourced from a web site.
  • the analysed email document 2 is saved into the memory of the computer 51 in a digitally accessible format in an annotation repository 8, which resides on the database server 54.
  • an annotation repository 8 which resides on the database server 54.
  • many other means for recording the results of the segmentation, punctuation and linguistic analysis of the text in digitally accessible formats may be devised by those skilled in the art.
  • text that has been analysed and which falls into a specific category is copied into a memory location or bulk storage location that is exclusively reserved for the relevant category of text.
  • a feature is a descriptive statistic calculated from either or both of the raw text and the annotations.
  • Some features express the ratio of frequencies of two different annotation types (e.g. the ratio of sentence annotations to paragraph annotations), or the presence or absence of an annotation type (e.g. signature). More particularly, the features can be generally divided into three groupings:
  • Structural level features typically refer to the annotations made regarding structural features of the text such as the presence of a signature block, reply status, attachments, headers, etc. Examples include information regarding: o indentation of paragraphs; o presence of farewells; o document length in characters, words, lines, sentences and/or paragraphs; and o mean paragraph length in lines, sentences and/or words.
  • Function words from predefined functionWord lexicon such as: up, to Word ratio functionWord ;
  • Words its part-of-speech posVBU Word_ratio_pos VBU_all VBU posIN Words its part-of-speech equal IN Word_ratio_posIN_all posJJ Words its part-of-speech equal JJ Word_ratio_posJJ_all posRB Words its part-of-speech equal RB Word_ratio_posRB_all posPR Words its part-of-speech equal PR Word_ratio_posPR_all posNNP Words its part-of-speech equal NNP Word_ratio_posNNP_all posPOS Words its part-of-speech equal POS Word_ratio_posPOS_all posMD Words its part-of-speech equal MD Word_ratio_posMD_all caseUpper Words of character case type upper Word_ratio_caseUpper_all caseLower Words of character case type lower Word_ratio_caseLower_all caseCamel Words of character case type
  • MultiwordPrepositions All multiword prepositions (mwp) MultiwordPreposition_count_all
  • HTML font tag with attribute size HTML_ratio_htmlFontAttributeSize- htmlFontAttributeSize- 1 -1 l_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorNa htmlFontAttributeColorNavy navy vy_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorTe htmlFontAttributeColorTeal teal al_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorSil htmlFontAttributeColorSilver silver ver_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorFu htmlFontAttributeColorFuchsia fuchsia chsia_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorW htmlFontAttributeColorWhite white hiteJitmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorYe htmlFontAttributeColor Yellow yellow llow_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorBla htmlFontAttributeColorBlack black ck_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorPur htmlFontAttributeColorPurple purple ple_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorOli htmlFontAttributeColorOlive olive ve_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorRe htmlFontAttributeColorRed red dJitmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorMa htmlFontAttributeColorMaroon maroon roon_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorAq htmlFontAttributeColorAqua aqua ua_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorGr htmlFontAttributeColorGray gray ay_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorBl htmlFontAttributeColorBlue blue ue_htmlTag
  • HTML font tag with attribute color HTML_ratio_htmlFontAttributeColorOt htmlFontAttributeColorOther other herJitmlTag
  • HTML font tag with attribute face HTML_ratio_htmlFontAttributeFaceAria litmlFontAttributeFaceArial arial LhtmlTag
  • HTML font tag with attribute face HTML_ratio_htmlFontAttributeFaceVer htmlFontAttributeFaceVerdana verdana dana_htmlTag
  • HTML font tag with attribute face HTML_ratio_htmlFontAttributeFacePap htmlFontAttributeFacePapyrus papyrus yrus_htmlTag
  • the feature Char_count_punc33 is a numeric value equal to the number of times ASCII code 33 (i.e. !) is used in the document being analysed.
  • Some of the other features mentioned in the above list are counts and/or ratios associated with user-defined lexicons of commonly used emoticons, farewells, function words, greetings and multiword prepositions.
  • Each of the feature names is a variable that is set to a numeric value that is calculated for the respective feature. For example, for an email comprised of 488 characters, the variable char_count_all is set to a value of 488.
  • a feature vector is essentially a list of features that is structured in a predefined manner to function as input for the Support Vector Machines processing that occurs at step 12. With reference to the running example, the feature vector is as follows:
  • the author traits that are predicted by the preferred embodiment include the following six demographic traits: age; gender; educational level; native language; country of origin and geographic region. Additionally, the preferred embodiment predicts the following psychometric traits: extraversion; agreeableness; conscientiousness; neuroticism; and openness. It will be appreciated that other preferred embodiments provide a greater or lesser number of predicted author traits as their output. In particular, some embodiments output at least three of the six demographic traits and at least three of the following six psychometric traits: extraversion; agreeableness; conscientiousness; neuroticism; psychoticism and openness.
  • the output is initially in a coded format, which for the running example looks as follows:
  • the first trait which is represented by code “0” is the predicted identity, which has a value of "u23-938484".
  • the second predicted trait which is represented by code “1” relates to the authors predicted openness and it has a value of "3.0” on a scale of 1 to 5.
  • Other predicted traits and their associated codes are as follows:
  • the coded output is processed by the computer 51 and displayed in a user- friendly display format on the screen 58 of the laptop computer 56.
  • a random example of such a display format is shown in the screen grab illustrated in figure 4.
  • Each of the predicted author traits is associated with a confidence level representing an estimate of the likelihood that the predicted trait is correct. For example, it can be seen from figure 4 that the predicted age of the author is 35 - 44, and this prediction is associated with a confidence level of 77%.
  • the confidence levels for any given author trait are calculated by the machine learning system based upon the strength of correlation between the selected input features and the relevant predicted author trait.
  • classifiers are created by the selection of sets of features for each author trait. For each experiment, ten-fold cross-validation is preferably used. Ten- fold cross validation refers to the practice of using a 90- 10 split of the data for experiments and repeating this process for each 90-10 split of the data. To guarantee a reasonably random split of the data, the splits are randomized but must be reproducible. To evaluate and test the classifiers, new documents are given as input and existing classifiers are selected to predict author traits. Another option is to keep 10% of the data for testing purposes while 90% is used for training and tuning. The training and tuning data is split into 90% for training and 10 % for tuning. This process gets repeated for each 90-10 split of the training/tuning data, in a 10-fold cross-validation. As previously mentioned, to guarantee a reasonably random split of the data in the 10-fold cross- validation process, the training/tuning splits are randomized, but the splits are reproducible.
  • each classifier 11 or 17 is not only specific to a particular author trait, but is also specific to a particular document type, such as emails, extracts from chat room communications, etc.
  • the present invention may be embodied in computer software in the form of executable code for instructing a computer to perform the inventive method.
  • the software and its associated data are capable of being stored upon a computer -readable medium in the form of one or more compact disks (CD's).
  • CD's compact disks
  • Alternative embodiments make use of other forms of digital storage media, such as Digital Versatile Discs (DVD's), hard drives, flash memory, Erasable Programmable Read-Only Memory (EPROM), and the like.
  • DVD's Digital Versatile Discs
  • EPROM Erasable Programmable Read-Only Memory
  • the software and its associated data may be stored as one or more downloadable or remotely executable files that are accessible via a computer communications network such as the internet.
  • the processing of documents undertaken by the preferred embodiment advantageously predicts a number of author traits. If properly configured and trained, preferred embodiments of the invention perform the predictions with a comparatively high degree of accuracy. Additionally, the preferred embodiment is not confined to analysis of the text of a small number of different authors, which compares favourably with at least some of the known prior art.
  • the predictive processing is achieved with the use of a rich set of linguistic features, such as a database storing a plurality of named entities, common greetings and farewell phrases.
  • the predictive processing also makes use of a comprehensive set of punctuation features. Additionally, the use of segmentation analysis provides further useful input to the predictive processing.
  • the preferred embodiment is advantageously configurably to function with input documents from a variety of sources.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • Computer Hardware Design (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

L'invention concerne un procédé informatique conçu pour traiter numériquement un document codé possédant un texte composé par un auteur au moyen d'un processeur, pour analyser la segmentation, la ponctuation et la linguistique du texte et pour stocker les résultats dans un format accessible numériquement. Les traits de l'auteur sont ensuite prédits au moyen d'un système d'apprentissage automatique basé sur les résultats de l'analyse de segmentation, de ponctuation et de linguistique du texte.
EP07718688A 2006-11-03 2007-04-05 Processeur de documents et procédé associé Withdrawn EP2084620A4 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
AU2006906095A AU2006906095A0 (en) 2006-11-03 Email document parsing method and apparatus
AU2006906623A AU2006906623A0 (en) 2006-11-28 Document processor and associated method
PCT/AU2007/000441 WO2008052240A1 (fr) 2006-11-03 2007-04-05 Processeur de documents et procédé associé

Publications (2)

Publication Number Publication Date
EP2084620A1 true EP2084620A1 (fr) 2009-08-05
EP2084620A4 EP2084620A4 (fr) 2011-05-11

Family

ID=39343669

Family Applications (2)

Application Number Title Priority Date Filing Date
EP07718687A Withdrawn EP2092447A4 (fr) 2006-11-03 2007-04-05 Procédé et appareil d'analyse de courriels
EP07718688A Withdrawn EP2084620A4 (fr) 2006-11-03 2007-04-05 Processeur de documents et procédé associé

Family Applications Before (1)

Application Number Title Priority Date Filing Date
EP07718687A Withdrawn EP2092447A4 (fr) 2006-11-03 2007-04-05 Procédé et appareil d'analyse de courriels

Country Status (4)

Country Link
US (2) US20100114562A1 (fr)
EP (2) EP2092447A4 (fr)
AU (2) AU2007314123B2 (fr)
WO (2) WO2008052240A1 (fr)

Families Citing this family (58)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10862994B1 (en) * 2006-11-15 2020-12-08 Conviva Inc. Facilitating client decisions
US8751605B1 (en) 2006-11-15 2014-06-10 Conviva Inc. Accounting for network traffic
US8874725B1 (en) 2006-11-15 2014-10-28 Conviva Inc. Monitoring the performance of a content player
US9264780B1 (en) 2006-11-15 2016-02-16 Conviva Inc. Managing synchronized data requests in a content delivery network
US8489923B1 (en) * 2006-11-15 2013-07-16 Conviva Inc. Detecting problems in content distribution
US8312379B2 (en) * 2007-08-22 2012-11-13 International Business Machines Corporation Methods, systems, and computer program products for editing using an interface
US9177313B1 (en) 2007-10-18 2015-11-03 Jpmorgan Chase Bank, N.A. System and method for issuing, circulating and trading financial instruments with smart features
US8788523B2 (en) * 2008-01-15 2014-07-22 Thomson Reuters Global Resources Systems, methods and software for processing phrases and clauses in legal documents
GB2463735A (en) * 2008-09-30 2010-03-31 Paul Howard James Roscoe Fully biodegradable adhesives
US20100125523A1 (en) * 2008-11-18 2010-05-20 Peer 39 Inc. Method and a system for certifying a document for advertisement appropriateness
CN101742442A (zh) * 2008-11-20 2010-06-16 银河联动信息技术(北京)有限公司 通过短信息传输电子凭证的系统和方法
US8402494B1 (en) 2009-03-23 2013-03-19 Conviva Inc. Switching content
WO2011154023A1 (fr) * 2010-06-11 2011-12-15 Siemens Enterprise Communications Gmbh & Co. Kg Procédé de création d'un document à l'aide d'un système de traitement d'informations
US8612293B2 (en) 2010-10-19 2013-12-17 Citizennet Inc. Generation of advertising targeting information based upon affinity information obtained from an online social network
US9098836B2 (en) 2010-11-16 2015-08-04 Microsoft Technology Licensing, Llc Rich email attachment presentation
US9349130B2 (en) 2010-11-17 2016-05-24 Eloqua, Inc. Generating relative and absolute positioned resources using a single editor having a single syntax
US8819156B2 (en) 2011-03-11 2014-08-26 James Robert Miner Systems and methods for message collection
US9419928B2 (en) 2011-03-11 2016-08-16 James Robert Miner Systems and methods for message collection
US20120254166A1 (en) * 2011-03-30 2012-10-04 Google Inc. Signature Detection in E-Mails
US9063927B2 (en) * 2011-04-06 2015-06-23 Citizennet Inc. Short message age classification
US20130097166A1 (en) * 2011-10-12 2013-04-18 International Business Machines Corporation Determining Demographic Information for a Document Author
US10148716B1 (en) 2012-04-09 2018-12-04 Conviva Inc. Dynamic generation of video manifest files
US10489433B2 (en) 2012-08-02 2019-11-26 Artificial Solutions Iberia SL Natural language data analytics platform
US9418151B2 (en) * 2012-06-12 2016-08-16 Raytheon Company Lexical enrichment of structured and semi-structured data
US9269273B1 (en) 2012-07-30 2016-02-23 Weongozi Inc. Systems, methods and computer program products for building a database associating n-grams with cognitive motivation orientations
US9246965B1 (en) 2012-09-05 2016-01-26 Conviva Inc. Source assignment based on network partitioning
US10182096B1 (en) 2012-09-05 2019-01-15 Conviva Inc. Virtual resource locator
US10439969B2 (en) * 2013-01-16 2019-10-08 Google Llc Double filtering of annotations in emails
US9208142B2 (en) 2013-05-20 2015-12-08 International Business Machines Corporation Analyzing documents corresponding to demographics
US9483519B2 (en) 2013-08-28 2016-11-01 International Business Machines Corporation Authorship enhanced corpus ingestion for natural language processing
US20150074202A1 (en) * 2013-09-10 2015-03-12 Lenovo (Singapore) Pte. Ltd. Processing action items from messages
RU2013144681A (ru) 2013-10-03 2015-04-10 Общество С Ограниченной Ответственностью "Яндекс" Система обработки электронного сообщения для определения его классификации
US9275242B1 (en) * 2013-10-14 2016-03-01 Trend Micro Incorporated Security system for cloud-based emails
US9607319B2 (en) 2013-12-30 2017-03-28 Adtile Technologies, Inc. Motion and gesture-based mobile advertising activation
US9606977B2 (en) 2014-01-22 2017-03-28 Google Inc. Identifying tasks in messages
US10691872B2 (en) * 2014-03-19 2020-06-23 Microsoft Technology Licensing, Llc Normalizing message style while preserving intent
US9563689B1 (en) 2014-08-27 2017-02-07 Google Inc. Generating and applying data extraction templates
US9652530B1 (en) 2014-08-27 2017-05-16 Google Inc. Generating and applying event data extraction templates
US9785705B1 (en) 2014-10-16 2017-10-10 Google Inc. Generating and applying data extraction templates
US10305955B1 (en) 2014-12-08 2019-05-28 Conviva Inc. Streaming decision in the cloud
US10178043B1 (en) 2014-12-08 2019-01-08 Conviva Inc. Dynamic bitrate range selection in the cloud for optimized video streaming
US10216837B1 (en) 2014-12-29 2019-02-26 Google Llc Selecting pattern matching segments for electronic communication clustering
US10097489B2 (en) 2015-01-29 2018-10-09 Sap Se Secure e-mail attachment routing and delivery
US9578493B1 (en) 2015-08-06 2017-02-21 Adtile Technologies Inc. Sensor control switch
US10003561B2 (en) 2015-08-24 2018-06-19 Microsoft Technology Licensing, Llc Conversation modification for enhanced user interaction
US10275446B2 (en) 2015-08-26 2019-04-30 International Business Machines Corporation Linguistic based determination of text location origin
US9639524B2 (en) 2015-08-26 2017-05-02 International Business Machines Corporation Linguistic based determination of text creation date
US9659007B2 (en) 2015-08-26 2017-05-23 International Business Machines Corporation Linguistic based determination of text location origin
US10437463B2 (en) 2015-10-16 2019-10-08 Lumini Corporation Motion-based graphical input system
US9940318B2 (en) * 2016-01-01 2018-04-10 Google Llc Generating and applying outgoing communication templates
US10140291B2 (en) 2016-06-30 2018-11-27 International Business Machines Corporation Task-oriented messaging system
US10511563B2 (en) * 2016-10-28 2019-12-17 Micro Focus Llc Hashes of email text
US10387559B1 (en) * 2016-11-22 2019-08-20 Google Llc Template-based identification of user interest
US9983687B1 (en) 2017-01-06 2018-05-29 Adtile Technologies Inc. Gesture-controlled augmented reality experience using a mobile communications device
US10762895B2 (en) 2017-06-30 2020-09-01 International Business Machines Corporation Linguistic profiling for digital customization and personalization
US11620566B1 (en) 2017-08-04 2023-04-04 Grammarly, Inc. Artificial intelligence communication assistance for improving the effectiveness of communications using reaction data
US10929617B2 (en) * 2018-07-20 2021-02-23 International Business Machines Corporation Text analysis in unsupported languages using backtranslation
US11068530B1 (en) * 2018-11-02 2021-07-20 Shutterstock, Inc. Context-based image selection for electronic media

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040158454A1 (en) * 2003-02-11 2004-08-12 Livia Polanyi System and method for dynamically determining the attitude of an author of a natural language document

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111398A (en) * 1988-11-21 1992-05-05 Xerox Corporation Processing natural language text using autonomous punctuational structure
US5636325A (en) * 1992-11-13 1997-06-03 International Business Machines Corporation Speech synthesis and analysis of dialects
US6173406B1 (en) * 1997-07-15 2001-01-09 Microsoft Corporation Authentication systems, methods, and computer program products
US6285978B1 (en) * 1998-09-24 2001-09-04 International Business Machines Corporation System and method for estimating accuracy of an automatic natural language translation
US6732087B1 (en) * 1999-10-01 2004-05-04 Trialsmith, Inc. Information storage, retrieval and delivery system and method operable with a computer network
US6836768B1 (en) * 1999-04-27 2004-12-28 Surfnotes Method and apparatus for improved information representation
US6507829B1 (en) * 1999-06-18 2003-01-14 Ppd Development, Lp Textual data classification method and apparatus
AU1072101A (en) * 1999-10-01 2001-05-10 Talisma Corporation Web mail management method and system
WO2001033409A2 (fr) * 1999-11-01 2001-05-10 Kurzweil Cyberart Technologies, Inc. Systeme generateur de poesie informatise
US7275029B1 (en) * 1999-11-05 2007-09-25 Microsoft Corporation System and method for joint optimization of language model performance and size
US6567805B1 (en) * 2000-05-15 2003-05-20 International Business Machines Corporation Interactive automated response system
US7346492B2 (en) * 2001-01-24 2008-03-18 Shaw Stroz Llc System and method for computerized psychological content analysis of computer and media generated communications to produce communications management support, indications, and warnings of dangerous behavior, assessment of media images, and personnel selection support
US20030043188A1 (en) * 2001-08-30 2003-03-06 Daron John Bernard Code read communication software
US6993534B2 (en) * 2002-05-08 2006-01-31 International Business Machines Corporation Data store for knowledge-based data mining system
TWI306202B (en) * 2002-08-01 2009-02-11 Via Tech Inc Method and system for parsing e-mail
US7813917B2 (en) * 2004-06-22 2010-10-12 Gary Stephen Shuster Candidate matching using algorithmic analysis of candidate-authored narrative information
US20060129602A1 (en) * 2004-12-15 2006-06-15 Microsoft Corporation Enable web sites to receive and process e-mail
US8055715B2 (en) * 2005-02-01 2011-11-08 i365 MetaLINCS Thread identification and classification
WO2006088915A1 (fr) * 2005-02-14 2006-08-24 Inboxer, Inc. Systeme d'application d'actions et de polices diverses a des messages electroniques avant leur sortie du controle de l'emetteur du message
US20080084972A1 (en) * 2006-09-27 2008-04-10 Michael Robert Burke Verifying that a message was authored by a user by utilizing a user profile generated for the user

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040158454A1 (en) * 2003-02-11 2004-08-12 Livia Polanyi System and method for dynamically determining the attitude of an author of a natural language document

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Cunningham et al: "GATE: an Architecture for Development of Robust HLT Applications", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, 6 July 2002 (2002-07-06), 12 July 2002 (2002-07-12), pages 168-175, XP002630481, Philadelphia, PA, US Retrieved from the Internet: URL:http://gate.ac.uk/sale/acl02/acl-main.pdf [retrieved on 2011-03-28] *
DE VEL O: "Mining E-mail Authorship", PROC. WORKSHOP ON TEXT MINING, ACM INTERNATIONL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'2000),, 1 January 2000 (2000-01-01), XP008133413, *
Nowson et al.: "Whose thumb is it anyway? Classifying author personality from weblog text", Coling/ACL 2006, 17 July 2006 (2006-07-17), 21 July 2006 (2006-07-21), XP002630474, Retrieved from the Internet: URL:http://nowson.com/papers/OberNowACL06.pdf [retrieved on 2011-03-28] *
See also references of WO2008052240A1 *

Also Published As

Publication number Publication date
EP2092447A1 (fr) 2009-08-26
EP2084620A4 (fr) 2011-05-11
US20100100815A1 (en) 2010-04-22
US20100114562A1 (en) 2010-05-06
WO2008052239A1 (fr) 2008-05-08
WO2008052240A1 (fr) 2008-05-08
AU2007314123B2 (en) 2009-09-03
AU2007314123A1 (en) 2008-05-08
AU2007314124B2 (en) 2009-08-20
EP2092447A4 (fr) 2011-03-02
AU2007314124A1 (en) 2008-05-08

Similar Documents

Publication Publication Date Title
AU2007314124B2 (en) Document processor and associated method
US6632251B1 (en) Document producing support system
Zaidan et al. Arabic dialect identification
Shaalan et al. NERA: Named entity recognition for Arabic
US6820237B1 (en) Apparatus and method for context-based highlighting of an electronic document
US9256679B2 (en) Information search method and system, information provision method and system based on user's intention
US20020156817A1 (en) System and method for extracting information
US20130006986A1 (en) Automatic Classification of Electronic Content Into Projects
US20030210249A1 (en) System and method of automatic data checking and correction
US11263714B1 (en) Automated document analysis for varying natural languages
CN101887414A (zh) 对包含图像符号的文本消息传达的评价自动打分的服务器
Almuqren et al. AraCust: a Saudi Telecom Tweets corpus for sentiment analysis
Al Qundus et al. Exploring the impact of short-text complexity and structure on its quality in social media
Forsyth et al. Found in translation: To what extent is authorial discriminability preserved by translators?
US20030074345A1 (en) Apparatus for interpreting electronic legal documents
Kovriguina et al. Metadata extraction from conference proceedings using template-based approach
Baron et al. Children Online: A survey of child language and CMC corpora
US20220270008A1 (en) Systems and methods for enhanced risk identification based on textual analysis
Afolabi et al. Semantic text mining using domain ontology
Estival et al. Author profiling for English and Arabic emails
Gobin-Rahimbux et al. KreolStem: A hybrid language-dependent stemmer for Kreol Morisien
CN112199948A (zh) 文本内容识别和违规广告识别方法、装置及电子设备
Abera et al. Information extraction model for afan oromo news text
LEMU Named Entity Detection and Classification for Afaan Oromoo Text based on Bidirectional Encoder Representations from Transformers
Branting et al. Decision support for detecting sensitive text in government records: Anonymous submission

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20090521

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC MT NL PL PT RO SE SI SK TR

DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20110408

17Q First examination report despatched

Effective date: 20140325

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20140805