WO2016083908A1 - System and method for computer processing of an e-mail message and visual representation of a message abstract - Google Patents
System and method for computer processing of an e-mail message and visual representation of a message abstract Download PDFInfo
- Publication number
- WO2016083908A1 WO2016083908A1 PCT/IB2015/054486 IB2015054486W WO2016083908A1 WO 2016083908 A1 WO2016083908 A1 WO 2016083908A1 IB 2015054486 W IB2015054486 W IB 2015054486W WO 2016083908 A1 WO2016083908 A1 WO 2016083908A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- lexical
- lexical unit
- meaningless
- text
- database
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/107—Computer-aided management of electronic mailing [e-mailing]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
Definitions
- the present technology relates to a system and method for computer processing of a message and visual representation of a message abstract.
- a sender fills in a subject field where s(he) can provide a brief summary of a topic of the e-mail message.
- the "subject" field allows the user to familiarize her (him)self with a pack of received e-mail messages and immediately upon receiving an e-mail message define its priority. For example the user can immediately realize that the message with the "subject” field "Biggest-ever discount on suitcases! " is of minor importance without reading its content, and, on the contrary, determine that the message with the "subject” field "Important notice: Your flight details have changed” as important.
- the "subject" field can be insufficient for determining priority of the e-mail message. It happens when an author suggested an indistinct topic or when the user receives a lot of e-mail messages with similar topics. In such cases a function of a preview the first lines of an e-mail message can be useful. For example, Microsoft Outlook e-mail client allows to review the first three lines of a message in the main window.
- a method for computer processing of a text message sent to an user which message comprises both meaningful lexical units and meaningless lexical units; the method comprises: (i) performing a syntax analysis of a text message for determining at least one lexical unit as a potential meaningless lexical unit; (ii) performing a first check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from a first lexical unit database, the first database was generated in result of the syntax analysis of previous text messages sent to the user; (iii) performing a second check of the at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from a second meaningless lexical unit database, the second database was generated in result of the syntax analysis of previous text messages sent to a group of users from the plurality of users; (iv) in response to the positive result of any of : the first and the second check determining the potential meaningless lexical unit as a meaning
- the method further comprises generating the text message abstract; the abstract being generated in such a way that there are no meaningless lexical units in the text message abstract.
- the text message abstract comprises at least one meaningful phrase.
- the text message abstract is an abstract of the most significant part of a text message.
- the text message is an e-mail message in which the most significant part of the given e-mail message is defined as the most significant logical block of HTML code from the plurality of the logical blocks of HTML code which comprise text.
- the most significant logical block of the HTML code comprises a block of the HTML code which comprises text, and the size of which is larger than a size of any other logical block of the HTML code of the e-mail message.
- the most significant logical block of the HTML code is a block of the HTML code which comprises text, and the text of the most significant logical block of the HTML code comprises the majority of meaningful lexical units in comparison with the text of any other logical block of the HTML code of the given e-mail message.
- the text message abstract is an abstract of the predefined number of paragraphs from the beginning of a text message.
- the group of users is an entire plurality of users.
- the method further comprises receiving an incoming text message.
- the lexical unit is any of: (i) a word, (ii) a phrase, (iii) a sentence, (iv) a paragraph.
- determining at least one lexical unit as a potential meaningless lexical unit comprises determining at least one meaningful lexical unit.
- determining at least one lexical unit as a potential meaningless unit is performed on the basis of the syntax analysis of one of: (i) an entire text of the text message, and (ii) a part of the text from the text message; the part of the text from the text message comprises predefined number of paragraphs.
- performing a syntax analysis of a text message comprises the markup language analysis of the text message.
- analyzing the markup language of the text message comprises analyzing at least one of: a structure of a text message, a font type, a font size, a font face, punctuation marks, and special marks.
- the method further comprises determining a lexical unit control sum.
- the lexical unit control sum is any of: a control element and a combination of control elements
- the control element is any element selected from: a number of characters in a lexical unit, a number of letters in the lexical unit, a number of capital letters in the lexical unit, a number of lower-case letters in the lexical unit, a number of spaces in the lexical unit, a number of numbers in the lexical unit, a number of special marks, a number of words in the lexical unit, a size of the lexical unit expressed in information handling and storage units.
- the comparison of the potential meaningless lexical unit with meaningless lexical units of any of: the first lexical units database and the second lexical units database is carried out, using at least one of predefined parameters, by matching the potential meaningless lexical unit with meaningless lexical units of any of: the first lexical units database and the second lexical units database.
- matching is carried out using a predefined parameter which can be one of: a control sum and a combination of particular control elements, being a part of the lexical unit control sum.
- a result of any of: the first check and the second check is positive when the comparison using at least one of the predefined parameters defines one of: partial matching using said at least one of predefined parameters, the level of match is higher than a predefined match threshold and full matching using the at least one of predefined parameters.
- the method further comprises, before the syntax analysis of a text message, generating at least one of: the first database and the second database.
- the computer includes a processor.
- the processor is configured to render the computer operable to execute: (i) performing a syntax analysis of a text message for determining at least one lexical unit as a potential meaningless lexical unit; (ii) performing a first check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from a first lexical unit database, the first database was generated in result of the syntax analysis of previous text messages sent to the user; (iii) performing a second check of the at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from a second meaningless lexical unit database, the second database was generated in result of the syntax analysis of previous text messages sent to a group of users from the plurality of users; (iv) in response to the positive result of any of : the first and the second check determining the potential meaningless lexical unit as a meaningless lexical unit.
- the processor is configured to render the computer operable to execute generating the text message abstract; the abstract being generated in such a way that there are no meaningless lexical units in the text message abstract.
- the text message abstract comprises at least one meaningful phrase.
- the text message abstract is an abstract of the most significant part of a text message.
- the text message is an e-mail message in which the most significant part of the given e-mail message is defined as the most significant logical block of HTML code from the plurality of the logical blocks of HTML code which comprise text.
- the most significant logical block of the HTML code comprises a block of the HTML code which comprises text, and the size of which is larger than a size of any other logical block of the HTML code of the e-mail message.
- the most significant logical block of HTML code is a block of HTML code comprising text; the text of the most significant logical block of HTML code which comprises the majority of meaningful lexical units in comparison with the text of any other logical block of HTML code of the given e-mail message.
- the text message abstract is an abstract of the predefined number of paragraphs from the beginning of the text message.
- the group of users is an entire plurality of users.
- the processor is configured to render the computer operable to execute receiving the text message.
- the lexical unit is any of: (i) a word, (ii) a phrase, (iii) a sentence, (iv) a paragraph.
- determining at least one lexical unit as a potential meaningless lexical unit comprises determining at least one meaningful lexical unit.
- determining at least one lexical unit as a potential meaningless lexical unit is performed on the basis of the syntax analysis of one of: (i) an entire text of the text message and (ii) a part of the text of the text message.
- carrying out the syntax analysis of an e-mail message includes the markup language analysis of the e-mail message.
- analyzing the markup language of the text message comprises analyzing at least one of: a structure of a text message, a font type, a font size, a font face, punctuation marks, and special marks.
- the processor is configured to render the computer operable to execute determining the lexical unit control sum.
- the lexical unit control sum is any of: a control element and a combination of control elements
- the control element is any element selected from: a number of characters in a lexical unit, a number of letters in the lexical unit, a number of capital letters in the lexical unit, a number of lower-case letters in the lexical unit, a number of spaces in the lexical unit, a number of numbers in the lexical unit, a number of special marks, a number of words in the lexical unit, a size of the lexical unit expressed in information handling and storage units.
- the comparison of the potential meaningless lexical unit with meaningless lexical units of any of: the first lexical units database and the second lexical units database is carried out, using at least one of predefined parameters, by matching the potential meaningless lexical unit with meaningless lexical units of any of: the first lexical units database and the second lexical units database.
- the matching is carried out using a predefined parameter which can be one of: a control sum and a combination of particular control elements, being a part of the lexical unit control sum.
- a result of any of: the first check and the second check is positive when the comparison using at least one of the predefined parameters defines one of: partial matching using said at least one of predefined parameters, the level of match is higher than a predefined match threshold and full matching using the at least one of predefined parameters.
- the processor is further configured to render the computer operable to generate, before carrying out the syntax analysis of the text message, at least one of: the first database and the second database.
- Another object of the present technology is a method of determining meaningless lexical units in a text message, the method is executable on a computer.
- the method includes: (i) carrying out the syntax analysis of a text message for determining at least one lexical unit as a first potential meaningless lexical unit; (ii) determining a control sum of the first potential meaningless lexical unit; (iii) matching (using a first parameter) the first potential meaningless lexical unit with meaningless lexical units of a plurality of meaningless lexical units from a lexical units database; matching (using a first parameter) is matching the control sum of the first potential meaningless lexical unit with control sums of the meaningless lexical units from the lexical units database; (iv) determining the first potential meaningless lexical unit as a meaningless lexical unit if the lexical units database includes at least one meaningless lexical unit with a control sum corresponding to the control sum of the first potential meaningless lexical unit.
- the method further includes: (i) subdividing the first potential meaningless lexical unit to obtain at least two smaller lexical unit and determining at lest one smaller lexical unit as a second potential meaningless lexical unit (ii) determining a control sum of the second potential meaningless lexical unit; (iii) matching using a second parameter the first potential meaningless lexical unit with lexical units from the lexical units database, wherein matching using the second parameter comprises matching the control sum of the second potential meaningless lexical unit with control sums of meaningless lexical units from the lexical units database; (iv) determining the second potential meaningless lexical unit is a meaningless lexical unit if the lexical units database includes at least one meaningless lexical unit with a control sum corresponding to the control sum of the second potential meaningless lexic
- the first potential meaningless lexical unit is a paragraph and the second potential meaningless lexical unit is a sentence from the paragraph.
- a control sum includes all the control elements.
- a control element is any of: a number of characters in a lexical unit, a number of letters in the lexical unit, a number of capital letters in the lexical unit, a number of lower-case letters in the lexical unit, a number of spaces in the lexical unit, a number of numbers in the lexical unit, a number of special marks, a number of words in the lexical unit, a size of the lexical unit expressed in information handling and storage units.
- the matching using the first parameter is carried out using a first set of control elements and matching using the second parameter is carried out using a second set of control elements.
- the first set of control elements and the second set of control elements are the same.
- control sums are considered matching if the control sums are the same.
- the method further comprises: checking a measure of the difference between the control sums and determining the control sums as matching if the measure of the difference is within a predefined permissible amplitude of the difference.
- the measure of the difference is determined for each control element from a control sum and the amplitude of the difference is defined for each control element from a control sum.
- a lexical database includes at least one meaningless lexical unit with a control sum corresponding to the control sum of the potential meaningless lexical unit
- the method further comprises performing a character-by-character match of the potential meaningless lexical unit with the at least one meaningless lexical unit and wherein in response to a match of a character sequence of the potential meaningless lexical unit with a character sequence of the at least one meaningless lexical unit, the method further comprises determining the potential meaningless lexical unit as a meaningless lexical unit.
- a lexical unit from a plurality of lexical units from the lexical unit database is meaningless if its weight exceeds the predefined threshold value.
- a lexical unit database is generated on the basis of a plurality of lexical units which can be found in a plurality of text message and a weight of each lexical unit is in direct proportion with a given lexical unit frequency in the plurality of lexical units which can be found in the plurality of text message.
- performing the syntax analysis of the text message comprises analyzing the markup language of the text message.
- analyzing the markup language of the text message comprises analyzing at least one of: a structure of a text message, a font type, a font size, a font face, punctuation marks, and special marks.
- carrying out the syntax analysis of the text message comprises executing a syntax analysis of a predefined number of paragraphs from the beginning of the text message.
- the text message is an e-mail message.
- the text message is an e-mail message and carrying out a syntax analysis of a text message is a syntax analysis of the most significant part of a text message.
- the most significant part of the e-mail message is determined based on an analysis of a most significant logical block of HTML code from a plurality of the logical blocks of an HTML code of the e-mail message.
- the most significant logical block of the HTML code comprises a block of the HTML code which comprises text, and the size of which is larger than a size of any other logical block of the HTML code of the e-mail message.
- the most significant logical block of the HTML code is a block of the HTML code which comprises text, and the text of the most significant logical block of the HTML code comprises the majority of meaningful lexical units in comparison with the text of any other logical block of the HTML code of the given e-mail message.
- the lexical unit is any of: (i) a word, (ii) a phrase, (iii) a sentence, (iv) a paragraph.
- determining at least one lexical unit as a potential meaningless lexical unit comprises determining at least one meaningful lexical unit.
- determining at least one lexical unit as a potential meaningless lexical unit is performed on the basis of the syntax analysis of one of: (i) an entire text of the text message and (ii) a part of the text of the text message.
- the method further comprises receiving the text message.
- an unique control sum is an ID of an unique lexical unit.
- the computer includes a processor.
- the processor is configured to render the computer operable to execute: (i) carrying out a syntax analysis of a text message (ii) determining at least one lexical unit as a first potential meaningless lexical unit; (iii) determining a control sum of the first potential meaningless lexical unit; (iv) matching (using a first parameter) the first potential meaningless lexical unit with meaningless lexical units of a plurality of meaningless lexical units from a lexical units database; matching (using a first parameter) is matching the control sum of the first potential meaningless lexical unit with control sums of the meaningless lexical units from the lexical units database; (v) determining the first potential meaningless lexical unit as a meaningless lexical unit if the lexical units database includes at least one meaningless lexical unit with a control sum corresponding to the control sum of the first potential meaningless lexical unit.
- the computer further executes: (i) subdividing the first potential meaningless lexical unit to obtain at least two smaller lexical unit and determining at lest one smaller lexical unit as a second potential meaningless lexical unit (ii) determining a control sum of the second potential meaningless lexical unit; (iii) matching using a second parameter the first potential meaningless lexical unit with lexical units from the lexical units database, wherein matching using the second parameter comprises matching the control sum of the second potential meaningless lexical unit with control sums of meaningless lexical units from the lexical units database; (iv) determining the second potential meaningless lexical unit is a meaningless lexical unit if the lexical units database includes at least one meaningless lexical unit with a control sum corresponding to the control sum of the second potential meaningless
- the first potential meaningless lexical unit is a paragraph and the second potential meaningless lexical unit is a sentence from the paragraph.
- control sum includes plurality of control elements.
- control element is any of: a number of characters in a lexical unit, a number of letters in the lexical unit, a number of capital letters in the lexical unit, a number of lower-case letters in the lexical unit, a number of spaces in the lexical unit, a number of numbers in the lexical unit, a number of special marks, a number of words in the lexical unit, a size of the lexical unit expressed in information handling and storage units.
- the matching using the first parameter is carried out using a first set of control elements and matching using the second parameter is carried out using a second set of control elements.
- the first set of control elements and the second set of control elements are the same.
- control sums are considered matching if the control sums are the same.
- the processor further executes: checking a measure of the difference between the control sums and determining the control sums as matching if the measure of the difference is within a predefined permissible amplitude of the difference. [86] In some implementations the measure of the difference is determined for each control element from a control sum and the amplitude of the difference is defined for each control element from a control sum.
- the processor is configured to render the computer operable to execute carrying out a character-by-character match of the potential meaningless lexical unit with this at least one meaningless lexical unit and, in response to a match of a character sequence of the potential meaningless lexical unit with a character sequence of the at least one meaningless lexical unit, determining the potential meaningless lexical unit as a meaningless lexical unit.
- a lexical unit from the plurality of lexical units from the lexical unit database is meaningless if its weight exceeds the predefined threshold value.
- the lexical unit database is generated on the basis of the plurality of lexical units which can be found in the plurality of text messages and in which a weight of each lexical unit is in direct proportion with the given lexical unit frequency in the plurality of lexical units which can be found in the plurality of the text messages.
- performing the syntax analysis of the text message comprises analyzing the markup language of the text message.
- analyzing the markup language of the text message comprises analyzing at least one of: a structure of a text message, a font type, a font size, a font face, punctuation marks, and special marks.
- carrying out the syntax analysis of the text message comprises executing a syntax analysis of a predefined number of paragraphs from the beginning of the text message.
- the text message is an e-mail message.
- the text message is an e-mail message and carrying out the syntax analysis of the text message comprises executing a syntax analysis of a most significant part of the e-mail message.
- the most significant part of the e-mail message is determined based on an analysis of a most significant logical block of HTML code from a plurality of the logical blocks of an HTML code of the e-mail message.
- the most significant logical block of HTML code is a block of HTML code which comprises text the size of which is larger than the size of any other logical block of HTML code of the given e-mail message.
- the most significant logical block of HTML code is a block of HTML code which comprise text; the text of the most significant logical block of HTML code comprises the majority of meaningful lexical units in comparison with the text of any other logical block of HTML code of the given e-mail message.
- the lexical unit is any of: (i) a word, (ii) a phrase, (iii) a sentence, (iv) a paragraph.
- determining at least one lexical unit as a potential meaningless lexical unit comprises determining at least one meaningful lexical unit.
- determining at least one lexical unit as a potential meaningless lexical unit is performed on the basis of the syntax analysis of one of: (i) an entire text of the text message and (ii) a part of the text of the text message.
- the computer further executes receiving the text message.
- an unique control sum is an ID of an unique lexical unit.
- a "server” is a computer program that is running on appropriate hardware and is capable of receiving requests (e.g. from client devices) over a network, and carrying out those requests, or causing those requests to be carried out.
- the hardware may be one physical computer or one physical computer system, but neither is required to be the case with respect to the present technology.
- the use of the expression a "server” is not intended to mean that every task (e.g. received instructions or requests) or any particular task will have been received, carried out, or caused to be carried out, by the same server (i.e.
- client device is any computer hardware that is capable of running software appropriate to the relevant task at hand.
- client device in general the term “client device” is associated with a user of the client device.
- client devices include personal computers (desktops, laptops, netbooks, etc.), smartphones, and tablets, as well as network equipment such as routers, switches, and gateways.
- network equipment such as routers, switches, and gateways.
- a device acting as a client device in the present context is not precluded from acting as a server to other client devices.
- the use of the expression "a client device” does not preclude multiple client devices being used in receiving/sending, carrying out or causing to be carried out any task or request, or the consequences of any task or request, or steps of any method described herein.
- e-mail message includes a file with a text generated by the sender and intended for transmitting to one or more receiver by e-mail.
- An e-mail message is type of a text message.
- source code is a text of software application in any of programming languages or in a markup language which is human readable.
- source code is any input data for a translator.
- Source code is translated into executable code before running of a program by means of a compiler or it can be executed immediately by means of an interpreter.
- information includes information of any nature or kind whatsoever capable of being stored in a database.
- information includes, but is not limited to audiovisual works (images, movies, sound records, presentations etc.), data (location data, numerical data, etc.), text (opinions, comments, questions, messages, etc.), documents, spreadsheets, etc.
- component is meant to include software (appropriate to a particular hardware context) that is both necessary and sufficient to achieve the specific function(s) being referenced.
- computer information storage medium is intended to include media of any nature and kind whatsoever, including without limitation RAM, ROM, disks (CD-ROMs, DVDs, floppy disks, hard drivers, etc.), USB keys, solid state-drives, tape drives, etc.
- a plurality of components may be combined to form the computer information storage medium, including two or more media components of a same type and/or two or more media components of different types.
- a “database” is any structured collection of data, irrespective of its particular structure, the database management software, or the computer hardware on which the data is stored, implemented or otherwise rendered available for use.
- a database may reside on the same hardware as the process that stores or makes use of the information stored in the database or it may reside on separate hardware, such as a dedicated server or plurality of servers.
- a "message analysis module” is a program or a part of the program executed on the corresponding hardware and able to execute a syntax analysis of a text.
- a message analysis module is able to execute a structural analysis of a text.
- the hardware may be one physical computer or one physical computer system, but neither is required to be the case with respect to the present technology.
- a “message analysis module” is not intended to mean that every task (e.g.
- a term "lexical unit” may mean any word, phrase, collocation, paragraph, abbreviation, character, date, acronym (including commonly- accepted ones), lexically meaningful combining form of a compound word in a natural language and also their equivalent code notation and symbolic notation of an artificial language.
- a lexical unit can be established in a text of an e-mail message by numbers, letters, hieroglyphic symbols, special marks or it can be composed of them.
- Implementations of the present technology each have at least one of the above-mentioned object and/or aspects, but do not necessarily have all of them. It should be understood that some aspects of the present technology that have resulted from attempting to attain the above- mentioned object may not satisfy this object and/or may satisfy other objects not specifically recited herein.
- FIG. 1 is a schematic image depicting an implementation of a network computer system 100, the network computer system 100 being implemented in accordance with non-limiting embodiments of the present technology.
- Fig. 2 depicts a text of an e-mail message 200, the e-mail message 200 was sent by a user 141 of Fig. 1 to an user 121 depicted in Fig. 1.
- FIG. 3 depicts a portion of a web interface of e-mail service (prior art).
- FIG. 4 depicts portion of a web interface 400, the web interface being implemented in accordance with non-limiting embodiments of the present technology.
- Fig. 5 is a block-diagram of a method 500 executed on a mail server 102 of the system of Fig. 1, the method being implemented in accordance with non-limiting embodiments of the present technology.
- FIG. 6 and Fig. 7 are block-diagrams of a method 600 executed on the mail server 102 of Fig. 1, the method being implemented in accordance with non-limiting embodiments of the present technology.
- FIG. 1 depicts a schematic diagram of a network computer system 100, components of the network computer system 100 being connected with a communication network 112.
- the network computer system 100 is depicted as merely an illustrative implementation of the present technology. Thus, the description thereof that follows is intended to be only a description of illustrative examples of the present technology. This description is not intended to define the scope or set forth the bounds of the present technology. In some cases, what are believed to be helpful examples of modifications to the network computer system 100 may also be set forth below. This is done merely as an aid to understanding, and, again, not to define the scope or set forth the bounds of the present technology. These modifications are not an exhaustive list, and, as a person skilled in the art would understand, other modifications are likely possible. Further, where this has not been done (i.e.
- network computer system 100 may provide in certain instances simple implementations of the present technology, and that where such is the case they have been presented in this manner as an aid to understanding. As persons skilled in the art would understand, various implementations of the present technology may be of a greater complexity.
- the network computer system 100 comprises a mail server 102.
- the mail server 102 can be a conventional computer server.
- the mail server 102 is a DellTM PowerEdgeTM server using MicrosoftTM Windows ServerTM operating system.
- the mail server 102 can be implemented as any other suitable hardware and/or software application and/or firmware or combination thereof.
- the mail server 102 is a single server.
- functionality of the mail server 102 can be distributed and the functionality can be performed by several servers.
- the mail server 102 comprises, inter alia, a network communication interface (not shown) for two-way communication over the communications network 112; and a processor (not shown) coupled to the network communication interface, the processor being configured to execute various routines, including those described herein below.
- the processor may store or have access to computer readable commands which commands, when executed, cause the processor to execute the various routines described herein.
- Tasks of the mail server 102 include receiving e-mail massages for the user 121, storage and transmission of e-mail messages to the user 121 from the mailbox.
- a mail service can be implemented by any conventional means.
- the network computer system 100 can comprise (either instead of the mail server 102 or additionally to the mail server 102) an IM (instant messages) server or SMS (Short Message Service) server or other text message server(s).
- IM instant messages
- SMS Short Message Service
- the mail server 102 is connected with the communications network 112 via a communication link (not separately numbered).
- the mail server 102 comprises a storage media 104, which can be used by the mail server 102.
- the storage media 104 can be storage media of any nature and kind whatsoever, including RAM, ROM, disks (CD-ROMs, DVDs, floppy disks, hard drivers, etc.), USB keys, solid state-drives, tape drives, etc and combination thereof.
- the storage media 104 of the mail server 102 is intended for storing a mail service module (not shown), the mail service module comprises user mailboxes (including a mailbox of the user 121), e-mail messages (including e-mail messages for the user 121 and e-mail messages for other users) and computer-executable instructions to keep the services and various modules up and running.
- the mail service module comprises user mailboxes (including a mailbox of the user 121), e-mail messages (including e-mail messages for the user 121 and e-mail messages for other users) and computer-executable instructions to keep the services and various modules up and running.
- a mailbox is a part of drive space of the storage media 104 for storing of user e-mail messages (including e-mail messages for the user 121), the mailbox is being stored as a conventional file system catalog in the part of drive space.
- E-mail messages are data files being stored in the file system catalog.
- the storage media 104 is also configured to store a message analysis module 106.
- the message analysis module 106 is a program or a part of the program executed on the corresponding hardware and able to execute a syntax and structural analysis of a text.
- the hardware for the message analysis module 106 may be one physical computer or one physical computer system, but neither is required to be the case with respect to the present technology.
- a "message analysis module 106" is not intended to mean that every task (e.g. received instructions or requests) or any particular task will have been received, carried out, or caused to be carried out, by the same software and/or hardware; it is intended to mean that any number of software elements or hardware devices may be involved in carrying out any task or request, or in processing the results of any task or request.
- a "syntax analysis” is a process of determining a grammatical tag of a linear sequence of text lexical units.
- the message analysis module 106 performs the syntax analysis for determining meaningful and meaningless lexical units in text messages, which text messages in this example are e-mail messages. In alternative implementations text messages can be instant messages, SMS etc.
- Meaningless lexical units are lexical units without significant meaning for the user. For example, it can be titles, parenthetical words, pleasantries, senders' addresses and the like. In contrast, meaningful lexical units may have a significant meaning for the user.
- the message analysis module 106 performs the syntax analysis for determining meaningful and meaningless lexical units in text messages in such a way that the meaningful and meaningless lexical units have meaning.
- a lexical unit can have relatively logically completed meaning and carry individual meaning.
- the text "Convention on the Civil Aspects of International Child Abduction” which is a part of the text "Please find enclosed the file with the text of "Convention on the Civil Aspects of International Child Abduction” [Ru, Eng] (Hague, October 25, 1980) you ordered.” can be a meaningful lexical unit.
- the potential lexical unit "you” or “Civil Aspects” can be meaningless.
- the syntax analysis can be or can comprise an e-mail message source code analysis.
- the e-mail message source code analysis can comprise (as an example and not by way of limitation) an e-mail message markup analysis.
- the syntax analysis can be performed for determining an e-mail message type, detecting e-mail message templates and also for determining lexical units to check (among other things) them as potential meaningless lexical units and potential meaningful lexical units.
- the e-mail message markup analysis can include a font size analysis.
- text parts with different font sizes may be considered to be different lexical units.
- the e-mail message markup analysis can include a font face analysis.
- a phrase in italics, a phrase in bold, or underlined phrase can potentially be one lexical unit.
- the e-mail message markup analysis can include a punctuation mark analysis.
- words are not considered as one lexical unit if there is a dot between them.
- a word sequence can be considered as one lexical unit if the word sequence is quoted and does not exceed a predefined number of words.
- each sentence can be considered as a separate lexical unit.
- the e-mail message markup analysis can include a special mark analysis.
- a mark can be a paragraph mark, a tabulation mark, a page break mark and the like.
- some of these marks can be considered as a sign of the fact that words, numbers and the like which are separated by these marks are not from the same lexical units.
- the mark «@» can be considered as a sign of the fact that surrounding (before this mark and next to this mark) letters, numbers and some other special marks (a dot, a dash, an underscore) are from the same lexical units (in this case an e-mail address).
- message analysis module 106 can process and classify results of the syntax analyses of one e-mail message and/or a specific group of e-mail messages and/or the entire plurality of e-mail messages sent for the users which have an e-mail account on the mail server 102.
- the message analysis module 106 can process and classify the results of the syntax analyses of one e-mail message and/or specific group of e-mail messages and/or the entire plurality of e-mail messages which were written and/or sent by the users which have an e-mail account on the mail server 102.
- the message analysis module 106 can group identical lexical units into groups and then define a number of lexical units in each group of lexical units.
- the message analysis module 106 can also define a general number of lexical units in the entire plurality of e-mail messages sent to the user 121.
- the message analysis module 106 can also define a general number of lexical units in the entire plurality of e-mail messages sent to all the users.
- the message analysis module 106 can also define a general number of lexical units in the entire plurality of e-mail messages sent to specific user groups.
- groups can be user groups set by some characteristics.
- the groups can be set using such criteria as age, gender, user location, user time zone, client device type.
- Corresponding information about age, gender, location, client device type can be obtained from any available source.
- sources can be data from a mail service account (age, gender, location and the like), IP (location), data obtained by a client device mail agent.
- the message analysis module 106 can also define a general number of lexical units in the entire plurality of e-mail messages sent by a sender of a specific type to all the users of to a user group.
- the sender types can include: lending financial institutions (for example, banks, savings banks, credit unions), insurance companies, on-line shops, booking web-sites (for example, flight ticket booking, train ticket booking, theater ticket booking and the like), social networks (for example, FacebookTM, TwitterTM, LinkedlnTM, VWalleteTM, OdnoklassnikiTM).
- the message analysis module 106 can also define types of messages from a specific sender.
- types of messages can be defined when the sender performs a mass distribution of a significant number of standardized messages using different templates.
- it can be messages of different types sent by FacebookTM social network.
- the message analysis module 106 can further or alternatively execute the following operations: receiving a plurality of specific sender's e-mail messages to a plurality of e-mail users which have mail accounts on the mail server 102; performing the syntax analysis of the plurality of specific sender's e-mail messages and determining types of the specific sender's e-mail messages; subdividing the specific sender's e-mail messages into paragraphs; including a plurality of paragraphs into the lexical unit database 108 and/or the lexical unit database 110 and each paragraph from the plurality of the paragraphs is associated with an ID of the given specific sender and with an ID of at least one type of an e-mail message of the sender.
- the message analysis module 106 can execute: receiving an e-mail message from a specific sender which performs a mass distribution of messages; determining a type of the specific sender's e- mail message; performing the syntax analysis of the e-mail message and subdividing the body of the e-mail message into plurality of paragraphs; checking at least one paragraph using at least one lexical unit database to determine if the given paragraph is meaningful for the given type of the e-mail message from the sender.
- the message analysis module 106 can calculate a lexical unit weight.
- the calculation of the lexical unit weight can be executed in respect to the entire array of the e-mail messages sent to the user 121. In this case the calculation can be performed using the first formula:
- Quseri2i total amount of lexical units in the entire array of the e-mail messages sent to the user 121.
- the calculation of the lexical unit weight can be executed by the message analysis module 106 also in respect to the entire array of the e-mail messages sent to all the users which have mail accounts on the mail server 102. In this case the calculation can be performed using the second formula:
- the lexical unit weight can be calculated by the message analysis module 106 separately in respect to one of: 1) different sender types; 2) or different user group, 3) or in respect to a given type of a given mass sender, 4) different combinations of different sender types and different user group, the user 121 alone and/or the entire plurality of receivers.
- the message analysis module 106 can take into account only some paragraphs of each e-mail message, and not the entire text of e-mail messages. A maximum number of such paragraphs can be predefined. In case when an e-mail message includes fewer paragraphs than the predefined number of paragraphs then the entire text of the message can be used for calculating lexical unit weights.
- the message analysis module 106 can take into account certain number of leading (from the beginning) paragraphs of each e-mail message, and not the entire text of e-mail messages. A maximum number of such leading paragraphs can be predefined. In case if an e-mail message includes fewer paragraphs than the predefined number of paragraphs then the entire text of the message can be used for calculating the lexical unit weights.
- the message analysis module 106 can take into account most significant parts of e-mail messages (as it is described below), and not the entire text of the e-mail messages.
- weight of the same lexical unit can differ depending on an array which was used to calculate the weight value and depending on a text which was used to calculate the weight value (entire texts of the e-mail messages or most significant parts of e-mail messages) and depending on a type of the fragment.
- a lexical unit weight can be used for generating different databases and while determining if a lexical unit is meaningful or meaningless in a database.
- the message analysis module 106 can determine a control sum of lexical units.
- the lexical unit control sum is a combination of the following elements: a number of words in the lexical unit, a number of letters in the lexical unit, a number of numbers in the lexical unit, a number of dots in the lexical unit, a number of commas in the lexical unit.
- the lexical unit control sum can be defined as a size of the corresponding lexical unit in bytes.
- the lexical unit control sum can be defined by a combination of any possible control elements such as a number of characters in a lexical unit, a number of letters in the lexical unit, a number of capital letters in the lexical unit, a number of lower-case letters in the lexical unit, a number of spaces in the lexical unit, a number of numbers in the lexical unit, a number of special marks, a number of words in the lexical unit, a size of the lexical unit expressed in information handling and storage units and the like.
- the message analysis module 106 can index lexical units.
- the message analysis module 106 can further execute a structural analysis of an e-mail message.
- a "structural analysis” is intended to mean a process of analyzing an e-mail message structure.
- analyzing an e-mail message structure is performed by means analyzing HTML markup of an e-mail message.
- Such an analysis allows to define logical blocks of HTML code with text. Such blocks, for example, can be large text blocks with a text, with a table cell text, paragraphs of text and the like.
- tags can be used, such as, for example, ⁇ div align- ' ?”> ⁇ /div> (text formatting tags), ⁇ table> ⁇ /table> (table tags), ⁇ td> ⁇ /td> (table cell tags), ⁇ p> ⁇ /p> (paragraph tags) and the like.
- the message analysis module 106 interacts with the first lexical unit database 108.
- the first lexical unit database 108 is a structured data collection which includes lexical units.
- the first lexical unit database 108 is populated using the same hardware as used for a process which performs information storage or use; the information is recorded in the first database 108.
- the first lexical unit database 108 can also be implemented using separate hardware such as a single-unit server or a plurality of servers.
- the first lexical unit database 108 is a database which was generated using results of a syntax analysis of the entirety of the e-mail messages sent to the user 121 and received by the user 121 during lifetime of the account of the user 121 with the mail server 102.
- the first lexical unit database 108 can be generated using results of a syntax analysis of the entirety of all the e-mail messages sent to the user 121 and received by the user 121 during a specific period, for example, during the preceding year. As those skilled in the art will understand, such a period can be any period, more than one year or less than one year.
- Each of the plurality of the lexical units from the first database 108 can be marked as a meaningful lexical unit or as a meaningless lexical unit.
- meaningful lexical units and meaningless lexical units can be stored in the same database with an indication of their weight or with an indication of their different weights calculated using different criteria as it will be described below.
- determining a lexical unit as meaningful or meaningless can be performed by accessing the database and comparing a specific corresponding weight of corresponding lexical units with a corresponding predefined threshold value.
- a lexical unit from a plurality of lexical units from a lexical unit database is meaningless if its weight exceeds a predefined threshold value. Since the lexical unit weight and the predefined threshold value are present in the database it is possible to define the lexical unit as meaningful or meaningless directly by accessing the database.
- meaningful lexical units and meaningless lexical units can be stored in a separate database.
- the first lexical units database 108 can store only meaningless lexical units.
- Lexical units from the first lexical units database 108 can be associated with their weight calculated using the first formula, i.e. a weight in relation to the entire array of the e-mail messages sent to the user 121 for the lifetime of the account of the user 121 on the mail server 102.
- lexical units from the first lexical units database 108 can be associated with their weight calculated using the first formula, i.e. a weight in relation to the entire array of the e-mail messages sent to the user 121 for a predefined preceding period.
- the message analysis module 106 also interacts with the second lexical unit database 110.
- the second lexical unit database 110 is a structured data collection which includes lexical units.
- the second lexical unit database 110 is implemented using the same hardware as used for performing information storage or use; the information is recorded in the database.
- the second lexical unit database 110 much akin to the first lexical unit database 108 can be implemented also using separate hardware such as a single-unit server or plurality of servers.
- the second lexical unit database 110 is a database was generated using results of a syntax analysis of the plurality of all the e-mail messages sent to all the users which have mail accounts on the mail server 102 and received by these users during lifetime of their accounts.
- the second lexical unit database 110 is a database can be generated using results of a syntax analysis of the plurality of all the e-mail messages sent to all the users which have mail accounts on the mail server 102 and received by these users during the preceding year.
- a period can be any period, more than one year or less than one year.
- Each of the plurality of the lexical units from the second database 110 can be marked as a meaningful lexical unit or as a meaningless lexical unit.
- meaningful lexical units and meaningless lexical units can be stored in separate databases.
- the database can store only meaningless lexical units.
- the second lexical units database 110 stores the information associated with weights of the lexical units which weights were calculated using the second formula, i.e. a weight in relation to the entire array of the e-mail messages sent to all the e-mail users which have accounts on the mail server 102.
- a weight in relation to the entire array of the e-mail messages sent to all the e-mail users which have accounts on the mail server 102.
- all the e-mail messages received during the lifetime of each account of each user which has an account on the mail server 102 are taken into account.
- the e-mail messages received during the preceding year are taken into account.
- such a period can be any period, more than one year or less than one year.
- the mail server 102 is connected with the communications network 112 over a communication link (not separately numbered).
- the communication network 112 can be the Internet.
- the communication network 112 can be implemented alternatively as a wide area network or local area network, private network and the like.
- a connection of the mail server 102 to the communication network 112 can be performed using wireless communications or an Ethernet-based connection.
- the mail server 102 is connected to the first client device 122 via the communication network 112.
- the first client device 122 is typically associated with the user 121.
- the user 121 is a person who has an e-mail account on the mail server 102.
- the first client device 122 is implemented as DellTM Precision T1700 MT CA033PT170011RUWS PC with Intel® XeonTM processor, CPU frequency 3300 MHz, video card nVTDIA Quadro K2000, running the Windows 7® Pro 64-bit operating system, the operating system installed and active.
- the implementation of the first client device 122 is not particularly limited. The first client device
- 122 may be implemented as a personal computer (desktops, laptops, netbooks, etc.), a wireless communication device (a cell phone, a smartphone, a tablet and the like), as well as other equipment.
- a personal computer desktops, laptops, netbooks, etc.
- a wireless communication device a cell phone, a smartphone, a tablet and the like
- the first client device 122 includes the storage media 124 implemented as a 500 Gb hard drive.
- the storage media 124 can be implemented as storage media of any nature and kind whatsoever, including RAM, ROM, disks (CD-ROMs, DVDs, floppy disks, hard drivers, etc.), USB keys, solid state-drives, tape drives, etc and combination thereof.
- the storage media 124 can store user's files and program instructions. More specifically, the storage media 124 can store software which execute functions of the browser 126. Generally, the purpose of the browser 126 is to enable the user 121 to connect to the mail server 102 and receive e-mail messages by means of a web-interface and show received and sent e-mail messages on a display 128.
- the browser 126 is implemented as the mobile browser YandexTM.
- the implementation of the browser 126 is not particularly limited. As non-limiting examples such browsers can be YandexTM browser, Google ChromeTM, Internet ExplorerTM, various mobile search applications and the like. It should be expressly understood that any other commercially available or proprietary application may be used for implementing non-limiting embodiments of the present technology.
- the first client device 122 further includes the display 128 implemented as a 21,5" DellTM E2214H 2214-7803, 1920x1080 screen resolution, which can provide video information to the user 121.
- the user 121 is able to see on the display 128 in the interface of the browser 126 of the first client device 122 various objects, incoming and outgoing e-mail messages, and abstracts of the incoming e-mail messages.
- the mail server 102 is connected to a second client device 132 via the communication network 112.
- the second client device 132 is typically associated with the user 131.
- the user 121 is an individual person who utilizes his e-mail account for personal use purposes and sends (using the given account) personal e-mail messages.
- the structure and characteristics of a private e-mail message can differ from the structure and characteristics of e-mail messages of other types (for example, from e-mail messages which include, for example, e-tickets, promotions and other deals).
- computer methods of processing and analyzing e-mail messages carried out by the message analysis module 106 of the mail server 102 can identify and classify the messages sent by the user 131 to the user 121 as private messages.
- the user 131 can be a sender of e-mail messages to various users, including the user 121 and/or other users who have e-mail accounts on the mail server 102 or on any other mail server.
- An e-mail account of the user 131 can be hosted on any suitable mail server, including the mail server 102.
- the user 131 uses the second client device 132 implemented as an AppleTM iPhone 5S smartphone running iOS 7 operating system (installed and active), with Bluetooth, Wi-Fi, 3G, LTE, GPS (global position system).
- the second client device 132 may be implemented as a personal computer (desktops, laptops, netbooks, etc.), a wireless communication device (a cell phone, a smartphone, a tablet and the like), as well as other equipment.
- the second client device 132 includes the storage media 134 implemented as a 500 Gb hard drive.
- the storage media 134 of the second client device 132 can be implemented as storage media of any nature and kind whatsoever, including RAM, ROM, disks (CD-ROMs, DVDs, floppy disks, hard drivers, etc.), USB keys, solid state-drives, tape drives, etc and combination thereof.
- the storage media 134 of the second client device 132 can store user's files and program instructions. More specifically, the storage media 134 of the second client device 132 can store software which execute functions of an e-mail client 136.
- the purpose of the e-mail client 136 is to enable the user 131 to connect to the mail server (in some cases it can be the mail server 102) and receive e-mail messages by means of the web-interface of the e-mail client 136 and show received and sent e-mail messages on the display 138.
- the e-mail client 136 is implemented as TriageTM. However, as those skilled in the art will understand, the implementation of the e-mail client 136 is not particularly limited.
- e-mail clients can be MailboxTM, EvomailTM, DispatchTM, Inky MailTM, SeedTM, myMailTM, BoxerTM etc.
- the functions of the e- mail client i.e. message receiving and sending, e-mail message demonstration using the display 138
- the second client device 132 further includes the display 128 which is a 4" touch screen with 640x1136 resolution, which allows to provide information to the user 131 and which can be used as an input device.
- the user 131 is able to see on the display 128 in the interface of the browser 126 of the second client device 132 various objects, incoming and outgoing e-mail messages, and abstracts of the incoming e-mail messages.
- the mail server 102 is connected to a third client device 142 via the communication network 112.
- the third client device 142 is usually associated with the user 141.
- the user 141 is an employee of a marketing company, which user uses his e-mail account for the purpose defined by clients of the marketing company.
- the third client device user 141 can send a plurality of e-mail messages from the client device 142 which messages can be classified and grouped into some conventional groups using specific parameters.
- various e-mail messages sent by the user 141 using the client device 142 can be classified as adverts and/or information messages and/or transactional message and/or personal notifications and the like.
- the message classification can be carried out by means of both an analysis of message contents using key words, specific terms and an analysis of an e- mail message code, for example, markup characteristics, and determining the usage of specific HTML-templates and the like.
- An HTML-template can be a message layout, including the HTML-formatting which sets a design and location of all the design elements.
- computer e-mail processing and analyzing methods carried out by the message analysis module 106 of the mail server 102 can identify and classify the messages sent by the user 141 from the client device 142 to the user 121 as adverts and/or information messages and/or transactional message and/or personal notifications and the like. Further computer e-mail processing and analyzing methods carried out by the message analysis module 106 of the mail server 102 can identify in such a message logical block of the HTML-code including HTML-code blocks which comprise text.
- the user 141 can be a sender of e-mail messages to various users, including the user 121 and/or other users who have e-mail accounts on the mail server 102 or on any other mail server.
- An e-mail account of the user 141 can be hosted on any suitable mail server, including the mail server 102.
- the third client device 142 includes a storage media (not depicted).
- the third client device 142 can execute a web-browser and/or e-mail client (not depicted.
- the third client device 142 can also comprise a display (not depicted). As those skilled in the art will understand, implementations of the third client device 142 are not particularly limited and well-known in the art.
- the third client device 142 may be implemented as a personal computer (desktops, laptops, netbooks, etc.), a wireless communication device (a cell phone, a smartphone, a tablet and the like), as well as other equipment. Hence, the third client device 142 will not be described in details.
- the first client device 122, the second client device 132 and the third client device 142 can be implemented in such a way that they will be able to sent other text massages, and the teachings presented herein should not be limited to e-mail messages.
- the first client device 122, the second client device 132 and the third client device 142 can be implemented as mobile phones which allow to perform sending and receiving SMS messages and which allow to perform the syntax analysis of text messages.
- FIG. 2 is an illustration of an e-mail message 200 sent by the user 141 from the client device 142 depicted in Fig. 1 to an user 121 depicted in Fig. 1.
- the e-mail message 200 comprises a sender's e-mail address 201.
- the sender of the e-mail message 200 is the user 141.
- the e-mail message 200 also comprises a name 202 (John Smith) and a receiver's e-mail address (johnsmith@company.com).
- the receiver of the e-mail message 200 is the user 121.
- the e-mail message 200 also comprises a subject 204 of the e-mail message 200.
- the subject 204 of the e-mail message 200 is «Moscow, 11 November 2014: Open Innovations Conference)).
- the e-mail message body comprises images and text. More specifically, the e-mail message body comprises text fragments 206, 208, 210, 212 H 214, which as non-limiting examples, from the HTML-structure of the e-mail message 200 perspective, can be separate paragraphs and/or separate tables, and/or separate table cells.
- Fig. 3 is an image of a fragment of a web-interface 300 of the e-mail service of the user 121, in which an «Inbox» tab 302 (i.e. a tab of incoming messages) is active.
- Fig. 3 is an image of the fragment of the web-interface 300 of the e-mail service, the web-interface 300 implemented in accordance with known techniques.
- the line 304 also comprises the subject 204 of the e-mail message 200.
- the line 304 also comprises an abstract 310 of the incoming e-mail message 200 «Moscow, Russia», which is the text of the first line of the text 206, placed in the very beginning of the body of the e-mail message 200.
- the abstract 310 of the incoming e-mail message is an abstract which includes any lexical units - both potentially meaningful and potentially meaningless.
- the abstract 310 of the incoming e-mail message is generated without performing the HTML-structure analysis of the e-mail message 200 and without performing the lexical analysis of texts from HTML-code logical blocks of the e-mail message 200.
- Fig. 4 is an image of a fragment of a web-interface 400 of the e-mail service of the user 121, in which an «Inbox» tab 402 (i.e. a tab of incoming messages) is active.
- Fig. 4 is an image of the fragment of the web-interface 400 of the e-mail service as it can be implemented in accordance with one of non-limiting implementations of the present technology.
- the fragment of the web-interface 400 of the e-mail service shows the user 121 received the e-mail message 200 from the user 141 and the client device 142.
- a line 404 including the sender's e-mail address 201 is displayed.
- the sender of the e-mail message 200 is the user 141.
- the line 404 also comprises the subject 204 of the e-mail message 200.
- the line 404 also comprises an abstract 410 of the incoming e-mail message 200 «Early bird registration fees available)), which is a part of the text 212, placed in the middle of the e- mail message 200.
- the abstract 410 of the incoming e-mail message 200 is an abstract which includes meaningful lexical units.
- the abstract 410 of the incoming e-mail message is generated with performing the HTML-structure analysis of the e-mail message 200 and with performing the lexical analysis of texts from HTML-code logical blocks of the e-mail message 200.
- Fig. 5 is a block-diagram of a method 500 executed on the mail server 102 of Fig. 1 and implemented in accordance with non-limiting embodiments of the present technology.
- a method 500 is a method of computer processing of an incoming text message sent to a user and in the given implementation of the present technology - of the e-mail message 200, which includes text which includes meaningful and meaningless lexical units.
- the method 500 can be executed on the mail server 102 depicted in Fig. 1.
- the mail server 102 includes the storage media 104 which stores computer-readable instructions, which when executed, are configured to cause the mail server 102 to execute the steps of the method 500.
- the method 500 can be executed on other servers.
- the mail server 102 receives from a plurality of users of variety of e-mail services messages sent to different users, including the user 121.
- Step 502 performing the syntax analysis of the e-mail message 200 and determining at least one lexical unit as a potential meaningless lexical unit.
- the method 500 begins at step 502, where the mail server 102 depicted in Fig. 1 performs the syntax analysis of the e-mail message 200.
- the message analysis module 106 performs the syntax analysis for determining meaningful and meaningless lexical units in the e-mail message 200.
- performing the syntax analysis of the e-mail message 200 includes the markup language analysis of the e-mail message 200.
- the markup language analysis of the e-mail message 200 includes the analysis of a font type, a font size, a font face, punctuation marks and special marks.
- some sentences can be detected as separate lexical units. Further, the sentence determination can be a basis for further analysis using the analysis of a font type, a font size, a font face.
- An indication of the end of a sentence can be both punctuation marks (for example, a dot, an exclamation mark, an ellipsis and the like) and special marks (for example, a paragraph mark, a tabulation mark, a page break mark and the like).
- performing the syntax analysis of the e-mail message 200 includes performing the syntax analysis of the most significant part of the e-mail message 200 and does not include performing the syntax analysis of other parts of the e-mail message 200.
- performing the syntax analysis of the e-mail message 200 can be performed using the entire text of the e-mail message or using separate parts of the message (for example, the first three paragraphs, or the first two paragraphs after a paragraph with a title or parts of the analysis can be chosen using any other parameter).
- Detecting at least one lexical unit as a potential meaningless lexical unit can be a result of performing the syntax analysis of the e-mail message 200.
- the lexical unit can be a separate word in any form.
- the word "Hi” is a lexical unit.
- a separate sentence can be a lexical unit as well.
- the sentence "Your order is delivered" can be a lexical unit.
- a phrase from a sentence can be a lexical unit.
- the phrase "Your order is delivered” can be a lexical unit.
- a paragraph from the e-mail message can be a lexical unit.
- lexical units can carry a meaning.
- the lexical units described above are meaningful, i.e. can be considered as some complete informational units.
- lexical units do not necessarily carry meaning. It also can be, for example, word combinations which can be incomplete informational units per se, out of context of other words and word combinations.
- performing the syntax analysis includes the markup language analysis of the e-mail message 200.
- determining at least one lexical unit as a potential meaningless lexical unit can be performed using other methods. For example, for determining a lexical unit aside from the markup analysis of the e-mail message 200 itself an additional check for the presence in the first line of the words typical for a title (for example, "Dear”, “Good morning” "Hi” and the like) can be executed. The presence of such key words in combination with some markup templates can be used for determining lexical units.
- Step 506 - performing a first check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from the first lexical unit database 108.
- a first check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from the first lexical unit database 108 is performed.
- the first lexical unit database 108 is a database which was generated using results of a syntax analysis of the entirety of the e-mail messages sent to the user 121 and received by the user 121 during lifetime of the account of the user 121 on the mail server 102.
- the first lexical unit database 108 can be generated using results of a syntax analysis of the entirety of the e-mail messages sent to the user 121 and received by the user 121 during a specific period, for example, for the preceding year. As those skilled in the art will understand, such a period can be any period, more than one year or less than one year.
- each of the plurality of the lexical units from the first database 108 is marked as a meaningful lexical unit or as a meaningless lexical unit. Matching the potential meaningless lexical units will be performed with the meaningless lexical units.
- the presence of meaningful lexical units in the first database 108 can be based on the fact that weight of all the lexical units in the first database 108 can be adjusted as new messages arrive and as analysis of these messages is performed. Accordingly, the presence of meaningful lexical units in the first database 108 can be necessary for calculating and re-calculating weight of these lexical units and if the weight exceeds the predefined threshold value a meaningful lexical unit from the first database 108 can be considered as a meaningless lexical unit.
- performing the first check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from the first lexical unit database 108 allows determining meaningfulness or meaninglessness of the potential meaningless unit using the database which was generated in relation to a plurality of e-mail messages sent to the user 121.
- the determination is made is the potential meaningless unit is meaningful or meaningless specifically to the given user 121.
- matching the potential meaningless lexical unit with meaningless lexical units from the first lexical unit database 108 is performed by means of comparison a potential meaningless lexical unit with meaningless lexical units from the first lexical unit database 108 using a predefined parameter.
- such a parameter can be a sequence of characters in the potential meaningless lexical unit and in the lexical units from the first lexical unit database 108.
- the check of the potential meaningless lexical unit by means of comparison with meaningless lexical units from the first lexical unit database 108 is performed character by character.
- the potential meaningless lexical units and meaningless lexical units from the first lexical unit database 108 can have control sums. These control sums can be preliminary calculated and represented in bytes. In relation to the potential meaningless lexical units and lexical units from the first lexical unit database 108 which have the control sums the check can be performed in two stages. In the first stage a control sum of the potential meaningless lexical unit is compared with the control sums of the lexical units from the first lexical unit database 108.
- control sum of the potential meaningless lexical unit is the same as a control sum of any of the meaningless lexical units from the first lexical unit database 108 then in some implementations the potential meaningless lexical unit is immediately defined as a meaningless lexical unit. In alternative implementations if the sums are the same then the method proceeds to execution of an additional step where verifying by the mean of character by character comparison of the potential meaningless lexical unit with a lexical units from the first lexical unit database 108 which control sum is the same as the control sum of the potential meaningless lexical unit is performed.
- step 508 based on the results of the check one of the two decisions is made.
- the result of the check is positive (step 510)
- the method proceeds to step 522 where the potential meaningless lexical unit is defined as a meaningless lexical unit.
- the potential meaningless lexical unit as a meaningless lexical unit causes that fact that thereafter when generating the abstract of the e-mail message 200 the given meaningless lexical unit is not included in this abstract.
- the method 500 then terminates.
- step 512 In case the result of the check is negative (step 512), i.e. when the check shows that the potential meaningless lexical unit is not the same as any of the meaningless lexical unit from the lexical unit database 108, the method proceeds to step 514.
- Step 514 - performing a second check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from the second lexical unit database 110
- a second check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from the second lexical unit database 110 is performed.
- the second lexical unit database 110 is a database was generated using results of a syntax analysis of the plurality of all the e-mail messages sent to all the users which have mail accounts on the mail server 102 and received by these users during lifetime of their accounts.
- the second lexical unit database 110 is a database can be generated using results of a syntax analysis of the entirety of the e-mail messages sent to all the users who have mail accounts with the mail server 102 and received by these users during the preceding year. As those skilled in the art will understand, such a period can be any period, more than one year or less than one year.
- each of the plurality of the lexical units from the second database 110 is marked as a meaningful lexical unit or as a meaningless lexical unit. Matching the potential meaningless lexical units will be performed with the meaningless lexical units.
- the presence of meaningful lexical units in the second database 110 can be based on the fact that weight of all the lexical units in the second database 110 can be adjusted as new messages arrive and as analysis of these messages is performed. Accordingly, the presence of meaningful lexical units in the second database 110 can be necessary for calculating and re- calculating weight of these lexical units and if the weight exceeds the predefined threshold value a meaningful lexical unit from the first database 110 can be considered as a meaningless lexical unit.
- performing the second check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from the second lexical unit database 110 allows to determine meaningfulness or meaninglessness of the potential meaningless unit in relation to the database which was generated in relation to the entirety of e-mail messages sent to all the users who have accounts with the mail server 102 and received by these users during the lifetime of these accounts.
- the determination that is made during this analysis is whether the potential meaningless unit is meaningful or meaningless for entirety of the users, and not specifically for the user 121.
- matching the potential meaningless lexical unit with meaningless lexical units from the second lexical unit database 110 is performed by means of comparison a potential meaningless lexical unit with meaningless lexical units from the second lexical unit database 110 using a predefined parameter.
- such a parameter can be a sequence of characters in the potential meaningless lexical unit and in the lexical units from the second lexical unit database 110.
- the check of the potential meaningless lexical unit by means of comparison with meaningless lexical units from the second lexical unit database 110 is performed character by character.
- the potential meaningless lexical units and meaningless lexical units from the second lexical unit database 110 can have control sums. These control sums can be pre- determined and expressed in bytes. In relation to the potential meaningless lexical units and lexical units from the second lexical unit database 110 which have the control sums the check can be performed in two stages.
- a control sum of the potential meaningless lexical unit is compared with the control sums of the lexical units from the second lexical unit database 110. If the control sum of the potential meaningless lexical unit is the same as a control sum of any of the meaningless lexical units from the second lexical units database 110 then in some implementations the potential meaningless lexical unit is determined as a meaningless lexical unit. In alternative implementations if the sums are the same then the method proceeds to execution of an additional step where verifying by the mean of character by character comparison of the potential meaningless lexical unit with a lexical units from the second lexical unit database 110 which control sum is the same as the control sum of the potential meaningless lexical unit is performed.
- step 516 based on the results of the check one of the two decisions is made.
- the result of the check is positive (step 518), i.e. when the check shows that the potential meaningless lexical unit is the same as any of the meaningless lexical unit from the second lexical unit database 110
- the method proceeds to step 522 where the potential meaningless lexical unit is defined as a meaningless lexical unit. Determining that the potential meaningless lexical unit as a meaningless lexical unit causes that fact that thereafter when generating the abstract of the e-mail message 200 the given meaningless lexical unit is not included in this abstract.
- the method 500 then terminates.
- step 516 In case at step 516 the result of the check is negative (step 520), i.e. when the check shows that the potential meaningless lexical unit is not the same as any of the meaningless lexical unit from the second lexical unit database 110, the method proceeds to step 524 where the potential meaningless lexical unit is defined as a meaningful lexical unit. Determining that the potential meaningful lexical unit as a meaningless lexical unit causes that fact that thereafter when generating the abstract of the e-mail message 200 the given meaningless lexical unit can be included in this abstract. [266] The method 500 then terminates.
- the method 500 of computer processing of an incoming e- mail message sent to the user which message includes a text is carried out to determine meaningless lexical units in the entire e-mail message or in a part of the e-mail message.
- an abstract can be generated, which abstract does not include meaningless lexical units.
- the abstract can include meaningful lexical units only.
- the method includes generating the abstract 410 of the e-mail message 200 and the abstract 410 of the e-mail message 200 is generated in such a way that there are no meaningless lexical units in the abstract 410 of the e-mail message 200.
- the method includes generating the abstract 410 of a part of the e-mail message 200 and the abstract 410 of the part of the e-mail message 200 is generated in such a way that there are no meaningless lexical units in the abstract 410 of the part of the e-mail message 200.
- the abstract of the e-mail message 200 can be an abstract of a predefined number of paragraphs in the beginning of an e-mail message.
- generating the abstract 410 of the part of the e-mail message 200 is generating an abstract of the most significant part of the e-mail message.
- the most significant part of the e-mail message 200 is defined as the most significant logical block of the HTML code from the plurality of the logical blocks of the HTML code comprising text.
- Logical blocks of the HTML code can be defined by the e-mail message analysis module 106 of the mail server 102.
- the most significant logical block of HTML code is a block of HTML code which comprises text the size of which is larger than the size of any other logical block of HTML code of the given e-mail message.
- a size of a text can be defined using a number of characters including or excluding punctuation marks and spaces.
- the most significant logical block of HTML code is a block of HTML code which comprise text; the text of the most significant logical block of HTML code contributing the majority of meaningful lexical units in comparison with the text of any other logical block of HTML code of the given e-mail message. Meaningful lexical units can be defined by the e-mail message analysis module 106 of the mail server 102.
- Fig. 6 and Fig. 7 are block-diagrams of a method 600 executed on the mail server 102 of Fig. 1 and implemented in accordance with non-limiting embodiments of the present technology.
- the method 600 is a computer implemented two-stage method of determining meaningless lexical units in a text message.
- the text message is the e-mail message 200.
- the method 600 can be executed for performing a check using any lexical unit database.
- the check can be performed using the first lexical unit database 108 and/or the second lexical unit database 110 and/or the third lexical unit database (not depicted) which can be generated, and the like.
- the method 600 can be executed on the mail server 102 depicted in Fig. 1.
- the mail server 102 includes the storage media 104 which stores computer-readable instructions, which when executed, are configured to cause the mail server 102 to execute the steps of the method 600.
- the method 600 can be executed on other servers.
- the mail server 102 receives from a plurality of users of variety of e-mail services messages sent to different users, including the user 121.
- Step 602 performing the syntax analysis of the e-mail message 200.
- the method 600 begins at step 602, where the mail server 102 depicted in Fig. 1 performs the syntax analysis of the e-mail message 200.
- the message analysis module 106 performs the syntax analysis for determining meaningful and meaningless lexical units in the e-mail message 200.
- performing the syntax analysis of the e-mail message 200 includes the markup language analysis of the e-mail message 200.
- the markup language analysis of the e-mail message 200 includes an analysis of HTML tags of the e-mail message.
- the message analysis module 106 defines message blocks comprising text using tags which mark the beginning and the end of text blocks, paragraphs, table cells.
- the markup language analysis of the e-mail message can include the analysis of a font type, a font size, a font face, punctuation marks and special marks.
- some sentences can be detected as separate lexical units. Further, the sentence determination can be a basis for further analysis using the analysis of a font type, a font size, a font face.
- the indication of the end of a sentence can be both punctuation marks (for example, a dot, an exclamation mark, an ellipsis and the like) and special marks (for example, a paragraph mark, a tabulation mark, a page break mark and the like).
- performing the syntax analysis of a text message can be performed using the entire text of the message or using a particular part of the message (for example, the first three paragraphs, or the first two paragraphs after a paragraph with a title or parts of the analysis can be chosen using any other parameter) or the most significant part of the text message.
- the method can further include receiving the incoming e-mail message 200.
- Step 606 can be performed in the same way as step 502, the described above and it will not be described in details.
- the message analysis module 106 selected as a first potential meaningless lexical unit a text fragment which fragment is an entire paragraph (not depicted) which includes two sentences (not depicted).
- Step 606 determining a control sum of the first potential meaningless lexical unit.
- step 606 determining a control sum of the first potential meaningless lexical unit is performed.
- control sum of the first potential meaningless lexical unit is a combination of the following control elements: a number of words in the control unit, a number of letters in the control unit, a number of numbers in the lexical unit, a number of dots in the lexical unit, a number of commas in the lexical unit.
- the message analysis module 106 detected that the first potential meaningless lexical unit comprises 44 words, 268 letters, 9 numbers, two dots and two commas.
- control sum of the first potential meaningless lexical unit can be a combination of any of control elements including the following: a number of characters in a lexical unit, a number of letters in the lexical unit, a number of capital letters in the lexical unit, a number of lower-case letters in the lexical unit, a number of spaces in the lexical unit, a number of numbers in the lexical unit, a number of special marks, a number of words in the lexical unit, a size of the lexical unit expressed in information handling and storage units.
- the message analysis module 106 For determining the control element of the first potential meaningless lexical unit, presented in the information handling and storage units, for example, in bytes, the message analysis module 106 performs an assessment of a corresponding non formatted text which is the lexical unit which is the first potential meaningless lexical unit. As those skilled in the art will understand, selection of a specific assessment method is not particularly important. In other words, it is possible to select different methods. In addition to the above once selected, the method has to be applied in succession for these control elements should be identical while calculating control elements of two or more identical lexical units presented in information handling and storage units.
- Step 608 matching using the first parameter, the first potential meaningless lexical unit with lexical units from the lexical unit database.
- step 608 matching using the first parameter, the first potential meaningless lexical unit with lexical units from a plurality of lexical units from the lexical units database is performed, where matching using the first parameter is matching the control sum of the first potential meaningless lexical unit with control sums of meaningless lexical units from the second lexical units database 110.
- matching using the first parameter is matching using the first set of control elements, more specifically, the following five control elements from the database 110 in relation to every lexical unit: 1) a number of words in the control unit, 2) a number of letters in the control unit 3) a number of numbers in the lexical unit, 4) a number of dots in the lexical unit, 5) a number of commas in the lexical unit.
- matching control sums using the first parameter can be matching hash-code of the first potential meaningless lexical unit with hash-codes of meaningless lexical units from the second database 110.
- the message analysis module 106 checks the lexical unit database 110 for presence such lexical units which have the same control sums as the control sum of the first potential meaningless lexical unit.
- control sum of the first potential meaningless lexical unit corresponds to a control sum of a meaningless lexical unit from the lexical unit database if (a) these two control sums are identical, or (b) these two control sums are not identical but the difference is insignificant, i.e. the difference is within a predefined permissible amplitude of the difference.
- step 610 checking for equivalence of control sums is performed.
- step 612 If such an accurate match is detected (step 612) the method 600 then proceeds to execution of step 626 where the potential meaningless lexical unit is defined is a meaningless lexical unit.
- step 614 If such an identical match is not detected (step 614) the method 600 then proceeds to execution of step 618 where a check for a measure of difference of control sums is performed.
- the measure of the difference is determined for each control element from a control sum and the amplitude of the difference is defined for each control element from a control sum.
- the amplitude is defined as a maximum permissible measure of deviation which is presented as coefficients of permissible deviation which are used in relation to control elements from the lexical unit database 110.
- coefficients of a permissible deviation are defined as follows: 0.018 for words, 0.01 for letters, 0.5 for numbers; 0 for commas. After the application of the deviation coefficients all results are rounded up.
- the first potential meaningless lexical unit is defined as a meaningless lexical unit (step 626).
- step 622 If in the previous example the deviation of at least one control element had exceeded the amplitude of permissible deviation, the control sums of the first potential meaningless lexical unit and the meaningless lexical unit from the database 110 would not have been considered as matching (step 622) and in this case the first potential meaningless lexical unit would not have been defined as a meaningless lexical unit. In this case the method 600 proceeds to step 628.
- Step 628 subdividing the first potential meaningless lexical unit into at least two smaller lexical units and determining at least one smaller lexical unit as the second potential meaningless lexical unit.
- step 628 subdividing the first potential meaningless lexical unit into at least two smaller lexical units and determining at least one smaller lexical unit as the second potential meaningless lexical unit is performed.
- Subdividing the first potential meaningless lexical unit to at least two smaller lexical units is performed by the message analysis module 106 by means of performing the syntax analysis of the first potential meaningless lexical unit as if the first potential meaningless lexical unit were an entire text message.
- the first potential meaningless lexical unit (which in this example is a paragraph) is subdivided to some lesser meaningless lexical units which can be sentences.
- the first potential meaningless lexical unit (which is a paragraph which includes two sentences) is subdivided into two smaller meaningless lexical units which are sentences from the same paragraph.
- the first potential meaningless lexical unit is subdivided to two or more lesser meaningless lexical units and such lesser meaningless lexical units can be words, phrases, collocations, sentences, abbreviations, characters, dates, acronyms (including commonly-accepted ones), lexically meaningful combining forms of a compound words from a natural language and also their equivalent code notations and symbolic notations from an artificial language and the like.
- step 630 The method 600 then proceeds to step 630.
- Step 630 determining a control sum of the second potential meaningless lexical unit.
- control sum of the second potential meaningless lexical unit is performed.
- the control sum of the second potential meaningless lexical unit which is a lexical unit is intended to mean any quantitative characteristic which characterizes the lexical unit in an unbiased manner.
- control sum of the second potential meaningless lexical unit is a combination of the following control elements: a number of words in the control unit, a number of letters in the control unit, a number of numbers in the lexical unit, a number of dots in the lexical unit, a number of commas in the lexical unit.
- the message analysis module 106 detected that the second potential meaningless lexical unit comprises 19 words, 92 letters, 6 numbers, one dot and two commas.
- control sum of the second potential meaningless lexical unit can be a combination of any of control elements including the following: a number of characters in a lexical unit, a number of letters in the lexical unit, a number of capital letters in the lexical unit, a number of lower-case letters in the lexical unit, a number of spaces in the lexical unit, a number of numbers in the lexical unit, a number of special marks, a number of words in the lexical unit, a size of the lexical unit expressed in information handling and storage units.
- Step 632 - performing a second check of at least one potential meaningless lexical unit by means of comparison with meaningless lexical units from the lexical unit database 110.
- the message analysis module 106 performs matching (using the second parameter) the second potential meaningless lexical unit with lexical units from the lexical units database 110, where matching using the second parameter is matching the control sum of the second potential meaningless lexical unit with control sums of meaningless lexical units from the lexical units database.
- matching using the second parameter is matching using the second set of control elements, more specifically, the following five control elements from the database 110 in relation to every lexical unit: 1) a number of words in the control unit, 2) a number of letters in the control unit 3) a number of numbers in the lexical unit, 4) a number of dots in the lexical unit, 5) a number of commas in the lexical unit.
- the first set of control elements and the second set of control elements are the same.
- the first set of control elements and the second set of control elements can be different.
- matching control sums using the first parameter can be matching hash-code of the first potential meaningless lexical unit with hash- codes of meaningless lexical units from the second database 110.
- the message analysis module 106 checks the lexical unit database 110 for presence of such lexical units which have the same control sums as the control sum of the second potential meaningless lexical unit.
- control sum of the second potential meaningless lexical unit matches the control sum of a meaningless lexical unit from the lexical unit database if (a) these two control sums are identical, or (b) these two control sums are not identical but the difference is insignificant, i.e. the difference is within a predefined permissible amplitude of the difference.
- step 632 If such an accurate match is detected (step 632) the method 600 then proceeds to execution of step 648 where the potential meaningless lexical unit is defined is a meaningless lexical unit. The method 600 then terminates.
- step 640 a check for a measure of difference of control sums is performed.
- the measure of the difference is determined for each control element from a control sum and the amplitude of the difference is defined for each control element from a control sum.
- the amplitude is defined as a maximum permissible measure of deviation which is presented as coefficients of permissible deviation which are used in relation to control elements from the lexical unit database 110.
- coefficients of a permissible deviation are defined as follows: 0.018 for words, 0.01 for letters, 0.5 for numbers; 0 for commas. After the application of the deviation coefficients all results are rounded up. Accordingly, all the calculations are performed in the same way, as it was described above in relation to the check of the measure of deviation of the control sums of the first potential meaningless lexical unit and the meaningless lexical unit from the database 110.
- step 646 matching the measure of deviation at step 638 will show that the deviation is within the permissible deviation (step 646), control sums of the meaningless lexical unit and the first potential meaningless lexical unit will be considered corresponding, because the parameters of the first potential meaningless lexical unit will be within the amplitude of permissible deviation.
- the second potential meaningless lexical unit is defined as a meaningless lexical unit (step 648).
- step 642 If in the previous example the deviation of at least one control element had exceeded the amplitude of permissible deviation, the control sums of the second potential meaningless lexical unit and the meaningless lexical unit from the database 110 would not have been considered as corresponding (step 642) and in this case the second potential meaningless lexical unit would have been defined as a meaningful lexical unit (step 644).
- retrieving an electronic or other signal from corresponding client device can be used, and displaying on a screen of the device can be implemented as transmitting a signal to the screen, the signal includes specific information which further can be interpreted with specific images and at least partially displayed on the screen of the client device.
- Sending and receiving the signal is not mentioned in some cases within the present description to simplify the description and as an aid to understanding.
- Signals can be transmitted using optical methods (for example, using fiber-optic communication), electronic methods (wired or wireless communication), mechanic methods (transmitting pressure, temperature and/or other physical parameters by means of which transmitting a signal is possible.
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computer Hardware Design (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
- Information Transfer Between Computers (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
RU2014147904A RU2014147904A (ru) | 2014-11-28 | 2014-11-28 | Способ выявления незначащих лексических единиц в текстовом сообщении и компьютер |
RU2014147904 | 2014-11-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016083908A1 true WO2016083908A1 (en) | 2016-06-02 |
Family
ID=56073690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IB2015/054486 WO2016083908A1 (en) | 2014-11-28 | 2015-06-12 | System and method for computer processing of an e-mail message and visual representation of a message abstract |
Country Status (2)
Country | Link |
---|---|
RU (1) | RU2014147904A (ru) |
WO (1) | WO2016083908A1 (ru) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001029709A1 (en) * | 1999-10-20 | 2001-04-26 | Ali Hussam | System and method for location, understanding and assimilation of digital documents through abstract indicia |
US20020103836A1 (en) * | 1999-04-08 | 2002-08-01 | Fein Ronald A. | Document summarizer for word processors |
US20060206569A1 (en) * | 2005-03-11 | 2006-09-14 | Niklas Heidloff | Smart size reduction of a local electronic mailbox by removing unimportant messages based on an automatically generated user interest profile |
US7222299B1 (en) * | 2003-12-19 | 2007-05-22 | Google, Inc. | Detecting quoted text |
US20070244692A1 (en) * | 2006-04-13 | 2007-10-18 | International Business Machines Corporation | Identification and Rejection of Meaningless Input During Natural Language Classification |
US20080282153A1 (en) * | 2007-05-09 | 2008-11-13 | Sony Ericsson Mobile Communications Ab | Text-content features |
US20090197225A1 (en) * | 2008-01-31 | 2009-08-06 | Kathleen Marie Sheehan | Reading level assessment method, system, and computer program product for high-stakes testing applications |
US7836061B1 (en) * | 2007-12-29 | 2010-11-16 | Kaspersky Lab, Zao | Method and system for classifying electronic text messages and spam messages |
US20110258181A1 (en) * | 2010-04-15 | 2011-10-20 | Palo Alto Research Center Incorporated | Method for calculating semantic similarities between messages and conversations based on enhanced entity extraction |
US20130179800A1 (en) * | 2012-01-05 | 2013-07-11 | Samsung Electronics Co. Ltd. | Mobile terminal and message-based conversation operation method for the same |
US20140250219A1 (en) * | 2012-05-30 | 2014-09-04 | Douglas Hwang | Synchronizing translated digital content |
-
2014
- 2014-11-28 RU RU2014147904A patent/RU2014147904A/ru not_active Application Discontinuation
-
2015
- 2015-06-12 WO PCT/IB2015/054486 patent/WO2016083908A1/en active Application Filing
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020103836A1 (en) * | 1999-04-08 | 2002-08-01 | Fein Ronald A. | Document summarizer for word processors |
WO2001029709A1 (en) * | 1999-10-20 | 2001-04-26 | Ali Hussam | System and method for location, understanding and assimilation of digital documents through abstract indicia |
US7222299B1 (en) * | 2003-12-19 | 2007-05-22 | Google, Inc. | Detecting quoted text |
US20060206569A1 (en) * | 2005-03-11 | 2006-09-14 | Niklas Heidloff | Smart size reduction of a local electronic mailbox by removing unimportant messages based on an automatically generated user interest profile |
US20070244692A1 (en) * | 2006-04-13 | 2007-10-18 | International Business Machines Corporation | Identification and Rejection of Meaningless Input During Natural Language Classification |
US20080282153A1 (en) * | 2007-05-09 | 2008-11-13 | Sony Ericsson Mobile Communications Ab | Text-content features |
US7836061B1 (en) * | 2007-12-29 | 2010-11-16 | Kaspersky Lab, Zao | Method and system for classifying electronic text messages and spam messages |
US20090197225A1 (en) * | 2008-01-31 | 2009-08-06 | Kathleen Marie Sheehan | Reading level assessment method, system, and computer program product for high-stakes testing applications |
US20110258181A1 (en) * | 2010-04-15 | 2011-10-20 | Palo Alto Research Center Incorporated | Method for calculating semantic similarities between messages and conversations based on enhanced entity extraction |
US20130179800A1 (en) * | 2012-01-05 | 2013-07-11 | Samsung Electronics Co. Ltd. | Mobile terminal and message-based conversation operation method for the same |
US20140250219A1 (en) * | 2012-05-30 | 2014-09-04 | Douglas Hwang | Synchronizing translated digital content |
Also Published As
Publication number | Publication date |
---|---|
RU2014147904A (ru) | 2016-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7163355B2 (ja) | メッセージ中のタスクの識別 | |
US9218568B2 (en) | Disambiguating data using contextual and historical information | |
US20160328378A1 (en) | Anaphora resolution for semantic tagging | |
JP6246951B2 (ja) | ユーザコンタクトエントリのデータ設定 | |
KR101708508B1 (ko) | 향상된 개체 발췌에 기초하여 메시지 및 대화 간의 의미 유사성을 계산하는 방법 | |
US10588003B2 (en) | Notification of potentially problematic textual messages | |
US10552539B2 (en) | Dynamic highlighting of text in electronic documents | |
KR101716905B1 (ko) | 개체의 유사성을 계산하는 방법 | |
US9971762B2 (en) | System and method for detecting meaningless lexical units in a text of a message | |
WO2019179022A1 (zh) | 文本数据质检方法、装置、设备及计算机可读存储介质 | |
US20090055168A1 (en) | Word Detection | |
US9442916B2 (en) | Management of language usage to facilitate effective communication | |
CN111742337A (zh) | 使用机器学习模型的消息分析 | |
US10824657B2 (en) | Search document information storage device | |
US20140317495A1 (en) | Retroactive word correction | |
CN110785762B (zh) | 用于编写电子消息的系统和方法 | |
US10803247B2 (en) | Intelligent content detection | |
US9875232B2 (en) | Method and system for generating a definition of a word from multiple sources | |
US20180349787A1 (en) | Analyzing communication and determining accuracy of analysis based on scheduling signal | |
US20240233427A1 (en) | Data categorization using topic modelling | |
US10944569B2 (en) | Comparison and validation of digital content using contextual analysis | |
WO2016083908A1 (en) | System and method for computer processing of an e-mail message and visual representation of a message abstract | |
US20170140022A1 (en) | Identifying an assumption about a user, and determining a veracity of the assumption | |
US20160323227A1 (en) | Method and system for providing a user with an indication of an unread e-mail count on a client device | |
US10176248B2 (en) | Performing a dynamic search of electronically stored records based on a search term format |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15863581 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15863581 Country of ref document: EP Kind code of ref document: A1 |