WO2022071790A1

WO2022071790A1 - System and method for text processing

Info

Publication number: WO2022071790A1
Application number: PCT/MY2020/050177
Authority: WO
Inventors: Mohammad Arshi SALOOT; Duc Nghia PHAM
Original assignee: Mimos Berhad
Priority date: 2020-09-30
Filing date: 2020-11-30
Publication date: 2022-04-07

Abstract

The present invention relates to a system and method for text processing. The system (10) comprises a display unit (11) for presenting a plurality of multi-option questions and corresponding options to a user, wherein multiple options are selectable as an answer to each question. An input unit (12) receives a user reply with respect to each question, wherein the user reply includes one or more options selected by the user. A storage unit (13) stores user log including conversations participated by the user. A processing unit (14) processes the user reply and generates a text content based on the user reply and the user log. The display unit (11) presents the generated text content to the user, and the input unit (12) receives a user selection with respect to the created content. A publishing unit (15) publishes the created content as the user's opinion based on the user selection.

Description

SYSTEM AND METHOD FOR TEXT PROCESSING

FIELD OF THE DISCLOSURE

The present invention relates broadly to the field of text processing. More particularly, the present invention relates to a system and method for processing text for generating text content that reflects a user’s opinion.

BACKGROUND

Developments have been made to enable a computer software applications to impersonate a human in real-time written conversation. For example, chatbots have been in use for chatting with a user or customer in providing solutions to issues raised. Mostly, such conversations are one-sided, wherein the user is provided with a set of options and when the user selects one, a preset sentence or paragraphs is provided as reply. In some cases, the user is allowed to enter queries in the form of questions or sentences and the chatbots would extract keywords and reply to the user with the information related to the extracted keywords, wherein the information is extracted from a database based on the keywords.

Even though such developments are useful to some extent in quickly addressing customer queries, it would be obvious for the customer that the queries are being dealt with by machines which makes the customer uncomfortable. United States Patent No.: US 8,655,889 B2 discloses an autonomous blog engine capable of autonomous generation of a blog, wherein whenever a picture is captured by a mobile phone, the mobile application determines a place of interest captured in the picture. Based on the determined place of interest, one or more pre-stored knowledge items including information on the place of interest are pulled from a database and autonomously compiles and publishes a blog entry on the place of interest along with the captured picture. However, it is mere compilation of preexisting information and opinion of the user cannot be expressed in this approach. Technical paper titled “Towards Automatic Generation of Product Reviews from Aspect-Sentiment Scores", Zang et al., discloses an improved method of generating a user review. This approach introduces a hierarchical structure with aligned attention in a Long-Short Term Memory (LSTM) decoder for generating descriptive Chinese reviews from aspect-sentiment scores representing users’ opinions. Scores for different aspects of a product i.e. car, are received from a user and reviews are generated from pre-stored review contents based on the scores. Even though this approach enables automatically generating a text content that expresses user’s opinion, it is very limited as it requires pre-written text content from the same field of technology.

Hence, there is still a need in the art for a system and method for processing text for creating text content in a simple and effective manner without a need for structured text information from the same field of technology, wherein the text content reflects the user’s opinion with respect to a specific topic or query.

SUMMARY

The present disclosure proposes a system and method for text processing. The system comprises a display unit, an input unit, a storage unit, a processing unit and a publishing unit. The display unit presents a plurality of multi-option questions and corresponding options to a user, wherein one or more options are selectable as an answer to each question. The input unit receives a user reply with respect to each question, wherein the user reply includes one or more options selected by the user. The storage unit stores a user log, wherein the user log includes one or more conversations participated by the user. The processing unit processes the user reply and generates a text content based on the user reply and the user log.

In one aspect of the present invention, the processing unit includes a sentence conversion module, an extraction module, weighing module, comparison module and a content generation module. The sentence conversion module converts each question and corresponding user reply into one or more declarative sentences based on linguistic knowledgebase and linguistic ontological database. The extraction module extracts one or more keywords from each declarative sentence and extracts one or more conversations from the user log based on the keywords using robotic process automation (RPA).

The weighing module weighs the declarative sentences and the conversations based on keywords present in the declarative sentences and the conversations, respectively. The comparison module compares weights of each declarative sentences and the corresponding conversations to determine the most similar conversation. The content generation module generates the text content based on the most similar conversation.

The display unit presents the generated text content to the user and the input unit receives a user selection with respect to the generated text content. A publishing unit publishes the generated text content as the user’s opinion based on the user selection, wherein the generated text content is published if the user selection includes an approval for publishing the created text content.

In another aspect of the present invention, the method comprises the steps of: presenting a plurality of multi-option questions and corresponding options to a user, wherein one or more options are selectable as an answer to each question, receiving a user reply with respect to each question, wherein the user reply includes one or more options selected by the user, processing the user reply for generating a text content based on the user reply and a user log, wherein one or more conversations participated by the user are stored in a storage unit as the user log, presenting the generated text content to the user; receiving a user selection with respect to the generated text content, and publishing the generated text content as the user’s opinion based on the user selection using a publishing unit, wherein the generated text content is published if said user selection includes an approval for publishing said generated text content.

By converting the questions and corresponding user replies into the declarative sentences, the present invention is able to understand the pattern of an actual opinion with respect to the questions. Since the final text is created based on the conversations closest to the declarative sentences, the present invention is capable of generating text content in a simple and effective manner, wherein the text content reflects the user’s opinion with respect to a specific topic or query.

Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

In the figures, similar components and/or features may have the same reference numerals. Further, various components of the same type may be distinguished by following the reference numerals with a second numeral that distinguishes among the similar components. If only the first reference numeral is used in the specification, the description is applicable to any one of the similar components having the same first reference numeral irrespective of the second reference numeral.

FIGURE 1 shows a block diagram of the system for text processing, in accordance with an exemplary embodiment of the present invention.

FIGURE 2 shows a flow diagram of the method for text processing, in accordance with an exemplary embodiment of the present invention.

DETAILED DESCRIPTION

In accordance with the present disclosure, there is provided a system and method for text processing, which will now be described with reference to the embodiments shown in the accompanying drawings. The embodiments do not limit the scope and ambit of the disclosure. The description relates purely to the embodiments and suggested applications thereof.

The embodiments herein and the various features and advantageous details thereof are explained with reference to the non-limiting embodiment in the following description. Descriptions of well-known components and processes are omitted so as to not unnecessarily obscure the embodiments herein. The examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein may be practiced and to further enable those of skill in the art to practice the embodiment herein. Accordingly, the description should not be construed as limiting the scope of the embodiment herein.

The description hereinafter, of the specific embodiment will so fully reveal the general nature of the embodiments herein that others can, by applying current knowledge, readily modify or adapt or perform both for various applications such specific embodiment without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. It is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation.

FIGURE 1 shows a block representation of the system for text processing, in accordance with an exemplary embodiment of the present invention. The system (10) comprises a display unit (11 ), an input unit (12), a storage unit (13), a processing unit (14) and a publishing unit (15). The display unit (11 ) presents a plurality of multi-option questions and corresponding options to a user, wherein one or more options are selectable as an answer to each question.

The input unit (12) receives a user reply with respect to each question, wherein the user reply includes one or more options selected by the user. In a preferred embodiment, the display unit (11 ) and the input unit (12) are integrated into a user device such as a smartphone, tablet computer, laptop computer, desktop computer or any other computing device capable of executing a mobile application or a web application. Alternatively, the user device may be in the form of an automated teller machine (ATM), kiosk, point of sale (POS) device and the like.

The storage unit (13) stores a user log, wherein the user log includes one or more conversations participated by the user. Preferably, the storage unit (13) is a remote database wirelessly connected to the user device. Alternatively, the storage unit (13) is a local memory device residing in the user device. The conversations may include but not limited to textual data, audio data, still image data, clip art data and moving image data. The system (10) may be connected to user’s email account, social media account, messaging account and storage folders within the user device for accumulating the user conversations through these means.

The processing unit (14) processes the user reply to each question and generates a text content based on the user reply and the user log. Preferably, the processing unit (14) includes a sentence conversion module (16), an extraction module (17), weighing module (18), comparison module (19) and a content generation module (20). The sentence conversion module (16) converts each question and corresponding user reply into one or more declarative sentences based on commonsense knowledge-bases such as ConceptNet from MIT Media Lab, and linguistic ontological databases, such as Wordnet provided by Princeton University, and DBpedia from OpenLink.

The sentence conversion module (16) converts the questions and user replies into the declarative sentences by identifying different parts of the questions and user replies. Additionally, the sentence conversion module (16) may also identify a type of each identified part of the questions and user replies, wherein the type of parts includes noun, pronoun, verb, adverb, adjective, conjunction or auxiliary verb. Furthermore, the sentence conversion module (16) generates one or more synonyms, hyponyms and hypernyms for each identified part of the questions and user replies.

Suppose, the question “Do you enjoy cooking food?” and options “a) Yes, I enjoy; b) No, I don’t; c) Sometimes”, are presented to the user. If the user selects option ‘a’, the question and the user reply are converted into the declarative sentence “I enjoy cooking food”. Similarly, the declarative sentence for options ‘b’ and ‘c’ may be “I don’t enjoy cooking food” and “I sometimes enjoy cooking food”, respectively.

The extraction module (17) extracts one or more keywords from each declarative sentence and extracts one or more conversations from the user log based on the keywords using robotic process automation (RPA). RPA, sometimes referred to as software robotics is a software automation tool capable of developing a set of actions by observing while a user does those actions in a graphical user interface (GUI), and then automatically repeating those actions directly in the GUI.

The weighing module (18) weighs the declarative sentences and the extracted conversations based on keywords present in the declarative sentences and the conversations, respectively. Each generated declarative sentence receives a weight, which is determined based on the distance of keywords with their Synonym, Hyponym, and Hypernym in the knowledge base. The weight is generated using weight=1/x equation, where x is the number of the time that the comparison module (19) has rejected the generated sentences.

In other words, the weight of each generated sentence has inverse relation with the number of iteration. For example, the below table shows generated sentence for a two-choice question: Governments should spend more either on health or education? Sentences No. 1 and 2 are generated in the first iteration; thus, they have the highest weightage. However, there is no similar user’s content to these sentences. Next sentences are No. 3 and 4, which have a very low similarity with respect to the user’s content but have a 0.5 weightage. After generating enough declarative sentences, final answer generating module uses ^weight * similarity score for each question option: education and health. In this example, the user believe that government should spend more on education because the health option gained 0.19 total score, and the education option obtained 0.24 score.

Table 1. Example declarative sentences and the corresponding weights and similarity scores

Table 1 continued

The comparison module (19) compares weights of each declarative sentence and the corresponding conversations to determine the most similar conversation. The content generation module (20) generates the text content based on the most similar conversation. Preferably, the comparison module (19) determines a conversation as the most similar conversation, if a difference between weights of the conversation and the corresponding declarative sentence is less than a threshold.

If none of the extracted conversations is determined as the most similar conversation, the comparison module (19) decreases the threshold by a predetermined value and then repeats the comparison process. Alternatively, the comparison module (19) may determine the most similar conversation by comparing the weights of the conversations, wherein the conversation with the highest weight is determined as the most similar conversation.

The display unit (11 ) presents the generated text content to the user, and the input unit (12) receives a user selection with respect to the presented text content. A publishing unit (15) publishes the presented text content as the user’s opinion based on the user selection, wherein the presented text content is published if the user selection includes an approval for publishing the presented text content. Preferably, the publishing unit (15) publishes the presented text content in a web page.

If the user selection includes a refusal to publish the presented text content, the processing unit (14) stops the publishing unit (15) from publishing the presented text content. By converting the questions and corresponding user replies into the declarative sentences, the present invention identifies a pattern of a potential opinion that may actually be provided by the user with respect to the questions. Since the final text is created based on the conversations closest to the declarative sentences, the present invention is capable of generating text content in a simple and effective manner, wherein the text content reflects the user’s opinion with respect to a specific topic or query.

FIGURE 2 shows a flow diagram of the method for text processing, in accordance with an exemplary embodiment of the present invention. The method (100) comprises the steps of: presenting, at a display unit, a plurality of multi-option questions and corresponding options to a user (101 ), wherein one or more options are selectable as an answer to each question, receiving, at an input unit, a user reply with respect to each question (102), wherein the user reply includes one or more options selected by the user, processing, at a processing unit, the user reply for generating a text content based on the user reply and a user log (103), wherein one or more conversations participated by the user are stored in a storage unit as the user log, presenting, at the display unit, the generated text content to the user (104), receiving, at the input unit, a user selection with respect to the created text content (105), and publishing the generated text content as the user’s opinion based on the user selection using a publishing unit (106), wherein the generated text content is published if the user selection includes an approval for publishing the created text content.

Preferably, the conversations may include but not limited to textual data, audio data, still image data, clip art data and moving image data. Furthermore, the conversations may be accumulated from the user’s email account, social media account, messaging account and storage folders within a user device including the display unit and the input unit.

Each question and corresponding user reply are converted into one or more declarative sentences using a sentence conversion module of the processing unit based on commonsense knowledge-bases (e.g. ConceptNet) and linguistic ontological databases (e.g. Wordnet). During the conversion process, different parts of the questions and user replies and type of the parts of the questions and user replies are identified, wherein the type of parts includes noun, pronoun, verb, adverb, adjective, conjunction or auxiliary verb. Furthermore, one or more synonyms, hyponyms and hypernyms for each identified part of the questions and user replies are generated by the sentence conversion module.

One or more keywords are extracted from each declarative sentence using an extraction module of the processing. Furthermore, one or more conversations are extracted from the user log by the extraction module based on the keywords using robotic process automation (RPA). Each of the declarative sentences and the corresponding extracted conversations is weighed using a weighing module of the processing unit based on keywords present in the declarative sentences and the conversations, respectively.

Weights of each declarative sentences and the corresponding conversations are compared using a comparison module of the processing unit to determine the most similar conversation. The text content is generated by a content generation module of the processing unit based on the most similar conversation. Preferably, a conversation is determined as the most similar conversation, if a difference between weights of the conversation and the corresponding declarative sentence is less than a threshold.

If none of the extracted conversations is determined as the most similar conversation, the threshold is decreased by a predetermined value and then the comparison process is repeated to determine the most similar conversation. Alternatively, the most similar conversation may also be determined by comparing the weights of the conversations, wherein the conversation with the highest weight is determined as the most similar conversation.

By converting the questions and corresponding user replies into the declarative sentences, the present invention identifies a pattern of a potential opinion that may actually be provided by the user with respect to the questions. Since the final text is created based on the conversations closest to the declarative sentences, the present invention is capable of generating text content in a simple and effective manner, wherein the text content reflects the user’s opinion with respect to a specific topic or query. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a", "an" and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises," "comprising," “including,” and “having,” are inclusive and therefore specify the presence of stated features, integers, steps, operations, elements, or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or groups thereof.

The use of the expression “at least” or “at least one” suggests the use of one or more elements, as the use may be in one of the embodiments to achieve one or more of the desired objects or results.

While the foregoing describes various embodiments of the invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. The scope of the invention is determined by the claims that follow. The invention is not limited to the described embodiments, versions or examples, which are included to enable a person having ordinary skill in the art to make and use the invention when combined with information and knowledge available to the person having ordinary skill in the art.

Claims

CLAIMS:

1 . A system (10) for text processing, comprising: i. at least one display unit (11 ) for presenting a plurality of multi-option questions and corresponding options to a user, wherein one or more options are selectable as an answer to each question; ii. at least one input unit (12) for receiving a user reply with respect to each question, wherein said user reply includes one or more options selected by said user; iii. at least one storage unit (13) for storing user log, wherein said user log includes one or more conversations participated by said user; iv. at least one processing unit (14) for processing said user reply and for generating a text content based on said user reply and said user log, wherein said display unit (11 ) presents said generated text content to said user and said input unit (12) receives a user selection with respect to said created text content; and v. at least one publishing unit (15) for publishing said created text content as said user’s opinion based on said user selection, wherein said created text content is published if said user selection includes an approval for publishing said created text content, characterized in that said processing unit (14) includes:

- at least one sentence conversion module (16) for converting each question and corresponding user reply into one or more declarative sentences based on at least one of a commonsense knowledge-base and a linguistic ontological databases;

- at least one extraction module (17) for extracting one or more keywords from each declarative sentence and for extracting one or more conversations from said user log based on said keywords using robotic process automation, RPA;

- at least one weighing module (18) for weighing said declarative sentences and said conversations based on keywords present in said declarative sentences and said conversations, respectively; - at least one comparison module (19) for comparing weights of each declarative sentences and said corresponding conversations to determine the most similar conversation; and

- at least one content generation module (20) for generating said text content based on the most similar conversation. The system (10) of claim 1 , wherein said sentence conversion module (16) converts said questions and user replies into said declarative sentences by identifying different parts of said questions and user replies. The system (10) of claim 2, wherein said sentence conversion module (16) identifies a type of each identified part of said questions and user replies. The system (10) of claim 3, wherein said type of identified parts includes noun, pronoun, verb, adverb, adjective, conjunction or auxiliary verb. The system (10) of claim 2, wherein said sentence conversion module (16) generates one or more synonyms, hyponyms and hypernyms for each identified part of said questions and user replies. The system (10) of claim 1 , wherein said conversations include at least one of textual data, audio data, still image data, clip art data and moving image data. A method (100) for text processing, comprising the steps of: i. presenting, at at least one display unit, a plurality of multi-option questions and corresponding options to a user (101 ), wherein one or more options are selectable as an answer to each question; ii. receiving, at at least one input unit, a user reply with respect to each question (102), wherein said user reply includes one or more options selected by said user; iii. processing, at at least one processing unit, said user reply for generating a text content based on said user reply and a user log

(103), wherein one or more conversations participated by said user are stored in a storage unit as said user log; iv. presenting, at said display unit, said generated text content to said user

(104); v. receiving, at said input unit, a user selection with respect to said created text content (105); and vi. publishing said created text content as said user’s opinion based on said user selection using at least one publishing unit (106), wherein said created text content is published if said user selection includes an approval for publishing said created text content, characterized in that said step of processing said user reply includes:

- converting, at a sentence conversion module of said processing unit, each question and corresponding user reply into one or more declarative sentences based on at least one of a commonsense knowledge-base and a linguistic ontological database;

- extracting, an extraction module of said processing unit, one or more keywords from each declarative sentence;

- extracting, at said extraction module, one or more conversations from said user log based on said keywords using robotic process automation, RPA;

- weighing said declarative sentences and said conversations based on keywords present in said declarative sentences and said conversations, respectively, using a weighing module of said processing unit;

- comparing weights of each declarative sentences and said corresponding conversations using a comparison module of said process unit to determine the most similar conversation; and

- generating said text content based on the most similar conversation using a content generation module of said processing unit. The method (100) of claim 7, wherein said step of converting said questions and corresponding user replies into one or more declarative sentences includes:

- identifying different parts of said questions and user replies;

- identifying a type of each identified part of said questions and user replies; and

- generating one or more synonyms, hyponyms and hypernyms for each identified part of said questions and user replies. The method (100) of claim 8, wherein said type of identified parts includes noun, pronoun, verb, adverb, adjective, conjunction or auxiliary verb. The method (100) of claim 7, wherein said conversations include at least one of textual data, audio data, still image data, clip art data and moving image data.