GB2520265A - Ranking Textual Candidates of controlled natural languages - Google Patents

Ranking Textual Candidates of controlled natural languages Download PDF

Info

Publication number
GB2520265A
GB2520265A GB1319983.1A GB201319983A GB2520265A GB 2520265 A GB2520265 A GB 2520265A GB 201319983 A GB201319983 A GB 201319983A GB 2520265 A GB2520265 A GB 2520265A
Authority
GB
United Kingdom
Prior art keywords
context
hierarchy
textual
contexts
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB1319983.1A
Other versions
GB201319983D0 (en
Inventor
Thierry Kormann
Stephane Hillion
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to GB1319983.1A priority Critical patent/GB2520265A/en
Publication of GB201319983D0 publication Critical patent/GB201319983D0/en
Priority to PCT/IB2014/065838 priority patent/WO2015071804A1/en
Publication of GB2520265A publication Critical patent/GB2520265A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/137Hierarchical processing, e.g. outlines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/274Converting codes to words; Guess-ahead of partial word inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/131Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces

Abstract

Disclosed is a method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy. The method comprises the steps of assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context; calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context. The contexts may be paragraphs within the hierarchy of a document. A probability may be assigned according to how frequently a text fragment has been used within a context and/or how recently a text fragment has been used within a context.

Description

RANKING TEXTUAL CANDIDATES OF CONTROLLED NATURAL
LANGUAGES
FIELD OF THE INVENTION
[0001] The present invention relates to ranking textual candidates of controlled natural languages (CNLs) and more particularly, to the ranking of textual candidates by assigning a probability to text fragments forming the textual candidates using a hierarchical-driven ranking mechanism for text completions.
BACKGROUND
[0002] CNLs are subset of natural languages, the subset being capable of being understood by computer systems by restricting both the grammar and the vocabulary in order to reduce or remove ambiguity and complexity. Many computer systems and more specifically editing tools exist for CNLs. These often provide advanced features such as validation, syntax highlighting, or autocomplete. Autocomplete is a feature that automatically predicts the remaining words or phrases that the user wants to type in without actually typing it completely. This feature is particularly effective when editing text written in highly structured, easy-to-predict languages such as CNLs. 1-lowever, when a language has an extensive vocabulary, providing relevant textual candidates among the valid predictions computed by the system remains a challenge.
[0003] Another solution consists of ranking textual candidates based on the history of most recently used and/or the most frequently used words or phrases. This method provides interesting results but is mainly effective for repetitive tasks, or tasks that do not involve very frequent context switching.
[0004] A similar solution uses a word prediction algorithm and can use the semantics and the location of the text being entered to rank textual candidates. For instance, given a common text prefix, the completion menu of a code editor shows variables prior class names within a method. This technique provides pertinent rankings but requires in-depth knowledge of the semantic of the entire language. Furthermore, the implementation of such algorithms is hard to achieve.
[0005] A further solution consists of annotating (or categorising) all the phrases of a vocabulary and declaring for each document (or part of it) which category or set of categories is permitted. This method can provide meaningful results but require a difficult and time-consuming initial step. Furthermore, it may be difficult to anticipate user needs and find relevant categories for each sentence.
[0006] United States patent application US 2013/0041857 Al discloses a system and method for the reordering of text predictions. The system and method reorders the text predictions based on modified probability values, wherein the probability values are modified according to the likelihood that a given text prediction will occur in the text inputted by a user. It further discloses that the ordering of predictions is allowed to be influenced by the likelihood that the predicted temi or phrase belongs in the current contextual context, that is in the current text sequence entered by a user. Nonlocal' context is allowed to be taken into account.
[0007] United States patent application US 2012/0029910 Al discloses a system comprising a user interface configured to receive text input by a user, a text prediction engine comprising a plurality of language models and configured to receive the input text from the user interface and to generate concurrently text predictions using the plurality of language models, and wherein the text prediction engine is frirther configured to provide text predictions to the user interface for display and user selection. An analogous method and an interface for use with the system and method are also disclosed. The language model can be further configured to apply a topic filter. N-gram statistics yield estimates of prediction candidate probabilities based on local context, but global context also affects candidate probabilities. A topic filter actively identifies the most likely topic for a given piece of writing and reorders the candidate predictions accordingly. The topic filter takes into account the fact that topical contcxt affccts tcrm usagc. For instancc, given the sequence "was awarded a", the likelihood of the following term being either "penalty" or "grant" is highly dependent on whether the topic of discussion is soccer' or finance'. Local n-gram context often cannot shed light on this, whilst a topic filter that takes the whole of a segment of text into account might be able to.
[0008] United States Patent 6,202,058 Bl discloses information presented to a user via an information access system being ranked according to a prediction of the likely degree of relevance to the user's interests. A profile of interests is stored for each user having access to the system. Items of information to be presented to a user are ranked according to their likely degree of relevance to that user and displayed in order of ranking. The prediction of relevance is carried out by combining data pertaining to the content of each item of information with other data regarding correlations of interests between users. A value indicative of the content of a document can be added to another value which defines user correlation, to produce a ranking score for a document. Alternatively, multiple regression analysis or evolutionary programming can be carried out with respect to various factors pertaining to document content and user correlation, to generate a prediction of relevance.
The user correlation data is obtained from feedback information provided by users when they retrieve items of information.
BRIEF SUMMARY OF TUE INVENTION
[0009] Embodiments of the invention provides a method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the method comprising the steps of assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context; calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hicrarchy other than the first context.
[0010] Preferably, the step of calculating takes the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.
[0011] Preferably, the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hicrarchy other than the first context to an assigned probability in a first context.
[0012] In an embodiment, said contexts are paragraphs within the hierarchy of a document.
[0013] In another embodiment, said contexts are business rule packages within a business rule project.
[0014] In an embodiment, the method further comprises the steps of: receiving textual or non-textual input; and computing a set of textual candidates.
[0015] In an embodiment, a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.
[0016] Embodiments of tim invcntion f'urthcr providc a system for ranking textual candidatcs of controllcd natural languages, thc textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the system comprising: a processing device for receiving text fragments forming the textual candidates; and a prediction ranker module for assigning a probability to the text fragments in the first context in which it is desired to rank the textual candidate, for assigning a probability to the text fragments in contexts in the hierarchy other than the first context, and for calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.
[0017] Further embodiments of the invention provide a computer program product for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code adapted to perform the method described above when said program is run on a computer.
[0018] Embodiments of the invention provide the advantage that the ranking is done entirely by the method and system without intervention of an expert. Another advantage is that the ranking dynamically takes into account any modifications made to documents. A further advantage is that hierarchically structured systems storing document or text fragments tend to be naturally organised by topics. They provide the appropriate information to compute a meaningful ranldng. A yet further advantage is that assigning a probability to textual candidates based on where similar phrases have been used does not require in-depth knowledge of the language. Consequently the approach both relatively simple to implement and works for any (NLs.
BRIEF DESCRIPTION OF THE DRAWINGS
[0019] Preferred embodiments of the present invention will now be described in more detail, by way of example only, with reference to the accompanying drawings, in which: Figure 1 shows an embodiment of a system for ranking textual candidates; Figure 2 shows an embodiment of a method of ranking textual candidates; Figure 3 shows a first embodiment having a rule project with local rankings and the global ranking of predictions within the "Upgrade" package; Figure 4 shows a table representing local score of textual candidates on a per package basis Ibr use in the embodiment of figure 3; Figure 5 shows a table representing weights to apply when propagating phrases across packages for use in the embodiment of figure 3; Figure 6 shows the process of how to update the local ranking of predictions; Figure 7 shows the process of how the system updates weights of entities; Figure 8 shows a second embodiment having a document with local rankings and the global ranking of predictions within chapter 2, paragraph 2 of the document Figure 9 shows a table representing local score of textual candidates on a per paragraph basis for use in the embodiment of figure 8; and Figure 10 shows a table representing weights to apply when propagating phrases across paragraphs fix use in the embodiment of fIgure 8.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0020] Embodiments of the present invention will be described hereinafter with reference to the implementation of the invention in a Business Rule Management System (BRMS). A BRMS is a software system enabling organizational policies and the repeatable decisions associated with those policies, such as claim approvals, pricing calculations and eligibility determinations to be defined, deployed, monitored and maintained separately from application code. Business rules include policies, requirements and conditional statements that are used to determine the tactical actions that take place in applications and systems.
However, the practical applications of embodiments of the present invention are not limited to this particular described environment. Embodiments of the present invention can find utility in any structured systems using controlled natural languages.
[0021] Embodiments of the present invention also relate to a method for suggesting relevant completions for an input text that is compliant to a CNL and typed to a text-oriented application running at a user computer. More specifically, the method and system provide a hierarchical-driven ranking mechanism for text completions.
[0022] Editing tools provide a way to organise or structure multiple text fragments.
There are many ways to organise text fragments using a CNL within a data processing system. In one embodiment, text fragments maybe part of a single document. As an example, each paragraph may represent a separate entity and the document layout defines the structure and the relations between entities. In another embodiment, a system may store entities in a tile system. How folders and files are organised defines the hierarchy and thus the relations between entities. Another embodiment consists of idcntifring the relations between entities by leveraging the grammar of the CNL. Each grammar construct helps define the structure and the relations across text fragments.
[0023] Referring to figure 1, which shows an embodiment of a system 150 for ranking textual candidates. Inputs arc received by a processing device 152 which computes sets of textual candidates, recognises text fragments, detects that new text fragments have been input and analyses changes to the hierarchy. The results from the processing device 152 are then sent to prediction ranker module 154 which ranks the sets of textual candidates and then makes suggestions to a user as to the most likely candidates, updates local scores (Ls) of phrases and updates weights associated with entities. The way in which these actions are achieved will be described below with reference to figures 2 to 10.
[0024] Referring to the example of a BRMS, a text fragment may be a business rule and an entity may be a rule package. A rule package may contain multiple business rules. Rule packages may be nested and thus define a hicrarchy. A rule projcct has a set of top-level rule packages and represents the root of the hierarchy.
[0025] When the system determines that a user operation has changed a text fragment, such as a business rule, the textual candidates may be computed and ranked prior to be exposed to the user. For example, when a business expert changes a business rule, the rule editor may choose to parse the text and display all possible phrases that can bc inserted at thc current location.
[0026] Referring to figure 2, which shows an embodiment of a method of ranking textual candidates. The method starts at step 102. At step 104, a processing device receives input.
The input may be non-textual input, such as, for example, digital ink input, speech input, or other input. With respect to the embodiment described below, the input is assumed to be text input.
[0027] At step 106, the input is recognised to compute a predicted set of textual candidates. The predicted set of textual candidates may be based on respective prefixes and one or more data sources such as the vocabulary.
[0028] At step 108, a prediction ranker module assigns a probability to each of the identified textual candidates in the predicted set of textual candidates. Step 108 will be described below in more detail with reference to figures 3 to 5. Tn one embodiment, the prediction ranker module ranks textual candidates prior to, at step 110, presenting the resulting sorted list to the user. In another embodiment, textual candidates are ranked in order to preselect, within a list sorted alphabetically, the textual candidate that has been identified as the most relevant one in the current authoring context. The method ends at step 112.
[0029] According to an embodiment of the invention, the prediction ranker module first assigns a score to each textual candidate by only considering the current entity being edited.
This score is referred as a local score. The person skilled in the art may determine how to compute a relevant initial score 11w a given prediction according to any known method. In one embodiment, the local score may be how many times a phrase has been used within an entity. Another embodiment may choose to maintain the more frequently and the more recentlyusedphraseswithinanentityandcombinedthosevaluestogetaninitialscore ona per-phrase basis.
[0030] Referring to figure 3, shown arc four rule packages (Refund 210, Discount 208, Upgrade 206 and Checkout 204) within a rule project 202, a local score (Ls) for each phrase and computed scores (Cs) 212 for predicting a textual candidate within the "Upgrade" 206 package.
[0031] The "Refund" 210 package contains several business rules and describes whether or not a refund should be done for a customer purchase. The business rules involved use three different phrases of the rule project vocabulary, that is, (i) the gross of invoice reffind amount; (ii) the service is authorized; and (iii) the gross charge of the service. For each of the different phrase, a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package. As can be seen from figure 3, the local scores for phrases (i), (ii) and (iii) arc 21, 13 and 4 respectively.
[0032] The "Discount" 208 package is used for computing a discount for a customer.
The phrases used by the business rules in the "Discount" 208 package are different from those in the "Refund" 210 package necessary to know if a refund should be done. The business rules involved use three different phrases of the rule project vocabulary, that is, (i) the category of the customer; (ii) the age of the customer; and (iii) the amount of the shopping cart. For each of the different phrases, a local score is computed. In the example of figure 3, the local score is how many times each phrase has been used within the package.
As can be seen from figure 3, the local scores for phrases (I), (ii) and (iii) are 24, 10 and 20 respectively.
[0033] The "Upgrade" 206 package represents a package for managing customer categories. A user begins entering a text fragment in an existing business rule. Based on this context, the prediction ranker module generates for the "Upgrade" 206 package, a computed score (Cs) for each phrase 212 of the vocabulary as described below.
[0034] As outlined above, embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which arc organised the various text fragments. Consequently, the prediction ranker module may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "the age of the customer" may be high in the "Discount" 208 package, but low in the "Refund" 210 package.
[0035] As further illustrated, several aspects of the hierarchy may be involved, to varying degrees, in the processing of a computed score (Cs). For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.
[0036] In one embodiment, the distance between a pair of nodes, such as packages 204, 206, 208, 210 may weigh local scores. As an example, "the amount of the shopping cart" has a local score (Ls) of 20 in the "Discount" 208 package. The computed score (Cs) of this phrase in the "Upgrade" 206 package may be 10 if the weight to go from one node to its next sibling node is 0.5. Furthermore, this phrase also has a local score (Ls) of 8. The prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.
[0037] In another embodiment, the structure and content of any document (sections, paragraphs etc.) may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.
[0038] In another embodiment, the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate packages may have a higher score than a phrase used six times in one package.
[0039] In another embodiment, the nature of a paragraph can also influence the probability associated with a textual candidate. For example, sentences or terms used in an introduction and in a conclusion of a document may have a higher probability associated with them than sentences and terms found in regular paragraphs of the document. An introduction is often the first section of a document and a conclusion is often the last. In this embodiment, the hierarchy of the document is leveraged to finely adjust probabilities. The introduction and conclusion arc further apart in distance, but have the hierarchical relationship mentioned earlier in the paragraph. The nature of a paragraph can also influence the system. The prediction ranker module 154 can provide for special treatment for paragraphs at predefined locations within the document.
[0040] Figure 4 shows a table representing local score of textual candidates on aper package basis. The rows in the table represent the phrases used. The columns in the table represent the local scores (Ls) of the phrases in the package identified at the top of each colunm. The row and the column including "ellipsis" characters (...) indicates that other phrases and other packages have been omitted from the table for brevity and clarity. In the table, the phrase "the gross amount of invoice refund amount" has a local score (Ls) in the "Refund" 210 package of2l and the phrase "amount of the shopping cart" has a local score (Ls) in the "Upgrade" 208 package of 8. The other exemplary local scores (Ls) for other phrases and other packages can be seen in the table.
[0041] FigureS represents the weights to use when propagating a phrase from one package to another. The columns in the table represent the source package of the local scores (Ls) of the phrases used. In the table of figure 5 they arc shown in chronological order, but any other order may be used. The rows in the table represent the target packages of the local scores (Ls) of the phrases used. The row and the column including "ellipsis" characters (...) indicates that other packages have been omitted from the table for brevity and clarity. In the table, the weight to use when propagating a phrase from the source "Discount" 208 package to the target "Upgrade" 206 package can be seen to be equal to 0.5. Applying the weight to the example of figure 3, the local score (Ls) of the phrase "the amount of the shopping cart" in the source "Discount" 208 package is weighted by a factor of 0.5 to produce a computed score (Cs) in the target "Upgrade" 206 package for that phrase from the source package of 10, that is 0.5 times 20. Also in the table, the weight to use when propagating a phrase in the opposite direction from the source "Upgrade" 206 package to the target "Discoullt" 208 package can be SCCII to be equal to 0.4, that is different from the weighting used when propagating in the opposite direction. Applying the weight to the example of figure 3, the local score (Ls) of the phrase "the amount of the shopping cart" in the source "Upgrade" 206 package is weighted by a factor of 0.4 to produce a computed score (Cs) in the target "Discount" 208 package for that phrase from the source package of 3.2, that is 0.4 times 8. The other exemplary weights for other combinations of source and target packages can be seen in the table.
[0042] Various implementations can be realised but as an example, a function returning the computed score of textual candidates may be: Cs[p,e)=max( (w(x,e)* Ls(x,p):x=1,...,n} ) where: Cs(p, e) : a function giving the computed score of prediction p' within entity e' w(x, c) : a function returning the weight to apply for predictions propagating from entity x' to target entity e' as illustrated in Figure 5 Ls(x, p) : a fhnction returning the local score within entity x' of prediction p' as illustrated in Figure 4 n: the total number of entities (e.g. rule packages) [0043] In the illustrated example, the ranking of textual candidates is obtained by combining, for each prediction, the local score (Ls) within a package, and the weight associated with propagation ftom tins package to ihc "Upgrade" 206 package. For example, the phrase the category of the customer" has a computed score (Cs) of 12 because this phrase has a local score (Ls) of 24 in the "Discount" 208 package and the weight to go from the "Discount" 208 package to the "Upgrade" 206 package is 0.5.
[0044] Another example is the computed score (Cs) within the "Upgrade" 206 package of the phrase "the amount of the shopping cart". The computed score is 10 because this is the maximum of 8 (the local score (Ls) within the "Upgrade" 206 package) and 20 (the local score within the "Discount" 208 package) multiplied by 0.5 (the weight from Figure 5 to go from the source "Discount" 208 package to the target "Upgrade" 206 package).
[0045] In the example of figure 3, six textual candidates are shown. The prediction with the highest computed score (Cs) is "the category of the customer" with a computed score (Cs) of 12. This computed score (Cs) is obtained from the local score (Ls) of 24 in the "Discount" 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount" 208 package to the "Upgrade" 206 package. The prediction for the phrase "the amount of the shopping cart" has a computed score (Cs) of 10, which is obtained from higher of (i) the local score (Ls) of 8 in the "Upgrade" 206 package and (ii) the local score (Ls) of 20 in the "Discount" 208 package and the weighting of 0.5 applied from the table of figure 5 for propagation from the Discount" 208 package to the "Upgrade" 206 package, giving a computed score (Cs) of 10.
[0046] Even though the phrase "the amount of the shopping cart" has a local score (Ls) of 8 and the phrase "the category of the customer" has a local score (Ls) of 0, the phrase "the category of the customer" has a higher computed score (Cs) because it is used more frequently in a package that is closely related in the hierarchy. This is in spite of the fact that the phrase "the category of the customer" has not been previously used in the "Upgrade" 206 package.
[0047] Once the prediction ranker engine has computed the final ranking based on the computed scores (Cs), textual candidates can be displayed to the user. It should however, be realised that any other appropriate action can be takcn. The global ranking of predictions represents the likely degree of relevance to the user's interests of each phrase at a given location in the hierarchy.
[0048] As the user continues to type, the system may recognise an operation that changes either the local score (Ls) or the weights shown in figure 5 associated to entities.
Figure 6 shows how the local ranking of textual candidates is updated.
[0049] Referring to figure 6, the method starts at step 502. At step 504, the processing device receives input. As described above with reference to figure 2, the input may be textual or non-textual input. At step 506, a text fragment is recognised and it is detected that a new phrase has been entered. In a typical embodiment, step 506 may involve a parser dedicated to the controlled natural language currently in use. At step 508, the prediction ranker module may be notified and one or more local scores of each phrase, within one or more packages, may be recalculated. The method ends at step 510.
[0050] When the user performs an operation that modifies the hierarchy of how text fragments are organised, the system may need to check whether or not the weights need some adjustments. Figure 7 shows how the weights associated with each entity storing text fragments are updated.
[0051] Referring to figure 7, the method starts at step 602. At step 604, an application may receive an event that indicates that the hierarchy has been changed. For example, the application may be notified when a rule package has been added or when a new section has been inserted into a document. At step 606, the application may be configured to determine what kind of operation has been performed. For example, this step may be particularly useful in identifying what part of the hierarchy needs to be updated. At step 608, the prediction ranker module may update the weight associated with each entity. For example, when inserting a new rule package, the distance between two entities may get bigger and the respective weights associated with each of the entities may need to be adjusted. the method ends at step 610.
[0052] Referring to figure 8, shown are three paragraphs (Chapter 1, Paragraph 2 810, Chapter 2, Paragraph 1 808, Chapter 2, Paragraph 2 806 within a document 802 having Chapters 814, 804, a local score (Ls) for each phrase and computed scores (Cs) 812 for predicting a textual candidate within Chapter 2, Paragraph 2 806.
[0053] Chapter 1, Paragraph 2 810 contains several phrases. The paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) "anger and rage"; (ii) "climate change"; and (iii) "the great recession". For each of thc different phrases, a local score (Ls) is computed. In the example of figure 8, the local score (Ls) is how many times each phrase has been used within the paragraph. As can be seen from figure 8, the local scores (Ls) for phrases (i), (ii) and (iii) areS, 8 and 4 respectively.
[0054] Chapter 2, Paragraph 1 808 also contains several phrases. The phrases used in Chapter 2, Paragraph 1808 are different from those in the Chapter!, Paragraph 2 810. The paragraphs involved use three different phrases of the rule project vocabulary, that is, (i) "linear no threshold"; (ii) "how's that working out for you?"; and (iii) "make no mistake about it". For each of the different phrases, a local score (Ls) is computed. In the example of figure 8, the local score (Ls) is how many times each phrase has been used within the paragraph. As can be seen fivm figure 8, the local scores (Ls) for phrases (i), (ii) and (iii) are 2,6 and 7 respectively.
[0055] Chapter 2. Paragraph 2806 also contains several phrases. A user begins entering a text fragment in the paragraph. Based on this context, the prediction ranker module 154 generates for Chapter 2, Paragraph 2 806, a computed score (Cs) for each phrase 812 of the vocabulary as described below.
[0056] As outlined above, embodiments of the present invention disclose a method and system to rank textual candidates by leveraging the hierarchical structure in which are organised the various text fragments. Consequently, the prediction ranker module 154 may rank textual candidates differently depending on where a prediction is requested. For instance, the computed score (Cs) of the phrase "how's that working out for you?" maybe high in Chapter 2, Paragraph! 808, but low in Chapter!, Paragraph 2810.
[0057] As further illustrated, several aspects of the hierarchy may be involved, to varying degrees, in the processing of a computed score (Cs). For a given location in the hierarchy and a given predicted phrase of the vocabulary, embodiments of the present invention compute a final score by considering each local score (Ls) of that phrase, weigh that local score (Ls) according to hierarchical characteristics, and take the higher of the values.
[0058] In one embodiment, the distance between a pair of nodes, such as paragraphs 806, 808, 810 may weigh local scores. As an example, "the great recession" has a local score (Ls) of 4 in Chapter 1, Paragraph 2 810. The computed score (Cs) of this phrase in Chapter 2, Paragraph 2 806 may be 2 if the weight to go from one node to its next sibling node is 0.5. However, this phrase also has a local score (Ls) of 3. The prediction ranker module may choose to use the maximum of all local scores (Ls) to get the computed score (Cs) of a textual candidate.
[0059] In another embodiment, the structure and content of any document (sections, paragraphs etc.) may provide a logical sequence of text fragments that can be used to rank predictions. For example, phrases intensively used in a paragraph may get a higher score in the immediately following paragraph, than phrases used in a paragraph a few pages later.
[0060] In another embodiment, the number of different locations into which a phrase is used may also influence the computed score. For instance, a phrase that has been used three times in two separate paragraphs may have a higher score than a phrase used six times in one paragraph.
[0061] Figure 9 shows a table representing local score of textual candidates on aper paragraph basis. In the table, the phrase "the great recession" has a local score (Ls) in the Chapter 1, Paragraph 2 810 of 4 and the phrase "make no mistake about it" has a local score (Ls) in Chapter 2, Paragraph 1 808 of 7. The other exemplary local scores (Ls) for other phrases and other paragraphs can be seen in the table.
[0062] Figure 10 represents the weights to use when propagating a phrase from one paragraph to another. Tn the table, the weight to use when propagating a phrase from the source Chapter 2, Paragraph 1 808 to the target Chapter 2, Paragraph 2 806 can be seen to be equal to 0.25. Applying the weight to the example of figure 8, the local score (Ls) of the phrase "make no mistake about it" in the source Chapter 2, Paragraph 1 808 is weighted by a factor of 0.25 to produce a computed score (Cs) in the target Chapter 2, Paragraph 2 806 for that phrase from the source paragraph of 1.75, that is 0.25 times 7. Also in the table, the weight to use when propagating a phrase in the opposite direction from the source Chapter 2, Paragraph 2 806 paragraph to the target Chapter 2, Paragraph 1 808 can be seen to be equal to 0.2, that is different from the weighting used when propagating in the opposite direction.
Applying the weight to the example of figure 8, the local score ([5) of the phrase "make no mistake about it" in the source Chapter 2, Paragraph 2 806 is weighted by a factor of 0.2 to produce a computed score (Cs) in the target Chapter 2, Paragraph 1 808 for that phrase from the source paragraph of 0, that is 0.2 times 0. The other exemplary weights for other combinations of source and target paragraphs can be seen in the table.
[0063] Embodiments of the invention can take the form of a computer program accessible from a computer-usable or computer-readable medium providing program code for usc by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus or device.
[0064] The medium can be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read only memory (ROM, a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk read only memory (CD-ROM), compact disk read/write (CD-RW), and DVD.

Claims (16)

  1. CLAIMS1. A method of ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the method comprising the steps of: assigning a probability to text fragments forming the textual candidates in the first context in which it is desired to rank the textual candidate; assigning a probability to text fragments forming the textual candidates in contexts in the hierarchy other than the first context; calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.
  2. 2. A method as claimed in claim 1, wherein the step of calculating takes the higher of(i) the weighted assigned probability in the fir st context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.
  3. 3. A method as claimed in claim 1, wherein the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.
  4. 4. A method as claimed in claim l,wherein said contexts are paragraphs within the hierarchy of a document.
  5. 5. A method as claimed in claim 1, wherein contexts are business rule packages within the hierarchy of a business rule project.
  6. 6. A method as claimed in claim 1, further comprising the steps of: receiving textual or non-textual input; and computing a set of textual candidates.
  7. 7. A method as claimed in claim I, wherein a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.
  8. 8. A system for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the system comprising: a processing device for receiving text fragments forming the textual candidates; and a prediction ranker module for assigning a probability to the text fragments in the first context in which it is desired to rank the textual candidate, for assigning a probability to the text fragments in contexts in the hierarchy other than the first context, and for calculating, for each of the textual candidates, a weighted sum of the assigned probability in the first context and the assigned probabilities in contexts in the hierarchy other than the first context, wherein the weighting applied to each of the assigned probabilities is inversely related to the hierarchical distance between the first context and contexts in the hierarchy other than the first context.
  9. 9. A system as claimed in claim 8, wherein the prediction ranker module calculates the weighted sum by taking the higher of (i) the weighted assigned probability in the first context and (ii) the weighted assigned probabilities in contexts in the hierarchy other than the first context.
  10. 10. A system as claimed in claim 8, wherein the weighting applied in the step of calculating in a first context to an assigned probability in a context in the hierarchy other than the first context differs from the weighting applied in a step of calculating in a context in the hierarchy other than the first context to an assigned probability in a first context.
  11. 11. A system as claimed in claim 8, wherein said contexts arc paragraphs within the hierarchy of a document.
  12. 12. A system as claimed in claim 8, wherein contexts are business rule packages within the hierarchy of a business rule project.
  13. 13. A system as claimed in claim 8, wherein: the processing device receives textual or non-textual input; and the prediction ranker module computes a set of textual candidates.
  14. 14. A system as claimed in claim 8, wherein a probability is assigned according to one or more of how frequently a text fragment has been used within a context and how recently a text fragment has been used within a context.
  15. 15. A computer program product for ranking textual candidates of controlled natural languages, the textual candidates forming portions of a first context, a plurality of contexts being arranged in a hierarchy, the computer program product comprising: a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code adapted to perform the method of any one of claim 1 to claim 7 when said program is run on a computer.
  16. 16. A method substantially as hereinbefore described, with reference to figures 1 to of the accompanying drawings.
GB1319983.1A 2013-11-13 2013-11-13 Ranking Textual Candidates of controlled natural languages Withdrawn GB2520265A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
GB1319983.1A GB2520265A (en) 2013-11-13 2013-11-13 Ranking Textual Candidates of controlled natural languages
PCT/IB2014/065838 WO2015071804A1 (en) 2013-11-13 2014-11-06 Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB1319983.1A GB2520265A (en) 2013-11-13 2013-11-13 Ranking Textual Candidates of controlled natural languages

Publications (2)

Publication Number Publication Date
GB201319983D0 GB201319983D0 (en) 2013-12-25
GB2520265A true GB2520265A (en) 2015-05-20

Family

ID=49818526

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1319983.1A Withdrawn GB2520265A (en) 2013-11-13 2013-11-13 Ranking Textual Candidates of controlled natural languages

Country Status (2)

Country Link
GB (1) GB2520265A (en)
WO (1) WO2015071804A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3611636A4 (en) * 2017-04-11 2020-04-08 Sony Corporation Information processing device and information processing method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089297B2 (en) * 2016-12-15 2018-10-02 Microsoft Technology Licensing, Llc Word order suggestion processing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080072143A1 (en) * 2005-05-18 2008-03-20 Ramin Assadollahi Method and device incorporating improved text input mechanism
EP2109046A1 (en) * 2008-04-07 2009-10-14 ExB Asset Management GmbH Predictive text input system and method involving two concurrent ranking means
US20120029910A1 (en) * 2009-03-30 2012-02-02 Touchtype Ltd System and Method for Inputting Text into Electronic Devices
US20120296627A1 (en) * 2011-05-18 2012-11-22 Microsoft Corporation Universal text input
US20130041857A1 (en) * 2010-03-04 2013-02-14 Touchtype Ltd System and method for inputting text into electronic devices

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6377965B1 (en) * 1997-11-07 2002-04-23 Microsoft Corporation Automatic word completion system for partially entered data
US7296223B2 (en) * 2003-06-27 2007-11-13 Xerox Corporation System and method for structured document authoring
US7657423B1 (en) * 2003-10-31 2010-02-02 Google Inc. Automatic completion of fragments of text

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080072143A1 (en) * 2005-05-18 2008-03-20 Ramin Assadollahi Method and device incorporating improved text input mechanism
EP2109046A1 (en) * 2008-04-07 2009-10-14 ExB Asset Management GmbH Predictive text input system and method involving two concurrent ranking means
US20120029910A1 (en) * 2009-03-30 2012-02-02 Touchtype Ltd System and Method for Inputting Text into Electronic Devices
US20130041857A1 (en) * 2010-03-04 2013-02-14 Touchtype Ltd System and method for inputting text into electronic devices
US20120296627A1 (en) * 2011-05-18 2012-11-22 Microsoft Corporation Universal text input

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3611636A4 (en) * 2017-04-11 2020-04-08 Sony Corporation Information processing device and information processing method

Also Published As

Publication number Publication date
WO2015071804A1 (en) 2015-05-21
GB201319983D0 (en) 2013-12-25

Similar Documents

Publication Publication Date Title
US11720572B2 (en) Method and system for content recommendation
Gambhir et al. Recent automatic text summarization techniques: a survey
US10579656B2 (en) Semantic query language
US8346795B2 (en) System and method for guiding entity-based searching
Hong et al. Improving the estimation of word importance for news multi-document summarization
JP5243167B2 (en) Information retrieval system
US9588960B2 (en) Automatic extraction of named entities from texts
CN1871603B (en) System and method for processing a query
US7284009B2 (en) System and method for command line prediction
Verberne et al. Evaluation and analysis of term scoring methods for term extraction
US20160140123A1 (en) Generating a query statement based on unstructured input
RU2592395C2 (en) Resolution semantic ambiguity by statistical analysis
RU2579699C2 (en) Resolution of semantic ambiguity using language-independent semantic structure
US20120278341A1 (en) Document analysis and association system and method
US9639522B2 (en) Methods and apparatus related to determining edit rules for rewriting phrases
CA2701171A1 (en) System and method for processing a query with a user feedback
JPH1173417A (en) Method for identifying text category
GB2397147A (en) Organising, linking and summarising documents using weighted keywords
RU2579873C2 (en) Resolution of semantic ambiguity using semantic classifier
Roul et al. A new automatic multi-document text summarization using topic modeling
US20200311203A1 (en) Context-sensitive salient keyword unit surfacing for multi-language survey comments
KR100795930B1 (en) Method and system for recommending query based search index
WO2015071804A1 (en) Ranking prediction candidates of controlled natural languages or business rules depending on document hierarchy
JPH11120206A (en) Method and device for automatic determination of text genre using outward appearance feature of untagged text
Keith et al. Performance impact of stop lists and morphological decomposition on word–word corpus-based semantic space models

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)