GB2367914A - An iterative method for identification of an item from a dataset by selecting questions to eliminate the maximum number of items at each ite tion - Google Patents

An iterative method for identification of an item from a dataset by selecting questions to eliminate the maximum number of items at each ite tion Download PDF

Info

Publication number
GB2367914A
GB2367914A GB0024633A GB0024633A GB2367914A GB 2367914 A GB2367914 A GB 2367914A GB 0024633 A GB0024633 A GB 0024633A GB 0024633 A GB0024633 A GB 0024633A GB 2367914 A GB2367914 A GB 2367914A
Authority
GB
United Kingdom
Prior art keywords
questions
candidate items
items
question
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB0024633A
Other versions
GB0024633D0 (en
GB2367914A9 (en
Inventor
David Boris Johnson-Davies
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JOHNSON DAVIES DAVID BORIS
Original Assignee
JOHNSON DAVIES DAVID BORIS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JOHNSON DAVIES DAVID BORIS filed Critical JOHNSON DAVIES DAVID BORIS
Priority to GB0024633A priority Critical patent/GB2367914A/en
Publication of GB0024633D0 publication Critical patent/GB0024633D0/en
Publication of GB2367914A publication Critical patent/GB2367914A/en
Publication of GB2367914A9 publication Critical patent/GB2367914A9/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

A method and system for identifying an item from a dataset comprises the steps of presenting a series of questions to a user such that the question is iteratively selected to maximise the number of candidate items eliminated. Items are stored in a dataset and a number of characteristics are associated with each item. The data set further comprises a number of questions, each question being associated with a characteristic. In use the questions are asked in series to the user to narrow down the number of candidate items. On receiving each answer each possible subordinate question is rated, the question having the highest value being presented to the user. Preferably the formula uses a weighted probability function to calculate the number of items likely to be eliminated. If the user provides an answer indicating that none of the remaining characteristic features are present no reduction is made in the number of items.

Description

A Computer-based Interactive Item Identification System The present invention relates to a computer-based interactive system for identifying one or more candidate items from amongst a range of possible candidate items.
Many items can be grouped or collected into sets of similar such items. Once such a set is formed however, it can be more difficult to select a desired particular item from the set according to known characteristics of that desired item, or to match a known item to one already in the set on the basis that such items have similar characteristics.
For example, a person may wish to identify a particular font used on printed matter. This may be necessary if it is desired to produce new additional printed matter with text in a font that matches the earlier printed matter.
Although workers in a printing company may have sufficient experience to identify a particular font, or at least to match it with another very similar font, this is a difficult task for a non-expert user.
It is an object of the present invention to provide a computer-based interactive system for identifying one or more candidate items from amongst a range of possible candidate items.
According to the invention, there is provided a computerbased apparatus for matching an unidentified item with one or more candidate items from amongst a set of a plurality of possible candidate items, comprising data processing means, a memory for storing data, user data entry means for entering information input into the computer, and a user display for displaying information output by the
computer, wherein : - the memory stores data relating to said set ; said set data includes at least one characteristic feature associated with each item in said set, and a plurality of questions each of which when presented to a user on the user display asks about the presence or absence of one or more characteristic features; - the data processing means is arranged to present in stages a series of said questions on the user display, the answers to said questions determining whether or not items remain possible candidate items; and - the questions are selected by the data processing means at each stage in order to eliminate as many candidate items as possible at each stage of questioning.
Also according to the invention, there is provided a method for identifying one or more candidate items from amongst a set of a plurality of possible candidate items, using an apparatus comprising data processing means, a memory for storing data, user data entry means for entering information input into the computer, and a user display for displaying information output by the computer, wherein the method comprises the steps of: - storing data relating to said set in the memory; - including in said set data at least one characteristic feature associated with each item in said set, and a plurality of questions each of which when presented to a user on the user display asks about the presence or absence of one or more characteristic features;
- presenting in stages a series of said questions on the user display, the answers to said questions determining whether or not items remain possible candidate items; and - selecting the questions at each stage in order to eliminate as many candidate items as possible at each stage of questioning.
In order to determine which questions should be asked at each stage of questioning, the data processing means may calculate which questions can be expected to eliminate the greatest number of candidates at each stage of questioning assuming that each of the remaining candidates is equally likely. If it is known that some candidates are more likely than others, for example based on a history of previous question and answer sessions, then the questions can be selected according to this unequal weighting of the remaining candidates.
After all possible questions have been responded to by the user, the computer-based system can present to the user the sole or the minimum number of possible candidates that have been identified The invention will now be described in further detail by way of example, with reference to the accompanying drawings, in which: Figure 1, shows an embodiment of computer-based apparatus for identifying candidate items from amongst a set of a plurality of possible candidate items, including a screen displaying a question and a set of possible answers to that question; and
Figure 2 shows another screen display displaying another question and another set of possible answers to that question.
Figure 1 shows schematically an apparatus 1 for identifying a particular font or typeface. The apparatus includes a microcomputer 2. The microcomputer 2 is in this example a standard personal computer. The computer is connected 4 to a standard user display 6 for displaying information to a user, and also connected 8,12 to a standard keyboard 10 and mouse 14 by which the user may enter information. Optionally, the display 6 is a touch sensitive display by which the user may enter information via the display 6.
The computer 2 includes a microprocessor 16 and a memory 18, which term includes both solid state and disk-based memory.
The memory 18 is loaded with data, referred to herein as "set data", relating to a list of items (items Ii, I2,..
. I) 20, and associated with each item in the list, one or more characteristic features (CF) 22, which in the memory 18 are each represented by a different numerical
value. For example, the first item Il may have three characteristic features CF, CF2 and CF, that are respectively represented in memory by discrete values. The second item I2 may have four characteristic features CF2, CF3, CF, and CF,, that are respectively represented in
memory by four distinct values. In general, an item IN will have"x"characteristic features CF., CFb, Cl.,... that are respectively represented in memory by"x"discrete values, where"x"is an integer number greater than or equal to 1.
The memory 18 also stores a separate list of questions 24 (Ql to Qma) R not associated with any of the items 20 in particular. Associated with each question (Q) 24, are possible answers which when given by the user indicate the presence or absence of characteristic features (CF) 25, and which are in memory represented by distinct values.
For example, the first question Ql may have two characteristic features CF, and CF3. The second question Q2 may have four answers Caf4, CF5, CF, and CF,,. In general, a question QN will have"x"characteristic features CF., CFb, CFc, where"x"is an integer number greater than or equal to 1.
The answer to a question indicates the presence of one of the characteristic features 25 presented in the question, and this is then matched to the characteristic features 22 of the remaining items 20. Alternatively the answer could be"not sure", in which case no match is possible. could be the mutually exclusive answers"affirmative", "negative", or"not sure". This will result in a characteristic feature 25 being present or absent, which is then matched to the characteristic features 22 of the remaining items 20.
In the particular example illustrated in Figure 1, the display 6 shows a question regarding the identification of a font style. Here, a question 124 is presented :"Is the 'a'single-story or double-story?". Three possible answers 125, 225, 325, which are the characteristic features"Double storey"and"Single storey", and the option"not sure", are also presented together with appropriate graphic illustrations 30,31, 32.
This question 124 can be answered either via the keyboard 10 or mouse 12. If the answer is in anything other than "not sure", the processor 16 is then able to eliminate a number of potential candidate items 20 which do not have the characteristic feature 22 of either the single-story razor the double story"a". In this example, it is preferred if the"not sure"answer is included, which indicates that neither of the characteristic features "Double story"or"Single story"can be said to be present, as the user may be trying to identify the font from a section of text not including the lower case letter "a". If the answer is"not sure"then no reduction of potential candidate answers can be made.
Figure 2 shows another example of a screen display 33 where a question 224 has been asked"What is the shape of the dot on the'i'or'j' ?" There are five possible answers 425,525, 625,725, 825 representing the characteristic features"Circular dot", "Square dot", "Diamond-shaped dot"or"No dot"and the option"Not sure". These characteristic features are represented by appropriate graphic illustrations 34,35, 36,37, 38. In this example, it is preferred if the"not sure"answer is included, as the user may be trying to identify the font from a section of text not including the lower case letter "i"or"j". If the answer is"not sure"then none of the characteristic features can be said to be present, and no reduction of potential candidate answers can be made.
The system can be designed to accept more than one answer. For example, if the user selects both"Square dot"and "Diamond dot", then this is equivalent to saying that the characteristic feature"Circular dot"is not present, and a corresponding reduction in the remaining candidate items
can be made.
In the case of font identification, each font can be classified according to a number of features, each of which can have two or more mutually exclusive values.
Therefore, each font is one of the items in the list of items 20, and is characterised by numerous associated characteristic features 22. The identification procedure involves asking a series of the questions 24, each of which determines if a characteristic feature 25 is present, until either a single candidate typeface (i. e. particular item 20) is uniquely identified, or a greatly reduced number of potential candidate typefaces (i. e. a reduced number of items 20) has been identified.
The memory 18 only needs to contain information about characteristic features 22,25 that will be particularly useful in distinguishing a typeface 20 from the other typefaces in the database.
The identification procedure is constructed so that at each stage the processor 16 selects a question 25 that is likely to be most effective in reducing the size of the list of potential typeface candidates 20.
As a result of this, the processor 16 does not ask an inappropriate question. For example, if all the remaining candidates are serif typefaces it will not ask a question relevant only to sans-serif typefaces.
Apart from the first question, the sequence of questions will in general vary from session to session depending on the earlier answers.
The mechanism of the identification procedure is as
follows : The following terms are used in this explanation : An'answer'is the user's reply to a question about a particular characteristic feature, and is a value"v"of that feature.
'Answers'is the list of replies given by the user.
'Potential candidates'is the list of typefaces that remain as potential solutions to the identification procedure, because they have not been eliminated by previous questions.
'Potential questions'are the remaining questions, i. e. those questions relating to characteristic features that have not yet been identified by the user.
At any stage during the identification procedure a state can be defined by the list of answers already given by the user.
It should be noted that although Figure 1 shows the procedure running on a single computer, it would of course be possible for the process to be spread over a network, or the internet, for example with a user at a terminal computer interacting with a host at a remote location. In this case, the host would most likely have the processor 16, and memory 18, while the display 6 and data entry means 10,14 would be at the user's site. In the Web-based implementation of the procedure, these characteristic features 25 are supplied to the remote Web server as an encoded string of letters and digits. Note that this is the only state information that needs to be given to the
processor 16 at each stage in an identification procedure. The Web server processes this response to retain candidate items 20 for which this characteristic feature 22 is present.
From this a list of characteristic features that are present can be built up, together with a list of questions 24 that have already been asked. Initially the list of answers is blank.
Any feature can also have the value'Not sure' corresponding to the case where the user clicks the'Not sure'button in reply to a question. This is handled by a special case in the following procedures: 1) Calculate the set of candidates for the list of answers supplied, as follows. a) Start with a list of all the typefaces 20 in the memory 18. b) For each answer,
- if the value is'Not sure', then ignore it - else remove all typefaces that have a different characteristic feature for that question.
2) If there is only one potential candidate left in the candidates list 20, display it as the identified typeface.
3) If there are no potential candidates left in the candidates list 20, display to the user that no typefaces match the sequence of. answers given by the user.
4) Recalculate the set of potential questions 24 as follows : a) Start with a list of all the questions 24 in the memory 18. b) For each supplied answer 25, remove the question 24 corresponding to that answer.
5) Use the processor 16 to calculate the best question to ask, that is, the question that will eliminate as many possible candidate items as possible at each stage of questioning. This can be done by calculating a rating R for each particular characteristic feature 26 associated with each question 24 amongst the list of remaining potential questions. The rating R is calculated as follows: a) For each value of a potential characteristic feature, define n (v) to be the number of potential candidates that have value v. b) Define a total T to be the sum of all the n (v) values for all potential characteristic features: T = # [n (v)] (1) c) Define N = total number of remaining candidates. d) The rating R can then be defined as: R = E [n (v)- {T-n (v) l]/N (2) R = E [n (v)- {S [n (v)]-n (v)}]/N (3)
For example, if the feature'i dot shape'has values 'square', circle'or'diamond', and there are respectively 20,30, and 50 candidates with these values amongst the remaining potential candidates, then: T = 20 + 30 + 50 = 100 (4) R = [20- (100-20) +30- (100-30) +50' (100-50)]/100 (5) R = [20*80 + 30970 + 50-50]/100 = 62 (6) This is equivalent to the expected number of candidates that will be weeded out by the question.
In the example of equations (4) to (6), there is an assumption that the user is equally likely to give any of the possible answers for any given question. It would, of course, be possible to include other factors into equations (4) to (6) in order to scale the various contributions to the rating from each question according either to a pattern deduced from previous answers, or a known likelihood of various answers based on previous experience or the known occurrence for various items in the item list 20.
6) The next step is to find the feature with the best rating. a) If the best rating is zero none of the remaining questions is any help in narrowing the list of candidates, so report the current list of candidates. b) Otherwise ask the user the question relating to
the characteristic feature with the best rating.
The same procedure can be used to determine the first question to present to the user.
7) Add the answer to the list of answers already supplied.
Additional refinements and modifications can be made to this procedure. For example, the computer 1 (or website hosting the identification service) may include an 'Identify from a sample'feature which allows the user to specify a sample of text, such as a word or sentence. The program them only presents questions involving letters that are present in the sample, or general questions (such as whether the typeface is serif or sans serif).
Optionally, if several potential characteristic features have equal ratings in step 5) above, one corresponding question may be chosen by the processor 16 at random. This makes the identification process different even for the same sequence of answers, which is more entertaining for the user.
Although the invention has been described in terms of a system for identifying fonts, the invention may also be used to identify other types of item from a list of items 20. For example, the invention may be used to help identify house plants, trees, roses, wild flowers, garden pest/blight identification, antiques identification, mechanical spare part identification, or silver mark identification.
The invention can also be used for selecting a software product for a particular application, from amongst a range of possible candidate software products. For example, the database could include information about the range of word-processing packages available. It would present questions such as"What platform do you use ?" with answers
such as"Macintosh","PC","Unix", or questions such as "Do you need mail-merge capabilities ?" and after several such questions it would present a suggested list of suitable software products.

Claims (9)

  1. Claims 1. A computer-based apparatus for matching an unidentified item with one or more candidate items from amongst a set of a plurality of possible candidate items, comprising data processing means, a memory for storing data, user data entry means for entering information input into the computer, and a user display for displaying information output by the computer, wherein: - the memory stores data relating to said set; said set data includes at least one characteristic feature associated with each item in said set, and a plurality of questions each of which when presented to a user on the user display asks about the presence or absence of one or more characteristic features; the data processing means is arranged to present in stages a series of said questions on the user display, the answers to said questions determining whether or not items remain possible candidate items; and - the questions are selected by the data processing means at each stage in order to eliminate as many candidate items as possible at each stage of questioning.
  2. 2. A method for identifying one or more candidate items from amongst a set of a plurality of possible candidate items, using an apparatus comprising data processing
    means, a memory for storing data, user data entry means for entering information input into the computer, and a user display for displaying information output by the computer, wherein the method comprises the steps of: - storing data relating to said set in the memory; - including in said set data at least one characteristic feature associated with each item in said set, and a plurality of questions each of which when presented to a user on the user display asks about the presence or absence of one or more characteristic features; - presenting in stages a series of said questions on the user display, the answers to said questions determining whether or not items remain possible candidate items; and - selecting the questions at each stage in order to eliminate as many candidate items as possible at each stage of questioning.
  3. 3. A method as claimed in Claim 2, in which the plurality of possible candidate items are different fonts.
  4. 4. A method as claimed in Claim 2 or Claim 3, in which the answers include an answer indicating that none of the characteristic features is present, which when selected as the answer to a question results in no reduction in the number of candidate items.
  5. 5. A method as claimed in any of Claims 2 to 4, in which for at least one question, there are at least three possible answers.
  6. 6. A method as claimed in any of Claims 2 to 5, in which for at least one question, more than one answer may be provided.
  7. 7. A method as claimed in any of Claims 2 to 6, in which questions are selected at each stage by calculating a rating value"R"for each particular characteristic feature associated with each question, R being defined by the equation R = [n (v)' {T-n (v)}]/N where: "v"is a value of a particular characteristic feature identified by an answer to said question; "n (v)" is the number of potential candidates that have the value v; "T"is a total number being the sum of all the n (v) values for all potential characteristic features T = [n (v)] ; and "N"is the total number of remaining candidates having possible values v for answers to said question.
  8. 8. A computer-based apparatus for matching an unidentified item with one or more candidate items from amongst a set of a plurality of possible candidate items, substantially as herein described, with reference to the accompanying drawings.
  9. 9. A method for identifying one or more candidate items from amongst a set of a plurality of possible candidate items, substantially as herein described, with reference to the accompanying drawings.
GB0024633A 2000-10-09 2000-10-09 An iterative method for identification of an item from a dataset by selecting questions to eliminate the maximum number of items at each ite tion Withdrawn GB2367914A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0024633A GB2367914A (en) 2000-10-09 2000-10-09 An iterative method for identification of an item from a dataset by selecting questions to eliminate the maximum number of items at each ite tion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0024633A GB2367914A (en) 2000-10-09 2000-10-09 An iterative method for identification of an item from a dataset by selecting questions to eliminate the maximum number of items at each ite tion

Publications (3)

Publication Number Publication Date
GB0024633D0 GB0024633D0 (en) 2000-11-22
GB2367914A true GB2367914A (en) 2002-04-17
GB2367914A9 GB2367914A9 (en) 2002-08-27

Family

ID=9900882

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0024633A Withdrawn GB2367914A (en) 2000-10-09 2000-10-09 An iterative method for identification of an item from a dataset by selecting questions to eliminate the maximum number of items at each ite tion

Country Status (1)

Country Link
GB (1) GB2367914A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8904502B1 (en) * 2011-04-04 2014-12-02 Niels T. Koizumi Systems and methods for rating organizations using user defined password gates

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5842203A (en) * 1995-12-01 1998-11-24 International Business Machines Corporation Method and system for performing non-boolean search queries in a graphical user interface
US5983219A (en) * 1994-10-14 1999-11-09 Saggara Systems, Inc. Method and system for executing a guided parametric search
WO2000075863A2 (en) * 1999-06-04 2000-12-14 Microsoft Corporation Representations and reasoning for goal-oriented conversations
EP1069487A1 (en) * 1999-07-14 2001-01-17 Hewlett-Packard Company, A Delaware Corporation Automated diagnosis of printer systems using bayesian networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5983219A (en) * 1994-10-14 1999-11-09 Saggara Systems, Inc. Method and system for executing a guided parametric search
US5842203A (en) * 1995-12-01 1998-11-24 International Business Machines Corporation Method and system for performing non-boolean search queries in a graphical user interface
WO2000075863A2 (en) * 1999-06-04 2000-12-14 Microsoft Corporation Representations and reasoning for goal-oriented conversations
EP1069487A1 (en) * 1999-07-14 2001-01-17 Hewlett-Packard Company, A Delaware Corporation Automated diagnosis of printer systems using bayesian networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Library High Tech 1999, MCB University Press ISSN 0737-8831 *
SEADLE - Bayesian Indexing: The Next Craze in Search Algorithms - Pg 336-337, Vol 17, *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8904502B1 (en) * 2011-04-04 2014-12-02 Niels T. Koizumi Systems and methods for rating organizations using user defined password gates

Also Published As

Publication number Publication date
GB0024633D0 (en) 2000-11-22
GB2367914A9 (en) 2002-08-27

Similar Documents

Publication Publication Date Title
US6230064B1 (en) Apparatus and a method for analyzing time series data for a plurality of items
Cooper The NewProd system: The industry experience
US7024418B1 (en) Relevance calculation for a reference system in an insurance claims processing system
US7333973B2 (en) Method, apparatus, and computer program product for locating data in large datasets
US7430561B2 (en) Search engine system for locating web pages with product offerings
US7095426B1 (en) Graphical user interface with a hide/show feature for a reference system in an insurance claims processing system
US7343307B1 (en) Dynamic help method and system for an insurance claims processing system
US6778980B1 (en) Techniques for improved searching of electronically stored information
US9811593B2 (en) Cooking recipe information providing device, cooking recipe information providing method, program, and information storage medium
EP2466499A1 (en) Information processing device, information processing method, program for information processing device, and recording medium
JP6836294B2 (en) Search material information storage device
JP2000339351A (en) System for identifying selectively related database record
CN113539457A (en) Medical resource recommendation method and device, electronic equipment and storage medium
CN113641767A (en) Entity relationship extraction method, device, equipment and storage medium
JP2001318939A (en) Method and device for processing document and medium storing processing program
CN116992294B (en) Satellite measurement and control training evaluation method, device, equipment and storage medium
US7328174B2 (en) Sales enhancement system and method for retail businesses
CN117057886A (en) Commodity recommendation method, commodity recommendation device, commodity recommendation equipment and storage medium
WO2002084519A1 (en) A computer system for assessing hazards
GB2367914A (en) An iterative method for identification of an item from a dataset by selecting questions to eliminate the maximum number of items at each ite tion
CN108182608B (en) Electronic device, product recommendation method, and computer-readable storage medium
US8719275B1 (en) Color coded radars
KR102124921B1 (en) Apparatus and method of keyword matching for providing genome four pillars of destiny as known as Sajupalja using additional information from people who are associated with an individual such as family, close friends and colleagues
CN106126638A (en) A kind of problem-posing commodity information consultation method based on search engine technique
US20030046284A1 (en) Automated system for performing Kepner Tregoe analysis for spread sheet output

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)