US9684678B2 - Methods and system for investigation of compositions of ontological subjects - Google Patents
Methods and system for investigation of compositions of ontological subjects Download PDFInfo
- Publication number
- US9684678B2 US9684678B2 US14/694,887 US201514694887A US9684678B2 US 9684678 B2 US9684678 B2 US 9684678B2 US 201514694887 A US201514694887 A US 201514694887A US 9684678 B2 US9684678 B2 US 9684678B2
- Authority
- US
- United States
- Prior art keywords
- data
- composition
- ontological subjects
- ontological
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000000203 mixture Substances 0.000 title claims abstract description 391
- 238000000034 method Methods 0.000 title claims abstract description 205
- 238000011835 investigation Methods 0.000 title abstract description 111
- 238000005192 partition Methods 0.000 claims description 126
- 238000012545 processing Methods 0.000 claims description 96
- 239000013598 vector Substances 0.000 claims description 64
- 238000003860 storage Methods 0.000 claims description 37
- 238000004891 communication Methods 0.000 claims description 15
- 238000011160 research Methods 0.000 claims description 12
- 238000013500 data storage Methods 0.000 claims description 9
- 238000001228 spectrum Methods 0.000 claims description 8
- 150000001875 compounds Chemical class 0.000 claims description 7
- 230000000007 visual effect Effects 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 abstract description 39
- 239000011159 matrix material Substances 0.000 description 97
- 230000006870 function Effects 0.000 description 47
- 238000004422 calculation algorithm Methods 0.000 description 30
- 239000000470 constituent Substances 0.000 description 23
- 230000000875 corresponding effect Effects 0.000 description 22
- 238000010586 diagram Methods 0.000 description 20
- 238000004364 calculation method Methods 0.000 description 19
- 230000015654 memory Effects 0.000 description 19
- 238000001914 filtration Methods 0.000 description 18
- 241000282414 Homo sapiens Species 0.000 description 17
- 238000009472 formulation Methods 0.000 description 16
- 238000004458 analytical method Methods 0.000 description 14
- 238000003491 array Methods 0.000 description 11
- 238000013473 artificial intelligence Methods 0.000 description 11
- 230000008901 benefit Effects 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 9
- 230000002068 genetic effect Effects 0.000 description 9
- 230000004044 response Effects 0.000 description 9
- 108020004414 DNA Proteins 0.000 description 8
- 238000011156 evaluation Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 7
- 230000002452 interceptive effect Effects 0.000 description 7
- 239000000463 material Substances 0.000 description 7
- 238000004590 computer program Methods 0.000 description 6
- 230000010365 information processing Effects 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000001537 neural effect Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000009897 systematic effect Effects 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 4
- 230000014759 maintenance of location Effects 0.000 description 4
- 230000017105 transposition Effects 0.000 description 4
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 3
- 230000007423 decrease Effects 0.000 description 3
- 238000004821 distillation Methods 0.000 description 3
- 230000001747 exhibiting effect Effects 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 108090000623 proteins and genes Proteins 0.000 description 3
- 229910052710 silicon Inorganic materials 0.000 description 3
- 239000010703 silicon Substances 0.000 description 3
- 230000002194 synthesizing effect Effects 0.000 description 3
- 201000008217 Aggressive systemic mastocytosis Diseases 0.000 description 2
- 230000004075 alteration Effects 0.000 description 2
- 210000004556 brain Anatomy 0.000 description 2
- 210000004027 cell Anatomy 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000005352 clarification Methods 0.000 description 2
- 230000001149 cognitive effect Effects 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000001902 propagating effect Effects 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 239000007787 solid Substances 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 102000053602 DNA Human genes 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 108091092724 Noncoding DNA Proteins 0.000 description 1
- 241000219000 Populus Species 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012252 genetic analysis Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 210000000130 stem cell Anatomy 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
- 230000036642 wellbeing Effects 0.000 description 1
Images
Classifications
-
- G06F17/30292—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Definitions
- This invention generally relates to information processing, ontological subject processing, knowledge processing and discovery, computational genomics, knowledge retrieval, artificial intelligence, signal processing, information theory, natural language processing and the applications.
- compositions such as, an alien language composition, a body of knowledge unfamiliar to an individual investigator, a corporate database, a computer code program, a collection of reports, genetic code strings and the like that we do not have any prior information about the meaning and implications of these compositions and the parts therein. Investigating such compositions is of immense interest and value.
- a data processing system such as a computer system comprise of data processing or computing devices/units, data storage units/devices, and/or environmental data acquisitions units/devices, and/or data communication units/devices, and/or input/output units/devices, and/or limbs, to learn as much information and gain knowledge/data by processing compositions of data of various forms and/or become able to produce new knowledge and useful data or compositions of data and/or autonomous decision making according to some codes of conducts.
- a data processing system such as a computer system comprise of data processing or computing devices/units, data storage units/devices, and/or environmental data acquisitions units/devices, and/or data communication units/devices, and/or input/output units/devices, and/or limbs, to learn as much information and gain knowledge/data by processing compositions of data of various forms and/or become able to produce new knowledge and useful data or compositions of data and/or autonomous decision making according to some codes of conducts.
- the present invention discloses a systematic, computer implementable, process efficient and scalable method/s of investigation of all types of compositions of ontological subjects such as textual, data files, networks and graphs, genetic codes, any types of string, and the likes.
- the given methods, algorithms, and services are accompanied with theoretical modeling and mathematical formulations which, once implemented, results in robust and fundamental algorithms and processes for investigating various aspects of a composition and for numerous applications.
- compositions of ontological subjects is viewed as an unknown system or system of knowledge that the purpose of the investigation is to obtain as much worthy information and knowledge about such an unknown system.
- the present invention therefore investigate the “compositions of ontological subjects” or a “body of knowledge” or a “system of knowledge” (as are called from time to time in this disclosure) by providing the investigation methods for identifying the most significant constituent ontological subjects for a given body of knowledge or the given compositions in respect to one or more significance aspect/s.
- the significance aspects generally include the “intrinsic significance aspects” and/or “associational/relational significance aspects”.
- VSM/s in short values significance measures
- association strength measures or ASM for short
- novelty value significance measures or NVSM for short
- XY_VSM in general form
- a composition of ontological subjects or a body of knowledge is break down to it's constituent ontological subjects which are grouped in different set which each set labeled with different orders, from which one or more array of data, respective of the information of the participations of the constituent ontological subjects of different orders into each other, are formed.
- the data therefore is used to evaluate various significance values of the constituent ontological subjects of the different order according to the disclosed measures of various aspects of significance.
- measure/s are given for valuation of “value significances” of the ontological subjects of the composition. These values are intrinsic values of the ontological subjects of the composition based on their significance role which is calculated from the participations pattern/s of the ontological subjects of the composition with each other.
- association strength In another aspect various measures of “association strength” are given from which the relations of ontological subjects of the composition can be revealed. Algorithms and formulations and calculation methods are given to evaluate such “association strength” according to various exemplary association aspects.
- measures are given for evaluating the “relational association strengths” of the ontological subjects of different orders to each other or to one or more target ontological subject.
- measures are given for evaluating the “relational value significances” of the ontological subjects of different orders to each other or to one or more target ontological subject.
- associational novelty value significances are given for evaluating another type of the general “novelty value significance” involving the association of one or more target ontological subjects of the composition or the body of knowledge.
- the values are assigned to a predetermined list of ontological subjects (e.g. one or more of the special words that usually are used to express a particular attribute such as a novelty or a reasoning or concluding remarks, such as ‘therefore, consequently, in spite of, . . . however, but, . . . etc.).
- ontological subjects e.g. one or more of the special words that usually are used to express a particular attribute such as a novelty or a reasoning or concluding remarks, such as ‘therefore, consequently, in spite of, . . . however, but, . . . etc..
- special significance conveyers to pre-selectedly amplify or dampen the significances of such special OSs of a composition in eth final output or result.
- an ontological subjects of a composition is not only represented by a string of characters but also there would be additional vast information available for the ontological subject corresponding to its type/s of significance and relationship with other ontological subjects of the composition. Said additional information or data is learnt, through implementing the methods of current disclosure and the incorporated references herein, from the ways these ontological subjects being used or composed together to make up a composition or more generally to form a body of knowledge.
- association strength measures, significance measure etc. are placed in one or more data structures which can be representative of data arrays corresponding to vectors or matrix for convenience of calculations by data processing devices.
- the data processing devices to carry out the calculations, storing, and data transportation between the various part of one or more computer systems can be selected from such technologies such as electronic or optical based processors, semiconductor based or quantum computers, application specific processing devices and the like.
- the implicit information not recognizable, useable, or appreciable by a human can be extracted, stored and become useable by a data processing system or machine.
- Said data processing system or machine therefore will become able to use its superior processing speed and unmatched, by human, memory capacity or environmental data acquisition capabilities, to perform intelligent tasks.
- intelligent tasks could be, but of course not limited to, conversing intelligently or evaluating a merit of a composition, recognizing visual objects, DNA analysis, knowledge discovery, automatic research and discovery, or composing an essay or a multimedia content, decision making, automatic knowledge discovery, controlling physical action/reaction of a machine to its limbs, management of tasks and sessions, autonomous navigation, and in general such tasks that currently can only be done by human being.
- Intelligent beings of various kinds, technologies, and forms, (e.g. a humanoid robot maid, a genetically modified being, a transportation intelligent beings such a an autonomous car or an autonomous agricultural machine, a robotic explorer, etc.), are exemplary beneficiaries of implementing and employing the methods and systems of the current disclosure.
- the invention provides data processing systems comprising computer hardware, software, internet infrastructure, and other customary appliances of an E-business, cloud computing, distributed networks, and services to perform and execute said methods in providing a variety of services for a client/user's desired applications or to provide a needed or requested data to a human/agent client.
- FIG. 1 shows one exemplary block diagram of a system or a software artifact that generates various outputs from a body of knowledge or a composition according to one embodiment of the present invention.
- FIG. 2 shows one exemplary illustration of the concept of association strength of a pair of OSs according to one embodiment of the present invention.
- FIG. 3 shows one exemplary embodiment of a directed asymmetric network or graph corresponding to a composition of ontological subjects.
- FIG. 4 shows a block diagram of one preferred embodiment of the method and the algorithm for calculating a number of exemplary “Value Significance Measures” of different types for the ontological subjects of a composition according to one embodiment of the present invention.
- FIG. 5 shows one exemplary block diagram of the method and the algorithm of building the “Ontological Subject Maps” (OSM) from the “Association Strength Matrix” (ASM) which is built for and from an input composition according to one embodiment of the present invention.
- OSM Ontological Subject Maps
- ASM Association Strength Matrix
- FIGS. 6 a , 6 b , 6 c show the exemplary values and one way of representing the values of the different conveyers of the different types of the “value significance measures”.
- FIG. 7 shows one exemplary instance of implementing the formulations and algorithm/s illustrating one way of using the “participation matrix” (PM) and the “association strength matrix” (ASM) to calculate the two different types of the associations strength of the OSs of order 2 to the OSs of the order 1, according to one embodiment of the present invention.
- This Figure is to demonstrate the use of various VSM vectors (filters) in the calculations.
- FIG. 8 is an block diagram the system and method of building at least two participation matrixes and calculating VSM for lth order partition, OS l , to calculate the “Value Significance Measures” (VSM) of other partitions of the compositions, OS l+r , and storing them for further use by the application servers according to one embodiment of the present invention.
- VSM Value Significance Measures
- FIG. 9 a block diagram of an exemplary application and the associated system for ranking, filtering, storing, indexing, clustering the crawled webpages, from the internet or other repositories, using “Value Significance Measures” (VSM) according to one embodiment of the present invention.
- VSM Value Significance Measures
- FIG. 10 is an exemplary system of investigating module/s for investigation of composition of ontological subjects providing one or more desired result/data/output according to one embodiment of the present invention.
- FIG. 11 is a block diagram of an exemplary application for investigation of a body of news feeds.
- FIG. 12 is another exemplary general system of using the investigator providing various services to the clients over a communication network (e.g. a private or public) according to one embodiment of the present invention.
- This embodiment shows exemplary general architecture of a system in which one or more of the blocks are optional and can be omitted or one or more blocks can be added.
- FIG. 13 is another exemplary block diagram of a composition investigation service for a client request for service according to one embodiment of the present invention.
- One or more functional modules can be still added to this embodiment and/or one or more of the modules can be removed or disabled.
- FIG. 14 An exemplary system of using the investigator providing various services to the clients in a private or public cloud environment according to one embodiment of the present invention.
- FIG. 15 another exemplary block diagram of a system of providing the various ubiquities service to one or more clients over a network wherein the system can be either localized or distributed according to one embodiment of the present invention.
- a system of knowledge here, means a composition or a body of knowledge in any field, narrow or wide, composed of data symbols such as alphabetical/numerical characters, any array of data, binary or otherwise, or any string of data etc.
- data symbols such as alphabetical/numerical characters, any array of data, binary or otherwise, or any string of data etc.
- a system of knowledge can be defined about the process of stem cell differentiation.
- a picture or a video frame is consists of colored pixels that have participated in a picture to form and convey the information about the picture. Especially some colored pixels of the picture are more significant or play a more distinguishing role in that picture. Moreover their combination or the way or the pattern that they participate together in any small parts or segments of that picture are also important in the way the pixels are conveying the information about the picture to an observer's eyes or a camera.
- composition or a body of knowledge could be a string of genetic codes, a DNA string, or a DNA strand, a whole genome, and the like.
- any system, simple or complicated, can be identified and explained by its constituent parts and the relation between the parts.
- any system or body of knowledge can also be represented by network/s or graph/s that shows the connection and relations of the individual parts of the system. The more accurate and detailed the identification of the parts and their relations the better the system is defined and designed and ultimately the better the corresponding tangible systems will function.
- Most of the information about any type of existing or new systems can be found in the body of many textual compositions. Nevertheless, these vast bodies of knowledge are unstructured, dispersed, and unclear for non expert in the field.
- the purpose of the investigation is to model and gain as much information and knowledge about an unknown system comprised of ontological subjects while the source of the information about such a system is a given composition of ontological subjects wherein the composition is readable by a computer. Therefore, some information about such an unknown system is supposedly embedded in a body of knowledge or system of knowledge or generally in the given composition. The investigator, hence, will have to be able to capture or produce as much knowledge about the system from the information in the given composition.
- the investigation is performed according to at least one significant/important aspect in the investigation of bodies of knowledge (i.e. compositions).
- the “investigation important aspect” can, for example, be one or more of the following goals:
- the “investigation important aspect” is to identify a relationship between two or more significant parts of the composition, the investigator may perform the following:
- the present invention gives a number of such investigation goals and the methods of achieving the desired outcome. Moreover, the present invention provides a variety of tools and investigation methods that enables a user to deal with investigation of compositions of ontological subjects for any kind of goals and any types of the composition.
- OS Ontological Subjects
- the “significance aspects”, based on which the significances of the OSs of compositions are defined and calculated, are various that can be looked at.
- one “significance aspect” could be an intrinsic significance of an OS which shows the overall or intrinsic significance of an OS in a body of knowledge.
- Another significance aspect is considered to be a significant aspect in relation or relative to one or more of the OSs of the body of knowledge.
- Yet another significance aspect is considered to be an intrinsic novelty value of an OS in a body of knowledge or a composition. And yet another significance aspect is defined as a relative or relational novelty value of an OS related to one or more of the OSs of the body of knowledge or a composition.
- a “significance aspect” is the orientation that one can use to reason on how to put a significance value on an ontological subject of a composition or a body of knowledge.
- a “significance aspect” is a qualitative quality that can polarize or differentiate the ontological subjects and be used to define “value significance measures” and consequently suggest or construct various value functions or significance weighting functions on the ontological subjects of a composition or a body of knowledge.
- relational value significances are defined here.
- the relational value significances are instrumental in clustering a collection of composition or clustering partitions of composition in regards to one or more of a target OS or the parts of the system of knowledge.
- Ontological Subject means generally any string of characters, but more specifically, characters, letters, numbers, words, binary codes, bits, mathematical functions, sound signal tracks, video signal tracks, electrical signals, chemical molecules such as DNAs and their parts, or any combinations of them, and more specifically all such string combinations that indicates or refer to an entity, concept, quantity, and the incidences of such entities, concepts, and quantities.
- Ontological Subject/s and the abbreviation OS or OSs are used interchangeably.
- Ontological Subjects can be divided into sets with different orders depends on their length, attribute, and function. Basically the order is assigned to a group or set of ontological subjects having at least one common predefined attribute, property, attribute, or characteristic. Usually the orders in this disclosure are denoted with alpha numerical characters such as 0, 1, 2, etc or OS1, OS2, etc. or any other combination of characters so as to distinguish one group or set of ontological subjects, having at least one common predefined characteristic, with another set or group of ontological subjects having another at least one common characteristic.
- This order/s will also be reflected in denoting/corresponding the data objects or the mathematical objects in the formulations to distinguish these data objects in relation to their corresponding ontological subject set or its order, as will be used and introduced throughout this disclosure.
- ontological subjects of textual nature one may characterizes or label letters as zeroth order OS, words or multiple word phrases as the first order, sentences or multiple word phrases as the second order, paragraphs as the third order, pages or chapters as the fourth order, documents as the fifth order, corpuses as the sixth order OS and so on.
- the order can be assigned to a group or set of ontological subjects based on at least one common predefined characteristic of the members of the set.
- a higher order OS is a combination of, or a set of, lower order OSs or lower order OSs are members of a higher order OS.
- bits can be defined as zeroth order OS, the bytes as first order, any sets of bytes as third order, and sets of sets of bytes, e.g. a frame, as fourth order OS and so on.
- the pixels with different color can be regarded as first order OS, a set whose members contain two or more number of pixels (e.g. a segment of a picture) can be regarded as OSs of second order, a set whose members contain of two or more such segments as third order OS, a whole frame as forth order OS, and a number of frames (like a certain period of duration of a movie such as a clip) as fifth order and so on. Therefore definitions of orders for ontological subjects are arbitrary set of initial definitions that one can stick to in order to make sense of the methods and mathematical formulations presented herein and being able to interpret the consequent results or outcomes in more sensible and familiar language.”
- COMPOSITION is an OS composed of constituent ontological subjects of lower or the same order, particularly text documents written in natural language documents, genetic codes, encryption codes, data files, voice files, video files, and any mixture thereof.
- a collection, or a set, of compositions is also a composition. Therefore a composition is in fact an Ontological Subject of particular order which can be broken to lower order constituent Ontological Subjects.
- the preferred exemplary composition is a set of data containing ontological subjects, for example a webpage, papers, documents, books, a set of webpages, sets of PDF articles, multimedia files, or even simply words and phrases.
- compositions and bodies of knowledge are basically the same and are used interchangeably in this disclosure. Compositions are distinctly defined here for assisting the description in more familiar language than a technical language using only the defined OSs notations.
- a partition of a composition in general, is a part or whole, i.e. a subset, of a composition or collection of compositions. Therefore, a partition is also an Ontological Subject having the same or lower order than the composition as an OS. More specifically in the case of textual compositions, parts or partitions of a composition can be chosen to be characters, words, sentences, paragraphs, chapters, webpage, documents, etc.
- a partition of a composition is also any string of symbols representing any form of information bearing signals such as audio or videos, texts, DNA molecules, genetic letters, genes, and any combinations thereof.
- partitions of a composition in this disclosure is word, sentence, paragraph, page, chapters, documents, sets of documents, and the like, or WebPages, and partitions of a collection of compositions can moreover include one or more of the individual compositions. Partitions are also distinctly defined here for assisting the description in more familiar language than a technical language using only the general OSs definitions.
- SIGNIFICANCE MEASURE assigning a quantity, or a number or feature or a metric for an OS from a set of OSs so as to assist to distinguishing or selecting one or more of the OSs from the set. More conveniently and in most cases the significance measure is a type of numerical quantity assigned to a partition of a composition. Therefore significance measures are functions of OSs and one or more of other related mathematical objects, wherein a mathematical object can, for instance, be a mathematical object containing information of participations of OSs in each other, whose values are used in the decisions about the constituent OSs of a composition.
- “Relational, and/or associational, and/or novel significances” are one form or a type of the general “significance measures” concept and are defined according to one or more the aspect of interest and/or in relation to one or more OSs of the composition.
- FILTRATION/SUMMARIZATION is a process of selecting one or more OS from one or more sets of OSs according to predetermined criteria with or without the help of value significance and ranking metric/s.
- the selection or filtering of one or more OS from a set of OSs is usually done for the purposes of representation of a body of data by a summary as an indicative of that body in respect to one or more aspect of interest.
- searching through a set of partitions or compositions, and showing the search results according to the predetermined criteria is considered a form of filtration/summarization.
- finding an answer to a query e.g. question answering, or finding a composition related or similar to an input composition etc. is also a form of searching through a set of partitions and therefore are a form of summarization or filtration according to the given definitions here.
- the methods and systems that are devised here is to solve the proposed problem of investigating compositions of ontological subjects through algorithmic manipulating and assigning and calculating various “value significance” quantities to the constituent ontological subjects of a composition or a network of ontological subjects. It is further to disclose the methods of measuring the significance of the value/s so that the right “Value Significance Measure/s (VSM)”, can be defined, synthesized, and be calculated for a desired aspect of investigation and be used for further processing of many related applications or other measures.
- VSM Value Significance Measure/s
- the methods and systems of the present invention can be used for applications ranging from document classification, search engine document retrieval, news analysis, knowledge discovery and research trajectory optimization, question answering, computer conversation, spell checking, summarization, categorizations, categorization, clustering, distillation, automatic composition generation, genetics and genomics, signal and image processing, to novel applications in economical systems by evaluating a value for economical entities, crime investigation, financial applications such as financial decision making, credit checking, decision support systems, stock valuation, target advertising, and as well measuring the influence of a member in a social network, and/or any other problem that can be represented by graphs and for any group of entities with some kind of relations or association.
- the “Participation Matrix” is a matrix indicating the participation of one or more ontological subjects of particular order in one or more partitions of the composition.
- PM indicate the participation of one or more lower order OS into one or more OS of higher or the same order.
- PM/s are the most important array of data in this disclosure that contains the raw information from which many other important functions, information, features, and desirable parameters can be extracted. Without intending any limitation on the value of PM entries, in the exemplary embodiments throughout most of this disclosure (unless stated otherwise) the PM is a binary matrix having entries of one or zero and is built for a composition or a set of compositions as the following:
- Participation Matrix (usually a binary matrix) of the order kl (PM kl ) which can be represented as:
- PM kl OS 1 l ⁇ ⁇ ... ⁇ ⁇ OS M l OS 1 k ⁇ OS N k ( pm 11 kl ... pm 1 ⁇ M kl ⁇ ⁇ ⁇ pm N ⁇ ⁇ 1 kl ... pm NM kl ) ( 1 )
- the desired criteria, in the step 2 above, can be, for instance, to only select the content words or select certain partitions having certain length or, in another instance, selecting all and every word or character strings and/or all the partitions.
- PM carries much other useful information.
- Another applicable example is using PM data to obtain the “frequency of occurrences” of ontological subjects in a given composition by: FO i k
- l ⁇ j pm ij kl (4) wherein the FO i k
- the latter two examples are given to demonstrate on how one can conveniently use the PM and the disclosed method/s to obtain many other desired data or information.
- l PM kl *(PM kl ) T (5), where the “T” and “*” show the matrix transposition and multiplication operation respectively.
- the COM is a N ⁇ N square matrix. This is the co-occurrences of the ontological subjects of order k in the partitions (ontological subjects of order l) within the composition and is one indication of the association of OSs of order k evaluated from their pattern of participations in the OSs of order l of the composition.
- l is an element of the “Co-Occurrence Matrix (COM)” and (in the case of binary PMs) essentially showing that how many times OS i k and OS j k has participated jointly into the selected OSs of the order l of the composition.
- COM can also be made binary, if desired, in which case only shows the existence or non-existence of a co-occurrence between any two OS k .
- the “co-occurrence matrix” as defined in this disclosure is that it carries or contain the information of relationship and associations of the OSs of the composition which is further utilized in some embodiments of the present invention.
- the co-occurrences of ontological subjects can also be obtained by looking at, for instance, co-occurrences of a pair of ontological subject within certain (i.e. predefined) proximities in the composition (e.g. counting the number of times that a pair of ontological subjects have co-occurred within certain or predefined distances from each other in the composition) as was used in the incorporated reference the U.S. patent application Ser. No. 12/179,363.
- there are other ways to count the frequency of occurrences of an ontological subjects i.e. the FO i k
- the preferred embodiment is an efficient way of calculating these quantities or objects and should not be construed as the only way implementing the teachings of the present invention.
- the repeated co-occurrences of a pair of ontological subjects within certain proximities is an indication of some sort of association (e.g. a logical relationship) between the pair or else it would have made no sense to use them together in one or more partitions of the composition.
- each raw of the PM can be stored in a dictionary, or the PM be stored in a list or lists in list, or a hash table, or a SQL database, or any other convenient objects of any computer programming languages such as Python, C, Perl, Java, etc.
- Such practical implementation strategies can be devised by various people in different ways.
- the PM entries are binary for ease of manipulation and computational efficiency.
- those skilled in the art can store, process or represent the information of the data objects of the present application (e.g. list of ontological subjects of various order, list of subject matters, participation matrix/ex, association strength matrix/ex, and various types of associational, relational, novel, matrices, various value significance measures, co-occurrence matrix, participation matrices, and other data objects introduced herein) or other data objects as introduced and disclosed in the incorporated references (e.g.
- the PMs, ASMs, OSM or co-occurrences of the ontological subjects etc. can be represented by a matrix, sparse matrix, table, database rows, no sql databases, JSON, dictionaries and the like which can be stored in various forms of data structures.
- each part, section, or any subset of the objects of the current disclosure such as a PM, ASM, OSM, RNVSM, NVSM, and the like or the ontological subject lists and index, or knowledge database/s can be represented and/or stored in one or more data structures such as one or more dictionaries, one or more cell arrays, one or more row/columns of an SQL database, or by any implementation of No SQL database/s of different technologies or methods etc., one or more filing systems, one or more lists or lists in lists, hash tables, tuples, string format, zip format, sequences, sets, counters, JSON, or any combined form of one or more data structure, or any other convenient objects of any computer programming languages such as Python, C, Perl, Java, JavaScript etc.
- Such practical implementation strategies can be devised by various people in different ways.
- the processing units or data processing devices e.g. CPUs
- the processing units or data processing devices must be able to handle various collections of data. Therefore the computing or data processing units to implement the system have compound processing speed equivalent of one thousand million or larger than one thousand million instructions per second and a collective memory, or storage devices (e.g. RAM), that is able to store large enough chunks of data to enable the system to carry out the task and decrease the processing time significantly compared to a single generic personal computer available at the time of the present disclosure.”
- the computing or executing system includes or has processing device/s such as graphical processing units for visual computations that are for instance, capable of rendering, synthesizing, and demonstrating the content (e.g. audio or video or text) or graphs/maps of the present invention on a display (e.g.
- the methods, teachings and the application programs of the presents invention can be implement by shared resources such as virtualized machines and servers (e.g. VMware virtual machines, Amazon Elastic Beanstalk, e.g. Amazon EC2 and storages, e.g. Amazon S3, and the like etc.
- virtualized machines and servers e.g. VMware virtual machines, Amazon Elastic Beanstalk, e.g. Amazon EC2 and storages, e.g. Amazon S3, and the like etc.
- specialized processing and storage units e.g. Application Specific Integrated Circuits ASICs, field programmable gate arrays (FPGAs) and the like
- ASICs Application Specific Integrated Circuits
- FPGAs field programmable gate arrays
- the data communication network to implement the system and method of the present invention carries, transmit, receive, or transport data at the rate of 10 million bits or larger than 10 million bits per second;”
- storage device refers to all types of no-transitory computer readable media such as magnetic cassettes, flash memories cards, digital video discs, random access memories (RAMSs), Bernoulli cartridges, optical memories, read only memories (ROMs), Solid state discs, and the like, with the sole exception being a transitory propagating signal.”
- RAMSs random access memories
- ROMs read only memories
- Solid state discs and the like, with the sole exception being a transitory propagating signal.”
- This section begins to concentrate on value significance evaluation of a predetermined order OSs by several exemplary embodiments of the preferred methods to evaluate the value of an OS of the predetermined order, within a same order set of OSs of the composition, for the desired measure of significance.
- value significance measure various measures of value significances of OSs in a body of knowledge or a composition
- values significance measure can be calculated for evaluating the value significances of OSs of different orders of the compositions or different partitions of a composition.
- these various measures (usually have intrinsic significances) are grouped in different types and number to distinguish the variety and functionalities of these measures.
- the first type of a “value significance measure” is defined as a function of “Frequency of Occurrences” of OS i k is called here FO i k
- l ⁇ 1 (FO i k
- l ), i 1,2, . . . N (6) wherein FO i k
- ⁇ 1 in Eq. 6 is a predetermined function such that ⁇ 1 (x) might be a liner function (e.g. ax+b), a power of x function (e.g. x 3 or x 0.53 ), a logarithmic function (e.g. 1/log 2(x)), or 1/x function, etc.
- ⁇ 1 (x) might be a liner function (e.g. ax+b), a power of x function (e.g. x 3 or x 0.53 ), a logarithmic function (e.g. 1/log 2(x)), or 1/x function, etc.
- l (stands for number one of type one “value significance measure”) for instance, can be defined as: vsm_1_1 i k
- l c ⁇ FO i k
- l of Eq. 7 gives a high value to the most frequent OS k .
- less frequent OSs are of more significance one may use the following vsm_1_2 i k
- l is defined as a function of the “Independent Occurrence Probability” (IOP) in the partitions such as: vsm_2 i k
- l ⁇ 2 (iop i k
- l ), i 1 . . . N (9) wherein the independent occurrence probability (iop i k
- l i.e. the number 1 type 2 vsm
- l ⁇ log 2 (iop i k
- l ), i 1 . . . N (11)
- l is defined as a function of the “co-occurrence of an OS k with others as: vsm_3 i k
- l ⁇ 3 (com ij k
- l ), i 1 . . . N (12) wherein the com ij k
- l can be defined as: vsm_3_1 i k
- l ⁇ 3 (com ij k
- l ) ⁇ j com ij k
- l , i 1 . . . N (13).
- This measure (Eq. 13) once combined with other measures can yet provide other measures. For instance when it is being divided by the vsm_1_1 i k
- l can be defined using the one or more of the other vsm i k
- l of type 4 (x 4) as function of vsm_1_2 i k
- l ⁇ 4 (vsm_1_2 i k
- l ) ⁇ i (com ij k
- l ) (1/FO i k
- l ) T ⁇ COM, i,j 1 . . . N (14) wherein “T” stands for matrix or vector transposition operation and wherein we substitute the vsm_1_2 i k
- index “i” refers to the row number and the index “j” refers to the column number therefore the matrices with only the subscript of “i” usually are the column vectors and the matrices with only the subscript of “j” usually are row vectors.
- l generally refer to the intrinsic value significance of an OS in the BOK.
- vsm_x i k are more indicative of intrinsic importance or significances of lower order constituent part that can be use to separate one or more of the these OSs for variety of applications such as labeling, categorization, clustering, building maps, conceptual maps, ontological subject maps, or finding other significant parts or partitions of the composition or the BOK.
- l can readily be employed to score a set of document or to select the most import parts or partitions of a composition by providing the tools and objects to weigh the significances of parts or partitions of a BOK.
- kl thereafter can be utilized for scoring, ranking, filtering, and/or be used by other functions and applications based on their assigned value significances.
- This section look into another important attributes of the ontological subjects of a composition that is instrumental and desirable in investigating the composition of ontological subjects.
- association strengths between the ontological subjects of a composition or a BOK play an important role in investigating, analyzing and modification of compositions of ontological subjects.
- association strength measures are introduced and disclosed here.
- the “association strength measures” play important role/s in many of the proposed applications and also in calculating and evaluating the different types of “value significance evaluation” of OSs of the compositions.
- the values of an “association strength measure” can be shown as entries of a matrix called herein the “Association Strength Matrix (ASM k
- l The entries of ASM k
- l ⁇ (com ij k
- vsm_x i k and/or the vsm_y j k are the same as vsm_x i k
- FIG. 2 shows one definition for association of two or more OSs of a composition to each other and shows how to evaluate the strength of the association between each two OSs of composition.
- association strength of each two OSs has been defined as a function of their co-occurrence in the composition or the partitions of the composition, and the value significances of each one of them.
- FIG. 2 moreover shows the concept and rational of this definition for association strength according to this disclosure.
- the larger and thicker elliptical shapes are indicative of the value significances, e.g. probability of occurrences, of OS i k and OS j k in the composition that were driven from the data of PM kl and wherein the small circles inside the area is representing the OS l s of the composition.
- the overlap area shows the common OS l between the OS i k and OS j k in which they have co-occurred, i.e. those partitions of the composition that includes both OS i k and OS j k .
- the co-occurrence number is shown by com ij k
- COM Co-Occurrence Matrix
- association strength measures asm i ⁇ j k
- l com ij k
- l . . . i,j 1 . . . N
- l com ij k
- association strength defined by Eq. 16 is not usually symmetric and generally asm j ⁇ i k
- association strength measure that in this application is labeled as asm_3_2 i ⁇ j k
- l are the individual entries of the COM k
- l are the “independent occurrence probability” of OS i k and OS j k in the partitions respectively, wherein the occurrence is happening in the partitions that are OSs of order l.
- the un-normalized “association strength measure” of each OS with itself is proportional to its frequency of occurrence (or self occurrence).
- association strength measure i.e. Eq. 20
- Eq. 20 basically states that if a less popular OS co-occurred with a highly popular OS then the association of the less poplar OS to the highly popular OS is much stronger than the association of the highly popular OS with the less popular OS (remembering the co-occurrence is a symmetric). That make sense, since the popular OSs obviously have many associations and are less strongly bounded to anyone of them so by observing a high popular OSs one cannot gain much upfront information about the occurrence of less popular OSs. However observing occurrence of a less popular OSs having strong association to a popular OS can tip the information about the occurrence of the popular OS in the same partition, e.g. a sentence, of the composition.
- association strength measure As:
- OS is occurring less frequently and whenever it has occurred it has appeared more often with one particular OS then the association bond of the less frequently occurring OS is strongest with the particular OS that has co-occurred with, the most.
- it's highest associated bond is from the OS with less independent occurrence probability.
- l is the column normalized version of the asm_3_2 i ⁇ j k
- l of Eq. 20 (when c 1/M in Eq. 21 and assuming binary PM) and is more useful in some instances and applications.
- This particular association strength measure can reveal a strong relationship from a less significant OS to the one who has co-occurred the most and is a useful measure to hunt for some types of novelty.
- association strength definition asm_4_1 i ⁇ j k
- l c ⁇ com ij k
- l i,j 1 . . . N (22).
- l attributes the strongest association bond from a first OS, say OS i k , to a second OS, say OS j k , when the product of their co-occurrences and the independent probability of occurrence of the second OS is the highest.
- This association strength measure usually is useful for discovering the real association of two important or significant OSs of the composition.
- this measure can be defined to hunt for mutual associations bonds such as word phrases as the following:
- association strength measure (ASM_ x 1 k
- the Eq. 26 takes into account the transformative or hidden association of OSs of order k (e.g. words of a textual composition or BOK) from one asm measure and combines with the information of another or the same asm measure to gives another measure of association that is not very obvious or apparent from the start.
- This type of measure therefore takes into account the indirect or secondary associations into account and can reveal or being used to suggest new or hidden relationships between the OSs of the compositions and therefore can be very instrumental in knowledge discovery and research.
- association strength measures of Eq. 17-26 is to find the real associates of a word, e.g. a concept or an entity, from their pattern of usage in the partitions of textual compositions. Knowing the associates of words, e.g. finding out the associated entities to a particular entity of interest, finds many applications in the knowledge discovery and information retrieval. In particular, one application is to quickly get a glance at the context of that concept or entity or the whole composition under investigation. The choice and the evaluation method of the association strength measure is important for the desired application. Furthermore, these measures can be directly used as a database of semantically associated words or OSs in meaning or semantic.
- composition under investigation is the entire (or even a good part of) contents of Wikipedia
- entity e.g. a word, concept, noun, etc.
- association strength measures one can also obtain and derive various other “value significance measures” which poses more of intrinsic type of significances.
- value significance measures e.g. Eq. 20-26
- l e.g. Eq. 20-26
- l few exemplary “value significance measures”, i.e. vsm i k
- OS j k we want to find out the strongest “associated with” OS (assume it found out to be the OS i k ). To do that we can use Eq. 21. Also one can use the Eq. 22 to find out which OS the given OS, say OS i k , is highly “associated to” (assume it was found out to be the OS j k ).
- Eq. 26 is an important tool for knowledge discovery. For instance this measure can be used to hunt for the subject matters that can in fact be highly related, but one cannot find their relations in the literature explicitly.
- association strength values are important for many applications.
- One or more of such applications is to cluster or to find hidden relationships between the partitions of the compositions.
- the asm i ⁇ j of the lower order OSs can show the association strength of the higher order OSs of the composition thereby to use them for clustering, categorization, scoring, ranking and in general filtering and manipulating the higher order OSs.
- RASM Relational Association Strength measure
- kl rasm_1 i l j k l ⁇ k
- kl (PM kl ) T ⁇ ASM k
- l is generally a square asymmetric matrix, whose transpose is not equal to itself, and therefore there could be envisioned another, also important, type of “relational association strength measure”. Accordingly, in the same manner the “second type relational association strength measure” can be defined and calculated as: RASM_2 l ⁇ k
- kl rasm_2 i l j k l ⁇ k
- kl (PM kl ) T ⁇ ASM k
- kl is the “second type relational association strength measure” of OSs of order l to OSs of order k, which is also a M ⁇ N matrix and is similar to RASM_1 l ⁇ k
- partitions e.g. sentences or paragraphs etc.
- kl can be used also to find out the association strength or relatedness of particular OS of order k (e.g. the j k th word of the composition) to a particular OS of order l (e.g. the i l th sentence of the composition) by having the following relationship: RASM_ x k ⁇ l
- kl (RASM_ x l ⁇ k
- kl “Relational Association Strength Measure” of type x is to remind the fact that these types of association strength are not only between a higher order OS (e.g. a sentence, paragraph, or a document) with a lower order OS (e.g. a word or a keyword, phrase etc) but it is, in an indirect way, also between a higher order OS and the associations of a lower order OS.
- the name for the other way around relationship i.e. RASM_x k ⁇ l
- association strength between the OSs of order l e.g. an association strength measure between sentences of a textual composition
- kl rasm_ x i l j l l ⁇ l
- kl RASM_ x l ⁇ k
- kl , i l ,j l 1,2, . . .
- kl is indicative of one type of “relational association strength measure” between ith OS of order l and jth OS of order l.
- This matrix is particularly useful to find or select the higher order OSs of the composition or the partitions (e.g. sentences or paragraphs, or documents), that are highly associated with each other. In some applications, though, it would be desirable, for instance, to find out the partitions that have the least amount of associations with any other partitions etc.
- one or more of these “related associations measures” can be used (either normalized or not) to define and/or synthesize new RASMs.
- Participation Matrix/es By the same manner using “Participation Matrix/es” and other objects, other desired features can be quantified in a composition or a BOK and consequently make it possible to select, clustered, or filter out the desired part or parts of the composition to look into, investigate, modified, re-composed, etc.
- Eqs. 27-30 make it easy to find the partitions of the compositions that have the highest relatedness or highest relative association with a keyword or the other way around etc. Therefore a computer implemented method utilizing these formulations can essentially filters out the most related parts or partitions of a composition in relation to a target keyword.
- One immediate application is for scoring the relatedness of group of documents to a subject matter or a keyword.
- kl and the formulation, for instance, is to cluster and separate partitions of a BOK or a large corpus/s, etc into sets of partitions that are related to a particular subject matter.
- the relatedness is measured by one or more of the above measures and partitions that exhibited an association strength value greater (or sometimes smaller) than a predetermined threshold to a particular OS, can be grouped or clustered together.
- association strength data structures usually in the form a matrix therefore is instrumental to build such cognitive networks for variety of tasks in general and for building neural nets in particular.
- the training iteration and the resource needed to train a neural net is significantly reduced using the information of the association strengths (and various other data objects or data structures introduced in this disclosure) of the ontological subjects obtained by investigating a body of knowledge as taught through this disclosure.
- FIG. 11 shows the procedure in which using the concept of “value significance” selected a number of head category are selected from those OSs exhibiting the highest value significances, and consequently using the “related association strength measure” concept it was possible to separate the very many different news feeds into different categories automatically with satisfactory accuracy.
- RVSM relative or “relational value significance measures”
- RVSM Relational value significance measure
- an OS i l l in relation to the target OS j k k when operates (multiply) on the participation matrix PM kl , as the following: rvsm_1_ x i l j k l ⁇ k
- kl (pm i k i l kl ) T ⁇ asm_ y i k ⁇ j k k
- kl stands for type 1 of number x “relational value significance measure” of OSs of order l, OS i l l , to a given OS j k k which is a row vector and is obtained by processing the participation data of OS k in OS l or in other words it has been driven from the data of PM kl and y is indicative the type of the “association strength measure”.
- the x and y are the same type. Accordingly, as can be seen in this embodiment the first type “relational value significance measure”, rvsm_1 i l j k l ⁇ k
- Eq. 31, once executed, will assign values to OS l in which it amplifies the importance or significance values of the partitions (e.g. sentences) of the composition that contains the OSs (e.g. words) that have the highest association strength to the target OS j k (i.e. a target keyword) thereby to provide an instrument, i.e. a filtering function, for scoring and consequently selecting one or more highly related partitions to an OS j k .
- an instrument i.e. a filtering function
- the Eq. 31 can also be written in a matrix form wherein the rvsm i l j k l ⁇ k
- kl is a kind of “relational value significance measure” and can be used as, say, “first type relational value significance measure” (e.g. can be shown by RVSM_1 notation).
- RVSM_1 therefore, following the Eqs. 27 and 31, can be given in the matrix form as: RVSM_1_ x l ⁇ k
- kl RASM_1 l ⁇ k
- kl rvsm_1 i l j k l ⁇ k
- kl (PM kl ) T ⁇ ASM k
- l is a N ⁇ N matrix and RASM_1 l ⁇ k
- RVSM_2 notation a second type relative value significance measure (e.g. can be shown by RVSM_2 notation).
- kl rvsm_2 i l i k l ⁇ k
- kl (PM kl ) T ⁇ (ASM k
- kl RASM_2 l ⁇ k
- kl is indicative of a degree that an OS of order l, OS i l , (e.g. sentences) containing the OSs of order k, OS k (e.g. the words) that are used to explain or express or provide information regarding the target OS j k (i.e. containing the words that are highly associated with the target OS).
- kl is indicative of a degree that an OS i l (e.g sentences) containing the OS k (e.g. the words) for which the target OS i k is used or participated to explain or express or provide information about them (i.e. containing the words that the target OS is highly associated with).
- kl vsm j k k
- kl vsm j k k
- RVSM_4 i l j k l ⁇ k
- kl vsm j k k
- kl vsm j k k
- kl can be rewritten as: RVSM_ x i l j k l ⁇ k
- kl ⁇ x (vsm j k k
- kl put an intrinsically high value on the significance of the partitions that are highly related to the high value significance OS k of the composition by taking the intrinsic value of the target OSs into account. Therefore these measures can be instrumental to, for example, representing a body of knowledge with the highest relational value significance or to summarize a composition. To do so one can simply select one or more partition of the BOK that scored the highest for these measures in order to present it as summary of a composition.
- kl rvsm_ x i l j l l ⁇ l
- kl RVSM_ x l ⁇ k
- kl ) T , i l ,j l 1,2, . . .
- kl is the relative value significance measure between OSs of order l so that it can directly measure the relatedness of partitions of the BOK such as sentences, paragraphs, or documents to each other. Again this measure therefore can readily be used to find the highly related partitions of the BOK either for retrieval purposes, rankings, document comparisons, question answering, conversation, or clustering and the like.
- the retrieved documents or the parts thereof should be the most relevant document and partition to a target OS which could be a keyword or set of keywords or even a composition itself.
- a target OS which could be a keyword or set of keywords or even a composition itself.
- value significance measures can readily be applied using the method of this discloser to retrieve and present the most relevant part (e.g. a word, a sentence, a paragraph, a chapter, a document) to the sought after subject matter or in response to a query.
- NVSM novelty value significance measures
- compositions yet other value significance measures are introduced and explored herein.
- this aspect of investigation in some instances it would become desirable to have found the words or the partitions of a composition expressing novel information about one or more subject matter/s.
- an instrument or a function to measure a novelty value of a subject matter e.g. an OS of the composition
- a novelty measure for the partitions it would become practical to spot the novel information and/or the partitions of the composition carrying novel information in the context of that compositions or a set of compositions or generally a body of knowledge (BOK) as we defined before.
- BOK body of knowledge
- NVSM novelty value significance measures
- the first step is to define what constitute a novelty in the context of a BOK and identify different aspects that there is into a novelty investigation.
- Novelty is an attribute that is related to newness, surprising factors, entropy, not being well known, not seen before, and unpredictability.
- this attributes depends very much on the context and in relations to other ontological subjects of the compositions. For instance something which is new in one domain or context might be an obvious thing in another domain. Or something that is new now, it might become very well known fact after sometimes.
- novelty in news aggregation novelty of the news is very much related to the time of the news being broken and how many other news agencies have published the same news story. Therefore the novelty should be measured in relation to the context, time, and other partitions of the compositions.
- we look for novelty or novelties in the given composition for investigation and since we can treat time and/or a time stamp as an OS our method of investigation, therefore, would also work for time-related compositions such as news, as well.
- a valuable novelty occurrence is relational (i.e. more than one OS is participated where the novelty occurs) which should be investigated in the context of a composition.
- a body of knowledge BOK
- One of the situations is a novel relationship between two or more OSs in which case there could yet be envisioned at least two notable and important situations.
- a type of “relational novelty value significance measure” can be assigned to spot a novel or less known relationship between two important OSs.
- the relational novel value should be high because the two significant OSs are less seen with each other in a part or partitions of a composition or a BOK. Therefore the desired “relational novel significance measure” should be proportional to the value significances of each of the OSs and be inversely proportional to their “association strength bond”.
- l stands for type one “relational novelty value significance measure” of OS i k to the OS j k . This measure can be used to hunt for those partitions that contain two or more significant OSs expressing less known relationship. Therefore this measure will give a high value to the pair of the OSs, that are intrinsically significant, and more likely the expressed relationship to be credible and significant yet their relationship with each other is of novelty in the context of the BOK.
- Another situation of novel relationship between two or more OSs is a type of novelty between two OSs in which the novelty reveals less known information about one important OS of the interest (e.g. a target keyword, a high value significance subject of a BOK, etc.), regardless the significance of the other OSs.
- the intrinsic value of the target OS e.g. an intrinsic vsm
- the less known associations can be a guide to find the novel part or partitions or statement of a relationship between a significant OS with other OSs of the composition.
- l stand for the second type “relational novelty value significance measure” OS i k to the OS j k .
- This measure put a high relational novelty value on the pairs that at least one of them, e.g. the target OS, have a high intrinsic value (i.e the vsm of the OS j k ) while the other ones are the ones that had the lowest co-occurrences with the target OS.
- This measure can be used to spot the partitions that are novel and significant but perhaps the expressed relationship, between the two OSs, by the partition, is less credible.
- this type of novelty value should be proportional to the value significance of the second OS, e.g. a target OS, and be inversely proportional to the value significance of the less significant OS and also be inversely proportional to their co-occurrences so that:
- l stand for the third type of “relational novelty value significance measure” OS i k to the OS j k . This measure can be used to spot highly novel but perhaps even less credible partitions of the BOK than what is found by the rnvsm_2 i ⁇ j k
- the significance and relational novelty value should be inversely proportional to the significances, i.e. VSMs, of each of the OSs and also proportional to their co-occurrences so that: rnvsm_4 i ⁇ j k
- This measure can be used to spot a highly novel relationship between two less known OSs but with some credibility. This measure can be used to spot the rare partitions that might be irrelevant to the context of the BOK but is important to be looked at.
- l stands for the fifth type of “relational novelty value significance measure” OS i k to the OS j k . This measure can be used to spot a highly novel relationship between two less known OSs but with even less credibility than rnvsm_4 i ⁇ j k
- This measure can be used to spot the noise like partitions that might be irrelevant to the context of the BOK but might be essential to be looked at such as crime investigation or financial analysis, fraud detections and the like. This measure also can be used to filter out the irrelevant or noisy part of the composition, or be used in data compression, image compression and the like.
- a measure of relational novelty value can be defined based on their association strengths to each other as: rnvsm_6 i ⁇ j k
- This measure of novelty amplifies the asymmetry of the association strength value between the two OSs and therefore serves as a measure of anomaly and novelty, both too large and too small a value for this measure can point to a novelty situation.
- to have a symmetric rnvsm using asm one might consider the following measure:
- l stands for the seventh type of “relational novelty value significance measure” OS i k to the OS j k . This measure is particularly good to spot any symmetric kind of novelty or anomaly between OS i k to the OS j k . When the value of this measure is large then there is a novelty situation to look at between OS i k to the OS j k .
- l (OS i k ,OS j k ) g 2 (vsm i k
- l (OS i k ,OS j k ,OS p k ) ⁇ 1 ⁇ rnvsm_ x 1 k
- l (OS q k ,OS p k ) and q 1,2 . .
- kl (pm i k i l kl ) T ⁇ rnvsm_ x i k ⁇ j k k
- kl (PM kl ) T ⁇ RNVSM_ x k
- kl is the type x (x 1, 2, . . . ) “relational novelty value significance measure” of the partitions or OSs of order l to the OSs of the order k. It is noticed that RNVSM_x l ⁇ l
- kl is a M ⁇ N matrix indicating the type x (x 1, 2, . . .
- kl RNVSM_ x l ⁇ k
- kl stands for the “relational novelty value significance measure” of type x between the OSs of the order l, which is a M ⁇ M matrix.
- This measure and the data of such matrix can be used to find a novel partition, exhibiting a predetermined range of “relational novelty value”, for a given partition.
- these measures can be combined with other measures to obtain the desired parts of the compositions that one is looking for (e.g. in response to a query or a question).
- anvsm_ ⁇ 1 i ⁇ j k ⁇ l ⁇ ( OS i k , OS j k ) ⁇ ( asm_x1 p ⁇ i k ⁇ l ⁇ asm_x2 p ⁇ j k ⁇ l ) asm_x3 i ⁇ j k ⁇ l , ⁇ p 1 , 2 , ... ⁇ ⁇ N ( 51 )
- l is indicative of the first type “association novelty value significance measure”
- the “ ⁇ ” shows the inner product or scalar multiplication of the asm_x1 p ⁇ i k
- This measure of novelty gives a high l value to the relational novelty of those pairs that exhibit strong hidden association correlation but they are not explicitly strongly bonded. This measure is particularly useful for detecting hidden relationships between two OSs of interest, i.e. OS i k and OS j k and can be used to spot the cases worthy of further research and investigation (e.g. in scientific discovery, medical, crime investigation, genetics, market research and financial analysis etc.).
- l is also one of the “relational novelty value significance measures” but in here it is preferred to be given a more distinct name as “association novelty value significance measure” (ANVSM) in order to have a distinct category for this kind of “value significance measure” in general.
- association novelty value significance measure ANVSM
- y1 and y2 indicates the types and numbers of the “value significance measure” used in this formula.
- the proportionality factor can be adjusted to account for normalization of the vectors when desired.
- Eq. 51 can be re written in matrix form in general terms which is more useful as: ANVSM_1 k
- l [(ASM_ x 1 k
- l are column or row normalized.
- Eq. 51, 52 and 53 are generally the exemplary cases of the general form of: anvsm_ x i ⁇ j k
- l (OS i k ,OS j k ) g 3 (vsm_ y 1 i k
- l ), . . . p,i,j 1,2, . . . N, (54) wherein g 3 is predetermined or predefined function and y1, y2, x1 . . . x4 etc refer to the selected type of the respective kind and type of the “value significance measure”.
- NVSM novelty value significance measure
- l h 1 (iop i k
- l ), i 1,2, . . . N (55)
- h 1 is a predetermined function such as h 1 (x) be a liner function (e.g. ax+b), power of x (e.g. x 3 or x 0.53 ), logarithmic (e.g. a/log 2(x)), 1/x, etc wherein a or b might be scalar constant or a vector.
- l c /iop i k
- l , i 1,2, . . . N (56) wherein c might be a scalar or a constant vector.
- l c /log b (iop i k
- l ), i 1,2, . . . N (57) or in another instance: nvsm_1_3 i k
- l c ⁇ log b (1/iop i k
- l ) ⁇ c ⁇ log b (iop i k
- l ), i 1,2, . . . N (58) or yet in another instance:
- nvsm_ ⁇ 1 ⁇ _ ⁇ 4 i k ⁇ l - c ⁇ log b ⁇ ( iop i k ⁇ l ) iop i k ⁇ l ( 59 )
- b is a constant
- c could be constant or a vector.
- c can be an auxiliary vector that when multiplies to other vectors it suppresses or dampen the value of particular OSs of the compositions such as the generic words in a textual composition.
- l is in fact obtained by multiplication of the nvsm_1_1 i k
- the novelty is observed in relation or combination with other OSs since novelty could occurs in a context and therefore in relation to other ontological subjects.
- the stand alone or the intrinsic “novelty value significance value” in this case is defined as sum of the novelty that an OS will have with a desired number of other OSs.
- l (OS i k ) c ⁇ j rnvsm_ x i ⁇ j k
- l (OS j k ) c ⁇ i rnvsm_ x i ⁇ j k
- l (OS j k ) c ⁇ j anvsm_ x i ⁇ j k
- l (OS j k ) c ⁇ i anvsm_ x i ⁇ j k
- l OS i k
- h NVSM_1 k
- h predetermined function
- y is the type and number of the particular NVSM k
- the parameters, vectors, and matrices of the present invention are transformation of the information hidden in the participation matrix which can be used for different applications with ease, convenience and efficiency to investigate various aspects of interests in the BOK such as extracting the most significant parts or partitions, finding the highly associated concepts or parts and partition, finding the novel part/s or partition/s of the BOK, finding the best piece of informative part of the composition, clustering and categorization of the partitions of the composition or the BOK, ranking and scoring partitions of a composition based on their relatedness to a subject matter (e.g. a query), excluding one or more partitions or OSs of the BOK or suppressing their role in the analysis, and numerous other application.
- a subject matter e.g. a query
- the mathematical objects and data arrays can be easily transformed to other forms, filtered out the desired part or segment of a matrix, amplify or suppress the role of one or more of the OSs of the composition and/or their values being altered numerically without needing to manipulate the input composition string or file.
- the matrices or vectors being normalized in order to make the comparisons more meaningful in the context of the BOK. Accordingly one or more of such mathematical objects and data arrays (vectors, matrices etc.) can and might be desired to become column or row normalized or further being multiplied by other matrices or vectors as a mask or filter etc.
- all these matrices can be regarded as an adjacency matrix for a corresponding graph wherein the matrix carry the data of the connectivity between the nodes or objects of the graph. Therefore, from these connectivity matrixes one can proceed to calculate a corresponding eigenvalue equation/s in order to estimate and calculate other types of desirable value significance measure or in general any type of value significance.
- These measures of value calculated from the corresponding eigenvalue equations of the matrices are generally indication of intrinsic significance values of the OSs. For instance in the non-provisional U.S. patent application Ser. Nos.
- 12/547,879, 12/755,415 and 12/939,112 one or more of these matrices have been used to calculate the significance values of the OSs of the composition based on their centralities of the corresponding node in the graph that could be represented by that matrix.
- the centrality value can be, for instance, be the values of largest eigen vector of the eigen value as described in the application Ser. Nos. 12/547,879, 12/755,415 and 12/939,112 which are incorporated here as references.
- VSM values e.g vectors
- these vectors or filter can be designed in such a way to amplify the significances of proper sentences of compositions written in a particular natural language such as English.
- the objective can be to give significance to particular types of partitions of the composition having of particular feature/s, attribute/s, or form/s.
- those selected OS e.g. words or phrases such as “therefore”, “as a result”, “hence”, “consequently”, “so that” . . . etc.
- one might have list of OSs that it is not desirable to participate in the calculation e.g. stop words
- These pre-assigned vectors are called “special cases conveyers” herein or “significance value conveyer vectors” as shown in FIG. 6 c , that can be used solely or in combinations with other VSM value vectors to obtain the desired functionality from the investigation.
- These conveyers are assigned and used based upon the goal of investigation.
- the special conveyers can be designed and altered for various stage of the process and can be used in different stages of calculations and processes.
- the participation matrix can, for instance, routinely being transformed to other types of objects or participation matrices by operating one or more vector or matrices on the PM. For example one can multiply the PM by a diagonal matrix (M by M) from the right side whose diagonal values are the reciprocal of the number of constituent OSs of order k in the partitions or the higher order OS of order l.
- the “resulting PM” matrix will become a column normalized PM and values of the entries will become the weighted participation factor.
- the PM matrix can be multiplied from the left side by a diagonal matrix (N by N) whose entries are a vector that will put a value on the OS of the order k so that their participation weight will be altered.
- N the diagonal of the left matrix
- the role of those particular words (e.g. the generic words) in the computations will be suppressed as well, without having to manipulate the original string of the compositions in order to achieve the same goal of suppressing the role of generic words.
- auxiliary vectors i.e. filters
- filters can be built to dampen the significance of particular OSs of the composition by multiplying those vectors on the resulting vector objects such as one or more of the different types and number of the “value significance measures” vectors or matrices.
- the method/s can conveniently be used for compositions of different nature such as data file compositions, e.g. audio or video signals, DNA string investigation, textual strings and text files, corporate reports, corporate databases, etc.
- data file compositions e.g. audio or video signals
- DNA string investigation e.g. DNA string investigation
- textual strings and text files e.g. textual strings and text files
- corporate reports e.g. corporate databases
- the investigation method disclosed herein can be readily used to investigate image and video files, such as spotting a novelty in an image or picture or video, edge detection in an image, feature/s extraction, compression of image and video signals, and manipulating the image etc.
- the disclosed methods of the present invention can readily be applied in applications such as, artificial intelligence, neural network training and learning, network training, machine learning, computer conversation, approximate reasoning, as well as computer vision, robotic vision, object tracking etc.
- the disclosed frame work along with the algorithms and methods enables the people in various disciplines, such as artificial intelligence, robotics, information retrieval, search engines, knowledge discovery, genomics and computational genomics, signal and image processing, information and data processing, encryption and compression, business intelligence, decision support systems, financial analysis, market analysis, public relation analysis, and generally any field of science and technology to use the disclosed method/s of the investigation of the compositions of ontological subjects and the bodies of knowledge to arrive the desired form of information and knowledge desired with ease, efficiency, and accuracy.
- disciplines such as artificial intelligence, robotics, information retrieval, search engines, knowledge discovery, genomics and computational genomics, signal and image processing, information and data processing, encryption and compression, business intelligence, decision support systems, financial analysis, market analysis, public relation analysis, and generally any field of science and technology to use the disclosed method/s of the investigation of the compositions of ontological subjects and the bodies of knowledge to arrive the desired form of information and knowledge desired with ease, efficiency, and accuracy.
- those skilled in the art can store, process or represent the information of the data objects of the present application (e.g. list of ontological subjects of various order, list of subject matters, participation matrix/ex, association strength matrix/ex, and various types of associational, relational, novel, matrices, co-occurrence matrix, participation matrices, and other data objects introduced herein) or other data objects as introduced and disclosed in the incorporated references (e.g.
- the PMs, ASMs, OSM or co-occurrences of the ontological subjects etc. can be represented by a matrix, sparse matrix, table, database rows, dictionaries and the like which can be stored in various forms of data structures.
- each layer of the a Pm, ASM, OSM, RNVSM, NVSM, and the like or the ontological subject index, or knowledge database/s can be represented and/or stored in one or more data structures such as one or more dictionaries, one or more cell arrays, one or more row/columns of an SQL database, one or more filing systems, one or more lists or lists in lists, hash tables, tuples, string format, zip format, sequences, sets, counters, or any combined form of one or more data structure, or any other convenient objects of any computer programming languages such as Python, C, Perl, Java, JavaScript etc.
- Such practical implementation strategies can be devised by various people in different ways.
- the processing units or data processing devices e.g. CPUs
- the processing units to implement the system have compound processing speed equivalent of one thousand million or larger than one thousand million instructions per second and a collective memory, or storage devices (e.g. RAM), that is able to store large enough chunks of data to enable the system to carry out the task and decrease the processing time significantly compared to a single generic personal computer available at the time of the present disclosure.”
- the goal of the investigation is to produce a useful data, information, and knowledge from a given or accessed composition/s, according to at least one aspect of significance or the goal/s of the investigation.
- the result of the investigation can be represented in various forms and presentation style and various devices of modern information technology (private or public cloud computing, wired or wireless connections, etc.).
- the interaction between a client and an investigator, employing one or more of the disclosed algorithms, can be facilitated through various forms of data network accessibility to an investigator through various interfaces such as web interfaces, or data transferring facilities.
- the result of the investigation can be displayed or provided in various forms such as interactive page/device environment, graphs, reports, charts, summaries, maps, interactive navigation maps, email, image, video compositions, voice or vocal compositions, different nature composition such as transformation of a textual composition to visual or vice versa, encoded data, decoded data, data files, etc.
- a goal of investigation can be to finding out the OSs of the composition scoring significant enough novelty value in the context of the given BOK or an assembled BOK wherein the OSs of the composition can be words, phrases, sentences, paragraphs, lines, document or the like for the BOK under investigation.
- Another exemplary goal of investigation can be to get a summary of the credible statements from a BOK or to modify a part or partitions of a composition (e.g. a document, an image, a video clip etc.).
- another instance of investigation can be to obtain a map of relations between the most significant parts or partitions of the BOK.
- a patent attorney, inventor, or an examiner can use the disclosed method to plan his/her claim drafting by investigation the application disclosure and get the most valuable or novel part of the disclosure to draft the claims.
- the method can be used for examining the application in comparison to one or more collection of one or more patent application disclosures.
- an intelligent being e.g. a software bot/robot a humanoid, a machine, or an appliances
- a provider of such services e.g. conversing and doing tasks, or entertaining, or assisting in knowledge discover etc.
- FIG. 1 it depicts one general flow process and the system that can provide one or more exemplary investigation's result, as services, utilizing the algorithms and the methods of the present invention.
- the required variables or the mathematical or data objects e.g. the matrices and the vectors values etc
- building the various filter one can design, synthesize, and compose an output according to her/his/it's need or goal of investigation or informational requirements and for an input composition. For example if one applications calls for getting the most credible and valuable partitions of an input compositions then she/he/it must chose (or select through an interface) the corresponding filter (i.e.
- the suitable XY_VSM/s and algorithm/s for which to obtain such a credible glance or summary of the composition.
- the user or the designer of such system and service can synthesize the suitable filter, using the tools, measures and methods of the present invention to provide the desired response, output or the service.
- the input composition is used to build or generate the one or more participation matrices while the ontological subjects of different orders are grouped, listed, and kept in the short term or more permanent storage media.
- the actual OSs or the partitions usually are used at the end of the processing and calculations of the desired quantity or quantities, when they are fetched again based on their corresponding value for one or more measures of the values introduced in previous sections.
- the system will calculate the desired mathematical objects such as COM, ASM/s, the desired VSM/s, one or more RASM if needed for the desired service, one or more RVSM/s if needed for the service, one or more of NVSM/s, or RNVSM/s or ANVSM/s if desired and so on.
- desired mathematical objects such as COM, ASM/s, the desired VSM/s, one or more RASM if needed for the desired service, one or more RVSM/s if needed for the service, one or more of NVSM/s, or RNVSM/s or ANVSM/s if desired and so on.
- These data objects are used to synthesize the required filter to provide the desired functionality once it operated on the PM.
- the output is further investigated for selection of suitable OSs of the composition for further processing or re-composing or presentation.
- the output can be presented in predetermined form/s or format, such as a file, displaying on a web-interface or an interactive web-interface, encoded data in a particular format for using by another system or software agent, sending by email, being displayed in a mobile device, projector and the like over a network, or sent to a client over the internet and the like.
- the desired mode of operation is to find out the novel partitions of the composition exhibiting enough novelty value while having enough significance then the corresponding filter will use the RNVSM of the Eq. 39 for finding, scoring and consequently selection of the suitable partitions for this requested service.
- composition data are transformed or transported into participation matrix/matrices then we only deal with numerical calculations that will determine the value of the members of the listed OSs and (based on their index in the list or based on their row or column number in the participation matrix) once the value for the corresponding measure was calculated then those OSs that exhibited the desirable value or range of values are selected by the selector or a composer that provide the output data or content, e.g. as service, according to predetermined formats for that service.
- association strength measure/s plays an important role in investigation of the composition of ontological subjects as well as providing the data that is valuable itself. That is, knowing the association strength of OSs to each other is important and can be used to build many other applications especially in artificial intelligence applications.
- FIG. 2 it is shown one general form of conceptualizing and defining the association strength measures and consequently calculating the association strength values for those measures.
- the association strength of the OSs of order k that have co-occurred in one or more OSs of order l is given by a function of their number of co-occurrence and the value/s respective of one or more of the “value significance measure/s” (e.g independent probability of occurrence).
- the “value significance measure/s” e.g independent probability of occurrence
- any composition of ontological subjects can in principal be represented by a graph which in this preferred embodiment shown as an asymmetric graph.
- the exemplified graph is corresponded to one of the exemplary “association strength matrix”, i.e. an ASM, as representative of its adjacency matrix.
- the nodes represent the desired group of OSs and the edge or arrows show the link between the associated nodes and the values on the edges are representative of the association strength from one node to the connected one.
- This figure is to graphically exemplify and depicts that compositions of ontological subjects and a network of ontological subjects can basically be investigated and dealt with in the same manner according to the teachings of the present invention.
- FIG. 4 there is shown again another embodiment for the process of calculating various value significance measures in more details.
- the data of the input composition is transformed to calculable quantities and data from which, employing the above methods and formulations, the desired value significance measures are calculated and/or are stored in the storage areas for further use or being used by other processes or programs or clients.
- FIG. 5 therefore shows the block diagram of one basic exemplary embodiment in which it demonstrates a method of using the association strengths matrix (ASM) to build an “Ontological Subject Map (OSM)” or a graph.
- ASM association strengths matrix
- OSM Ontological Subject Map
- the map is not only useful for graphical representation and navigation of an input body of knowledge but also can be used to evaluate the value significances of the OSs in the graph as explained in the patent application Ser. No. 12/547,879 entitled “System and Method of Ontological Subject Mapping for knowledge Processing Applications” filed on Aug. 26, 2009 by the same applicant. Utilization of the ASM introduced in this application can result in better justified Ontological Subject Map (OSM) and the resultant calculated significance value of the OSs.
- OSM Ontological Subject Map
- the association strength matrix could be regarded as the adjacency matrix of any graphs such as social graphs or any network of any thing.
- the graphs can be built representing the relations between the concepts and entities or any other desired set of OSs in a special area of science, market, industry or any “body of knowledge”.
- the method becomes instrumental at identifying the value significance of any entity or concept in that body of knowledge and consequently be employed for building an automatic ontology.
- l and other mathematical objects can be very instrumental in knowledge discovery and research trajectories prioritizations and ontology building by indicating not only the important concepts, entities, parts, or partitions of the body of knowledge but also by showing their most important associations.
- values of different types of value significance measures can be shown as a vector in a multidimensional space.
- XY_VSM/s in general are matrices that might also carry the relational value significances but still any row or column (as shown in FIG. 6 a ) of them can be shown as discrete vectors in a multidimensional space. These discreet vectors can also be treated as discrete signals in which they can be further be used for investigation of the compositions.
- XY_VSM Some types of XY_VSM, that are intrinsic, are vectors (e.g. FIG. 6 b ) for which they can readily be used to weigh other OSs or the partitions of the composition. Also shown in FIG. 6 c are some of the vectors that might be “special conveyer vectors” labeled with “significance conveyer vectors” in the FIG. 6 c and are usually predefined or predetermined that can be used for filtering out and/or dampening or amplifying and/or shaping/synthesizing the VSMs of one or more of the predetermined OSs of the composition. FIG. 6 c demonstrate that special conveyer vectors or VSM have basically the same characteristics as other XY-VSM except the values might have been set in advance.
- FIG. 7 shows one way of demonstrating (e.g. schematically) how two exemplary value significance vectors can be extracted from an exemplary “association strength matrix” (asm) which in this instance are also shown to be used to evaluate the associations of OSs of order l (e.g. sentences) to particular OS of order k (e.g. a word or keyword or phrase).
- association strength matrix asm
- FIG. 7 is for further clarification and instantiation of the actual meaning and their use and the way to manipulate and use, deal, and calculate the variables and data or mathematical objects that were introduced in the previous sections.
- the disclosed processes and methods with the given formulations should be enough for those of ordinary skilled in the art to enable them to implement, execute, and apply the teachings of the present invention.
- an OS of order l can be selected by the investigator based on its strength of association to one or more OSs of the order k.
- the calculation and the selection method of OSs of order l can find an important application in document retrieval, question answering, computer conversation, in which a suitable answer or output is being south from a knowledge repository (e.g. a given composition) in response to the input query or composition.
- a knowledge repository e.g. a given composition
- an input statement or a query is parsed to its constituent OSs of order k and from the association strength matrix (which might be constructed from and for said knowledge repository) then the mostly related partitions of the stored composition (i.e.
- the knowledge repository is retrieved in response of an input query which is a conversational statement or a question.
- the mostly related partition of the knowledge repository can be the partition (OS of order l) that has scored the highest average or cumulative association to the constituent OSs of the input query.
- the mostly related partition of the knowledge repository might have scored the highest, for example, after multiplication of the association strength vectors of the OSs of the input query in the association strength matrix that have been built from the knowledge repository.
- FIG. 8 shows, in schematic, a block diagram of an exemplary system as well as the process of further clarification as how to use the “value significances” data of one or more OSs of particular order to evaluate and calculate the one or more “value significances” of OSs of another order using the one or more XY_VSM and one or more participations matrix.
- the XY in the FIG. 8 is the indication, and can be replaced with the desired type and number combination, of the desired “value significance measure”. Therefore XY_VSM in FIG. 8 can be replaced with any of the different types of the “value significance measures” (such as RVSM, NVSM, ARASM, RSVM, etc.).
- the data objects can be stored, if desired, for later use so that the pre-calculated data and objects are pre-made and can easily be retrieved for the corresponding compositions and the desired application.
- the pre-made stored data can be used to accelerate and speeding up the process of composition investigation in a system that provide such a service/s to one or more clients.
- FIG. 9 shows an instance of clustering and ranking, and sorting of a number of webpages fetched from the internet for example, by crawling the internet.
- This is to demonstrate the process of indexing and consequently easily and efficiently finding the relevant information related to a keyword or a subject matter.
- This is the familiar but very important application and example of the present invention to be used in search engines.
- the pages/documents/compositions are investigated so that the associations of the desired part or partitions of such collections are calculated to other desired OSs of the collection of the compositions.
- Now, in such a exemplary search engine once a client enter a query or a keyword, it would be straightforward to find the most relevant document, page, or composition to the input query, i.e. or a target OS.
- association strength matrix/es (indicated by XASM) or RVSMs etc.
- using the disclosed algorithms make it possible to retrieve the documents with the highest degrees of relevancy to the input query or the target OS.
- This is one of the very important applications and implication of the disclosed teachings and materials, since, as is experienced by many users of the commercial search engines; the relevancy of retrieved documents to the input query has been and is a major challenge in improvement of the search engine performance.
- employing the investigation methods of present invention through its various measures, make it possible to quickly and reliably retrieve the most semantically related document/page to the input query.
- special OSs can be selected for which the association strength of pages are to be calculated.
- special OSs can be the content words such as nouns or named entities. Nevertheless there would be no limitation on the selection or choice of the target OS and they can basically be all possible types of words, or even sentences and higher orders partitions.
- OSs of high value significance can be identified so that the whole composition (i.e. the whole collection of the documents or pages) can be clustered or categorized into bodies of knowledge under one or more target subject matter or head categories (e.g. the high value OSs of lower order, such as words or phrases).
- the target OSs could usually be the keywords or phrases, or the words or any combinations of the characters, such as dates, special names, etc.
- the target OSs of such composition could be the extracted sentences, phrases, paragraphs, or even a whole document and the like.
- a service provider system such as a search engine, question answering or computer conversing, which comprises or having access to the system of FIG. 9
- the system can simply parse the input query and extract all or some of the words of the input query (i.e. the OSs of order one) then by having calculated the associations strength of rasm_x 1 ⁇ 5
- the documents e.g. web-pages
- the engine can return for instance the document or the web-page that composed of the partitions of high novelty values, either intrinsic or relative, to the target OS/s. Therefore the engine can also filters out and present the documents or webpages that have most relevancy to the desired “significance aspect” based on the user preferences. So if novelty or credibility or information density of a document, in the context of a BOK, is important for the user then these services can readily be implemented in light of the teachings of the present invention.
- FIG. 10 shows schematically a system of composition investigations that can provide numerous useful data and information to a client or user as a service. Such output or services in principal can be endless once combined in various modes for different application. However in the FIG. 10 a few of the exemplary and important and desirable outputs are illustrated.
- the FIG. 10 illustrates a block diagram system composed of an investigator and/or analyzer and/or a transformer and/or a service provider that can receive or access a composition and provide a plurality of data or content as output.
- the investigator in fact implement at lease one of the algorithms of calculating one of the measures in order to assign a value on the part or partitions of the compositions and based on the assigned value process one or more of the partitions or OSs of the particular order as an output in the form of a service or data.
- the output could be simply one or more tags or OS/s that the input composition can be characterized with, i.e. significant keywords of the composition.
- the significant keywords or labels are selected based on their values corresponding to at least one of the aspectual XY_VSM, i.e. one of the value significance measures.
- the output or outcome of the investigator of FIG. 10 could be to provide the partitions of the input composition which have exhibited intrinsic value significances of above a predetermined threshold.
- Another output could be the novel parts or the OSs of the compositions that scored a predetermined level of a particular type of novelty value significance.
- the output could be the noisy part of a composition or a detected spam in a collection of compositions etc.
- FIG. 10 Several other output or services of the system of FIG. 10 are depicted in the FIG. 10 itself which are, in light of the foregoing, self explanatory.
- FIG. 11 shows another instance and application of the present invention in which the process, methods, algorithms and formulations used to investigate a number of news feeds and/or news contents automatically and present the result to a client.
- the news are being first categorized automatically through finding the significant head-categories and consequently clustering and bunching the news into or under such significant head-categories and then select one or more partitions of such cluster to represent the content of that clustered news to a reader.
- Head-categories can simply being identified, by evaluating at least one of the significance measures introduced in the present invention, from those OSs that have exhibited a predetermined level of significance.
- the predetermined level of significance can be set dynamically depends on the compositions of the input news.
- FIG. 12 shows one general embodiment of a system implementing the process, methods and algorithms of the present invention to provide one or more services or output to the clients.
- This figure further illustrates the method that a particular output or service can in practice being implemented.
- the provider of the service or the outputs can basically utilizes various measures to select from or use the various measures to synthesize the desired sought after part/s of an input compositions.
- a feature to be noticed in this embodiment is that the system not only might accept an input composition for investigation but also have access to banks of BOKs if the service calls for additional resources related to the input composition or as result of input composition investigation and the mode of the service.
- the 12 has a BOK assembler that is able to assemble a BOK from various sources, such as internet or other repositories, in response to an input request and performs the methods of the present invention to provide an appropriate service or output data or content to one or more client.
- the filtration can be done is several parallel or tandem stages and the output could be provided after any number the step/s of filtrations.
- the filters F 1 , F 2 , . . . F n can be one of the significance measures or any combinations of them so as to capture the sought after knowledge, information, data, partitions from the compositions.
- the output and the choice of the filter can be identified by the client or user as an option beside several defaults modes of the services of the system.
- FIG. 12 Another block in the FIG. 12 to mention is the post-processing block that in fact has the responsibility to transform the output of the filter/s into a predetermined format, or transform the output semantically, or basically composing a new composition as a presentable response to a client from the output/s of the filters of the FIG. 12 .
- a representation mode selection that based on the selected service the output is tailored for that service and the client in terms of, for instance, transmission mode, web-interfacing style, frontend engineering and designs, etc.
- FIG. 12 shows a network bus that facilitate the data exchange between the various parts of the system such as the BOK bank (e.g. containing file servers) and/or other storages (e.g. storages of Los 1 , Los 2 , Los 3 , etc. and/or list storage/data wherein Los stands for List of the Ontological Subjects and, for instance, Los 1 refers to the list of the OSs of order 1) and/or the processing engine/s and/or application servers and/or the connection to internet and/or connection to other networks.
- the BOK bank e.g. containing file servers
- other storages e.g. storages of Los 1 , Los 2 , Los 3 , etc. and/or list storage/data wherein Los stands for List of the Ontological Subjects and, for instance, Los 1 refers to the list of the OSs of order 1) and/or the processing engine/s and/or application servers and/or the connection to internet and/or connection to other networks.
- FIG. 13 shows another general embodiment block diagram of a system providing at least one service to a client.
- a composition investigator wherein the investigator has access to a bank of bodies of knowledge or has access to one or modulus that can assemble a body of knowledge for client.
- Such said module can for example use search engines to assemble their BOK or from another repository or database.
- the system can also provide one or more of the services of the FIG. 10 to a client.
- the system is connected to the client through communication means such as private or public data networks, wireless connection, internet and the like and either can receive a composition from the client or the system can assemble a composition or a body of knowledge for the client and/or the system can enrich or add materials to the client's input composition and perform the investigation and provide the result to the client.
- communication means such as private or public data networks, wireless connection, internet and the like and either can receive a composition from the client or the system can assemble a composition or a body of knowledge for the client and/or the system can enrich or add materials to the client's input composition and perform the investigation and provide the result to the client.
- the system can automatically identifies the related subject matters to the input composition and go on to assemble one or more BOK related to at least one of the dominant OSs of the input composition and offer further services or output such as the information regarding the degree of novelty of the input composition in comparison to one or more of said BOK/s and/or score the input composition in terms of credibility or overall score of the merits of the input compositions in comparison to the said BOK/s and/or identify the substantially valuable and/or novelty valuable part or partitions of the input composition back to the user or other clients or agents.
- a software/hardware module for composition comparisons that provide one or more of the services or the output data of the just exemplified application.
- the mentioned exemplary application and service can, for instance, be of immense value to the content creators, genetic scientists, or editors and referees of scientific journals or in principal to any publishing/broadcasting shops such as printed or online publishing websites, online journals, online content sharing and the like.
- Such a system can further provide, for instance, a web interface with required facilities for client's interaction/s with the system so as to send and receive the desired data and to select one or more desired services from the system.
- a client can, for examples, be a machine, human, another software agent, an intelligent being, a remote server, or the like.
- One of such optional modulus can be a module for client and computer or the client and system converse or conversation. The conversations is done in such a way that the system of this exemplary embodiment with the “converse module” receives an input from a client and identifies the main subject/s of the input and provide a related answer with the highest merit selected from its own bank of BOK/s or a particular BOK or an available composition.
- the response from the system to the client can be tuned in such a way to always provide a related content according to a predetermined particular aspect of the conversation.
- the client might choose to receive only the content with highest novelty yet credibility value from the system.
- the “converse module” and/or the investigator module will find the corresponding piece of content (employing one or more of the “XY value significant measure”) from their repositories and provided to the user.
- the user can demand to receive the most significant yet credible piece of knowledge or content related to her/his/it's input.
- the client/system conversation hence, can be continued.
- Such conversation method can be useful and instrumental for variety of reasons/applications such as entertainment, amusement, educational purpose, questions and answering, knowledge seeking, customer relationship management and help desk, automatic examination, artificial intelligence, and very many other purposes.
- the system for instance can be used as a system of providing or generating visual and/or multimedia content as introduced the U.S. patent application Ser. No. 12/908,856 entitled “System And Method Of Content Generation”, filed on Oct. 20, 2010, and or using the value significance measures and the maps and indexes to automatically generate content compositions as introduced in the U.S. patent application Ser. No. 12/946,838, filed on Nov. 15, 2010, now U.S. Pat. No. 8,560,599 B2 entitled: “Automatic Content Composition Generation”, which were incorporated entirely as references in this application
- FIG. 14 further exemplifies and illustrates an embodiment of a system of composition investigation that one or more client are connected to the system directly and one or more clients can optionally be connected to the system through other means of communications such as private or public data network such as wireless networks or internet.
- the whole system can be a private system providing such services to its user or the system is composed of several hardware and necessary software modules over a private network wherein the users can use the services of composition investigation by the system directly or over the network.
- Such a system can in one configuration being characterized as a private cloud computing facilities capable of interacting with clients and running the one or more of the process and algorithms and/or implement and execute one or more of the relational value significance calculations processes or implementation of one or more of the formulas or equivalent process in their software module/s to provide data/content and/or a desirable service of composition investigation to one or more client.
- FIG. 15 shows another exemplary instance of ubiquities system and service provider in which the system can/might be a distributed system and is using resources from different locations in order to perform and provide one or more of the services.
- One or more of the function performs as shown in FIG. 15 might be physically located across a distributed network. For instance one or more of the calculations, or one or more of the servers, the front end server, or the client's computer or device can be located in different places and still the services is performed over a distributed network.
- an ISP who is facilitating the connection for a client to such a distributed network is regarded as the service provider of such service. Therefore a facilitator that facilitated (e.g. through a switch, router or a gateway etc.) at least some of the request or response data either from the client or from any part of such a distributed service is regarded as instance of such a service provider system.
- the computing or executing system includes or has processing device/s such as graphical processing units for visual computations that are for instance, capable of rendering and demonstrating the graphs/maps of the present invention on a display (e.g.
- the methods, teachings and the application programs of the presents invention can be implement by shared resources such as virtualized machines and servers (e.g. VMware virtual machines, Amazon Elastic Beanstalk, e.g. Amazon EC2 and storages, e.g. Amazon S3, and the like etc.
- virtualized machines and servers e.g. VMware virtual machines, Amazon Elastic Beanstalk, e.g. Amazon EC2 and storages, e.g. Amazon S3, and the like etc.
- specialized processing and storage units e.g. Application Specific Integrated Circuits ASICs, field programmable gate arrays (FPGAs) and the like
- ASICs Application Specific Integrated Circuits
- FPGAs field programmable gate arrays
- the data communication network to implement the system and method of the present invention carries, transmit, receive, or transport data at the rate of 10 million bits or larger than 10 million bits per second;”
- “Furthermore the terms “storage device, “storage”, “memory”, and “computer-readable storage medium/media” refers to all types of no-transitory computer readable media such as magnetic cassettes, flash memories cards, digital video discs, random access memories (RAMSs), Bernoulli cartridges, optical memories, read only memories (ROMs), Solid state discs, and the like, with the sole exception being a transitory propagating signal.
- the disclosed frame work along with the algorithms and methods enables the people in various disciplines, such as artificial intelligence, robotics, information retrieval, search engines, knowledge discovery, genomics and computational genomics, signal and image processing, information and data processing, encryption and compression, business intelligence, decision support systems, financial analysis, market analysis, public relation analysis, and generally any field of science and technology to use the disclosed method/s of the investigation of the compositions of ontological subjects and the bodies of knowledge to arrive the desired form of information and knowledge desired with ease, efficiency, and accuracy. Since the disclosed underlying theory, methods and applications are universal it is worth to implement in the system of executing the methods and products directly on processing chips/devices to further increase the speed and reduce the cost of such investigations of compositions. In one instance, for example, the data processing operations (e.g.
- ASICS Application Specific Integrated Circuits
- FPGA Field-Programmable Gate Arrays
- system-on-chip any computing and data processing device manufacturing platforms and technologies, such as silicon based, HI-IV semiconductors, and quantum computing artifacts to name a few.
- computing and data processing device manufacturing platforms and technologies such as silicon based, HI-IV semiconductors, and quantum computing artifacts to name a few.
- the invention provides a unified and integrated method and systems for investigation of compositions of ontological subjects.
- the method can be implemented language independent and grammar free.
- the method is not based on the semantic and syntactic roles of symbols, words, or in general the syntactic role of the ontological subjects of the composition. This will make the method very process efficient, applicable to all types of compositions and languages, and very effective in finding valuable pieces of knowledge embodied in the compositions.
- Several valuable applications and services also were exemplified to demonstrate the possible implementation and the possible applications and services. These exemplified applications and services were given for illustration and exemplifications only and should not be construed as limiting application.
- the invention has broad implication and application in many disciplines that were not mentioned or exemplified herein but in light of the present invention's concepts, algorithms, methods and teaching, they becomes apparent applications with their corresponding systems to those familiar with the art.
- the system and method have numerous applications in knowledge discovery, knowledge visualization, content creation, signal, image, and video processing, genomics and computational genomics and gene discovery, finding the best piece of knowledge, related to a request for knowledge, from one or more compositions, artificial intelligence, realization of artificially or new intelligent begins, computer vision, computer or man/machine conversation, approximate reasoning, as well as many other fields of science and generally ontological subject processing.
- the invention can serve knowledge seekers, knowledge creators, inventors, discoverer, as well as general public to investigate and obtain highly valuable knowledge and contents related to their subjects of interests.
- the method and system thereby, is instrumental in increasing the speed and efficiency of knowledge retrieval, discovery, creation, learning, problem solving, and accelerating the rate of knowledge discovery to name a few.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Methods and systems are given for investigation of compositions of ontological subjects in accordance with various aspects of significance. Accordingly, the present invention provide a unified method and process of investigating the compositions of ontological subjects, modeling an unknown system, and obtaining as much worthwhile information and knowledge as possible about the system or the composition or the body of knowledge along with exemplary services utilizing such investigations.
Description
This application is a continuation in part of and claims the benefits of the U.S. patent application Ser. No. 13/608,333 entitled. “Methods and Systems For investigation Of Compositions of Ontological Subjects” filed on Sep. 10, 2012, which claims priority to U.S. provisional patent application No. 61/546,054 filed on Oct. 10, 2011 entitled the same.
This application also cross-references and claims the benefits of: the U.S. patent application Ser. No. 12/179,363 entitled “ASSISTED KNOWLEDGE DISCOVERY AND PUBLICATION SYSTEM AND METHOD”, filed on Jul. 24, 2008, which claims priority from Canadian Patent Application Ser. No CA 2,595,541, filed on Jul. 26, 2007, entitled the same; and
the U.S. patent application Ser. No. 13/789,644 filed on Mar. 7, 2013 which is a continuation of U.S. patent application Ser. No. 12/547,879 filed on Aug. 26, 2009, now U.S. Pat. No. 8,452,725, entitled “SYSTEM AND METHOD OF ONTOLOGICAL SUBJECT MAPPING FOR KNOWLEDGE PROCESSING APPLICATIONS”, which claims priority from the U.S. provisional patent application No. 61/093,952 filed on Sep. 3, 2008, entitled the same; and
the U.S. patent application Ser. No. 14/151,022 filed on Jan. 9, 2014 which is a continuation in part of the U.S. patent application Ser. No. 13/962,895, now U.S. Pat. No. 8,983,897, filed on Aug. 8, 2013, entitled “UNIFIED SEMANTIC RANKING OF COMPOSITIONS OF ONTOLOGICAL SUBJECTS” which is a divisional of and claims the benefit of the U.S. patent application Ser. No. 12/755,415, now U.S. Pat. No. 8,612,445, filed on Apr. 7, 2010, which claims priority from U.S. provisional patent application No. 61/177,696 filed on May 13, 2009 entitled: “System and Method for a Unified Semantic Ranking of Compositions of Ontological Subjects and the Applications Thereof”; and
the U.S. patent application Ser. No. 12/908,856 entitled “SYSTEM AND METHOD OF CONTENT GENERATION”, filed on Oct. 20, 2010, which claims priority from U.S. provisional application No. 61/253,511 filed on Oct. 21, 2009, entitled the same; and
the U.S. patent application Ser. No. 14/607,588, filed on Feb. 7, 2015, entitled “Association strengths and value significances of ontological subjects of networks and compositions” which is a divisional of and claims the benefits of the U.S. patent application Ser. No. 13/740,228, filed on Jan. 13, 2013, entitled “SYSTEM AND METHOD FOR VALUE SIGNIFICANCE EVALUATION OF ONTOLOGICAL SUBJECTS OF NETWORKS AND THE APPLICATION THEREOF” which is a divisional of and claims the benefits of the U.S. patent application Ser. No. 12/939,112, filed on Nov. 3, 2010, now U.S. Pat. No. 8,401,980, entitled “METHODS FOR DETERMINING CONTEXT OF COMPOSITIONS OF ONTOLOGICAL SUBJECTS AND THE APPLICATIONS THEREOF USING VALUE SIGNIFICANCE MEASURES (VSMS), CO-OCCURRENCES AND FREQUENCY OF OCCURRENCES OF THE ONTOLOGICAL SUBJECTS SYSTEM”, which claims priority from U.S. provisional application No. 61/259,640 filed on Nov. 10, 2009, entitled “SYSTEM AND METHOD FOR VALUE SIGNIFICANCE EVALUATION OF ONTOLOGICAL SUBJECTS OF NETWORKS AND THE APPLICATION THEREOF”; and
the U.S. patent application Ser. No. 14/018,102, filed on Sep. 4, 2014, which is a divisional of and claims the benefits of the U.S. patent application Ser. No. 12/946,838, filed on Nov. 15, 2010, now U.S. Pat. No. 8,560,599 B2 entitled: “AUTOMATIC CONTENT COMPOSITION GENERATION”, which claims priority from U.S. provisional application No. 61/263,685 filed on Nov. 23, 2009, entitled “Automatic Content Composition Generation”; and
the U.S. patent application Ser. No. 14/247,731, filed on May 11, 2014 which is a continuation of the U.S. patent application Ser. No. 12/955,496, filed on Nov. 29, 2010, now U.S. Pat. No. 8,775,365 entitled “INTERACTIVE AND SOCIAL KNOWLEDGE DISCOVERY SESSIONS” which claims priority from U.S. provisional patent application No. 61/311,368 filed on Mar. 7, 2010, entitled “Interactive and Social Knowledge Discovery Sessions”, all by the same applicant which are all incorporated entirely as references in this application.”
This invention generally relates to information processing, ontological subject processing, knowledge processing and discovery, computational genomics, knowledge retrieval, artificial intelligence, signal processing, information theory, natural language processing and the applications.
In these day and age that data is generated at an unprecedented rate it is very hard for a human operator to analyze large bodies of data in order to extract the real information, the knowledge therein, spot a novelty, and using them to further advance the state of knowledge or discovery of a real knowledge about a subject matter.
For example for any topic or subject there are vast amount of textual, or convertible to textual characters, repositories such as collection of research papers in any particular topic or subject, images, news feeds, interviews, talks, video collections, corporate databases, surveillance pictures and videos, and the like. Gaining any benefit from such unstructured collections of information needs lots of expertise, time, and many years of training just even to separate the facts and extract value out of these immense amounts of data. Not every piece of data is worthy of attention and investigation or investment of expensive times of experts and professionals or data processing resources.
Moreover, there is no guarantee that a human investigator or researcher can accurately analyze the vast collection of documents, data, and information. The results of the investigations are usually biased by the individual's knowledge, experiences, and background. The complexities of relations in the bodies of data limit the throughputs of knowledge-based professionals and the speed at which credible knowledge can be produced. The desired speed or rate of knowledge discovery apparently is much higher than the present rate of knowledge discovery and production.
There is a need to enhance the art of knowledge discovery and investigation methods in terms of accuracy, effectiveness on unknown compositions, thoroughness, speed, and throughput.
Additionally, in some instances, there could be compositions such as, an alien language composition, a body of knowledge unfamiliar to an individual investigator, a corporate database, a computer code program, a collection of reports, genetic code strings and the like that we do not have any prior information about the meaning and implications of these compositions and the parts therein. Investigating such compositions is of immense interest and value.
It is also very desirable to enable a data processing system, such as a computer system comprise of data processing or computing devices/units, data storage units/devices, and/or environmental data acquisitions units/devices, and/or data communication units/devices, and/or input/output units/devices, and/or limbs, to learn as much information and gain knowledge/data by processing compositions of data of various forms and/or become able to produce new knowledge and useful data or compositions of data and/or autonomous decision making according to some codes of conducts. Such an enabled machine would be of an immense assistance to the development of human civilization much further and much faster leading to abundance, economic prosperity, biological and mental health, and well-being of society in general.
Accordingly, the present invention discloses a systematic, computer implementable, process efficient and scalable method/s of investigation of all types of compositions of ontological subjects such as textual, data files, networks and graphs, genetic codes, any types of string, and the likes. The given methods, algorithms, and services are accompanied with theoretical modeling and mathematical formulations which, once implemented, results in robust and fundamental algorithms and processes for investigating various aspects of a composition and for numerous applications.
According to the teachings of the present invention any compositions of ontological subjects is viewed as an unknown system or system of knowledge that the purpose of the investigation is to obtain as much worthy information and knowledge about such an unknown system.
The present invention therefore investigate the “compositions of ontological subjects” or a “body of knowledge” or a “system of knowledge” (as are called from time to time in this disclosure) by providing the investigation methods for identifying the most significant constituent ontological subjects for a given body of knowledge or the given compositions in respect to one or more significance aspect/s. The significance aspects generally include the “intrinsic significance aspects” and/or “associational/relational significance aspects”.
In the general aspect of this invention, conceptual “measures of significances” are disclosed along with their rational and justifications. These conceptual “measures of significances” further are accompanied with systematic methods of calculation and quantifications of their values in order to provide the instrumental tools in implementations/utilization of the disclosed method/s of the investigation of compositions of ontological subjects. These measures are, for example, called “value significance measures” (VSM/s in short), “association strength measures” (or ASM for short), “novelty value significance measures” (or NVSM for short), and/or “relational/associational” type measures, and various combinations of them (referred herein as XY_VSM in general form) that are used to find and spot the “aspectual significant” parts or partitions of the composition for further investigation and/or further processing and/or presentation to a client.
According to one general embodiment of the disclosed method/s of the present invention, a composition of ontological subjects or a body of knowledge is break down to it's constituent ontological subjects which are grouped in different set which each set labeled with different orders, from which one or more array of data, respective of the information of the participations of the constituent ontological subjects of different orders into each other, are formed. The data therefore is used to evaluate various significance values of the constituent ontological subjects of the different order according to the disclosed measures of various aspects of significance.
Accordingly, in one aspect of the present invention, measure/s are given for valuation of “value significances” of the ontological subjects of the composition. These values are intrinsic values of the ontological subjects of the composition based on their significance role which is calculated from the participations pattern/s of the ontological subjects of the composition with each other.
In another aspect various measures of “association strength” are given from which the relations of ontological subjects of the composition can be revealed. Algorithms and formulations and calculation methods are given to evaluate such “association strength” according to various exemplary association aspects.
According to another aspect of the present invention measures are given for evaluating the “relational association strengths” of the ontological subjects of different orders to each other or to one or more target ontological subject.
According to another aspect of the present invention measures are given for evaluating the “relational value significances” of the ontological subjects of different orders to each other or to one or more target ontological subject.
According to another aspect of the invention, various types of measures are given to evaluate the “novelty value significances” of the ontological subjects of the composition or the body of knowledge. Method/s are, therefore, given for efficient calculations and processing and presentation of the results.
Accordingly, in yet another aspect of the invention, various measure of the “relational novelty value significances” are given for evaluating one type of the general “novelty value significances” in relation to one or more target ontological subjects of the composition or the body of knowledge.
According to yet another aspect of the invention various measure of the “associational novelty value significances” are given for evaluating another type of the general “novelty value significance” involving the association of one or more target ontological subjects of the composition or the body of knowledge.
According to yet another aspect of the invention various measure of the “intrinsic novelty value significances” are given for evaluating yet another type of “novel value significance” which is an intrinsic novelty value of one or more of ontological subjects of the composition or the body of knowledge.
According to another aspect of the invention, the values are assigned to a predetermined list of ontological subjects (e.g. one or more of the special words that usually are used to express a particular attribute such as a novelty or a reasoning or concluding remarks, such as ‘therefore, consequently, in spite of, . . . however, but, . . . etc.). These are called “special significance conveyers” to pre-selectedly amplify or dampen the significances of such special OSs of a composition in eth final output or result.
Furthermore, specific examples and general forms and methods are given as how to synthesize and/or shape a desired from of a “value significance measure” and how to build and calculate the respective filter for that “value significance measure” by combining one or more of the VSM vectors of one or more type or number of the XY-VSM.
These various “XY-value significance measures” then can be employed in many applications for which at least one “aspectual significance measure” is of interest and importance. Depends on the desired application one can use the applicable and desirable embodiments for the intended application such as web page ranking, document clustering, single and multi-document summarization/distillation, question answering, graphical representation of the compositions, context extraction and representation, knowledge discovery, novelty detection, composing new compositions, engineering new compositions, composition comparison, approximate reasoning, artificial intelligence, robotic, robotics vision, human/computer interaction, computer conversation, as well as other areas of science and technology such as genetic analysis and synthesize, signal processing, economics, marketing, customer care, and the like.
Along the disclosure, methods, formulations, and algorithms are given for efficient and versatile computer implementable evaluation of the various “value significance measures” of ontological subject of different orders used in a system of knowledge. In essence, using the participation information of a set of lower order OSs into a set of the same or higher order OSs, the present invention provide a unified method and process of investigating the compositions of ontological subjects, modeling an unknown system, and obtaining as much worthwhile information and knowledge as possible about the system or the composition or the body of knowledge. The “aspectual investigation's goals” can be wide-open, however, in light of the teachings of the present invention becomes a straightforward, implementable, and practical possibility.
Accordingly, in another aspect of the invention, a number of exemplary applications are described and presented with the illustrating block diagrams of the method and algorithm along with the associated systems for performing such applications. These applications and systems are presented to exemplify the way that the present invention's methods of investigations might be employed to perform one or more of the desired processes to get the respective output or the content, answer, data, graphs, analysis, etc.
Therefore beside that an ontological subjects of a composition is not only represented by a string of characters but also there would be additional vast information available for the ontological subject corresponding to its type/s of significance and relationship with other ontological subjects of the composition. Said additional information or data is learnt, through implementing the methods of current disclosure and the incorporated references herein, from the ways these ontological subjects being used or composed together to make up a composition or more generally to form a body of knowledge.
These information, data, or values of different objects of this disclosure (e.g. association strength measures, significance measure etc.) are placed in one or more data structures which can be representative of data arrays corresponding to vectors or matrix for convenience of calculations by data processing devices. The data processing devices to carry out the calculations, storing, and data transportation between the various part of one or more computer systems can be selected from such technologies such as electronic or optical based processors, semiconductor based or quantum computers, application specific processing devices and the like. Different embodiments are given for ease of calculations and processing the data of said one or more data structures or vectors or matrices than can be implement with information, computing, or data processing systems of certain processing speeds and/or storage media access speed and capacities such as certain RAM capacity, SSD, HD, and/or optical memories and the like with required access time.
In this way the implicit information not recognizable, useable, or appreciable by a human (due to inherent biological limitations) can be extracted, stored and become useable by a data processing system or machine. Said data processing system or machine therefore will become able to use its superior processing speed and unmatched, by human, memory capacity or environmental data acquisition capabilities, to perform intelligent tasks. Examples of such intelligent tasks could be, but of course not limited to, conversing intelligently or evaluating a merit of a composition, recognizing visual objects, DNA analysis, knowledge discovery, automatic research and discovery, or composing an essay or a multimedia content, decision making, automatic knowledge discovery, controlling physical action/reaction of a machine to its limbs, management of tasks and sessions, autonomous navigation, and in general such tasks that currently can only be done by human being. Intelligent beings (or artificially intelligent beings) of various kinds, technologies, and forms, (e.g. a humanoid robot maid, a genetically modified being, a transportation intelligent beings such a an autonomous car or an autonomous agricultural machine, a robotic explorer, etc.), are exemplary beneficiaries of implementing and employing the methods and systems of the current disclosure.
Further, in another aspect, the invention provides data processing systems comprising computer hardware, software, internet infrastructure, and other customary appliances of an E-business, cloud computing, distributed networks, and services to perform and execute said methods in providing a variety of services for a client/user's desired applications or to provide a needed or requested data to a human/agent client.
A system of knowledge, here, means a composition or a body of knowledge in any field, narrow or wide, composed of data symbols such as alphabetical/numerical characters, any array of data, binary or otherwise, or any string of data etc. In this disclosure, however, for the sake and ease of explanation and comprehension, we mostly exemplify the compositions and bodies of knowledge with those that are expressed in natural language symbols with textual characters
Accordingly, for instance a system of knowledge can be defined about the process of stem cell differentiation. In this example there are many unknowns that are desired to be known. So consider someone has collected many or all textual compositions about this subject. Apparently the collections contains many useful information about the subject that are important but can easily be overlooked by a human due to the limitations of processing capability and memory capacity of individuals' brains.
Another example of a body of knowledge according to the given definitions is a picture or a video signal. A picture or a video frame is consists of colored pixels that have participated in a picture to form and convey the information about the picture. Apparently some colored pixels of the picture are more significant or play a more distinguishing role in that picture. Moreover their combination or the way or the pattern that they participate together in any small parts or segments of that picture are also important in the way the pixels are conveying the information about the picture to an observer's eyes or a camera.
Yet example of a composition or a body of knowledge could be a string of genetic codes, a DNA string, or a DNA strand, a whole genome, and the like.
Moreover any system, simple or complicated, can be identified and explained by its constituent parts and the relation between the parts. Additionally, any system or body of knowledge can also be represented by network/s or graph/s that shows the connection and relations of the individual parts of the system. The more accurate and detailed the identification of the parts and their relations the better the system is defined and designed and ultimately the better the corresponding tangible systems will function. Most of the information about any type of existing or new systems can be found in the body of many textual compositions. Nevertheless, these vast bodies of knowledge are unstructured, dispersed, and unclear for non expert in the field.
In the present invention, the purpose of the investigation is to model and gain as much information and knowledge about an unknown system comprised of ontological subjects while the source of the information about such a system is a given composition of ontological subjects wherein the composition is readable by a computer. Therefore, some information about such an unknown system is supposedly embedded in a body of knowledge or system of knowledge or generally in the given composition. The investigator, hence, will have to be able to capture or produce as much knowledge about the system from the information in the given composition.
Consequently, according to the present disclosure, the investigation is performed according to at least one significant/important aspect in the investigation of bodies of knowledge (i.e. compositions).
The “investigation important aspect” can, for example, be one or more of the following goals:
1. identifying and recognizing the most significant constitutes parts of the bodies of knowledge according to at least one “significance aspect”,
2. identifying the associated constituent parts of the bodies of knowledge, and
3. identifying and/or finding (through discovery and/or reasoning) the informative constituent parts and informative combinations of the constituent parts of the composition by, for example, finding or composing the expressions that show a relationship between two or more of constituent parts of the bodies of knowledge.
Each of these “important aspect” or stages (1, 2, and 3 in the above) of the investigation, of course, can further be break down to two or more stages or steps or be combined together to perform a desirable investigation goal or to define the “investigation important aspect”.
For instance, according to one exemplary investigation method embodiment of the present invention, the “investigation important aspect” is to identify a relationship between two or more significant parts of the composition, the investigator may perform the following:
-
- 1. identifying the most significant constituent part/s,
- 2. identifying the associated constituent parts of the bodies of knowledge, and
- 3. finding or composing expressions that express the relationship between one or more significant parts having certain level of association to one or more of other significant parts.
Therefore depends on the goal of the investigation the “investigation important aspect” can be defined and performed in more detailed processes. The present invention gives a number of such investigation goals and the methods of achieving the desired outcome. Moreover, the present invention provides a variety of tools and investigation methods that enables a user to deal with investigation of compositions of ontological subjects for any kind of goals and any types of the composition.
As defined along this disclosure as well as the incorporated references herein, the constituent parts of the bodies of knowledge are called “Ontological Subjects” (OS). The ontological subjects further are grouped into different sets labeled with orders as will be explained in the definition of section of this disclosure too.
The “significance aspects”, based on which the significances of the OSs of compositions are defined and calculated, are various that can be looked at. For instance one “significance aspect” could be an intrinsic significance of an OS which shows the overall or intrinsic significance of an OS in a body of knowledge. Another significance aspect is considered to be a significant aspect in relation or relative to one or more of the OSs of the body of knowledge.
Yet another significance aspect is considered to be an intrinsic novelty value of an OS in a body of knowledge or a composition. And yet another significance aspect is defined as a relative or relational novelty value of an OS related to one or more of the OSs of the body of knowledge or a composition.
Many other desirable significance aspect might be defined by different people depends on the application and the goal of the investigation of a composition or a body of knowledge. Also any combinations of such significance aspects can be regarded as a significance aspect.
Accordingly a “significance aspect” is the orientation that one can use to reason on how to put a significance value on an ontological subject of a composition or a body of knowledge.
In other words, a “significance aspect” is a qualitative quality that can polarize or differentiate the ontological subjects and be used to define “value significance measures” and consequently suggest or construct various value functions or significance weighting functions on the ontological subjects of a composition or a body of knowledge.
These functions, individually or in combination, therefore can be employed and utilized to spot and/or filter out the one or more ontological subjects of a composition or a body of knowledge for different purposes and applications or generally for investigation of bodies of knowledge.
For instance and in accordance with one aspect of the present disclosure, for the purpose of investigation of the compositions of ontological subjects, a general form of evaluating “value significances” of the ontological subjects of a composition or a body of knowledge or a network is given along with a number of exemplified such value significances and their applications. Such investigation method/s will speed up the research process and knowledge discovery, and design cycles by guiding the users to know the substantiality of each part in the system. Consequently dealing with all parts of the system based on the value significance priority or any other predetermined criteria can become a systematic process and more yielding to automation.
As will be explained in the next section, having constructed one or more arrays of data indicative of relations of constituent part, it will become necessary and desirable to spot the significant part and/or separate the parts that their significance is defined in relation to a target part. Thereby relational value significances are defined here. The relational value significances are instrumental in clustering a collection of composition or clustering partitions of composition in regards to one or more of a target OS or the parts of the system of knowledge.
Furthermore exemplary algorithms and systems are given to be used for providing the respective data and/or such application/s as one or more services to the computer program agents as well as human users.
Application of such methods and systems of investigations of compositions of ontological subjects would be very many and various. For example lets say after or before a conference, with many expert participants and many presented papers, one wants to compare the submitted contributing papers, draw some conclusions, and/or get the direction for future research or find the more important subjects to focus on, he or she could use the system, employing the disclosed methods, to find out the value significance of each concept along with their most important associations and interrelations. This is not an easy task for the individuals who do not have many years of experience and a deep and wide breadth of knowledge in the respective domain of knowledge.
Or consider a market research analyst who is assigned to find out the real value of an enterprise by researching the various sources of information. Or rank an enterprise among its competitors by identifying the strength and weakness of the enterprise constituent parts or partitions. Or in another instance an enterprise, a blogger, a website owner, a content publisher, or a Facebook subscriber wants to find out the most valuable or the most interesting contents, comments, or any parts of such discussions. The investigation method of the present invention therefore can provide such information and knowledge with high confidence.
Many other consecutive applications such as searching engines, question answering, summarization, categorization, distillation, computer conversing, artificial intelligence, genetics, etc. can be performed, enhanced, and benefit from having an estimation of the various “value significances” of the partitions of the body of knowledge and a through investigation method of such compositions.
In order to describe the disclosure in details we first define a number of terms that are used frequently throughout this description. For instance, the information bearing symbols are called Ontological Subjects and are defined herein below, along with others terms, in the definitions sections.
This disclosure uses the definitions that were introduced in the U.S. patent application Ser. No. 12/755,415 filed on Apr. 7, 2010, and Ser. No. 12/939,112 filed on Nov. 3, 2010, which are incorporated herein as references, and are recited here again along with more clarifying points according to their usage in this disclosure and the mathematical formulations herein.
1. ONTOLOGICAL SUBJECT: symbol or signal referring to a thing (tangible or otherwise) worthy of knowing about. Therefore Ontological Subject means generally any string of characters, but more specifically, characters, letters, numbers, words, binary codes, bits, mathematical functions, sound signal tracks, video signal tracks, electrical signals, chemical molecules such as DNAs and their parts, or any combinations of them, and more specifically all such string combinations that indicates or refer to an entity, concept, quantity, and the incidences of such entities, concepts, and quantities. In this disclosure Ontological Subject/s and the abbreviation OS or OSs are used interchangeably.
2. ORDERED ONTOLOGICAL SUBJECTS: Ontological Subjects can be divided into sets with different orders depends on their length, attribute, and function. Basically the order is assigned to a group or set of ontological subjects having at least one common predefined attribute, property, attribute, or characteristic. Usually the orders in this disclosure are denoted with alpha numerical characters such as 0, 1, 2, etc or OS1, OS2, etc. or any other combination of characters so as to distinguish one group or set of ontological subjects, having at least one common predefined characteristic, with another set or group of ontological subjects having another at least one common characteristic. This order/s will also be reflected in denoting/corresponding the data objects or the mathematical objects in the formulations to distinguish these data objects in relation to their corresponding ontological subject set or its order, as will be used and introduced throughout this disclosure. For instance, for ontological subjects of textual nature, one may characterizes or label letters as zeroth order OS, words or multiple word phrases as the first order, sentences or multiple word phrases as the second order, paragraphs as the third order, pages or chapters as the fourth order, documents as the fifth order, corpuses as the sixth order OS and so on. As seen the order can be assigned to a group or set of ontological subjects based on at least one common predefined characteristic of the members of the set. So a higher order OS is a combination of, or a set of, lower order OSs or lower order OSs are members of a higher order OS. Equally one can order the genetic codes in different orders of ontological subjects. For instance, the 4 basis of a DNA molecules as the zeroth order OS, the base pairs as the first order, sets of pieces of DNA as the second order, genes as the third order, chromosomes as the fourth order, genomes as the fifth order, sets of similar genomes as the sixth order, sets of sets of genomes as the seventh order and so on. Yet the same can be defined for information bearing signals such as analogue and digital signals representing audio or video information. For instance for digital signals representing a signal, bits (electrical One and Zero) can be defined as zeroth order OS, the bytes as first order, any sets of bytes as third order, and sets of sets of bytes, e.g. a frame, as fourth order OS and so on. Yet in another instance for a picture or a video frame, the pixels with different color can be regarded as first order OS, a set whose members contain two or more number of pixels (e.g. a segment of a picture) can be regarded as OSs of second order, a set whose members contain of two or more such segments as third order OS, a whole frame as forth order OS, and a number of frames (like a certain period of duration of a movie such as a clip) as fifth order and so on. Therefore definitions of orders for ontological subjects are arbitrary set of initial definitions that one can stick to in order to make sense of the methods and mathematical formulations presented herein and being able to interpret the consequent results or outcomes in more sensible and familiar language.”
-
- More importantly Ontological Subjects can be stored, processed, manipulated, and transported by transferring, transforming, and using matter or energy (equivalent to matter) and hence the OS processing is an instance of physical transformation of materials and energy.
3. COMPOSITION: is an OS composed of constituent ontological subjects of lower or the same order, particularly text documents written in natural language documents, genetic codes, encryption codes, data files, voice files, video files, and any mixture thereof. A collection, or a set, of compositions is also a composition. Therefore a composition is in fact an Ontological Subject of particular order which can be broken to lower order constituent Ontological Subjects. In this disclosure, the preferred exemplary composition is a set of data containing ontological subjects, for example a webpage, papers, documents, books, a set of webpages, sets of PDF articles, multimedia files, or even simply words and phrases. Moreover, compositions and bodies of knowledge are basically the same and are used interchangeably in this disclosure. Compositions are distinctly defined here for assisting the description in more familiar language than a technical language using only the defined OSs notations.
4. PARTITIONS OF COMPOSITION: a partition of a composition, in general, is a part or whole, i.e. a subset, of a composition or collection of compositions. Therefore, a partition is also an Ontological Subject having the same or lower order than the composition as an OS. More specifically in the case of textual compositions, parts or partitions of a composition can be chosen to be characters, words, sentences, paragraphs, chapters, webpage, documents, etc. A partition of a composition is also any string of symbols representing any form of information bearing signals such as audio or videos, texts, DNA molecules, genetic letters, genes, and any combinations thereof. However one preferred exemplary definition of a partition of a composition in this disclosure is word, sentence, paragraph, page, chapters, documents, sets of documents, and the like, or WebPages, and partitions of a collection of compositions can moreover include one or more of the individual compositions. Partitions are also distinctly defined here for assisting the description in more familiar language than a technical language using only the general OSs definitions.
5. SIGNIFICANCE MEASURE: assigning a quantity, or a number or feature or a metric for an OS from a set of OSs so as to assist to distinguishing or selecting one or more of the OSs from the set. More conveniently and in most cases the significance measure is a type of numerical quantity assigned to a partition of a composition. Therefore significance measures are functions of OSs and one or more of other related mathematical objects, wherein a mathematical object can, for instance, be a mathematical object containing information of participations of OSs in each other, whose values are used in the decisions about the constituent OSs of a composition. For instance, “Relational, and/or associational, and/or novel significances” are one form or a type of the general “significance measures” concept and are defined according to one or more the aspect of interest and/or in relation to one or more OSs of the composition.
6. FILTRATION/SUMMARIZATION: is a process of selecting one or more OS from one or more sets of OSs according to predetermined criteria with or without the help of value significance and ranking metric/s. The selection or filtering of one or more OS from a set of OSs is usually done for the purposes of representation of a body of data by a summary as an indicative of that body in respect to one or more aspect of interest. Specifically, therefore, in this disclosure searching through a set of partitions or compositions, and showing the search results according to the predetermined criteria is considered a form of filtration/summarization. In this view finding an answer to a query, e.g. question answering, or finding a composition related or similar to an input composition etc. is also a form of searching through a set of partitions and therefore are a form of summarization or filtration according to the given definitions here.
7. THE USAGE OF QUOTATION MARKS “ ”: throughout the disclosure several compound names of concepts, variable, functions and mathematical objects and their abbreviations (such as “participation matrix”, or PM for short, “Co-Occurrence Matrix”, or COM for short, “value significance measure”, or VSM for short, and the like) will be introduced, either in singular or plural forms, that once or more is being placed between the quotation marks (“ ”) for identifying them as one object (or a regular expression that is used in this disclosure frequently) and must not be interpreted as being a direct quote from the literatures outside this disclosure.”
8. UNIVERSES OF COMPOSITIONS: Universe: in this disclosure “universe” is frequently used and have few intended interpretation: when “universe x” (x is a number or letter or word or combination thereof) is used it mean the universe of one or more compositions, that is called x, and contains none, one or more ontological subjects. By “real universe” or “our universe” we mean our real life universe including everything in it (physical and its notions and/or so called abstract and its notions) which is the largest universe intended and exist. Furthermore, “universal” refers to the real universe.
Furthermore, in the following description, numerous specific details are set forth in order to provide a thorough understanding of the present embodiments. It will be apparent, however, to one having ordinary skill in the art that the specific detail need not be employed to practice the present embodiments. In other instances, well-known materials or methods have not been described in detail in order to avoid obscuring the present embodiments.
-
- 1. Reference throughout this specification to “one embodiment”, “an embodiment”, “one example” or “an example” means that a particular feature, structure or characteristic described in connection with the embodiment or example is included in at least one embodiment of the present embodiments. Thus, appearances of the phrases “in one embodiment”, “in an embodiment”, “for instance”, “one example” or “an example” in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures or characteristics may be combined in any suitable combinations and/or sub-combinations in one or more embodiments or examples. In addition, it is appreciated that the figures provided herewith are for explanation purposes to persons ordinarily skilled in the art and that the drawings are not necessarily drawn to scale.
- 2. Embodiments in accordance with the present embodiments may be implemented as an apparatus, method, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “module” or “system.” Furthermore, the present embodiments may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
- 3. Any combination of one or more computer-usable or computer-readable media may be utilized. For example, a computer-readable medium may include one or more of a portable computer diskette, a hard disk, a random access memory (RAM) device, a read-only memory (ROM) device, an erasable programmable read-only memory (EPROM or Flash memory) device, a portable compact disc read-only memory (CDROM), an optical storage device, and a magnetic storage device. Computer program code for carrying out operations of the present embodiments may be written in any combination of one or more programming languages.
- 4. Embodiments may also be implemented in cloud computing environments. In this description and the following claims, “cloud computing” may be defined as a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned via virtualization and released with minimal management effort or service provider interaction, and then scaled accordingly. A cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, etc.), service models (e.g., Software as a Service (“SaaS”), Platform as a Service (“PaaS”), Infrastructure as a Service (“IaaS”), and deployment models (e.g., private cloud, community cloud, public cloud, hybrid cloud, etc.).
- 5. The flowchart and block diagrams in the flow diagrams illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- 6. As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having,” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, article, or apparatus.
- 7. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
- 8. Additionally, any examples or illustrations given herein are not to be regarded in any way as restrictions on, limits to, or express definitions of any term or terms with which they are utilized. Instead, these examples or illustrations are to be regarded as being described with respect to one particular embodiment and as being illustrative only. Those of ordinary skill in the art will appreciate that any term or terms with which these examples or illustrations are utilized will encompass other embodiments which may or may not be given therewith or elsewhere in the specification and all such embodiments are intended to be included within the scope of that term or terms. Language designating such nonlimiting examples and illustrations includes, but is not limited to: “for example,” “for instance,” “e.g.,” and “in one embodiment.”
Now the invention is disclosed in details in reference to the accompanying Figures and exemplary cases and embodiments in the following subsections.
The methods and systems that are devised here is to solve the proposed problem of investigating compositions of ontological subjects through algorithmic manipulating and assigning and calculating various “value significance” quantities to the constituent ontological subjects of a composition or a network of ontological subjects. It is further to disclose the methods of measuring the significance of the value/s so that the right “Value Significance Measure/s (VSM)”, can be defined, synthesized, and be calculated for a desired aspect of investigation and be used for further processing of many related applications or other measures.
The methods and systems of the present invention and can be used for applications ranging from document classification, search engine document retrieval, news analysis, knowledge discovery and research trajectory optimization, question answering, computer conversation, spell checking, summarization, categorizations, categorization, clustering, distillation, automatic composition generation, genetics and genomics, signal and image processing, to novel applications in economical systems by evaluating a value for economical entities, crime investigation, financial applications such as financial decision making, credit checking, decision support systems, stock valuation, target advertising, and as well measuring the influence of a member in a social network, and/or any other problem that can be represented by graphs and for any group of entities with some kind of relations or association.
Although the methods are general with broad applications, implications, and implementation strategies and technique, the disclosure is described by way of specific exemplary embodiments to consequently describe the methods, implications, and applications in the simplest forms of embodiments and senses.
Also since most of human knowledge and daily information production is recorded in the form of text (or it can be converted or represented with textual/numerical characters) the detailed description is focused on textual compositions to illustrate the teachings and the methods and the systems. In what follows the invention is described in several sections and steps which in light of the previous definitions would be sufficient for those ordinary skilled in the art to comprehend and implement the methods, the systems and the applications thereof. In the following section we first set the mathematical foundation of the disclosed method from where we launch into introducing several “value significance measures” (VSMs) and ways of calculating them and their applications.
We explain the method/s and the algorithms with the step by step formulations that is easy to implement by those of ordinary skilled in the art and by employing computer programming languages and computer hardware systems that can be optimized or customized by build or design of hardware to perform the algorithm efficiently and produce useful outputs for various desired applications.
Assuming we have an input composition of ontological subjects, e.g. an input text, the “Participation Matrix” (PM) is a matrix indicating the participation of one or more ontological subjects of particular order in one or more partitions of the composition. In other words in terms of our definitions, PM indicate the participation of one or more lower order OS into one or more OS of higher or the same order. PM/s are the most important array of data in this disclosure that contains the raw information from which many other important functions, information, features, and desirable parameters can be extracted. Without intending any limitation on the value of PM entries, in the exemplary embodiments throughout most of this disclosure (unless stated otherwise) the PM is a binary matrix having entries of one or zero and is built for a composition or a set of compositions as the following:
1. break the composition to desired numbers of partitions. For example, for a text document, break the documents into chapters, pages, paragraphs, lines, and/or sentences, words etc. and assign an order number (e.g. 0, 1, 2, 3 . . . etc) to any set of similar partitions, i.e. the ordered ontological subjects,
2. select a desired N number of OSs of order k and a desired M number of OSs of order l (these OSs are usually the partitions of the composition from the step 1) according to certain predetermined criteria, and;
3. construct a N×M matrix in which the ith raw (Ri) is a vector (e.g. a binary vector), with dimension M, indicating the presence of the ith OS of order k, (often extracted from the composition under investigation), in the OSs of order l, (often extracted from the composition under investigation or sometimes from another referenced composition), by having a nonzero value, and not present by having the value of zero.
We call this matrix the “Participation Matrix” (usually a binary matrix) of the order kl (PMkl) which can be represented as:
where OSp k is the pth OS of the kth order (p=1 . . . N), OSq l is the qth OS of the lth order (q=1 . . . M), usually extracted from the composition, and, according to one embodiment of this invention, PMpq kl=1 if OSp k have participated, i.e. is a member, in the OSq l and 0 otherwise. The desired criteria, in the step 2 above, can be, for instance, to only select the content words or select certain partitions having certain length or, in another instance, selecting all and every word or character strings and/or all the partitions.
The participating matrix of order lk, i.e. PMlk, can also be defined which is simply the transpose of PMkl whose elements are given by:
PMpq lk=PMqp kl (2).
Accordingly without limiting the scope of invention, the description is given by exemplary embodiments using the general participation matrix of the order kl, i.e the PMkl in which k≦l.
Furthermore PM carries much other useful information. For example using binary PMs, one can obtain a participation matrix in which the entries are the number of time that a particular OS (e.g. a word) is being repeated in another partitions of particular interest (e.g. in a document) one can readily do so by, for instance, the following:
PM_R 15=PM12×PM25 (3)
wherein the PM_R15 stands for participation matrix of OSs of order 1 (e.g. words) into OSs of order 5 (e.g. the documents) in which the nonzero entries shows the number of time that a word has been appeared in that document (however the possible repetition of a word in an OS of order 2, e.g. sentences, will not be accounted for here). Another applicable example is using PM data to obtain the “frequency of occurrences” of ontological subjects in a given composition by:
FOi k|l=Σjpmij kl (4)
wherein the FOi k|l is the frequency of occurrence of OSs of order k, i.e. OSi k, in the OSs of order l, i.e. the OSl. The latter two examples are given to demonstrate on how one can conveniently use the PM and the disclosed method/s to obtain many other desired data or information.
PM_R 15=PM12×PM25 (3)
wherein the PM_R15 stands for participation matrix of OSs of order 1 (e.g. words) into OSs of order 5 (e.g. the documents) in which the nonzero entries shows the number of time that a word has been appeared in that document (however the possible repetition of a word in an OS of order 2, e.g. sentences, will not be accounted for here). Another applicable example is using PM data to obtain the “frequency of occurrences” of ontological subjects in a given composition by:
FOi k|l=Σjpmij kl (4)
wherein the FOi k|l is the frequency of occurrence of OSs of order k, i.e. OSi k, in the OSs of order l, i.e. the OSl. The latter two examples are given to demonstrate on how one can conveniently use the PM and the disclosed method/s to obtain many other desired data or information.
More importantly, from PMkl one can arrive at the “Co-Occurrence Matrix” COMk|l for OSs of the same order as follow:
COMk|l=PMkl*(PMkl)T (5),
where the “T” and “*” show the matrix transposition and multiplication operation respectively. The COM is a N×N square matrix. This is the co-occurrences of the ontological subjects of order k in the partitions (ontological subjects of order l) within the composition and is one indication of the association of OSs of order k evaluated from their pattern of participations in the OSs of order l of the composition. The co-occurrence number is shown by comij k|l which is an element of the “Co-Occurrence Matrix (COM)” and (in the case of binary PMs) essentially showing that how many times OSi k and OSj k has participated jointly into the selected OSs of the order l of the composition. Furthermore, COM can also be made binary, if desired, in which case only shows the existence or non-existence of a co-occurrence between any two OSk.
COMk|l=PMkl*(PMkl)T (5),
where the “T” and “*” show the matrix transposition and multiplication operation respectively. The COM is a N×N square matrix. This is the co-occurrences of the ontological subjects of order k in the partitions (ontological subjects of order l) within the composition and is one indication of the association of OSs of order k evaluated from their pattern of participations in the OSs of order l of the composition. The co-occurrence number is shown by comij k|l which is an element of the “Co-Occurrence Matrix (COM)” and (in the case of binary PMs) essentially showing that how many times OSi k and OSj k has participated jointly into the selected OSs of the order l of the composition. Furthermore, COM can also be made binary, if desired, in which case only shows the existence or non-existence of a co-occurrence between any two OSk.
The importance of the “co-occurrence matrix” as defined in this disclosure is that it carries or contain the information of relationship and associations of the OSs of the composition which is further utilized in some embodiments of the present invention.
It should be noticed that the co-occurrences of ontological subjects can also be obtained by looking at, for instance, co-occurrences of a pair of ontological subject within certain (i.e. predefined) proximities in the composition (e.g. counting the number of times that a pair of ontological subjects have co-occurred within certain or predefined distances from each other in the composition) as was used in the incorporated reference the U.S. patent application Ser. No. 12/179,363. Similarly there are other ways to count the frequency of occurrences of an ontological subjects (i.e. the FOi k|l). However the preferred embodiment is an efficient way of calculating these quantities or objects and should not be construed as the only way implementing the teachings of the present invention. The repeated co-occurrences of a pair of ontological subjects within certain proximities is an indication of some sort of association (e.g. a logical relationship) between the pair or else it would have made no sense to use them together in one or more partitions of the composition.
Those skilled in the art can store the information of the PMs, and also other mathematical objects of the present invention, in equivalent forms without using the notion of a matrix. For example each raw of the PM can be stored in a dictionary, or the PM be stored in a list or lists in list, or a hash table, or a SQL database, or any other convenient objects of any computer programming languages such as Python, C, Perl, Java, etc. Such practical implementation strategies can be devised by various people in different ways. Moreover, in the preferred exemplary embodiments the PM entries are binary for ease of manipulation and computational efficiency.
However, in some applications it might be desired to have non-binary entries so that to account for partial participation of lower order ontological subjects into higher orders, or to show or preserve the information about the location of occurrence/participation of a lower order OS into a higher order OSs, or to account for a number of occurrences of a lower OS in a higher OS etc., or any other desirable way of mapping/converting or conserving some or all of the information of a composition into a participation matrix. In light of the present disclosure such cases can also be readily dealt with, by those skilled in the art, by slight mathematical modifications of the disclosed methods herein.
Furthermore, as pointed out before, those skilled in the art can store, process or represent the information of the data objects of the present application (e.g. list of ontological subjects of various order, list of subject matters, participation matrix/ex, association strength matrix/ex, and various types of associational, relational, novel, matrices, various value significance measures, co-occurrence matrix, participation matrices, and other data objects introduced herein) or other data objects as introduced and disclosed in the incorporated references (e.g. association value spectrums, value significance measures, ontological subject map, ontological subject index, list of authors, and the like and/or the functions and their values, association values, counts, co-occurrences of ontological subjects, vectors or matrix, list or otherwise, and the like etc.) of the present invention in/with different or equivalent data structures, data arrays or forms without any particular restriction.
For example the PMs, ASMs, OSM or co-occurrences of the ontological subjects etc. can be represented by a matrix, sparse matrix, table, database rows, no sql databases, JSON, dictionaries and the like which can be stored in various forms of data structures. For instance each part, section, or any subset of the objects of the current disclosure such as a PM, ASM, OSM, RNVSM, NVSM, and the like or the ontological subject lists and index, or knowledge database/s can be represented and/or stored in one or more data structures such as one or more dictionaries, one or more cell arrays, one or more row/columns of an SQL database, or by any implementation of No SQL database/s of different technologies or methods etc., one or more filing systems, one or more lists or lists in lists, hash tables, tuples, string format, zip format, sequences, sets, counters, JSON, or any combined form of one or more data structure, or any other convenient objects of any computer programming languages such as Python, C, Perl, Java, JavaScript etc. Such practical implementation strategies can be devised by various people in different ways.
The detailed description, herein, therefore describes exemplary way(s) of implementing the methods and the system of the present invention, employing the disclosed concepts. They should not be interpreted as the only way of formulating the disclosed concepts, algorithms, and the introducing mathematical or computer implementable objects, measures, parameters, and variables into the corresponding physical apparatuses and systems comprising data/information processing devices and/or units, storage device and/or computer readable storage media, data input/output devices and/or units, and/or data communication/network devices and/or units, etc.
The processing units or data processing devices (e.g. CPUs) must be able to handle various collections of data. Therefore the computing or data processing units to implement the system have compound processing speed equivalent of one thousand million or larger than one thousand million instructions per second and a collective memory, or storage devices (e.g. RAM), that is able to store large enough chunks of data to enable the system to carry out the task and decrease the processing time significantly compared to a single generic personal computer available at the time of the present disclosure.”
The data/information processing or the computing system that is used to implement the method/s, system/s, and teachings of the present invention comprises storage devices with more than 1 (one) Giga Byte of RAM capacity and one or more processing device or units (i.e. data processing or computing devices, e.g. the silicon based microprocessor, quantum computers etc.) that can operate with clock or instruction speeds of higher than 1 (one) Giga Hertz or with compound processing speeds of equivalent of one thousand million or larger than one thousand million instructions per second (e.g. an Intel Pentium 3, Dual core, i3, i7 series, and Xeon series processors or equivalents or similar from other vendors, or equivalent processing power from other processing devices such as quantum computers utilizing quantum computing devices and units) are used to perform and execute the method once they have been programmed by computer readable instruction/codes/languages or signals and instructed by the executable instructions. Additionally, for instance according to another embodiment of the invention, the computing or executing system includes or has processing device/s such as graphical processing units for visual computations that are for instance, capable of rendering, synthesizing, and demonstrating the content (e.g. audio or video or text) or graphs/maps of the present invention on a display (e.g. LED displays and TV, projectors, LCD, touch screen mobile and tablets displays, laser projectors, gesture detecting monitors/displays, 3D hologram, and the like from various vendors, such as Apple, Samsung, Sony, or the like etc.) with good quality (e.g. using a NVidia graphical processing units).
Also the methods, teachings and the application programs of the presents invention can be implement by shared resources such as virtualized machines and servers (e.g. VMware virtual machines, Amazon Elastic Beanstalk, e.g. Amazon EC2 and storages, e.g. Amazon S3, and the like etc. Alternatively specialized processing and storage units (e.g. Application Specific Integrated Circuits ASICs, field programmable gate arrays (FPGAs) and the like) can be made and used in the computing system to enhance the performance and the speed and security of the computing system of performing the methods and application of the present invention.
Moreover several of such computing systems can be run under a cluster, network, cloud, mesh or grid configuration connected to each other by communication ports and data transfers apparatuses such as switches, data servers, load balancers, gateways, modems, internet ports, databases servers, graphical processing units, storage area networks (SANs) and the like etc. The data communication network to implement the system and method of the present invention carries, transmit, receive, or transport data at the rate of 10 million bits or larger than 10 million bits per second;”
Furthermore the terms “storage device, “storage”, “memory”, and “computer-readable storage medium/media” refers to all types of no-transitory computer readable media such as magnetic cassettes, flash memories cards, digital video discs, random access memories (RAMSs), Bernoulli cartridges, optical memories, read only memories (ROMs), Solid state discs, and the like, with the sole exception being a transitory propagating signal.”
The detailed description, herein, therefore uses a straightforward mathematical notions and formulas to describe exemplary ways of implementing the methods and should not be interpreted as the only way of formulating the concepts, algorithms, and the introduced measures and applications. Therefore the preferred or exemplary mathematical formulation here should not be regarded as a limitation or constitute restrictions for the scope and sprit of the invention which is to investigate the bodies of knowledge and compositions with systematic detailed accuracy and computational efficiency and thereby providing effective tools, products and application in knowledge discovery, scoring/ranking, decision making, navigation, conversing, man/Machine collaboration and interaction, filtering or modification of partitions of a body of knowledge, string processing, information processing, signal processing and the like.
Having constructed the PMkl, we now launch to explain the methods of defining and evaluating the “value significances” of the ontological subjects of the compositions for various important measures of significance. One of the advantages and benefits of transforming the information of a composition into participation matrices is that once we attribute something to the OSs of particular order then we can evaluate the merit of OSs of another order in regards to that attribute using the PMs. For instance, if we find words of particular importance in a textual composition then we can readily find the most important sentences of the composition wherein the most important sentences contain the most important words in regards to that particular importance measure or aspect. Moreover, as will be shown, the calculations become straightforward, language independent and computationally very efficient making the method practical, accurate to the extent of our definitions, and scalable in investigating large volumes of data or large bodies of knowledge.
The investigation method/s and the algorithm/s are now explained in the following sections and subsections with the step by step formulations that is easy to implement by those of ordinary skilled in the art and by employing computer programming languages and computer hardware systems that can be optimized or customized by build or hardware design to perform the algorithm efficiently and produce useful outputs for various desired applications.
This section begins to concentrate on value significance evaluation of a predetermined order OSs by several exemplary embodiments of the preferred methods to evaluate the value of an OS of the predetermined order, within a same order set of OSs of the composition, for the desired measure of significance.
Using these mathematical objects various measures of value significances of OSs in a body of knowledge or a composition (called “value significance measure”) can be calculated for evaluating the value significances of OSs of different orders of the compositions or different partitions of a composition. Furthermore, these various measures (usually have intrinsic significances) are grouped in different types and number to distinguish the variety and functionalities of these measures.
The first type of a “value significance measure” is defined as a function of “Frequency of Occurrences” of OSi k is called here FOi k|l and can be given by:
vsm_1i k|l=ƒ1(FOi k|l), i=1,2, . . . N (6)
wherein FOi k|l is obtained by counting the occurrences of OSs of the particular order, e.g. counting the appearances of particular word in the text or counting its total occurrences in the partitions, or more conveniently be obtained from the COMk|l (the elements on the main diagonal of the COMk|l) or by using Eq. 4, or any other way of counting the occurrences of OSi k in the desired partitions of the composition.
vsm_1i k|l=ƒ1(FOi k|l), i=1,2, . . . N (6)
wherein FOi k|l is obtained by counting the occurrences of OSs of the particular order, e.g. counting the appearances of particular word in the text or counting its total occurrences in the partitions, or more conveniently be obtained from the COMk|l (the elements on the main diagonal of the COMk|l) or by using Eq. 4, or any other way of counting the occurrences of OSi k in the desired partitions of the composition.
Moreover the ƒ1 in Eq. 6 is a predetermined function such that ƒ1(x) might be a liner function (e.g. ax+b), a power of x function (e.g. x3 or x0.53), a logarithmic function (e.g. 1/log 2(x)), or 1/x function, etc.
Accordingly, a vsm_1_1i k|l, (stands for number one of type one “value significance measure”) for instance, can be defined as:
vsm_1_1i k|l =c·FOi k|l (7)
wherein c is a constant or a pre-assigned vector. The vsm_1_1i k|l of Eq. 7 gives a high value to the most frequent OSk. In another situation or some applications if, for a desired aspect, less frequent OSs are of more significance one may use the following vsm_1_2i k|l (number 2 oftype 1 vsm)
vsm_1_1i k|l =c·FOi k|l (7)
wherein c is a constant or a pre-assigned vector. The vsm_1_1i k|l of Eq. 7 gives a high value to the most frequent OSk. In another situation or some applications if, for a desired aspect, less frequent OSs are of more significance one may use the following vsm_1_2i k|l (number 2 of
Furthermore, another type of vsm_xi k|l is defined as a function of the “Independent Occurrence Probability” (IOP) in the partitions such as:
vsm_2i k|l=ƒ2(iopi k|l), i=1 . . . N (9)
wherein the independent occurrence probability (iopi k|l) may conveniently be given by:
vsm_2i k|l=ƒ2(iopi k|l), i=1 . . . N (9)
wherein the independent occurrence probability (iopi k|l) may conveniently be given by:
and ƒ2 is a predetermined function. For instance a vsm_2_1i k|l (i.e. the
vsm_2_1i k|l=−log2(iopi k|l), i=1 . . . N (11)
This measure gives a high value to those OSs of order k of the composition (e.g. the words when k=1) conveying the most amount of information as a result of their occurrence in the composition. Extreme values of this measure can point to either novelty or noise.
Still, another type of vsm_xi k|l is defined as a function of the “co-occurrence of an OSk with others as:
vsm_3i k|l=ƒ3(comij k|l), i=1 . . . N (12)
wherein the comij k|l is the co-occurrences of OSi k and OSj k and ƒ3 is a predetermined function. For instance a vsm_3i k|l can be defined as:
vsm_3_1i k|l=ƒ3(comij k|l)=Σjcomij k|l , i=1 . . . N (13).
vsm_3i k|l=ƒ3(comij k|l), i=1 . . . N (12)
wherein the comij k|l is the co-occurrences of OSi k and OSj k and ƒ3 is a predetermined function. For instance a vsm_3i k|l can be defined as:
vsm_3_1i k|l=ƒ3(comij k|l)=Σjcomij k|l , i=1 . . . N (13).
This measure gives a high value to those frequent OSs of order k that have co-occurred with many other OSs of order k in the partitions of order l.
This measure (Eq. 13) once combined with other measures can yet provide other measures. For instance when it is being divided by the vsm_1_1i k|l of Eq. 7, (e.g. being divided by FOi k|l), the resultant measure can indicates the diversity of occurrence of that OS. Therefore, this particular combined measure usually gives a high value to the generic words (since generic words can occur with many other words). Once the generic words excluded from the list of OSs of the order k then this measures can quickly identifies the main subject matter of a composition so that it can be used to label a composition or for classification, categorization, clustering, etc.
Accordingly, more vsm_xi k|l can be defined using the one or more of the other vsmi k|l or the variables. For instance one can define a vsm_xi k|l of type 4 (x=4) as function of vsm_1_2i k|l given by Eq. 8 and comij k|l as the following:
vsm_4_1j k|l=ƒ4(vsm_1_2i k|l,comij k|l)=Σi(comij k|l·vsm_1_2i k|l)=(1/FOi k|l)T×COM, i,j=1 . . . N (14)
wherein “T” stands for matrix or vector transposition operation and wherein we substitute the vsm_1_2i k|l from Eq. 8 into Eq. 12 or 14. This measure also points to the diversity of the participations of the respective OS especially when COM is made digital.
vsm_4_1j k|l=ƒ4(vsm_1_2i k|l,comij k|l)=Σi(comij k|l·vsm_1_2i k|l)=(1/FOi k|l)T×COM, i,j=1 . . . N (14)
wherein “T” stands for matrix or vector transposition operation and wherein we substitute the vsm_1_2i k|l from Eq. 8 into Eq. 12 or 14. This measure also points to the diversity of the participations of the respective OS especially when COM is made digital.
For mathematical accuracy it is noticed that in our notation the index “i” refers to the row number and the index “j” refers to the column number therefore the matrices with only the subscript of “i” usually are the column vectors and the matrices with only the subscript of “j” usually are row vectors.
In a similar fashion there could be defined, synthesized, and be calculated various vsm_xi k|l (x=1, 2, 3, . . . ) vectors for OSi k that are indicatives of one or more significances aspect/s of an OSi k in the composition or the BOK. These groups of vsm_xi k|l generally refer to the intrinsic value significance of an OS in the BOK.
These “value significance measures” (vsm_xi k) are more indicative of intrinsic importance or significances of lower order constituent part that can be use to separate one or more of the these OSs for variety of applications such as labeling, categorization, clustering, building maps, conceptual maps, ontological subject maps, or finding other significant parts or partitions of the composition or the BOK. For instance as disclosed in the incorporated references the vsm_xi k|l can readily be employed to score a set of document or to select the most import parts or partitions of a composition by providing the tools and objects to weigh the significances of parts or partitions of a BOK.
Accordingly, from the vsm_xi k vectors one can readily proceed to calculate the vsm_x of other OS of different order (i.e. an order l) utilizing the participation matrices PMkl by a multiplication operation by:
vsm_x j l|kl=(vsm_x i k)T×pmij kl j=1,2, . . . M and i=1,2, . . . N (15)
wherein vsm_xj l|kl is the type x value significance of OSs of order l obtained from the data of the PMkl. An instance meaning of OS of order l for a textual composition or a BOK is a sentence (e.g. l=2), a paragraph (e.g. l=3) or a document (l=5). The vsm_xj l|kl thereafter can be utilized for scoring, ranking, filtering, and/or be used by other functions and applications based on their assigned value significances.
vsm_x j l|kl=(vsm_x i k)T×pmij kl j=1,2, . . . M and i=1,2, . . . N (15)
wherein vsm_xj l|kl is the type x value significance of OSs of order l obtained from the data of the PMkl. An instance meaning of OS of order l for a textual composition or a BOK is a sentence (e.g. l=2), a paragraph (e.g. l=3) or a document (l=5). The vsm_xj l|kl thereafter can be utilized for scoring, ranking, filtering, and/or be used by other functions and applications based on their assigned value significances.
Generally, many other “value significant measures” can be constructed or synthesized as functions of other “value significance measures” to obtain a desired new value significance measure.
Therefore, from the disclosure here, it becomes apparent as how various filtering functions can be synthesized utilizing the participation matrix information of different orders and other derivative mathematical objects. The method is thereby easily implemented and is process efficient.
An immediate application of the theory and the associated methods, systems, and applications are instrumental in processing of natural languages composition and building the artificial intelligences capable of interacting with humans in an intelligent manner.
This section look into another important attributes of the ontological subjects of a composition that is instrumental and desirable in investigating the composition of ontological subjects.
According to the theoretical discoveries, methods, systems, and applications of the present invention, the concept and evaluation methods of “association strengths” between the ontological subjects of a composition or a BOK play an important role in investigating, analyzing and modification of compositions of ontological subjects.
Accordingly, the “association strength measures” are introduced and disclosed here. The “association strength measures” play important role/s in many of the proposed applications and also in calculating and evaluating the different types of “value significance evaluation” of OSs of the compositions. The values of an “association strength measure” can be shown as entries of a matrix called herein the “Association Strength Matrix (ASMk|l)”.
The entries of ASMk|l is defined in such a way to show the concept and rational of association strength according to one exemplary general embodiment of the present invention as the following:
asmi→j k|l=ƒ(comij k|l,vsm_x i k,vsm_y j k) . . . i,j=1 . . . N,x,y=1,2, . . . (16),
where asmi→j k|l is the “association strength” of OSi k to OSj k of the composition and ƒ is a predetermined or a predefined function, comij k|l are the individual entries of the COMk|l showing the co-occurrence of the OSi k and OSj k in the partitions or OSl, and the vsm_xi k and vsm_yj k are the values of one of the “value significance measures” of type x and type y of the OSi k and OSj k respectively, wherein the occurrence of OSk is happening in the partitions that are OSs of order l. Usually the vsm_xi k and/or the vsm_yj k are the same as vsm_xi k|l and/or the vsm_yj k|l which means it has been calculated from the participation data of the OSk in the OSs of order l.
asmi→j k|l=ƒ(comij k|l,vsm_x i k,vsm_y j k) . . . i,j=1 . . . N,x,y=1,2, . . . (16),
where asmi→j k|l is the “association strength” of OSi k to OSj k of the composition and ƒ is a predetermined or a predefined function, comij k|l are the individual entries of the COMk|l showing the co-occurrence of the OSi k and OSj k in the partitions or OSl, and the vsm_xi k and vsm_yj k are the values of one of the “value significance measures” of type x and type y of the OSi k and OSj k respectively, wherein the occurrence of OSk is happening in the partitions that are OSs of order l. Usually the vsm_xi k and/or the vsm_yj k are the same as vsm_xi k|l and/or the vsm_yj k|l which means it has been calculated from the participation data of the OSk in the OSs of order l.
Accordingly having selected the desired form of the function ƒ and introducing the exemplary quantities from Eq. 6, and/or 9 and/or Eq. 12 into Eq. 16 the value of the corresponding “association strength measure” can be calculated.
Referring to FIG. 2 here, it shows one definition for association of two or more OSs of a composition to each other and shows how to evaluate the strength of the association between each two OSs of composition. In FIG. 2 the “association strength” of each two OSs has been defined as a function of their co-occurrence in the composition or the partitions of the composition, and the value significances of each one of them.
The various asmi→j k|l can be grouped into types and number in order to distinguish them from other measures in a similar fashion in labeling and naming the VSMs in the previous subsection. Consequently few exemplary types of “association strength measures”, asmi→j k|l, are given below:
asm_1_1i→j k|l=comij k|l . . . i,j=1 . . . N (17)
asm_2_1i→j k|l=comij k|l/vsm_x i k|l . . . i,j=1 . . . N,x,y=1,2, . . . (18)
asm_1_1i→j k|l=comij k|l . . . i,j=1 . . . N (17)
asm_2_1i→j k|l=comij k|l/vsm_x i k|l . . . i,j=1 . . . N,x,y=1,2, . . . (18)
It is important to notice that the association strength defined by Eq. 16, is not usually symmetric and generally asmj→i k|l≠asmi→j k|l. Therefore, one important aspect of the Eq. 16 to be pointed out here is that associations of OSs of the compositions are not necessarily symmetric and in fact an asymmetric “association strength measure” is more rational and better reflects the actual semantic relationship situations of OSs of the composition.
For instance in the patent application Ser. No. 12/939,112 the exemplary and preferred “association strength measure” that in this application is labeled as asm_3_2i→j k|l, (it reads as number 2 type 3 “association strength measure”) to make it distinguishable from other measures, was defined as:
where c is a predetermined constant, or a pre-assigned value vector, or a predefined function of other variables in Eq. 20, comij k|l are the individual entries of the COMk|l showing the co-occurrence of the OSi k and OSj k in the partitions of order l, and the iopi k|l and iopj k|l are the “independent occurrence probability” of OSi k and OSj k in the partitions respectively, wherein the occurrence is happening in the partitions that are OSs of order l. In a particular case, it can be seen that in Eq. 20, the un-normalized “association strength measure” of each OS with itself is proportional to its frequency of occurrence (or self occurrence).
This exemplary choice of definition for “association strength measure”, i.e. Eq. 20, is further illustrated here. In fact Eq. 20 basically states that if a less popular OS co-occurred with a highly popular OS then the association of the less poplar OS to the highly popular OS is much stronger than the association of the highly popular OS with the less popular OS (remembering the co-occurrence is a symmetric). That make sense, since the popular OSs obviously have many associations and are less strongly bounded to anyone of them so by observing a high popular OSs one cannot gain much upfront information about the occurrence of less popular OSs. However observing occurrence of a less popular OSs having strong association to a popular OS can tip the information about the occurrence of the popular OS in the same partition, e.g. a sentence, of the composition.
In another instance it may be more desirable to have defined the association strength measure as:
This asm_2_2i→j k|l measure effectively expressing that association of an OSi k to another one, say OSj k, is stronger when the co-occurrences of them is high and the probability of occurrence of OSi k is low. In other words if an OS is occurring less frequently and whenever it has occurred it has appeared more often with one particular OS then the association bond of the less frequently occurring OS is strongest with the particular OS that has co-occurred with, the most. In the other way for a given co-occurrence number for a particular OS, say OSj k, it's highest associated bond is from the OS with less independent occurrence probability. Mathematically, in fact, the asm_2_2i→j k|l is the column normalized version of the asm_3_2i→j k|l of Eq. 20 (when c=1/M in Eq. 21 and assuming binary PM) and is more useful in some instances and applications.
This particular association strength measure can reveal a strong relationship from a less significant OS to the one who has co-occurred the most and is a useful measure to hunt for some types of novelty.
Yet in another instance an application/s is found for the following association strength definition:
asm_4_1i→j k|l =c·comij k|l·iopj k|l i,j=1 . . . N (22).
asm_4_1i→j k|l =c·comij k|l·iopj k|l i,j=1 . . . N (22).
The asm_4_1i→j k|l attributes the strongest association bond from a first OS, say OSi k, to a second OS, say OSj k, when the product of their co-occurrences and the independent probability of occurrence of the second OS is the highest. This association strength measure usually is useful for discovering the real association of two important or significant OSs of the composition.
And yet further, this measure can be defined to hunt for mutual associations bonds such as word phrases as the following:
This measure of association strength (i.e. Eq. 23) is symmetric and gives a high value to those pairs of OSs that frequently co-occur with each other such as word phrases. This becomes equal to 1 (assuming c=1 in Eq. 23) when two words have always co-occurred with each other.
These are few exemplary but useful types of association strength measures which are found to be instrumental in analyzing and investigation of a composition of ontological subjects. However by Eq. 16 it can be seen that there could be defined, synthesized and calculate numerous other association strength measures. Furthermore considering that comij k|l is also one type of “association strength measure” therefore Eq. 16 can be further generalized as:
asm_x2i→j k|l =F(asm_x1i→j k|l,vsm_x i k,vsm_y j k) . . . i,j=1 . . . N,x,y=1,2, . . . ,x1,x2=1,2, . . . (24),
wherein F is a predetermined function and x1 and x2 refer to different types of association strength measures and xi and yj refer to one of the “value significance measures” of the different types of “value significance measures”. To illustrate this, one can see that the asm_3_2i→j k|l can be expressed versus the asm_2_2i→j k|l (Eq. 21) and the vsm_1j k|l (Eq. 7) as:
asm_3_2i→j k|l =c·asm_2_2i→j k|l·vsm_1j k|l (25)
wherein c is a constant and “·” indicates an element-wise multiplication of two vectors and wherein Eqs. 7, 10, 20, 21 were combined to derive the Eq. 25.
asm_x2i→j k|l =F(asm_x1i→j k|l,vsm_x i k,vsm_y j k) . . . i,j=1 . . . N,x,y=1,2, . . . ,x1,x2=1,2, . . . (24),
wherein F is a predetermined function and x1 and x2 refer to different types of association strength measures and xi and yj refer to one of the “value significance measures” of the different types of “value significance measures”. To illustrate this, one can see that the asm_3_2i→j k|l can be expressed versus the asm_2_2i→j k|l (Eq. 21) and the vsm_1j k|l (Eq. 7) as:
asm_3_2i→j k|l =c·asm_2_2i→j k|l·vsm_1j k|l (25)
wherein c is a constant and “·” indicates an element-wise multiplication of two vectors and wherein Eqs. 7, 10, 20, 21 were combined to derive the Eq. 25.
These illustrating examples are given to demonstrate that with the concept of “value significance” and “association strengths” there will be various ways to synthesize, perform, calculate and obtain the desired association strength for the particular application by those skilled in the art.
Also importantly from the one or more of the “association strength measures” one can go on and define a measure for evaluating the hidden association strength of OS of order k even further by:
ASM_x3k|l=(ASM_x1k|l)T×ASM_x2k|l (26)
wherein ASM_x3k|l stands for type x3 “association strength measure” which is basically a N×N matrix. The Eq. 26 takes into account the transformative or hidden association of OSs of order k (e.g. words of a textual composition or BOK) from one asm measure and combines with the information of another or the same asm measure to gives another measure of association that is not very obvious or apparent from the start. This type of measure therefore takes into account the indirect or secondary associations into account and can reveal or being used to suggest new or hidden relationships between the OSs of the compositions and therefore can be very instrumental in knowledge discovery and research.
ASM_x3k|l=(ASM_x1k|l)T×ASM_x2k|l (26)
wherein ASM_x3k|l stands for type x3 “association strength measure” which is basically a N×N matrix. The Eq. 26 takes into account the transformative or hidden association of OSs of order k (e.g. words of a textual composition or BOK) from one asm measure and combines with the information of another or the same asm measure to gives another measure of association that is not very obvious or apparent from the start. This type of measure therefore takes into account the indirect or secondary associations into account and can reveal or being used to suggest new or hidden relationships between the OSs of the compositions and therefore can be very instrumental in knowledge discovery and research.
A very important, useful, and quick use of exemplary “association strength measures” of Eq. 17-26 is to find the real associates of a word, e.g. a concept or an entity, from their pattern of usage in the partitions of textual compositions. Knowing the associates of words, e.g. finding out the associated entities to a particular entity of interest, finds many applications in the knowledge discovery and information retrieval. In particular, one application is to quickly get a glance at the context of that concept or entity or the whole composition under investigation. The choice and the evaluation method of the association strength measure is important for the desired application. Furthermore, these measures can be directly used as a database of semantically associated words or OSs in meaning or semantic. For instance if the composition under investigation is the entire (or even a good part of) contents of Wikipedia, then universal association of each entity (e.g. a word, concept, noun, etc.) can be calculated and stored for many other applications such as in artificial intelligence, information retrieval, knowledge discovery and numerous others.
Moreover, from the “association strength measures” one can also obtain and derive various other “value significance measures” which poses more of intrinsic type of significances. For instance in the application Ser. No. 12/939,112 the asmi→j k|l (e.g. Eq. 20-26) was used to define and calculate few exemplary “value significance measures”, i.e. vsmi k|l, in order to evaluate the intrinsic importance, credibility, and importance of OSs of different orders.
In practice, for given a OS, e.g. OSj k, we want to find out the strongest “associated with” OS (assume it found out to be the OSi k). To do that we can use Eq. 21. Also one can use the Eq. 22 to find out which OS the given OS, say OSi k, is highly “associated to” (assume it was found out to be the OSj k).
To find out the semantically or functionally related OSs one can use Eq. 26 which is an important tool for knowledge discovery. For instance this measure can be used to hunt for the subject matters that can in fact be highly related, but one cannot find their relations in the literature explicitly. The “association strength measure” of Eq. 26, thereby can point to interesting and important topics of further investigation or research either by human researcher or an intelligent machine.
In the next subsection the rational and definition of yet other types of instrumental measures and way of calculating them are given
As mentioned above the association strength values are important for many applications. One or more of such applications is to cluster or to find hidden relationships between the partitions of the compositions. The asmi→j of the lower order OSs can show the association strength of the higher order OSs of the composition thereby to use them for clustering, categorization, scoring, ranking and in general filtering and manipulating the higher order OSs.
Accordingly, in this section we further disclose and explain the concept of “Relational Association Strength measure” (RASM). In the general terms, from lower order “association strength matrix” we can proceed to calculate association strength of higher order OSs to a lower order OS that we call it “Relational Association Strength measure” (RASM) here.
One exemplary instance of such “Relational Association Strength measure” can be given by:
RASM_1l→k|kl=rasm_1il j k l→k|kl=(PMkl)T×ASMk|l i l=1,2, . . . M and j k=1,2, . . . N (27)
wherein rasm_1il j k l→l|kl or the RASM_1l→k|kl is the “first type relational association strength measure” of OSs of order l to OSs of order k, which is a M×N matrix and shows the degree that an OS of order l (e.g. the ilth sentence of the composition) is associated or is related to a particular OS of order k (e.g. to the jkth word of the composition).
RASM_1l→k|kl=rasm_1i
wherein rasm_1i
It is noted that ASMk|l is generally a square asymmetric matrix, whose transpose is not equal to itself, and therefore there could be envisioned another, also important, type of “relational association strength measure”. Accordingly, in the same manner the “second type relational association strength measure” can be defined and calculated as:
RASM_2l→k|kl=rasm_2il j k l→k|kl=(PMkl)T×ASMk|l T i l=1,2, . . . M and j k=1,2, . . . N (28).
wherein rasm_2il j k l→l|kl or the RASM_2l→k|kl is the “second type relational association strength measure” of OSs of order l to OSs of order k, which is also a M×N matrix and is similar to RASM_1l→k|kl except relational emphasis is from different aspect. For instance if the ASM used in Eq. 28 is from the Eq. 20, then for a given OS of order k (e.g. a particular keyword) the RASM_1l→k|kl shows a high relatedness for those partitions (e.g. sentences or paragraphs etc.) that contain the words that are highly bonded to the target OS. Whereas at the same condition using the RASM_2l→k|kl then those sentences that contain the words that the target OS is highly associated with show a strong relatedness to the target OS.
RASM_2l→k|kl=rasm_2i
wherein rasm_2i
Therefore using the above relational rasm one can conveniently find the most related partitions of a composition to one or more target OS for the desired goal of the investigation (e.g quick retrieval of documents, sentences, or paragraphs with high semantic relevancy).
On the other way, the RASM_2l→k|kl or RASM_1l→k|kl can be used also to find out the association strength or relatedness of particular OS of order k (e.g. the jkth word of the composition) to a particular OS of order l (e.g. the ilth sentence of the composition) by having the following relationship:
RASM_x k→l|kl=(RASM_x l→k|kl)T (29).
RASM_x k→l|kl=(RASM_x l→k|kl)T (29).
The reason that the present invention call RASM_xl→k|kl “Relational Association Strength Measure” of type x, is to remind the fact that these types of association strength are not only between a higher order OS (e.g. a sentence, paragraph, or a document) with a lower order OS (e.g. a word or a keyword, phrase etc) but it is, in an indirect way, also between a higher order OS and the associations of a lower order OS. The name for the other way around relationship (i.e. RASM_xk→l|kl) is also appropriate in which not only a lower order OS is associated with a higher order OS but also is related to other constituent lower order OSs of the higher order OS.
Many more useful mathematical objects and relations are obtained, in a similar fashion as thought in the present invention, from which variety of operations can be envisioned. For instance we can proceed to calculate the association strength between the OSs of order l (e.g. an association strength measure between sentences of a textual composition) by the following:
RASM_x l→l|kl=rasm_x il j l l→l|kl=RASM_x l→k|kl×RASM_x k→l|kl , i l ,j l=1,2, . . . M (30)
wherein rasm_xl→l|kl is indicative of one type of “relational association strength measure” between ith OS of order l and jth OS of order l. This matrix is particularly useful to find or select the higher order OSs of the composition or the partitions (e.g. sentences or paragraphs, or documents), that are highly associated with each other. In some applications, though, it would be desirable, for instance, to find out the partitions that have the least amount of associations with any other partitions etc.
RASM_x l→l|kl=rasm_x i
wherein rasm_xl→l|kl is indicative of one type of “relational association strength measure” between ith OS of order l and jth OS of order l. This matrix is particularly useful to find or select the higher order OSs of the composition or the partitions (e.g. sentences or paragraphs, or documents), that are highly associated with each other. In some applications, though, it would be desirable, for instance, to find out the partitions that have the least amount of associations with any other partitions etc.
In general one or more of these “related associations measures” can be used (either normalized or not) to define and/or synthesize new RASMs.
By the same manner using “Participation Matrix/es” and other objects, other desired features can be quantified in a composition or a BOK and consequently make it possible to select, clustered, or filter out the desired part or parts of the composition to look into, investigate, modified, re-composed, etc.
Eqs. 27-30 make it easy to find the partitions of the compositions that have the highest relatedness or highest relative association with a keyword or the other way around etc. Therefore a computer implemented method utilizing these formulations can essentially filters out the most related parts or partitions of a composition in relation to a target keyword.
One immediate application, of course, is for scoring the relatedness of group of documents to a subject matter or a keyword. Another immediate application of the computer implemented method, utilizing the concept of RASM_xl→k|kl and the formulation, for instance, is to cluster and separate partitions of a BOK or a large corpus/s, etc into sets of partitions that are related to a particular subject matter. The relatedness is measured by one or more of the above measures and partitions that exhibited an association strength value greater (or sometimes smaller) than a predetermined threshold to a particular OS, can be grouped or clustered together. Further these data can be readily used to build a neural network type system (for learning, reasoning etc.) whose edge/connection weights can be obtained from the data of association strengths of the ontological subjects (e.g. the node of a neural net). In this way the training of a neural net can be done much faster or simply by reading a body of knowledge to attain the necessary data for building a learnt (e.g. adjusted weight by training through observing output/input as done currently without the teachings of the this disclosure) neural net. The association strength data structures usually in the form a matrix therefore is instrumental to build such cognitive networks for variety of tasks in general and for building neural nets in particular. The training iteration and the resource needed to train a neural net is significantly reduced using the information of the association strengths (and various other data objects or data structures introduced in this disclosure) of the ontological subjects obtained by investigating a body of knowledge as taught through this disclosure.
In light of the foregoing explanation, the algorithm and method of clustering become straightforward. For instance, a number of partitions of the composition or the BOK that have exhibited a predetermined threshold of relative association strength or predetermined criteria of satisfying enough association strength to a target subject or to each other can be categorized or being clustered as group together.
As a practical example, these method/s, were successfully and effectively used for clustering and categorizing a large of number of news feeds as shown in FIG. 11 which will be explained in the next subsections (section II-II-I).
Nevertheless in the short note here, the FIG. 11 shows the procedure in which using the concept of “value significance” selected a number of head category are selected from those OSs exhibiting the highest value significances, and consequently using the “related association strength measure” concept it was possible to separate the very many different news feeds into different categories automatically with satisfactory accuracy.
In the next section, in accordance with another aspect of this disclosure the relative or “relational value significance measures” (RVSM) are further introduced to evaluated the relative significances of various OSs in relation to a target OS in the context of the given BOK.
Considering the case wherein one is looking for an important partition of the BOK related to a target OS (e.g. OSj k) which could be a word or a phrase, subject matter, keyword etc. Consequently one needs a value significance measure/s that is measured in relation or relative to one or more target OS. One can call this conceptual measure as “relational value significance measure” or RVSM.
In here the RVSM can simply be the association strengths of OSi k, i=1, 2, . . . N to a target OSj k k, i.e. asmi → j k k|l or the jkth column of the ASMk|l matrix, which when is used as a VSM vector that can give a weighted importance of partitions of the composition or the BOK (i.e. an OSi l l) in relation to the target OSj k k when operates (multiply) on the participation matrix PMkl, as the following:
rvsm_1_x il j k l→k|kl=(pmi k i l kl)T×asm_y i k →j k k|l . . . i k ,j k=1,2, . . . N and i l=1,2, . . . M and x,y=1,2, . . . (31)
wherein rvsm_1_xil j k l→k|kl stands for type 1 of number x “relational value significance measure” of OSs of order l, OSi l l, to a given OSj k k which is a row vector and is obtained by processing the participation data of OSk in OSl or in other words it has been driven from the data of PMkl and y is indicative the type of the “association strength measure”.
rvsm_1_x i
wherein rvsm_1_xi
For the sake of simplicity usually the x and y are the same type. Accordingly, as can be seen in this embodiment the first type “relational value significance measure”, rvsm_1i l j k l→k|kl, is in fact the same as rasm_1i l j k l→k|kl the “first type relational Association strength measure” introduced in Eq. 27.
Eq. 31, once executed, will assign values to OSl in which it amplifies the importance or significance values of the partitions (e.g. sentences) of the composition that contains the OSs (e.g. words) that have the highest association strength to the target OSj k (i.e. a target keyword) thereby to provide an instrument, i.e. a filtering function, for scoring and consequently selecting one or more highly related partitions to an OSj k.
In fact the Eq. 31 can also be written in a matrix form wherein the rvsmi l j k l→k|kl is a M by N matrix indicating the relative importance of the partitions to each of OSj k. In other words rvsmi l j k l→k|kl is a kind of “relational value significance measure” and can be used as, say, “first type relational value significance measure” (e.g. can be shown by RVSM_1 notation).
The RVSM_1 therefore, following the Eqs. 27 and 31, can be given in the matrix form as:
RVSM_1_x l→k|kl=RASM_1l→k|kl=rvsm_1il j k l→k|kl=(PMkl)T×ASMk|l , i l=1,2, . . . M and j k=1,2, . . . N (32)
wherein the “T” shows the transposition matrix operation and RASM_1l→|kl is the “Relational Association Strength Matrix” and the RVSM_1 is the “first type relational value significance measure”. It is noticed that ASMk|l is a N×N matrix and RASM_1l→k|kl is a M×N matrix indicating the relatedness/association of OSi l (e.g. a sentence and i=1 . . . M) to a OSj k (e.g. a word and j=1 . . . N).
RVSM_1_x l→k|kl=RASM_1l→k|kl=rvsm_1i
wherein the “T” shows the transposition matrix operation and RASM_1l→|kl is the “Relational Association Strength Matrix” and the RVSM_1 is the “first type relational value significance measure”. It is noticed that ASMk|l is a N×N matrix and RASM_1l→k|kl is a M×N matrix indicating the relatedness/association of OSi l (e.g. a sentence and i=1 . . . M) to a OSj k (e.g. a word and j=1 . . . N).
In a similar fashion there could be defined a second type relative value significance measure (e.g. can be shown by RVSM_2 notation).
as:
RVSM_2l→k|kl=rvsm_2il i k l→k|kl=(PMkl)T×(ASMk|l)T i l=1,2, . . . M and i k=1,2, . . . N (33)
Or equivalently (see Eq. 28) given by:
RVSM_2l→k|kl=RASM_2l→k|kl (34)
wherein the RVSM_2l→k|kl or the RASM_2l→k|kl indicates the relatedness/association strength of OSi l (e.g. a sentence and i=1 . . . M) or its “relational value significance” to a OSj k (e.g. a word and j=1 . . . N).
RVSM_2l→k|kl=rvsm_2i
Or equivalently (see Eq. 28) given by:
RVSM_2l→k|kl=RASM_2l→k|kl (34)
wherein the RVSM_2l→k|kl or the RASM_2l→k|kl indicates the relatedness/association strength of OSi l (e.g. a sentence and i=1 . . . M) or its “relational value significance” to a OSj k (e.g. a word and j=1 . . . N).
Remembering the ASMk|l in general is asymmetric and have different interpretation in which the rows of ASMk|l indicates the value of association to other and column indicates the value of being association with by others. Therefore the RVSM_1l→k|kl is indicative of a degree that an OS of order l, OSi l, (e.g. sentences) containing the OSs of order k, OSk (e.g. the words) that are used to explain or express or provide information regarding the target OSj k (i.e. containing the words that are highly associated with the target OS). Whereas the RVSM_2l→k|kl is indicative of a degree that an OSi l (e.g sentences) containing the OSk (e.g. the words) for which the target OSi k is used or participated to explain or express or provide information about them (i.e. containing the words that the target OS is highly associated with).
Yet a third type of “relational value significance measure” can be defined as:
RVSM_3il j k l→k|kl=vsmj k k|l·RASM_1l→k|kl=vsmj k k|l·((PMkl)T×ASMk|l) i l=1,2, . . . M and j k=1,2, . . . N (35)
wherein “·” indicates an element-wise multiplication and the vsmjk k|l could be the value of the one of the “value significance measures”.
RVSM_3i
wherein “·” indicates an element-wise multiplication and the vsmj
And yet “forth type relational value significance measure” can be defined and calculated as:
RVSM_4il j k l→k|kl=vsmj k k|l·RASM_2l→k|kl=vsmj k k|l·((PMkl)T×ASMk|l), i l=1,2, . . . M and j k=1,2, . . . N (36)
RVSM_4i
Therefore there could also be defined various “relational value significance measures” by incorporating the “intrinsic value significances” and the “relational association strength”.
Accordingly, in general the RVSM_xi l j k l→k|kl can be rewritten as:
RVSM_x il j k l→k|kl=ƒx(vsmj k k|l,RASM_1l→k|kl,RASM_2l→k|kl) (37)
wherein RVSM_xil j k l→l|kl is the “type x relational value significance measure” and the ƒx is a predetermined function.
RVSM_x i
wherein RVSM_xi
These measures, RVSM_3i l j k l→k|kl and/or RVSM_4i l j k l→k|kl, put an intrinsically high value on the significance of the partitions that are highly related to the high value significance OSk of the composition by taking the intrinsic value of the target OSs into account. Therefore these measures can be instrumental to, for example, representing a body of knowledge with the highest relational value significance or to summarize a composition. To do so one can simply select one or more partition of the BOK that scored the highest for these measures in order to present it as summary of a composition.
Furthermore, from RVSM_xi l j k l→k|kl one can proceed to calculate the “relational value significance measures” between the OSs of higher order l as:
RVSM_x l→l|kl=rvsm_x il j l l→l|kl=RVSM_x l→k|kl×(RVSM_x l→k|kl)T , i l ,j l=1,2, . . . M (38)
wherein RVSM_xl→l|kl is the relative value significance measure between OSs of order l so that it can directly measure the relatedness of partitions of the BOK such as sentences, paragraphs, or documents to each other. Again this measure therefore can readily be used to find the highly related partitions of the BOK either for retrieval purposes, rankings, document comparisons, question answering, conversation, or clustering and the like.
RVSM_x l→l|kl=rvsm_x i
wherein RVSM_xl→l|kl is the relative value significance measure between OSs of order l so that it can directly measure the relatedness of partitions of the BOK such as sentences, paragraphs, or documents to each other. Again this measure therefore can readily be used to find the highly related partitions of the BOK either for retrieval purposes, rankings, document comparisons, question answering, conversation, or clustering and the like.
The concept behind the “relational value significance measures” is for processing and investigating compositions of ontological subject as it become important in these investigations to have tools, measures, and filtering functions and methods of building such filtering functions to spot a partition relevant to another part or partition or to a given composition or query.
For instance in the information retrieval it becomes increasingly important to have retrieved the most relevant pieces of information and therefore the retrieved documents or the parts thereof should be the most relevant document and partition to a target OS which could be a keyword or set of keywords or even a composition itself. For instance it would be very useful and desirable to find the most relevant document or piece of knowledge to an input query in the form of a natural language question, or even a paragraphs or a whole text document. In this particular application one or more of the various kind and types of the, so far introduced, “value significance measures” can readily be applied using the method of this discloser to retrieve and present the most relevant part (e.g. a word, a sentence, a paragraph, a chapter, a document) to the sought after subject matter or in response to a query.
Many other desirable outcome and functionality can be built in light of the teachings and the disclosed method of systematic and computer-implementable methods of investigations not only for textual compositions but also for other types of compositions. In fact the disclosed method has been used and applied on image and video compositions as well as genetic code compositions which confirmed the method/s is indeed very effective in investigating compositions of ontological subject to obtain a desirable outcome or information or knowledge or the result.
In another aspect of the present invention, in the next section, are the concept and definitions of “novelty value significance measures” (NVSM), as indication of various situations of novelty of OSs in the composition or the BOK.
According to another aspect of investigation methods of compositions yet other value significance measures are introduced and explored herein. According to this aspect of investigation, in some instances it would become desirable to have found the words or the partitions of a composition expressing novel information about one or more subject matter/s. In these instances if one can have an instrument or a function to measure a novelty value of a subject matter (e.g. an OS of the composition) itself or a novelty measure for the partitions then it would become practical to spot the novel information and/or the partitions of the composition carrying novel information in the context of that compositions or a set of compositions or generally a body of knowledge (BOK) as we defined before.
However the degree or value of novelty should be somehow measured in order to identify the part or partitions of the novelty and evaluate their value in terms of the significance of their novelty. In this disclosure these measures are called “novelty value significance measures” (NVSM) which can be categorized in different types and we, herein, define and show the methods of evaluating them for ontological subjects of a composition or a BOK.
In view of that, the first step is to define what constitute a novelty in the context of a BOK and identify different aspects that there is into a novelty investigation.
There could be envisioned several situations in which a novelty can occur that is of value in the investigation process. The detection and evaluation of novelty values can be important to either a knowledge consumer or to be used in other applications, processes, and or other computer implemented client programs.
Accordingly, in the present invention we explain few exemplary instances of novelty, having significance value, to be investigated in more details to demonstrate another investigation method of compositions according to novelty significance aspect/s.
Novelty is an attribute that is related to newness, surprising factors, entropy, not being well known, not seen before, and unpredictability. However this attributes depends very much on the context and in relations to other ontological subjects of the compositions. For instance something which is new in one domain or context might be an obvious thing in another domain. Or something that is new now, it might become very well known fact after sometimes. For instance, in news aggregation novelty of the news is very much related to the time of the news being broken and how many other news agencies have published the same news story. Therefore the novelty should be measured in relation to the context, time, and other partitions of the compositions. However, we look for novelty or novelties in the given composition for investigation and since we can treat time and/or a time stamp as an OS, our method of investigation, therefore, would also work for time-related compositions such as news, as well.
Generally, therefore, a valuable novelty occurrence is relational (i.e. more than one OS is participated where the novelty occurs) which should be investigated in the context of a composition. For instance in the context of a body of knowledge (BOK) there could be found many known or anticipated facts in regards to the subject matter/s of the BOK but there could be some partitions, e.g. statements, that are less known and can be considered as novel.
In this subsection therefore, to identify relative or relational novelty in regards to a topic or one or more OSs, several important novelty occurrence situations are envisioned and exemplified in the followings.
One of the situations is a novel relationship between two or more OSs in which case there could yet be envisioned at least two notable and important situations.
In one situation of novel relationship between two or more OSs, for example, a type of “relational novelty value significance measure” can be assigned to spot a novel or less known relationship between two important OSs. In this case the relational novel value should be high because the two significant OSs are less seen with each other in a part or partitions of a composition or a BOK. Therefore the desired “relational novel significance measure” should be proportional to the value significances of each of the OSs and be inversely proportional to their “association strength bond”.
Accordingly, one exemplary and simple measure of “relational novel value significance” between two of the OS of order k, say OSi k and OSj k, can be given by:
wherein the rnvsm_1i→j k|l stands for type one “relational novelty value significance measure” of OSi k to the OSj k. This measure can be used to hunt for those partitions that contain two or more significant OSs expressing less known relationship. Therefore this measure will give a high value to the pair of the OSs, that are intrinsically significant, and more likely the expressed relationship to be credible and significant yet their relationship with each other is of novelty in the context of the BOK.
Another situation of novel relationship between two or more OSs, is a type of novelty between two OSs in which the novelty reveals less known information about one important OS of the interest (e.g. a target keyword, a high value significance subject of a BOK, etc.), regardless the significance of the other OSs. In this instance, the intrinsic value of the target OS, e.g. an intrinsic vsm, should be a significance factor for measuring and putting a value on the novelty. Also in terms of how to spot a novelty in relation to a significant target OS then the less known associations can be a guide to find the novel part or partitions or statement of a relationship between a significant OS with other OSs of the composition.
Therefore, another type of “relational novelty value significance measure” can be defined as:
wherein the rnvsm_2i→j k|l stand for the second type “relational novelty value significance measure” OSi k to the OSj k. This measure put a high relational novelty value on the pairs that at least one of them, e.g. the target OS, have a high intrinsic value (i.e the vsm of the OSj k) while the other ones are the ones that had the lowest co-occurrences with the target OS. This measure can be used to spot the partitions that are novel and significant but perhaps the expressed relationship, between the two OSs, by the partition, is less credible.
Moreover there could be considered further notable situations, when two or more of OSs of the composition have participated in a partition, to convey a novel knowledge or information.
Accordingly, for example, another type of relational novelty can occur between a less significant OS and a high significance target OS. In this case this type of novelty value should be proportional to the value significance of the second OS, e.g. a target OS, and be inversely proportional to the value significance of the less significant OS and also be inversely proportional to their co-occurrences so that:
wherein the rnvsm_3i→j k|l stand for the third type of “relational novelty value significance measure” OSi k to the OSj k. This measure can be used to spot highly novel but perhaps even less credible partitions of the BOK than what is found by the rnvsm_2i→j k|l.
And yet another type of novelty can occur between two less significant OSs. In this case the significance and relational novelty value should be inversely proportional to the significances, i.e. VSMs, of each of the OSs and also proportional to their co-occurrences so that:
rnvsm_4i→j k|l(OSi k,OSj k)∝1/vsmj k|l,1/vsmi k|l,comij k|l (42)
wherein the rnvsm_4i→j k|l stands for the forth type of “relational novelty value significance measure” OSi k to the OSj k. This measure can be used to spot a highly novel relationship between two less known OSs but with some credibility. This measure can be used to spot the rare partitions that might be irrelevant to the context of the BOK but is important to be looked at.
rnvsm_4i→j k|l(OSi k,OSj k)∝1/vsmj k|l,1/vsmi k|l,comij k|l (42)
wherein the rnvsm_4i→j k|l stands for the forth type of “relational novelty value significance measure” OSi k to the OSj k. This measure can be used to spot a highly novel relationship between two less known OSs but with some credibility. This measure can be used to spot the rare partitions that might be irrelevant to the context of the BOK but is important to be looked at.
And yet there could be another notable situation and measure of relational novelty as:
wherein the rnvsm_5i→j k|l stands for the fifth type of “relational novelty value significance measure” OSi k to the OSj k. This measure can be used to spot a highly novel relationship between two less known OSs but with even less credibility than rnvsm_4i→j k|l. This measure can be used to spot the noise like partitions that might be irrelevant to the context of the BOK but might be essential to be looked at such as crime investigation or financial analysis, fraud detections and the like. This measure also can be used to filter out the irrelevant or noisy part of the composition, or be used in data compression, image compression and the like.
In another notable instance a measure of relational novelty value can be defined based on their association strengths to each other as:
rnvsm_6i→j k|l(OSi k,OSj k)∝asmi→j k|l/asmj→i k|l (44)
wherein the rnvsm_6i→j k|l stands for the sixth type of “relational novelty value significance measure” OSi k to the OSj k. This measure of novelty amplifies the asymmetry of the association strength value between the two OSs and therefore serves as a measure of anomaly and novelty, both too large and too small a value for this measure can point to a novelty situation. However, to have a symmetric rnvsm using asm one might consider the following measure:
rnvsm_6i→j k|l(OSi k,OSj k)∝asmi→j k|l/asmj→i k|l (44)
wherein the rnvsm_6i→j k|l stands for the sixth type of “relational novelty value significance measure” OSi k to the OSj k. This measure of novelty amplifies the asymmetry of the association strength value between the two OSs and therefore serves as a measure of anomaly and novelty, both too large and too small a value for this measure can point to a novelty situation. However, to have a symmetric rnvsm using asm one might consider the following measure:
wherein the rnvsm_7i→j k|l stands for the seventh type of “relational novelty value significance measure” OSi k to the OSj k. This measure is particularly good to spot any symmetric kind of novelty or anomaly between OSi k to the OSj k. When the value of this measure is large then there is a novelty situation to look at between OSi k to the OSj k.
It can be noted that the some of the exemplary rnvsm_xi→j k|l, (x=1, 2, 3 . . . ) are generally symmetric and both sided whereas the some other rnvsm_xi→j k|l are asymmetric.
Once is noted that the co-occurrence is one of the measures and indications of the associations between a pair of OS then the rnvsm_xk|l (x=1, 2, . . . ) can further be generalized as a function of individual values significances of the OSs and their association strength measures. Therefore in general the “relational novel value significance measures” can be defined and calculated in the general form of:
rnvsm_x i→j k|l(OSi k,OSj k)=g 2(vsmi k|l,vsmj k|l,asmi→j k|l,asmj→i k|l), . . . i,j=1,2, . . . N,x=1,2, . . . (46)
wherein g2 is a predefined or predetermined function.
rnvsm_x i→j k|l(OSi k,OSj k)=g 2(vsmi k|l,vsmj k|l,asmi→j k|l,asmj→i k|l), . . . i,j=1,2, . . . N,x=1,2, . . . (46)
wherein g2 is a predefined or predetermined function.
When there are multiple OSs of interest the pair-wise value significances can be used in combination and perhaps with various weight to achieve the same filtering effect for a set of OSs. For instance
rnvsmq→i,j,p k|l(OSi k,OSj k,OSp k)=α1·rnvsm_x1k|l(OSq k,OSi k)+α2·rnvsm_x2k|l(OSq k,OSj k)+α3·rnvsm_x3k|l(OSq k,OSp k) and q=1,2 . . . N (47)
wherein α1, α2, and α3 are predetermined weighting functions such as α1(OSi k)=1/FO(OSi k) or α1(OSi k)=log 2(iop(OSi k)) etc. or constants and/or normalization factors, and x1, x2 and x3 are indications of the type of the rnvsm (e.g. Eq. 39-45) and “OSp k” is the indication of one or more combination of the first OS to the particular target OS. Moreover, Eq. 47 in just one of the notable situations of novelty occurrence and in another instance it might become more useful to multiply the pair-wise rnvsm_xk|l to each other.
rnvsmq→i,j,p k|l(OSi k,OSj k,OSp k)=α1·rnvsm_x1k|l(OSq k,OSi k)+α2·rnvsm_x2k|l(OSq k,OSj k)+α3·rnvsm_x3k|l(OSq k,OSp k) and q=1,2 . . . N (47)
wherein α1, α2, and α3 are predetermined weighting functions such as α1(OSi k)=1/FO(OSi k) or α1(OSi k)=log 2(iop(OSi k)) etc. or constants and/or normalization factors, and x1, x2 and x3 are indications of the type of the rnvsm (e.g. Eq. 39-45) and “OSp k” is the indication of one or more combination of the first OS to the particular target OS. Moreover, Eq. 47 in just one of the notable situations of novelty occurrence and in another instance it might become more useful to multiply the pair-wise rnvsm_xk|l to each other.
All these relationships (i.e. Eq. 39-46) can be written in a matrix form to, once executed numerically, have all combinations of relations between two or more of the OSk pre-calculated and handy.
Again by operating these specialty defined “value significance measures” on the PM one can obtain the respective type of value for the partitions of the compositions, e.g. OSs of order l or OSl, by:
rnvsm_x il→ j k l→k|kl=(pmi k i l kl)T×rnvsm_x i k→ j k k|l . . . i k ,j k=1,2, . . . N and i l=1,2, . . . M (48)
Or in the matrix form as:
RNVSM_x l→k|kl=(PMkl)T×RNVSM_x k|l i l=1,2, . . . M and j k=1,2, . . . N (49)
wherein the “T” shows the transposition matrix operation and the RNVSM_xl→k|kl is the type x (x=1, 2, . . . ) “relational novelty value significance measure” of the partitions or OSs of order l to the OSs of the order k. It is noticed that RNVSM_xl→l|kl is a M×N matrix indicating the type x (x=1, 2, . . . ) “relative novel value significance measure” of OSi l (e.g. a sentence and i=1, 2, . . . M) to a OSj k (e.g. a word and j=1, 2, . . . N) and RNVSM_xk|l is a N×N matrix indicating the type x (x=1, 2, . . . ) “relational novel value significance measure” of OSk with OSk.
rnvsm_x i
Or in the matrix form as:
RNVSM_x l→k|kl=(PMkl)T×RNVSM_x k|l i l=1,2, . . . M and j k=1,2, . . . N (49)
wherein the “T” shows the transposition matrix operation and the RNVSM_xl→k|kl is the type x (x=1, 2, . . . ) “relational novelty value significance measure” of the partitions or OSs of order l to the OSs of the order k. It is noticed that RNVSM_xl→l|kl is a M×N matrix indicating the type x (x=1, 2, . . . ) “relative novel value significance measure” of OSi l (e.g. a sentence and i=1, 2, . . . M) to a OSj k (e.g. a word and j=1, 2, . . . N) and RNVSM_xk|l is a N×N matrix indicating the type x (x=1, 2, . . . ) “relational novel value significance measure” of OSk with OSk.
In a similar fashion to the previous subsection, there could be calculated a novelty type relationships between the OSs of order l so that to show how each pair of the partitions are related in terms of the significance of the relational novelty to each other as:
RNVSM_x l→l|kl=RNVSM_x l→k|kl×RNVSM_x k→l|kl (50)
wherein RNVSM_xl→l|kl stands for the “relational novelty value significance measure” of type x between the OSs of the order l, which is a M×M matrix. This measure and the data of such matrix can be used to find a novel partition, exhibiting a predetermined range of “relational novelty value”, for a given partition. Also these measures can be combined with other measures to obtain the desired parts of the compositions that one is looking for (e.g. in response to a query or a question).
RNVSM_x l→l|kl=RNVSM_x l→k|kl×RNVSM_x k→l|kl (50)
wherein RNVSM_xl→l|kl stands for the “relational novelty value significance measure” of type x between the OSs of the order l, which is a M×M matrix. This measure and the data of such matrix can be used to find a novel partition, exhibiting a predetermined range of “relational novelty value”, for a given partition. Also these measures can be combined with other measures to obtain the desired parts of the compositions that one is looking for (e.g. in response to a query or a question).
Many associations are hidden that when is revealed is obviously a case of novelty existence or occurrence. For instance when two OSs have little direct associations but their association spectrum is highly correlated then there could be a novelty of high value revealed for further investigation. In these instances a measure to hunt for these types of novelty association can be given by:
wherein anvsm_1k|l is indicative of the first type “association novelty value significance measure”, the “·” shows the inner product or scalar multiplication of the asm_x1p→i k|l and asm_x2p→j k|l vectors. The indices of x1, x2, x3 (=1, 2, . . . etc) are usually equal and can refer, for instance, to the first or the second type association strength measure (given by Eq. 16, and/or 17-26).
This measure of novelty gives a high l value to the relational novelty of those pairs that exhibit strong hidden association correlation but they are not explicitly strongly bonded. This measure is particularly useful for detecting hidden relationships between two OSs of interest, i.e. OSi k and OSj k and can be used to spot the cases worthy of further research and investigation (e.g. in scientific discovery, medical, crime investigation, genetics, market research and financial analysis etc.).
Although anvsm_1k|l is also one of the “relational novelty value significance measures” but in here it is preferred to be given a more distinct name as “association novelty value significance measure” (ANVSM) in order to have a distinct category for this kind of “value significance measure” in general.
To further amplify the significance of the novelty of anvsm_1k|l one can further incorporate the intrinsic value significance of one or both of the value significances of the OSi k and OSj k as, for example, the following:
wherein y1 and y2 indicates the types and numbers of the “value significance measure” used in this formula.
The proportionality factor can be adjusted to account for normalization of the vectors when desired.
Eq. 51 can be re written in matrix form in general terms which is more useful as:
ANVSM_1k|l=[(ASM_x1k|l)T×ASM_x2k|l]·/ASM_x3k|l (53)
wherein “דshows the matrix multiplication operator and”·/” shows the element-wise division. Usually, in the preferred exemplary embodiment, in the Eq. 53 the ASM_xk|l are column or row normalized.
ANVSM_1k|l=[(ASM_x1k|l)T×ASM_x2k|l]·/ASM_x3k|l (53)
wherein “דshows the matrix multiplication operator and”·/” shows the element-wise division. Usually, in the preferred exemplary embodiment, in the Eq. 53 the ASM_xk|l are column or row normalized.
As can be seen Eq. 51, 52 and 53 are generally the exemplary cases of the general form of:
anvsm_x i→j k|l(OSi k,OSj k)=g 3(vsm_y1i k|l·vsm_y2j k|l,asm_x1p→i k|l·asm_x2p→j k|l,asm_x3i→j k|l,asm_x4i→j k|l), . . . p,i,j=1,2, . . . N, (54)
wherein g3 is predetermined or predefined function and y1, y2, x1 . . . x4 etc refer to the selected type of the respective kind and type of the “value significance measure”.
anvsm_x i→j k|l(OSi k,OSj k)=g 3(vsm_y1i k|l·vsm_y2j k|l,asm_x1p→i k|l·asm_x2p→j k|l,asm_x3i→j k|l,asm_x4i→j k|l), . . . p,i,j=1,2, . . . N, (54)
wherein g3 is predetermined or predefined function and y1, y2, x1 . . . x4 etc refer to the selected type of the respective kind and type of the “value significance measure”.
Numerous other forms of “value significance measures” using one or more of the introduced “value significance measures” and the concept behind them can be devised, depends on the applications, which are not further listed here, and in light of the teachings of the present invention become obvious to those skilled in the art.
Another important situation of novelty occurrence would be to spot and find the novel OSs and the partitions of the composition regardless of their relationship and just for being intrinsically novel in the context of the composition or convey novelty wherever they appear in the composition or the BOK.
In this case we assign an intrinsic “novelty value significance measure” (NVSM) to each desired OS and then use the NVSM to weight the intrinsic novelty value of other partitions.
The first measure of novelty of course can be derived and defined based on the independent probability of occurrence so that:
nvsm_1i k|l =h 1(iopi k|l), i=1,2, . . . N (55)
wherein h1 is a predetermined function such as h1(x) be a liner function (e.g. ax+b), power of x (e.g. x3 or x0.53), logarithmic (e.g. a/log 2(x)), 1/x, etc wherein a or b might be scalar constant or a vector.
nvsm_1i k|l =h 1(iopi k|l), i=1,2, . . . N (55)
wherein h1 is a predetermined function such as h1(x) be a liner function (e.g. ax+b), power of x (e.g. x3 or x0.53), logarithmic (e.g. a/log 2(x)), 1/x, etc wherein a or b might be scalar constant or a vector.
Usually the term “novelty” implies that it should be inversely proportional to the popularity or frequency of occurrence or independent probability of occurrence and therefore nvsm_1i k|l is usually more justified when the choice of h1 is such that it decreases as the iopi increases. For instance one good candidate for defining and calculating a “novelty value significance measure” as a vector is:
nvsm_1_1i k|l =c/iopi k|l , i=1,2, . . . N (56)
wherein c might be a scalar or a constant vector. In another instance it might be defined as:
nvsm_1_2i k|l =c/logb(iopi k|l), i=1,2, . . . N (57)
or in another instance:
nvsm_1_3i k|l =c·logb(1/iopi k|l)=−c·logb(iopi k|l), i=1,2, . . . N (58)
or yet in another instance:
nvsm_1_1i k|l =c/iopi k|l , i=1,2, . . . N (56)
wherein c might be a scalar or a constant vector. In another instance it might be defined as:
nvsm_1_2i k|l =c/logb(iopi k|l), i=1,2, . . . N (57)
or in another instance:
nvsm_1_3i k|l =c·logb(1/iopi k|l)=−c·logb(iopi k|l), i=1,2, . . . N (58)
or yet in another instance:
wherein b is a constant and c could be constant or a vector. For example c can be an auxiliary vector that when multiplies to other vectors it suppresses or dampen the value of particular OSs of the compositions such as the generic words in a textual composition.
Accordingly, by the same manner, there could be defined various “novel value significance measures” if the justification is properly done. For instance with combination of one or more of the nvsm_xi k|l or other variables there could be defined more sensible and useful novelty value significances. As can be seen in Eq. 59 the nvsm_1_4i k|l is in fact obtained by multiplication of the nvsm_1_1i k|l and nvsm_1_3i k|l.
In another aspect the novelty is observed in relation or combination with other OSs since novelty could occurs in a context and therefore in relation to other ontological subjects. The stand alone or the intrinsic “novelty value significance value” in this case is defined as sum of the novelty that an OS will have with a desired number of other OSs.
These measures of novelty are intrinsic since it adds up all the pair-wise novelty values for each OSk so that a NVSM type 2 can be defined as:
NVSM_2k|l(OSi k)=cΣ jrnvsm_x i→j k|l(OSi k,OSj k) (60)
wherein the pair-wise novelty measures are summed over the column (i.e. the j subscript).
NVSM_2k|l(OSi k)=cΣ jrnvsm_x i→j k|l(OSi k,OSj k) (60)
wherein the pair-wise novelty measures are summed over the column (i.e. the j subscript).
Similarly another type of intrinsic novelty value significance measure can be defined as:
NVSM_3k|l(OSj k)=cΣ irnvsm_x i→j k|l(OSi k,OSj k) (61)
wherein the summation is over the rows (i.e. the i subscript).
NVSM_3k|l(OSj k)=cΣ irnvsm_x i→j k|l(OSi k,OSj k) (61)
wherein the summation is over the rows (i.e. the i subscript).
The same can be calculated using anvsm_xi→j k|l as:
NVSM_4k|l(OSj k)=cΣ janvsm_x i→j k|l(OSi k,OSj k) (62)
and also:
NVSM_5k|l(OSj k)=cΣ ianvsm_x i→j k|l(OSi k,OSj k) (63).
Or in a general form any combination of them can still serve as an intrinsic measure of novelty of the OSs of the composition as:
NVSM_x k|l(OSi k)=h(NVSM_1k|l,NVSM_2k|l, . . . NVSM_y k|l), (64)
wherein h is predetermined function and y is the type and number of the particular NVSMk|l used into building other types of NVSM_xk|l.
NVSM_4k|l(OSj k)=cΣ janvsm_x i→j k|l(OSi k,OSj k) (62)
and also:
NVSM_5k|l(OSj k)=cΣ ianvsm_x i→j k|l(OSi k,OSj k) (63).
Or in a general form any combination of them can still serve as an intrinsic measure of novelty of the OSs of the composition as:
NVSM_x k|l(OSi k)=h(NVSM_1k|l,NVSM_2k|l, . . . NVSM_y k|l), (64)
wherein h is predetermined function and y is the type and number of the particular NVSMk|l used into building other types of NVSM_xk|l.
These various novelty value measures can find and have many applications in variety of applications and compositions which can be employed to investigate such composition to find and investigate the parts or partitions of novelty values. For instance they can be employed for textual composition processing such as question answering, summarization, knowledge discovery, as well as other kind of compositions like detecting novel and valuable parts in a genetic code strings, finding and filtering the junk DNA, as well as other compositions such as image and video compositions and signal processing such as edge detection, compression, deformations, re-composition to name a few.
The parameters, vectors, and matrices of the present invention are transformation of the information hidden in the participation matrix which can be used for different applications with ease, convenience and efficiency to investigate various aspects of interests in the BOK such as extracting the most significant parts or partitions, finding the highly associated concepts or parts and partition, finding the novel part/s or partition/s of the BOK, finding the best piece of informative part of the composition, clustering and categorization of the partitions of the composition or the BOK, ranking and scoring partitions of a composition based on their relatedness to a subject matter (e.g. a query), excluding one or more partitions or OSs of the BOK or suppressing their role in the analysis, and numerous other application.
Moreover the mathematical objects and data arrays can be easily transformed to other forms, filtered out the desired part or segment of a matrix, amplify or suppress the role of one or more of the OSs of the composition and/or their values being altered numerically without needing to manipulate the input composition string or file. For instance in many of the above calculations it will be more useful to have the matrices or vectors being normalized in order to make the comparisons more meaningful in the context of the BOK. Accordingly one or more of such mathematical objects and data arrays (vectors, matrices etc.) can and might be desired to become column or row normalized or further being multiplied by other matrices or vectors as a mask or filter etc.
Moreover all these matrices (e.g. such as PM, COM, ASM/s, RASM, RVSMs NVSM, RNVSMs etc.) can be regarded as an adjacency matrix for a corresponding graph wherein the matrix carry the data of the connectivity between the nodes or objects of the graph. Therefore, from these connectivity matrixes one can proceed to calculate a corresponding eigenvalue equation/s in order to estimate and calculate other types of desirable value significance measure or in general any type of value significance. These measures of value calculated from the corresponding eigenvalue equations of the matrices are generally indication of intrinsic significance values of the OSs. For instance in the non-provisional U.S. patent application Ser. Nos. 12/547,879, 12/755,415 and 12/939,112 one or more of these matrices have been used to calculate the significance values of the OSs of the composition based on their centralities of the corresponding node in the graph that could be represented by that matrix. The centrality value can be, for instance, be the values of largest eigen vector of the eigen value as described in the application Ser. Nos. 12/547,879, 12/755,415 and 12/939,112 which are incorporated here as references.
In many cases one wants to deliberately amplify and/or dampen or suppress one or more of the values of OS of the BOK in order to achieve the right functionality out of the analysis and investigation. Therefore there could be per-built or pre-determined VSM values (e.g vectors) that can be used when it is desired to alter and influence the significance values of one or more of the OSs of the compositions. For instance these vectors or filter can be designed in such a way to amplify the significances of proper sentences of compositions written in a particular natural language such as English. For example, in another instance, the objective can be to give significance to particular types of partitions of the composition having of particular feature/s, attribute/s, or form/s. For instance when one like to hunt the partitions containing connecting or the concluding remarks then one may construct a vector that assigns a low significance value to every OS except those selected OS (e.g. words or phrases such as “therefore”, “as a result”, “hence”, “consequently”, “so that” . . . etc.). n another instance, one might have list of OSs that it is not desirable to participate in the calculation (e.g. stop words) one can provide a vector over the range of OSs having a value of one expect for those selected OS that must be omitted from the calculation.
These pre-assigned vectors are called “special cases conveyers” herein or “significance value conveyer vectors” as shown in FIG. 6c , that can be used solely or in combinations with other VSM value vectors to obtain the desired functionality from the investigation. These conveyers are assigned and used based upon the goal of investigation. The special conveyers can be designed and altered for various stage of the process and can be used in different stages of calculations and processes.
In accordance with another aspect of the methods of investigation of the compositions of ontological subject of the present invention, the participation matrix can, for instance, routinely being transformed to other types of objects or participation matrices by operating one or more vector or matrices on the PM. For example one can multiply the PM by a diagonal matrix (M by M) from the right side whose diagonal values are the reciprocal of the number of constituent OSs of order k in the partitions or the higher order OS of order l. The “resulting PM” matrix will become a column normalized PM and values of the entries will become the weighted participation factor. For instance from a binary PM one can get to partial PM in which if a word has participated in a sentence with 5 words then its participation entry in the PM would be ⅕ and if the same word has participated in a sentence with 10 words its participation entry would be 1/10 and so on. In another instance, in a similar situation, it become desirable to have a “resulting PM” with column geometrical unitary (i.e. the length of the column become one), in this case therefore the elements of the diagonal matrix are the inverse of the square-root of the sum of the square of the individual elements of the original respective PM column (or row).
As another instance of transformation, moreover, the PM matrix can be multiplied from the left side by a diagonal matrix (N by N) whose entries are a vector that will put a value on the OS of the order k so that their participation weight will be altered. For instance if the diagonal of the left matrix is one except for some particular words (such as the generic words of a natural language) for which the corresponding entries are suppressed (e.g. replaced with 0.1) then the role of those particular words (e.g. the generic words) in the computations will be suppressed as well, without having to manipulate the original string of the compositions in order to achieve the same goal of suppressing the role of generic words.
As another instance of transformation and alteration, one or more auxiliary vectors (i.e. filters) can be built to dampen the significance of particular OSs of the composition by multiplying those vectors on the resulting vector objects such as one or more of the different types and number of the “value significance measures” vectors or matrices.
Moreover the method/s can conveniently be used for compositions of different nature such as data file compositions, e.g. audio or video signals, DNA string investigation, textual strings and text files, corporate reports, corporate databases, etc. For instance the investigation method disclosed herein can be readily used to investigate image and video files, such as spotting a novelty in an image or picture or video, edge detection in an image, feature/s extraction, compression of image and video signals, and manipulating the image etc. The disclosed methods of the present invention can readily be applied in applications such as, artificial intelligence, neural network training and learning, network training, machine learning, computer conversation, approximate reasoning, as well as computer vision, robotic vision, object tracking etc.
Numerous other forms of “value significance measures” using one or more of the introduced value significance measures and the concept behind them can be devised and synthesized accordingly, depends on the application, that are not further listed here but in light of the teachings of the present invention become obvious to those skilled in the art.
The disclosed frame work along with the algorithms and methods enables the people in various disciplines, such as artificial intelligence, robotics, information retrieval, search engines, knowledge discovery, genomics and computational genomics, signal and image processing, information and data processing, encryption and compression, business intelligence, decision support systems, financial analysis, market analysis, public relation analysis, and generally any field of science and technology to use the disclosed method/s of the investigation of the compositions of ontological subjects and the bodies of knowledge to arrive the desired form of information and knowledge desired with ease, efficiency, and accuracy.
Furthermore, as pointed out before, those skilled in the art can store, process or represent the information of the data objects of the present application (e.g. list of ontological subjects of various order, list of subject matters, participation matrix/ex, association strength matrix/ex, and various types of associational, relational, novel, matrices, co-occurrence matrix, participation matrices, and other data objects introduced herein) or other data objects as introduced and disclosed in the incorporated references (e.g. association value spectrums, ontological subject map, ontological subject index, list of authors, and the like and/or the functions and their values, association values, counts, co-occurrences of ontological subjects, vectors or matrix, list or otherwise, and the like etc.) of the present invention in/with different or equivalent data structures, data arrays or forms without any particular restriction.
For example the PMs, ASMs, OSM or co-occurrences of the ontological subjects etc. can be represented by a matrix, sparse matrix, table, database rows, dictionaries and the like which can be stored in various forms of data structures. For instance each layer of the a Pm, ASM, OSM, RNVSM, NVSM, and the like or the ontological subject index, or knowledge database/s can be represented and/or stored in one or more data structures such as one or more dictionaries, one or more cell arrays, one or more row/columns of an SQL database, one or more filing systems, one or more lists or lists in lists, hash tables, tuples, string format, zip format, sequences, sets, counters, or any combined form of one or more data structure, or any other convenient objects of any computer programming languages such as Python, C, Perl, Java, JavaScript etc. Such practical implementation strategies can be devised by various people in different ways.
The detailed description, herein, therefore describes exemplary way(s) of implementing the methods and the system of the present invention, employing the disclosed concepts. They should not be interpreted as the only way of formulating the disclosed concepts, algorithms, and the introducing mathematical or computer implementable objects, measures, parameters, and variables into the corresponding physical apparatuses and systems comprising data/information processing devices and/or units, storage device and/or computer readable storage media, data input/output devices and/or units, and/or data communication/network devices and/or units, etc.
The processing units or data processing devices (e.g. CPUs) must be able to handle various collections of data. Therefore the computing units to implement the system have compound processing speed equivalent of one thousand million or larger than one thousand million instructions per second and a collective memory, or storage devices (e.g. RAM), that is able to store large enough chunks of data to enable the system to carry out the task and decrease the processing time significantly compared to a single generic personal computer available at the time of the present disclosure.”
This section describes few exemplary systems that can be constructed in order to demonstrate the enabling benefits of the deployment of the disclosed method/s of investigation of compositions of ontological subjects in various challenging applications and important functionalities.
As was described throughout the description the goal of the investigation is to produce a useful data, information, and knowledge from a given or accessed composition/s, according to at least one aspect of significance or the goal/s of the investigation.
The result of the investigation can be represented in various forms and presentation style and various devices of modern information technology (private or public cloud computing, wired or wireless connections, etc.). The interaction between a client and an investigator, employing one or more of the disclosed algorithms, can be facilitated through various forms of data network accessibility to an investigator through various interfaces such as web interfaces, or data transferring facilities. The result of the investigation can be displayed or provided in various forms such as interactive page/device environment, graphs, reports, charts, summaries, maps, interactive navigation maps, email, image, video compositions, voice or vocal compositions, different nature composition such as transformation of a textual composition to visual or vice versa, encoded data, decoded data, data files, etc.
For instance a goal of investigation can be to finding out the OSs of the composition scoring significant enough novelty value in the context of the given BOK or an assembled BOK wherein the OSs of the composition can be words, phrases, sentences, paragraphs, lines, document or the like for the BOK under investigation.
Another exemplary goal of investigation can be to get a summary of the credible statements from a BOK or to modify a part or partitions of a composition (e.g. a document, an image, a video clip etc.). Or another instance of investigation can be to obtain a map of relations between the most significant parts or partitions of the BOK. For instance a patent attorney, inventor, or an examiner can use the disclosed method to plan his/her claim drafting by investigation the application disclosure and get the most valuable or novel part of the disclosure to draft the claims. Or to get the map of relationships between the components (i.e. the ontological subjects) of the disclosure in order to draft a summary, an abstract, an argument, one or more claims, litigation, etc. Or the method can be used for examining the application in comparison to one or more collection of one or more patent application disclosures.
In another instance an intelligent being (e.g. a software bot/robot a humanoid, a machine, or an appliances) can use the system and methods internally or by connecting/communicating to a provider of such services to become enabled to interact intelligently with human (e.g. conversing and doing tasks, or entertaining, or assisting in knowledge discover etc.). And many numerous other examples that could be using one or more of the tools, measures and method/s given in this disclosure to get information and finding/composing the knowledge that is being desired or seek after.
Referring now to the accompanying drawings in here, few exemplary embodiments of the methods, the systems and the applications are further illustrated and explained in order to demonstrate the deployment of the teaching of the present invention.
Referring to FIG. 1 here, it depicts one general flow process and the system that can provide one or more exemplary investigation's result, as services, utilizing the algorithms and the methods of the present invention. As shown in the diagram, following the above formulations and methods of building the required variables or the mathematical or data objects (e.g. the matrices and the vectors values etc) and building the various filter, one can design, synthesize, and compose an output according to her/his/it's need or goal of investigation or informational requirements and for an input composition. For example if one applications calls for getting the most credible and valuable partitions of an input compositions then she/he/it must chose (or select through an interface) the corresponding filter (i.e. the suitable XY_VSM/s and algorithm/s) for which to obtain such a credible glance or summary of the composition. Moreover the user or the designer of such system and service can synthesize the suitable filter, using the tools, measures and methods of the present invention to provide the desired response, output or the service.
Alternatively, in another instance, if one is looking only to get the novel parts of the input composition then that can also be readily done following the teaching and computational process of the above to get the novel parts or partitions of the composition using the one or more of the novelty value significance measures.
Turning to FIG. 1 again, as seen in the FIG. 1 , the input composition is used to build or generate the one or more participation matrices while the ontological subjects of different orders are grouped, listed, and kept in the short term or more permanent storage media. The actual OSs or the partitions usually are used at the end of the processing and calculations of the desired quantity or quantities, when they are fetched again based on their corresponding value for one or more measures of the values introduced in previous sections. Accordingly after having the PM/s the system will calculate the desired mathematical objects such as COM, ASM/s, the desired VSM/s, one or more RASM if needed for the desired service, one or more RVSM/s if needed for the service, one or more of NVSM/s, or RNVSM/s or ANVSM/s if desired and so on.
These data objects (e.g. matrix/es or vector/s) are used to synthesize the required filter to provide the desired functionality once it operated on the PM. After operating the filter on the PM, the output is further investigated for selection of suitable OSs of the composition for further processing or re-composing or presentation. The output can be presented in predetermined form/s or format, such as a file, displaying on a web-interface or an interactive web-interface, encoded data in a particular format for using by another system or software agent, sending by email, being displayed in a mobile device, projector and the like over a network, or sent to a client over the internet and the like.
For instance if the desired mode of operation is to find out the novel partitions of the composition exhibiting enough novelty value while having enough significance then the corresponding filter will use the RNVSM of the Eq. 39 for finding, scoring and consequently selection of the suitable partitions for this requested service.
In another word after the composition data are transformed or transported into participation matrix/matrices then we only deal with numerical calculations that will determine the value of the members of the listed OSs and (based on their index in the list or based on their row or column number in the participation matrix) once the value for the corresponding measure was calculated then those OSs that exhibited the desirable value or range of values are selected by the selector or a composer that provide the output data or content, e.g. as service, according to predetermined formats for that service.
In references to FIG. 2 now, it involves the conceptualization of the association strength measure/s. As exemplified several times along the disclosure the concept and values of “association strength measure/s” plays an important role in investigation of the composition of ontological subjects as well as providing the data that is valuable itself. That is, knowing the association strength of OSs to each other is important and can be used to build many other applications especially in artificial intelligence applications.
Accordingly, in FIG. 2 here, it is shown one general form of conceptualizing and defining the association strength measures and consequently calculating the association strength values for those measures. As seen in this exemplary embodiment the association strength of the OSs of order k that have co-occurred in one or more OSs of order l is given by a function of their number of co-occurrence and the value/s respective of one or more of the “value significance measure/s” (e.g independent probability of occurrence). Several exemplified such association strength measure were given by Eq. 16-24. The FIG. 2 was also illustrated in some details in the section II-III of this disclosure.
Referring to FIG. 3 now, it is to show that any composition of ontological subjects can in principal be represented by a graph which in this preferred embodiment shown as an asymmetric graph. The exemplified graph is corresponded to one of the exemplary “association strength matrix”, i.e. an ASM, as representative of its adjacency matrix. The nodes represent the desired group of OSs and the edge or arrows show the link between the associated nodes and the values on the edges are representative of the association strength from one node to the connected one. This figure is to graphically exemplify and depicts that compositions of ontological subjects and a network of ontological subjects can basically be investigated and dealt with in the same manner according to the teachings of the present invention.
In FIG. 4 , there is shown again another embodiment for the process of calculating various value significance measures in more details. As seen the data of the input composition is transformed to calculable quantities and data from which, employing the above methods and formulations, the desired value significance measures are calculated and/or are stored in the storage areas for further use or being used by other processes or programs or clients.
In reference to FIG. 5 , it became evident that at this stage, and in accordance with the method, and using one or more of the participation matrix and/or the consequent matrices one can also evaluate the significance of the OSs by building a graph and calculating the centrality power of each node in the graph by solving the resultant eigen-value equation of adjacency matrix of the graph as explained in patent application Ser. No. 12/547,879 and the patent application Ser. No. 12/755,415.
The association strength matrix could be regarded as the adjacency matrix of any graphs such as social graphs or any network of any thing. For instance the graphs can be built representing the relations between the concepts and entities or any other desired set of OSs in a special area of science, market, industry or any “body of knowledge”. Thereby the method becomes instrumental at identifying the value significance of any entity or concept in that body of knowledge and consequently be employed for building an automatic ontology. The VSM_1, 2, . . . xk|l and other mathematical objects can be very instrumental in knowledge discovery and research trajectories prioritizations and ontology building by indicating not only the important concepts, entities, parts, or partitions of the body of knowledge but also by showing their most important associations.
Referring to FIG. 6a, 6b, 6c now, they show one graphical representation of the concept of the different values of different “value significance measures”. As seen values of different types of value significance measures (labeled as XY_VSM wherein XY is used to show the different types of VSM/s) can be shown as a vector in a multidimensional space. Though XY_VSM/s in general are matrices that might also carry the relational value significances but still any row or column (as shown in FIG. 6 a) of them can be shown as discrete vectors in a multidimensional space. These discreet vectors can also be treated as discrete signals in which they can be further be used for investigation of the compositions. Some types of XY_VSM, that are intrinsic, are vectors (e.g. FIG. 6b ) for which they can readily be used to weigh other OSs or the partitions of the composition. Also shown in FIG. 6c are some of the vectors that might be “special conveyer vectors” labeled with “significance conveyer vectors” in the FIG. 6c and are usually predefined or predetermined that can be used for filtering out and/or dampening or amplifying and/or shaping/synthesizing the VSMs of one or more of the predetermined OSs of the composition. FIG. 6c demonstrate that special conveyer vectors or VSM have basically the same characteristics as other XY-VSM except the values might have been set in advance.
An application of the instance demonstration of FIG. 7 is that an OS of order l, can be selected by the investigator based on its strength of association to one or more OSs of the order k. The calculation and the selection method of OSs of order l can find an important application in document retrieval, question answering, computer conversation, in which a suitable answer or output is being south from a knowledge repository (e.g. a given composition) in response to the input query or composition. As an example, for showing how to utilize the disclosed method/s, an input statement or a query is parsed to its constituent OSs of order k and from the association strength matrix (which might be constructed from and for said knowledge repository) then the mostly related partitions of the stored composition (i.e. the knowledge repository) is retrieved in response of an input query which is a conversational statement or a question. For instance, the mostly related partition of the knowledge repository can be the partition (OS of order l) that has scored the highest average or cumulative association to the constituent OSs of the input query. The mostly related partition of the knowledge repository might have scored the highest, for example, after multiplication of the association strength vectors of the OSs of the input query in the association strength matrix that have been built from the knowledge repository.
Referring to FIG. 8 now, it shows, in schematic, a block diagram of an exemplary system as well as the process of further clarification as how to use the “value significances” data of one or more OSs of particular order to evaluate and calculate the one or more “value significances” of OSs of another order using the one or more XY_VSM and one or more participations matrix. The XY in the FIG. 8 is the indication, and can be replaced with the desired type and number combination, of the desired “value significance measure”. Therefore XY_VSM in FIG. 8 can be replaced with any of the different types of the “value significance measures” (such as RVSM, NVSM, ARASM, RSVM, etc.). The data objects can be stored, if desired, for later use so that the pre-calculated data and objects are pre-made and can easily be retrieved for the corresponding compositions and the desired application. The pre-made stored data can be used to accelerate and speeding up the process of composition investigation in a system that provide such a service/s to one or more clients.
Referring to FIG. 9 now it shows an exemplary system, process and application of the present invention. FIG. 9 shows an instance of clustering and ranking, and sorting of a number of webpages fetched from the internet for example, by crawling the internet. This is to demonstrate the process of indexing and consequently easily and efficiently finding the relevant information related to a keyword or a subject matter. This is the familiar but very important application and example of the present invention to be used in search engines. As seen after crawling a number of webpage or documents from the internet (or from any other repository in fact) the pages/documents/compositions are investigated so that the associations of the desired part or partitions of such collections are calculated to other desired OSs of the collection of the compositions. Now, in such a exemplary search engine, once a client enter a query or a keyword, it would be straightforward to find the most relevant document, page, or composition to the input query, i.e. or a target OS.
Accordingly, as discussed in the previous sections, having one or more of the “association strength matrix/es” (indicated by XASM) or RVSMs etc., using the disclosed algorithms make it possible to retrieve the documents with the highest degrees of relevancy to the input query or the target OS. This is one of the very important applications and implication of the disclosed teachings and materials, since, as is experienced by many users of the commercial search engines; the relevancy of retrieved documents to the input query has been and is a major challenge in improvement of the search engine performance. However, employing the investigation methods of present invention, through its various measures, make it possible to quickly and reliably retrieve the most semantically related document/page to the input query.
Furthermore, some special OSs can be selected for which the association strength of pages are to be calculated. For instance, special OSs can be the content words such as nouns or named entities. Nevertheless there would be no limitation on the selection or choice of the target OS and they can basically be all possible types of words, or even sentences and higher orders partitions.
Moreover, through the investigation of crawled pages, either in one step or in several steps, OSs of high value significance can be identified so that the whole composition (i.e. the whole collection of the documents or pages) can be clustered or categorized into bodies of knowledge under one or more target subject matter or head categories (e.g. the high value OSs of lower order, such as words or phrases).
The target OSs could usually be the keywords or phrases, or the words or any combinations of the characters, such as dates, special names, etc. However in extreme but useful case the target OSs of such composition could be the extracted sentences, phrases, paragraphs, or even a whole document and the like.
As seen from the teachings of the present invention then it becomes readily straightforward to calculate the association and relevancy of each part of such a composition (such as the webpages or documents or their parts thereof) to each possible target OSs. These data are stored and therefore upon receiving a query (such as a keyword or a question in a natural language form, or in the form of a part of text etc.) the system will be able to retrieve the most relevant partitions (e.g. a sentence, and/or paragraph, and/or the webpage) and present it to the user in a predetermined format and order.
Let's exemplify and explain this even in more detail here, when a service provider system such as a search engine, question answering or computer conversing, which comprises or having access to the system of FIG. 9 , receives a query from a user, the system can simply parse the input query and extract all or some of the words of the input query (i.e. the OSs of order one) then by having calculated the associations strength of rasm_x1→5| one can easily calculate the association strength of each of the documents (e.g. web-pages) to the words of the input query, and eventually the documents which have the overall acceptable association strength with the selected words of the input query will be presented to the queries as the most relevant document or content.
In another exemplary method of retrieval using this embodiment the most related document or partition to the input query are identified and retrieved or fetched as follow:
-
- extract the OSs (e.g. words) of the input query,
- obtain the rasm_x1→1| vector (e.g. the association strength of a words to each other obtained from the investigation of the crawled repository of webpages consisting one or more webpages/documents) for the input words of the query,
- make a common association strength spectrum or vector for the input words of the query by, for example, averaging the rasm_x1→1| vectors or multiplying them to each other,
- use the common association vector to identify the most related or associated documents, or sentences to the input query by multiplying the common association spectrum with the respective participation matrix (e.g. PM15 for document retrieval and PM12 for question answering or conversation as an example).
Moreover most of calculation can be done in advance and even for each target OSs (though not as a condition but usually the intrinsically significant OSs can be used as possible target) and therefore there could be assembled for each possible target OS a body of knowledge pre-made and pre categorized and ready for retrieval upon receiving a query by a system which has access to these data and materials. The degree of relevancy of such retrieved pages to the target OSs (e.g. the user's Queries) is semantically insured and the relevancy of such retrieved materials far exceeds the quality of the currently available search engines.
More importantly in a similar manner the engine can return for instance the document or the web-page that composed of the partitions of high novelty values, either intrinsic or relative, to the target OS/s. Therefore the engine can also filters out and present the documents or webpages that have most relevancy to the desired “significance aspect” based on the user preferences. So if novelty or credibility or information density of a document, in the context of a BOK, is important for the user then these services can readily be implemented in light of the teachings of the present invention.
Referring to FIG. 10 now, it shows schematically a system of composition investigations that can provide numerous useful data and information to a client or user as a service. Such output or services in principal can be endless once combined in various modes for different application. However in the FIG. 10 a few of the exemplary and important and desirable outputs are illustrated. The FIG. 10 illustrates a block diagram system composed of an investigator and/or analyzer and/or a transformer and/or a service provider that can receive or access a composition and provide a plurality of data or content as output. The investigator in fact implement at lease one of the algorithms of calculating one of the measures in order to assign a value on the part or partitions of the compositions and based on the assigned value process one or more of the partitions or OSs of the particular order as an output in the form of a service or data. The output could be simply one or more tags or OS/s that the input composition can be characterized with, i.e. significant keywords of the composition. In this instance, the significant keywords or labels are selected based on their values corresponding to at least one of the aspectual XY_VSM, i.e. one of the value significance measures.
As another example, the output or outcome of the investigator of FIG. 10 , could be to provide the partitions of the input composition which have exhibited intrinsic value significances of above a predetermined threshold. Another output could be the novel parts or the OSs of the compositions that scored a predetermined level of a particular type of novelty value significance. Or the output could be the noisy part of a composition or a detected spam in a collection of compositions etc.
Several other output or services of the system of FIG. 10 are depicted in the FIG. 10 itself which are, in light of the foregoing, self explanatory.
Referring to FIG. 11 now, it shows another instance and application of the present invention in which the process, methods, algorithms and formulations used to investigate a number of news feeds and/or news contents automatically and present the result to a client. In this exemplary but important application system, the news are being first categorized automatically through finding the significant head-categories and consequently clustering and bunching the news into or under such significant head-categories and then select one or more partitions of such cluster to represent the content of that clustered news to a reader. Head-categories can simply being identified, by evaluating at least one of the significance measures introduced in the present invention, from those OSs that have exhibited a predetermined level of significance. The predetermined level of significance can be set dynamically depends on the compositions of the input news.
It is important to notice that some of data in respect to any of these features (e.g. association of OSs) can be obtain from one composition (e.g. a good size of body knowledge) in order to be used in investigation of other compositions. For instance it is possible to calculate the universal association of the concepts by investigation the whole contents of Wikipedia (using, for instance, exemplary teachings of present invention) and use these data/knowledge about the association of concept in calculating a relatedness of OSs of another composition (e.g. a single or multiple documents, or a piece or a bunch of news etc.) to each other or to a query.
Moreover other complimentary representations, such as a navigable ontological subject map/s, can accurately being built and accompany the represented news. Various display method can be used to show the head-categories and their selected representative piece of news or part of the piece of the news so that make it easy to navigate and get the most important and valuable news content for the desired category. Moreover the categorization can be done in more than one steps wherein there could be a predetermined or automatic selection of major categories and then under each major category there could be one or more subcategories so that the news are highly relevant to the head category or the sub-categories or topics.
Furthermore many more forms of services can be performed automatically for this exemplary, but important, application such as identifying the most novel piece of the news or the most novel part of the news related to a head category or, as we labeled in this disclosure, to a target OS. Such services can periodically being updated to show the most updated significant and/or novel news content along with their automatic categorization label and/or navigation tools etc.
Referring to FIG. 12 now, it shows one general embodiment of a system implementing the process, methods and algorithms of the present invention to provide one or more services or output to the clients. This figure further illustrates the method that a particular output or service can in practice being implemented. The provider of the service or the outputs can basically utilizes various measures to select from or use the various measures to synthesize the desired sought after part/s of an input compositions. A feature to be noticed in this embodiment is that the system not only might accept an input composition for investigation but also have access to banks of BOKs if the service calls for additional resources related to the input composition or as result of input composition investigation and the mode of the service. Moreover as shown the exemplary embodiment of system of FIG. 12 has a BOK assembler that is able to assemble a BOK from various sources, such as internet or other repositories, in response to an input request and performs the methods of the present invention to provide an appropriate service or output data or content to one or more client. The filtration can be done is several parallel or tandem stages and the output could be provided after any number the step/s of filtrations. The filters F1, F2, . . . Fn can be one of the significance measures or any combinations of them so as to capture the sought after knowledge, information, data, partitions from the compositions. The output and the choice of the filter can be identified by the client or user as an option beside several defaults modes of the services of the system.
Another block in the FIG. 12 to mention is the post-processing block that in fact has the responsibility to transform the output of the filter/s into a predetermined format, or transform the output semantically, or basically composing a new composition as a presentable response to a client from the output/s of the filters of the FIG. 12 . Also shown in this exemplary embodiment there is a representation mode selection that based on the selected service the output is tailored for that service and the client in terms of, for instance, transmission mode, web-interfacing style, frontend engineering and designs, etc.
Furthermore the exemplary system embodiment of FIG. 12 shows a network bus that facilitate the data exchange between the various parts of the system such as the BOK bank (e.g. containing file servers) and/or other storages (e.g. storages of Los1, Los2, Los3, etc. and/or list storage/data wherein Los stands for List of the Ontological Subjects and, for instance, Los1 refers to the list of the OSs of order 1) and/or the processing engine/s and/or application servers and/or the connection to internet and/or connection to other networks.
The mentioned exemplary application and service can, for instance, be of immense value to the content creators, genetic scientists, or editors and referees of scientific journals or in principal to any publishing/broadcasting shops such as printed or online publishing websites, online journals, online content sharing and the like.
Such a system can further provide, for instance, a web interface with required facilities for client's interaction/s with the system so as to send and receive the desired data and to select one or more desired services from the system.
For instance it can be used as a system of interactive and social knowledge discovery as introduced in the U.S. patent Ser. No. 12/955,496 now the U.S. Pat. No. 8,775,365 entitled “Interactive and Social Knowledge discovery Sessions” which was incorporated entirely as a reference in this application.
Also as shown in the FIG. 13 , other optional modulus can be made available to the client that uses the main composition investigator and or the BOK assembler or BOK banks. A client can, for examples, be a machine, human, another software agent, an intelligent being, a remote server, or the like. One of such optional modulus can be a module for client and computer or the client and system converse or conversation. The conversations is done in such a way that the system of this exemplary embodiment with the “converse module” receives an input from a client and identifies the main subject/s of the input and provide a related answer with the highest merit selected from its own bank of BOK/s or a particular BOK or an available composition. The response from the system to the client can be tuned in such a way to always provide a related content according to a predetermined particular aspect of the conversation. For example, the client might choose to receive only the content with highest novelty yet credibility value from the system. In this case the “converse module” and/or the investigator module will find the corresponding piece of content (employing one or more of the “XY value significant measure”) from their repositories and provided to the user. Alternatively, for instance, the user can demand to receive the most significant yet credible piece of knowledge or content related to her/his/it's input. The client/system conversation, hence, can be continued. Such conversation method can be useful and instrumental for variety of reasons/applications such as entertainment, amusement, educational purpose, questions and answering, knowledge seeking, customer relationship management and help desk, automatic examination, artificial intelligence, and very many other purposes.
The system, for instance can be used as a system of providing or generating visual and/or multimedia content as introduced the U.S. patent application Ser. No. 12/908,856 entitled “System And Method Of Content Generation”, filed on Oct. 20, 2010, and or using the value significance measures and the maps and indexes to automatically generate content compositions as introduced in the U.S. patent application Ser. No. 12/946,838, filed on Nov. 15, 2010, now U.S. Pat. No. 8,560,599 B2 entitled: “Automatic Content Composition Generation”, which were incorporated entirely as references in this application
In light of the teaching of this disclosure, such exemplified modules and services can readily be implemented by those skilled in the art by, for instance, employing or synthesizing one or more the value significance measures, and the disclosed methods of investigation, filtration, and modification of composition or bodies of knowledge.
The data/information processing or the computing system that is used to implement the method/s, system/s, and teachings of the present invention comprises storage devices with more than 1 (one) Giga Byte of RAM capacity and one or more processing device or units (i.e. data processing or computing devices, e.g. the silicon based microprocessor, quantum computers etc.) that can operate with clock speeds of higher than 1 (one) Giga Hertz or with compound processing speeds of equivalent of one thousand million or larger than one thousand million instructions per second (e.g. an Intel Pentium 3, Dual core, i3, i7 series, and Xeon series processors or equivalents or similar from other vendors, or equivalent processing power from other processing devices such as quantum computers utilizing quantum computing devices and units) are used to perform and execute the method once they have been programmed by computer readable instruction/codes/languages or signals and instructed by the executable instructions. Additionally, for instance according to another embodiment of the invention, the computing or executing system includes or has processing device/s such as graphical processing units for visual computations that are for instance, capable of rendering and demonstrating the graphs/maps of the present invention on a display (e.g. LED displays and TV, projectors, LCD, touch screen mobile and tablets displays, laser projectors, gesture detecting monitors/displays, 3D hologram, and the like from various vendors, such as Apple, Samsung, Sony, or the like etc.) with good quality (e.g. using a NVidia graphical processing units).
Also the methods, teachings and the application programs of the presents invention can be implement by shared resources such as virtualized machines and servers (e.g. VMware virtual machines, Amazon Elastic Beanstalk, e.g. Amazon EC2 and storages, e.g. Amazon S3, and the like etc. Alternatively specialized processing and storage units (e.g. Application Specific Integrated Circuits ASICs, field programmable gate arrays (FPGAs) and the like) can be made and used in the computing system to enhance the performance and the speed and security of the computing system of performing the methods and application of the present invention. Moreover several of such computing systems can be run under a cluster, network, cloud, mesh or grid configuration connected to each other by communication ports and data transfers apparatuses such as switches, data servers, load balancers, gateways, modems, internet ports, databases servers, graphical processing units, storage area networks (SANs) and the like etc. The data communication network to implement the system and method of the present invention carries, transmit, receive, or transport data at the rate of 10 million bits or larger than 10 million bits per second;”
“Furthermore the terms “storage device, “storage”, “memory”, and “computer-readable storage medium/media” refers to all types of no-transitory computer readable media such as magnetic cassettes, flash memories cards, digital video discs, random access memories (RAMSs), Bernoulli cartridges, optical memories, read only memories (ROMs), Solid state discs, and the like, with the sole exception being a transitory propagating signal.
These applications and systems are presented to exemplify the way that the present invention method of investigation might be employed to perform one or more of the desired processes to get the respective output or the content, answer, data, graphs, analysis, and service/s etc. Several modes of services and further applications are exemplified herebelow.
-
- The processes and systems of
FIGS. 8-15 can be an on premises system, an intelligent being, or a network system of computation and processing, storage medium, displays and interfaces, and the associated software. - In another instance the systems and processes of the
FIGS. 8-15 can be a remote system providing the service in the form of cloud environment for one or more clients providing one or more the services mentioned above. - Yet in another instance the system can be a combination of an on premises private cloud/machine computation facilities connected to a public cloud service provider. These familiar mode of operation characterized as public and/or private and/or hybrid cloud computing environment (either distributed or central, on premises or remote, private or public or hybrid) is known to the skilled to art and the disclosed methods of investigations of compositions of ontological subjects can be performed in variety of topologies which is regarded as service provider system employing one or more of the generating methods/s of output data respective of one or more of the disclosed methods of the investigation of a composition of ontological subjects.
- An interesting mode of service is when for an input composition and after investigation the system yet provides further related compositions or bodies of knowledge to be looked at or being investigated further in relation to the one or more aspect of the input composition investigation. Another service mode is that the system provides various investigation diagnostic services for the input composition from user.
- Another mode of use is when an intelligent being make connection or communicate with the system of composition investigation (i.e. the brain) by way of communication networks to provide desired services (e.g. conversing, telling stories, talking, instructing, providing consultancy, generating various content, manufacturing, etc.). In another instance the currently disclosed method/s and system/s is implemented within the intelligent being or used to realize new intelligent beings.
- Furthermore the method and the associated system can be used as a platform so that the user can use the core algorithms of the composition investigation to build other applications that need or use the service of such investigation. For instance a client might want to have her/her website being investigated to find out the important aspects of the feedback given by their own users, visitors or clients.
- In another application one can use the service to improve or create content after a through investigation of literature.
- In another instance the methods and systems of the present invention can be employed to provide a human computer conversation and/or computer/computer conversation such as chat-bots, automatic customer care, question answering, fortunetelling, consulting or any general any type of kind of conversation.
- In another mode a user might want to use the service of the such system and platform to compare and investigate her/his created content to find out the most closely related content available in one or more of such content repositories (e.g. a private or public, or subscribed library or knowledge database etc.) or to find out the score of her/his creation in comparison to the other similar or related content. Or to find out the valuable parts of her/his creation, or find a novel part etc.
As seen there could be envisioned numerous instance of use, products, beings, and applications of such process and methods of investigating that can be implemented and utilized by those of skilled in the art without departing from the scope and sprit of the present invention.
- The processes and systems of
The disclosed frame work along with the algorithms and methods enables the people in various disciplines, such as artificial intelligence, robotics, information retrieval, search engines, knowledge discovery, genomics and computational genomics, signal and image processing, information and data processing, encryption and compression, business intelligence, decision support systems, financial analysis, market analysis, public relation analysis, and generally any field of science and technology to use the disclosed method/s of the investigation of the compositions of ontological subjects and the bodies of knowledge to arrive the desired form of information and knowledge desired with ease, efficiency, and accuracy. Since the disclosed underlying theory, methods and applications are universal it is worth to implement in the system of executing the methods and products directly on processing chips/devices to further increase the speed and reduce the cost of such investigations of compositions. In one instance, for example, the data processing operations (e.g. vector/matrix manipulations, manipulating data structures, association spectrums calculations and manipulation, etc.) and even storage of the data structures, is implemented with designs of Application Specific Integrated Circuits (ASICS), or Field-Programmable Gate Arrays, (FPGA), or the system-on-chip, based on any computing and data processing device manufacturing platforms and technologies, such as silicon based, HI-IV semiconductors, and quantum computing artifacts to name a few. Similarly, if the disclosed methods of the investigation and applications are going to be used in/with implementing neural or cognitive based type of computations, still one can implement the system on such chips and by said technologies. Accordingly, those competent in the art can implement the disclosed methods for various applications/products in/with various data processing device manufacturing and designs on the physical material level.
The invention provides a unified and integrated method and systems for investigation of compositions of ontological subjects. The method can be implemented language independent and grammar free. The method is not based on the semantic and syntactic roles of symbols, words, or in general the syntactic role of the ontological subjects of the composition. This will make the method very process efficient, applicable to all types of compositions and languages, and very effective in finding valuable pieces of knowledge embodied in the compositions. Several valuable applications and services also were exemplified to demonstrate the possible implementation and the possible applications and services. These exemplified applications and services were given for illustration and exemplifications only and should not be construed as limiting application. The invention has broad implication and application in many disciplines that were not mentioned or exemplified herein but in light of the present invention's concepts, algorithms, methods and teaching, they becomes apparent applications with their corresponding systems to those familiar with the art.
Among the many implications and application, the system and method have numerous applications in knowledge discovery, knowledge visualization, content creation, signal, image, and video processing, genomics and computational genomics and gene discovery, finding the best piece of knowledge, related to a request for knowledge, from one or more compositions, artificial intelligence, realization of artificially or new intelligent begins, computer vision, computer or man/machine conversation, approximate reasoning, as well as many other fields of science and generally ontological subject processing. The invention can serve knowledge seekers, knowledge creators, inventors, discoverer, as well as general public to investigate and obtain highly valuable knowledge and contents related to their subjects of interests. The method and system, thereby, is instrumental in increasing the speed and efficiency of knowledge retrieval, discovery, creation, learning, problem solving, and accelerating the rate of knowledge discovery to name a few.
It is understood that the preferred or exemplary embodiments, the applications, and examples described herein are given to illustrate the principles of the invention and should not be construed as limiting its scope. Those familiar with the art can yet envision, alter, and use the methods and systems of this invention in various situations and for many other applications. Various modifications to the specific embodiments could be introduced by those skilled in the art without departing from the scope and spirit of the invention as set forth in the following claims.
Claims (20)
1. A system and service of investigating a composition of ontological subjects comprising:
a computer system for generating and providing one or more data structures corresponding to relational association strengths of ontological subjects of the composition comprising:
one or more data processing or computing devices,
a hardware data storage, comprising one or more non-transitory storage devices and storing data of relational association strengths of ontological subjects of a composition by having executed a method comprising:
building one or more data structures corresponding to association strengths between a plurality of ontological subjects of a first predefined order of the composition, according to least one association strength measure,
building one or more data structures corresponding to participations of ontological subjects of the first predefined order in ontological subjects of a second predefined order,
generating, using one or more data processing or computing devices, one or more data structure corresponding to a relational association strength value between an ontological subject of the second predefined order with an ontological subject of the first predefined order as a function of association strengths of the ontological subjects of the first predefined order and the data of said one or more data structures corresponding to the participation of the ontological subjects of the first predefined order into the ontological subjects of the second predefined order,
storing one or more of said one or more data structures of the data corresponding to the relational association strengths between a plurality of ontological subjects of the composition into said hardware data storage,
a client system comprising one or more data processing or storage devices generating network access requests through a data communication network;
receiving network access requests from said client system by the computer system, and processing the request from said client system and extracting a desired number of a predefined order ontological subjects of said request and providing a new composition to the client using said hardware data storage of relational association strength of the ontological subjects of the composition and at least one of the extracted ontological subject of the request.
2. The method of claim 1 , wherein said one or more data processing or computing devices having a singular or compound processing speed of one thousand million or larger than one thousand million instructions per second.
3. The method of claim 1 , wherein the association strength of the ontological subjects of the first predefined order is a function of value significance and co-occurrences, within predefined proximities in the composition, of the ontological subjects of the first predefined order.
4. The method of claim 1 , wherein the composition of ontological subject comprises one or more of:
a. one or more news content,
b. a picture,
c. a video,
d. a DNA string code,
e. a graph and/or a data array representative a graph,
f. a computer readable code,
g. a multimedia content,
h. a textual content,
i. a binary string,
j. a digital signal,
k. an electrical signal,
l. a data file,
m. one or more web pages,
n. a given corpus, and
o. a body of knowledge.
5. The method of claim 1 , wherein data of the relational association strength of ontological subjects is used to obtain an association strength value between two or more of ontological subjects of the second predefined order.
6. The method of claim 1 , wherein the data of relational association strength values of ontological subjects is represented as association strength spectrum vectors and is used to evaluate significances of ontological subjects of the first predefined order or the second predefined order, or relatedness of ontological subjects of a predefined order with each other.
7. The method of claim 1 , further comprising using the data of association strength of the ontological subjects of the first predefined order or the relational association strength between the ontological subjects of the second predefined order with the ontological subjects of the first predefined order to do at least one of followings:
a. identifying or finding related ontological subjects;
b. identifying or finding most important subjects related to another ontological subject,
c. identifying or finding indirect or novel relation of two or more ontological subjects,
d. composing a question,
e. calling a function,
f. anomaly detection,
g. composing a relevant composition to one or more ontological subjects, and
h. scoring partitions of the composition.
8. The method of claim 1 , wherein the data of the association strength of the ontological subjects of the first predefined order or the relational association strength of the ontological subjects of the second predefined order with the ontological subjects of the first predefined order is used to build a neural network with weight between nodes based on the association strength of ontological subjects.
9. The method of claim 6 , wherein the data of association strength spectrum vectors or the relational association strength value is used to visually or graphically show the relations of ontological subjects of the composition with a visual shape.
10. The method claim 1 , further comprising data communication devices/units to send and receive one or more said data structures over a data network.
11. The method of claim 1 , further comprising one or more data structures corresponding to one or more significance value conveyer vectors.
12. A system and service of investigating a composition of ontological subjects comprising:
a computer system for generating and providing one or more data structures corresponding to relational value significances of ontological subjects of the composition comprising:
one or more data processing or computing devices,
a hardware data storage, comprising one or more non-transitory storage devices and storing data corresponding to generation of relational value significances of ontological subjects of a composition by having executed a method comprising:
using one or more data processing or computing devices to build a first one or more data structure corresponding to a participation of ontological subjects of a first predefined order in ontological subjects of a second predefined order,
building, using one or more data processing or computing devices, a second one or more data structures corresponding to co-occurrences of a plurality of ontological subjects of the composition within predefined proximities in the composition,
building, using the second one or more data structure, a third one or more data structure corresponding to association strengths between a plurality of ontological subjects of the first predefined order of the composition, according to least one association strength measure,
calculating, using one or more data processing or computing devices, a value significance of an ontological subject of the second predefined order in relation to an ontological subject of the first predefined order as a function of data of the third one or more data structure and the data of the first one or more data structures, and
storing one or more of said one or more data structures corresponding to the first, the second, and the third one or more data structures, or the data corresponding to said relational value significance, into said hardware data storage,
a client system comprising one or more data processing or storage devices generating network access requests through a data communication network,
receiving network access requests from said client system by the computer system, and
processing the request from said client system and extracting a desired number of a predefined order ontological subjects of said request and proving a new composition to the client using the data stored in said hardware data storage and at least one of the extracted ontological subject of the request.
13. The method of claim 12 , wherein said one or more data processing or computing devices having a singular or compound processing speed of one thousand million or larger than one thousand million instructions per second.
14. The method of claim 12 , wherein the data of the association strength of the ontological subjects of the first predefined order or the relational value significances is used to construct a network of related ontological subjects of the composition using a function of association strengths of the ontological subjects of the composition.
15. A system and service of investigating a composition of ontological subjects comprising:
a computer system for generating and providing one or more data structures corresponding to relational novel value significances of ontological subjects of the composition comprising:
one or more data processing or computing devices
a hardware data storage, comprising one or more non-transitory storage devices and storing data of relational novel value significances of ontological subjects of a composition by having executed a method comprising:
building, using one or more data processing or computing devices, a first one or more data structures corresponding to co-occurrences of a plurality of ontological subjects of the composition within predefined proximities in the composition,
building one or more data structure using data of the first one or more data structures, corresponding to value significances of a plurality of ontological subjects of the composition according to least one value significance measure,
calculating, using one or more data processing or computing devices, a relational novel value significance measure between a pair of ontological subjects as a function of the value significance of one or more of the ontological subjects and inverse of their co-occurrence number of the pair of ontological subjects, and
building a database of relational novel value significances by storing said one or more data structures corresponding to said relational novel value significances of a plurality of pairs of ontological subjects of the composition into said hardware data storage,
a client system comprising one or more data processing or storage devices generating network access requests through a data communication network,
receiving network access requests from said client system by the computer system, and
processing the request from said client system and extracting a desired number of a predefined order ontological subjects of said request and providing a new composition to the client using said database of relational novel value significances of the ontological subjects of the composition and at least one of the extracted ontological subjects of the request.
16. The method of claim 15 , wherein the composition of ontological subject contains one or more of:
a. one or more news content,
b. a picture,
c. a video,
d. a DNA string code,
e. a graph and/or a data array representative a graph,
f. a computer readable code,
g. a multimedia content,
h. a textual content,
i. a binary string,
j. a digital signal,
k. an electrical signal,
l. a data file,
m. one or more web pages,
n. one or more collection of one or more patent application disclosures,
o. a given corpus, and
p. a body of knowledge.
17. The method of claim 15 , wherein the data of relational novel value significances is used to suggest a research topic.
18. The method of claim 15 , further comprising using the data of the relational novel value significances of the ontological subjects of the composition to do at least one of followings:
a. identifying or finding related ontological subjects;
b. identifying or finding most important subjects related to another ontological subject,
c. identifying or finding indirect or novel relation of two or more ontological subjects,
d. identifying or finding indirect or novel relation of two or more partitions of the composition,
e. composing a question,
f. calling a function,
g. anomaly detection,
h. composing a relevant composition to one or more ontological subjects, and
i. scoring partitions of the composition.
19. The method of claim 15 , wherein the data of relational novel value significances or the value significances of the ontological subjects is used to build a neural network with weight between nodes as a function said data.
20. The method of claim 15 , wherein the data of relational novel value significances or the value significances of the ontological subjects of the composition is used to visually or graphically show the relations of ontological subjects of the composition with a visual shape or content.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/694,887 US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
US15/597,080 US10795949B2 (en) | 2007-07-26 | 2017-05-16 | Methods and systems for investigation of compositions of ontological subjects and intelligent systems therefrom |
Applications Claiming Priority (25)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA2595541 | 2007-07-26 | ||
CA002595541A CA2595541A1 (en) | 2007-07-26 | 2007-07-26 | Assisted knowledge discovery and publication system and method |
US12/179,363 US20090030897A1 (en) | 2007-07-26 | 2008-07-24 | Assissted Knowledge Discovery and Publication System and Method |
US9395208P | 2008-09-03 | 2008-09-03 | |
US17769609P | 2009-05-13 | 2009-05-13 | |
US12/547,879 US8452725B2 (en) | 2008-09-03 | 2009-08-26 | System and method of ontological subject mapping for knowledge processing applications |
US25351109P | 2009-10-21 | 2009-10-21 | |
US25964009P | 2009-11-10 | 2009-11-10 | |
US26368509P | 2009-11-23 | 2009-11-23 | |
US31136810P | 2010-03-07 | 2010-03-07 | |
US12/755,415 US8612445B2 (en) | 2009-05-13 | 2010-04-07 | System and method for a unified semantic ranking of compositions of ontological subjects and the applications thereof |
US12/908,856 US20110093343A1 (en) | 2009-10-21 | 2010-10-20 | System and Method of Content Generation |
US12/939,112 US8401980B2 (en) | 2009-11-10 | 2010-11-03 | Methods for determining context of compositions of ontological subjects and the applications thereof using value significance measures (VSMS), co-occurrences, and frequency of occurrences of the ontological subjects |
US12/946,838 US8560599B2 (en) | 2009-11-23 | 2010-11-15 | Automatic content composition generation |
US12/955,496 US8775365B2 (en) | 2010-03-07 | 2010-11-29 | Interactive and social knowledge discovery sessions |
US201161546054P | 2011-10-11 | 2011-10-11 | |
US13/608,333 US9070087B2 (en) | 2011-10-11 | 2012-09-10 | Methods and systems for investigation of compositions of ontological subjects |
US13/740,228 US9183505B2 (en) | 2008-09-03 | 2013-01-13 | System and method for value significance evaluation of ontological subjects using association strength |
US13/789,644 US9069828B2 (en) | 2008-09-03 | 2013-03-07 | System and method of ontological subject mapping for knowledge processing applications |
US13/962,895 US8793253B2 (en) | 2009-05-13 | 2013-08-08 | Unified semantic ranking of compositions of ontological subjects |
US14/018,102 US20140006317A1 (en) | 2009-11-23 | 2013-09-04 | Automatic content composition generation |
US14/151,022 US9613138B2 (en) | 2008-09-03 | 2014-01-09 | Unified semantic scoring of compositions of ontological subjects |
US14/274,731 US20140258211A1 (en) | 2010-03-07 | 2014-05-11 | Interactive and Social Knowledge Discovery Sessions |
US14/607,588 US9842129B2 (en) | 2008-07-24 | 2015-01-28 | Association strengths and value significances of ontological subjects of networks and compositions |
US14/694,887 US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
Related Parent Applications (8)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/179,363 Continuation US20090030897A1 (en) | 2007-07-26 | 2008-07-24 | Assissted Knowledge Discovery and Publication System and Method |
US12/908,856 Continuation US20110093343A1 (en) | 2007-07-26 | 2010-10-20 | System and Method of Content Generation |
US13/608,333 Continuation-In-Part US9070087B2 (en) | 2007-07-26 | 2012-09-10 | Methods and systems for investigation of compositions of ontological subjects |
US13/789,644 Continuation US9069828B2 (en) | 2007-07-26 | 2013-03-07 | System and method of ontological subject mapping for knowledge processing applications |
US14/018,102 Continuation US20140006317A1 (en) | 2007-07-26 | 2013-09-04 | Automatic content composition generation |
US14/151,022 Continuation US9613138B2 (en) | 2007-07-26 | 2014-01-09 | Unified semantic scoring of compositions of ontological subjects |
US14/607,588 Continuation US9842129B2 (en) | 2007-07-26 | 2015-01-28 | Association strengths and value significances of ontological subjects of networks and compositions |
US14/694,887 Continuation US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
Related Child Applications (8)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/179,363 Continuation-In-Part US20090030897A1 (en) | 2007-07-26 | 2008-07-24 | Assissted Knowledge Discovery and Publication System and Method |
US13/740,228 Division US9183505B2 (en) | 2007-07-26 | 2013-01-13 | System and method for value significance evaluation of ontological subjects using association strength |
US13/789,644 Continuation US9069828B2 (en) | 2007-07-26 | 2013-03-07 | System and method of ontological subject mapping for knowledge processing applications |
US13/962,895 Division US8793253B2 (en) | 2007-07-26 | 2013-08-08 | Unified semantic ranking of compositions of ontological subjects |
US14/018,102 Division US20140006317A1 (en) | 2007-07-26 | 2013-09-04 | Automatic content composition generation |
US14/274,731 Continuation US20140258211A1 (en) | 2007-07-26 | 2014-05-11 | Interactive and Social Knowledge Discovery Sessions |
US14/694,887 Continuation US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
US15/597,080 Continuation-In-Part US10795949B2 (en) | 2007-07-26 | 2017-05-16 | Methods and systems for investigation of compositions of ontological subjects and intelligent systems therefrom |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150227559A1 US20150227559A1 (en) | 2015-08-13 |
US9684678B2 true US9684678B2 (en) | 2017-06-20 |
Family
ID=53775079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/694,887 Active US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
Country Status (1)
Country | Link |
---|---|
US (1) | US9684678B2 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11227219B2 (en) | 2018-05-16 | 2022-01-18 | Catalog Technologies, Inc. | Compositions and methods for nucleic acid-based data storage |
US11286479B2 (en) | 2018-03-16 | 2022-03-29 | Catalog Technologies, Inc. | Chemical methods for nucleic acid-based data storage |
US11306353B2 (en) | 2020-05-11 | 2022-04-19 | Catalog Technologies, Inc. | Programs and functions in DNA-based data storage |
US11341962B2 (en) | 2010-05-13 | 2022-05-24 | Poltorak Technologies Llc | Electronic personal interactive device |
US11379729B2 (en) | 2016-11-16 | 2022-07-05 | Catalog Technologies, Inc. | Nucleic acid-based data storage |
US11535842B2 (en) | 2019-10-11 | 2022-12-27 | Catalog Technologies, Inc. | Nucleic acid security and authentication |
US11610651B2 (en) | 2019-05-09 | 2023-03-21 | Catalog Technologies, Inc. | Data structures and operations for searching, computing, and indexing in DNA-based data storage |
US20230202513A1 (en) * | 2018-09-10 | 2023-06-29 | Drisk, Inc. | Systems and Methods for Graph-Based AI Training |
US11763169B2 (en) | 2016-11-16 | 2023-09-19 | Catalog Technologies, Inc. | Systems for nucleic acid-based data storage |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9323832B2 (en) * | 2008-06-18 | 2016-04-26 | Ebay Inc. | Determining desirability value using sale format of item listing |
US9754020B1 (en) * | 2014-03-06 | 2017-09-05 | National Security Agency | Method and device for measuring word pair relevancy |
US10242090B1 (en) | 2014-03-06 | 2019-03-26 | The United States Of America As Represented By The Director, National Security Agency | Method and device for measuring relevancy of a document to a keyword(s) |
US11797641B2 (en) | 2015-02-03 | 2023-10-24 | 1Qb Information Technologies Inc. | Method and system for solving the lagrangian dual of a constrained binary quadratic programming problem using a quantum annealer |
CA2881033C (en) | 2015-02-03 | 2016-03-15 | 1Qb Information Technologies Inc. | Method and system for solving lagrangian dual of a constrained binary quadratic programming problem |
US10268773B2 (en) | 2015-06-30 | 2019-04-23 | International Business Machines Corporation | Navigating a website using visual analytics and a dynamic data source |
EP4036708A1 (en) | 2016-03-11 | 2022-08-03 | 1QB Information Technologies Inc. | Methods and systems for quantum computing |
US10855561B2 (en) * | 2016-04-14 | 2020-12-01 | Oracle International Corporation | Predictive service request system and methods |
US9537953B1 (en) * | 2016-06-13 | 2017-01-03 | 1Qb Information Technologies Inc. | Methods and systems for quantum ready computations on the cloud |
US10044638B2 (en) | 2016-05-26 | 2018-08-07 | 1Qb Information Technologies Inc. | Methods and systems for quantum computing |
US9870273B2 (en) | 2016-06-13 | 2018-01-16 | 1Qb Information Technologies Inc. | Methods and systems for quantum ready and quantum enabled computations |
CN106484839B (en) * | 2016-10-08 | 2018-07-06 | 大连理工大学 | A kind of journal impact appraisal procedure based on academic big data |
US11948116B2 (en) | 2017-09-22 | 2024-04-02 | 1Nteger, Llc | Systems and methods for risk data navigation |
US10997541B2 (en) * | 2017-09-22 | 2021-05-04 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US10127511B1 (en) * | 2017-09-22 | 2018-11-13 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
US10373091B2 (en) * | 2017-09-22 | 2019-08-06 | 1Nteger, Llc | Systems and methods for investigating and evaluating financial crime and sanctions-related risks |
CN107818164A (en) * | 2017-11-02 | 2018-03-20 | 东北师范大学 | A kind of intelligent answer method and its system |
WO2020255076A1 (en) | 2019-06-19 | 2020-12-24 | 1Qb Information Technologies Inc. | Method and system for mapping a dataset from a hilbert space of a given dimension to a hilbert space of a different dimension |
US11055494B2 (en) * | 2019-08-09 | 2021-07-06 | Microsoft Technology Licensing, Llc. | Matrix based bot implementation |
EP4070205A4 (en) | 2019-12-03 | 2024-05-01 | 1QB Information Technologies Inc. | System and method for enabling an access to a physics-inspired computer and to a physics-inspired computer simulator |
CN111177591B (en) * | 2019-12-10 | 2023-09-29 | 深圳市数康云信息技术有限公司 | Knowledge graph-based Web data optimization method for visual requirements |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080115082A1 (en) * | 2006-11-13 | 2008-05-15 | Simmons Hillery D | Knowledge discovery system |
US20090012842A1 (en) * | 2007-04-25 | 2009-01-08 | Counsyl, Inc., A Delaware Corporation | Methods and Systems of Automatic Ontology Population |
US7512576B1 (en) * | 2008-01-16 | 2009-03-31 | International Business Machines Corporation | Automatically generated ontology by combining structured and/or semi-structured knowledge sources |
US8234285B1 (en) * | 2009-07-10 | 2012-07-31 | Google Inc. | Context-dependent similarity measurements |
-
2015
- 2015-04-23 US US14/694,887 patent/US9684678B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080115082A1 (en) * | 2006-11-13 | 2008-05-15 | Simmons Hillery D | Knowledge discovery system |
US20090012842A1 (en) * | 2007-04-25 | 2009-01-08 | Counsyl, Inc., A Delaware Corporation | Methods and Systems of Automatic Ontology Population |
US7512576B1 (en) * | 2008-01-16 | 2009-03-31 | International Business Machines Corporation | Automatically generated ontology by combining structured and/or semi-structured knowledge sources |
US8234285B1 (en) * | 2009-07-10 | 2012-07-31 | Google Inc. | Context-dependent similarity measurements |
Non-Patent Citations (3)
Title |
---|
"An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval"; Castells et al IEEE Transactions on Knowledge and Data Engineering, vol. 19, No. 2, Feb. 2007. * |
"Evaluating the Novelty of Text Mined Rules Using Lexical Knowledge"; Basu et al Proceedings of the Seventh International Converence on Knowledge Discovery and Data Mining KDD 2001 San Francisco, California, USA Copyright 2001. * |
"From Frequency to Meaning: Vector Space Models of Semantics"; Turney et al Journal of Artificial Intelligence Research 37 (2010) 141-188 Submitted Oct. 2009; published Feb. 2010 (c)2010 AI Access Foundation and National Research Council Canada. * |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11341962B2 (en) | 2010-05-13 | 2022-05-24 | Poltorak Technologies Llc | Electronic personal interactive device |
US11367435B2 (en) | 2010-05-13 | 2022-06-21 | Poltorak Technologies Llc | Electronic personal interactive device |
US11763169B2 (en) | 2016-11-16 | 2023-09-19 | Catalog Technologies, Inc. | Systems for nucleic acid-based data storage |
US11379729B2 (en) | 2016-11-16 | 2022-07-05 | Catalog Technologies, Inc. | Nucleic acid-based data storage |
US12001962B2 (en) | 2016-11-16 | 2024-06-04 | Catalog Technologies, Inc. | Systems for nucleic acid-based data storage |
US11286479B2 (en) | 2018-03-16 | 2022-03-29 | Catalog Technologies, Inc. | Chemical methods for nucleic acid-based data storage |
US12006497B2 (en) | 2018-03-16 | 2024-06-11 | Catalog Technologies, Inc. | Chemical methods for nucleic acid-based data storage |
US11227219B2 (en) | 2018-05-16 | 2022-01-18 | Catalog Technologies, Inc. | Compositions and methods for nucleic acid-based data storage |
US20230202513A1 (en) * | 2018-09-10 | 2023-06-29 | Drisk, Inc. | Systems and Methods for Graph-Based AI Training |
US12043280B2 (en) * | 2018-09-10 | 2024-07-23 | Drisk, Inc. | Systems and methods for graph-based AI training |
US11610651B2 (en) | 2019-05-09 | 2023-03-21 | Catalog Technologies, Inc. | Data structures and operations for searching, computing, and indexing in DNA-based data storage |
US12002547B2 (en) | 2019-05-09 | 2024-06-04 | Catalog Technologies, Inc. | Data structures and operations for searching, computing, and indexing in DNA-based data storage |
US11535842B2 (en) | 2019-10-11 | 2022-12-27 | Catalog Technologies, Inc. | Nucleic acid security and authentication |
US11306353B2 (en) | 2020-05-11 | 2022-04-19 | Catalog Technologies, Inc. | Programs and functions in DNA-based data storage |
Also Published As
Publication number | Publication date |
---|---|
US20150227559A1 (en) | 2015-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9684678B2 (en) | Methods and system for investigation of compositions of ontological subjects | |
US10795949B2 (en) | Methods and systems for investigation of compositions of ontological subjects and intelligent systems therefrom | |
US10846274B2 (en) | Ontological subjects of a universe and knowledge representations thereof | |
US10885073B2 (en) | Association strengths and value significances of ontological subjects of networks and compositions | |
Kaliyar et al. | DeepFakE: improving fake news detection using tensor decomposition-based deep neural network | |
Pereira et al. | A comparative evaluation of off-the-shelf distributed semantic representations for modelling behavioural data | |
US9842129B2 (en) | Association strengths and value significances of ontological subjects of networks and compositions | |
US10795919B2 (en) | Assisted knowledge discovery and publication system and method | |
US9070087B2 (en) | Methods and systems for investigation of compositions of ontological subjects | |
Lei et al. | Tag recommendation by text classification with attention-based capsule network | |
Guo et al. | Tapping on the potential of q&a community by recommending answer providers | |
US20210073191A1 (en) | Knowledgeable Machines And Applications | |
CA3004097A1 (en) | Methods and systems for investigation of compositions of ontological subjects and intelligent systems therefrom | |
US20220245109A1 (en) | Methods and systems for state navigation | |
Mabrouk et al. | Exploiting ontology information in fuzzy SVM social media profile classification | |
Flores et al. | User profiling and satisfaction inference in public information access services | |
Alahmary et al. | A semiautomatic annotation approach for sentiment analysis | |
Gupta et al. | Fuzzy logic-based approach to develop hybrid similarity measure for efficient information retrieval | |
Aydoğan et al. | TRSAv1: a new benchmark dataset for classifying user reviews on Turkish e-commerce websites | |
Kaur et al. | Semantic-based integrated plagiarism detection approach for english documents | |
Azzam et al. | A question routing technique using deep neural network for communities of question answering | |
Adebanji et al. | Sequential models for sentiment analysis: A comparative study | |
Dhar et al. | Hybrid approach for text categorization: A case study with Bangla news article | |
Bhatnagar et al. | Improving pseudo relevance feedback based query expansion using genetic fuzzy approach and semantic similarity notion | |
WO2021012040A1 (en) | Methods and systems for state navigation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
CC | Certificate of correction | ||
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2551); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YR, SMALL ENTITY (ORIGINAL EVENT CODE: M2552); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY Year of fee payment: 8 |