US20140006317A1 - Automatic content composition generation - Google Patents
Automatic content composition generation Download PDFInfo
- Publication number
- US20140006317A1 US20140006317A1 US14/018,102 US201314018102A US2014006317A1 US 20140006317 A1 US20140006317 A1 US 20140006317A1 US 201314018102 A US201314018102 A US 201314018102A US 2014006317 A1 US2014006317 A1 US 2014006317A1
- Authority
- US
- United States
- Prior art keywords
- knowledge
- ontological subjects
- partitions
- composition
- ontological
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 239000000203 mixture Substances 0.000 title claims description 152
- 238000000034 method Methods 0.000 claims abstract description 92
- 238000005192 partition Methods 0.000 claims description 91
- 238000003860 storage Methods 0.000 claims description 14
- 239000000470 constituent Substances 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 8
- 238000003491 array Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 238000000638 solvent extraction Methods 0.000 claims 1
- 238000004422 calculation algorithm Methods 0.000 abstract description 17
- 239000011159 matrix material Substances 0.000 description 22
- 230000006870 function Effects 0.000 description 18
- 238000009472 formulation Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 12
- 239000000126 substance Substances 0.000 description 12
- 238000011835 investigation Methods 0.000 description 6
- 230000001427 coherent effect Effects 0.000 description 5
- 230000002068 genetic effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 101100498759 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) DDI1 gene Proteins 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 210000000349 chromosome Anatomy 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000007620 mathematical function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/954—Navigation, e.g. using categorised browsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
Definitions
- This invention generally relates to content generation, knowledge and information processing, ontological subject processing, web content service provider.
- This application is about solving the identified problem of generating authoritative or novel compositions (with the desired length) to adequately represent a body of knowledge or any important aspect of it by having a significant substance, knowledge significance, credibility, with the context coherency, usefulness, and sensibility for a knowledge seeker user.
- the constituent components of the generated content composition are selected from the parts or partitions of one or more compositions or as we can call “the assembled body of knowledge” or simply “body of knowledge” in here.
- a collection of WebPages are considered a body of knowledge, from which we desire to compose a new composition for using by a consumer.
- a plurality of web pages are obtained from a database after querying the database, e.g.
- a search engine database and one desires to have a new composition built from or about the content of this plurality of WebPages so that a user can make sure to have the most appropriate and complete content, or in regards to a particular aspect, in hand which has almost the same information about a particular subject as the whole collection of WebPages returned by the search engine.
- the generated content can be a long authoritative article with related multimedia content imbedded therein or being as short as a single sentence statement.
- the body of knowledge can be any contents from a single paragraph article to longer compositions such as books or any sets of these kinds of compositions.
- the body of knowledge or sets of composition can include any form of contents such as audio, video or multimedia, DNA codes, etc.
- the present method of composing new contents uses the methods and definitions as introduced in the patent application Ser. No. 12/939,112 to first evaluate the “Association strength matrix (ASM)”, and “Value Significance Measures (VSMs)” of the ontological subjects, parts and partitions of the assembled body of knowledge. Having evaluated the VSMs of the ontological subjects, and/or the partitions, and the association strengths of the ontological subjects, the current disclosure discloses the methods and algorithms on how to compose a new content in a systematic manner. The resultant content will conserve the most important knowledge and relations of the original body of knowledge while having a coherent and logical path or the composing plan, route or map.
- ASM Association strength matrix
- VSMs Value Significance Measures
- the method transforms the information of the usage and pattern of usage of ontological subjects of an input body of knowledge into matrices and the graphs or networks in accordance with the proposed defined matrices.
- the automatic composition generation in general as composing ontological subjects of any order and any nature (e.g., text, audio, video, genetic code, electrical signal etc.)
- the composition can specifically be composed of parts or partitions of other compositions such as using sentence, paragraphs or web pages obtained from larger compositions (i.e. higher order Ontological Subjects as defined in the patent application Ser. Nos. 12/755,415 and 12/939,112).
- a composition can be composed of different parts of larger compositions or higher order ontological subject with the same or different forms (e.g. text, video, audio, etc.) or any combination of them.
- the composition can be composed of ontological subjects or parts of larges compositions of specific form, e.g.
- a method of selecting the constituting components of the composition, along with the principal route or composing plan for composing the compositions out of ontological subjects is disclosed. It starts by having access to a collection of Ontological Subject of different orders and different natures (that are extracted from a body of knowledge). Then by employing one or more of the preferred algorithms a principal route for semantically composing the composition is determined and according to the route and based on the merit or values significances measures of the partitions, i.e. ontological subjects of lower and higher orders, most appropriate and merit-full partitions are selected to represent the intended semantics aspect according to said principal route of the composition. The route may be selected dynamically as the new content composition is being formed.
- the method first follow the method of the patent application Ser. Nos. 12/939,112 to identify the most valuable partition of the body of knowledge by evaluating the value significance of the ontological subjects and/or the partitions as described in the patent application Ser. No. 12/939,112.
- the method may further construct a principal map of knowledge for that body of knowledge by evaluating the association strengths of the OSs of the given composition (e.g. a body of knowledge) and select a principal route or composing plan from which a new composition is being built. After identifying the principal route according to the predetermined requirements, style, aspect, application, etc.
- a new composition is constructed by selecting the most valued partitions of the body of knowledge that contain one or more of the associated OSs on the principal routs and explain the most significant OSs in such an order that will follow the principal rout or backbone of the composition.
- Depended on the allowed length or desirable length substantive details will be added based on their value significance measure/s and their relatedness or association with the OSs that need to be explained along the composition.
- a method and the associated exemplary system is introduced that provide the knowledge consumers with the verified and substantive knowledge about a topic or subject matter of interest.
- a body of knowledge or corpus is created or obtained.
- the most semantically or formally important partitions of the corpus is identified for inclusion into the composed content.
- the structure of the article (the content composition) is identified and organized. Once the structure of the article is identified for the semantics that need to be in the composition, then we find the best suited partitions to convey the necessary information about that semantic.
- the selected partitions can be further rephrased, edited, or replaced with semantically similar ontological subjects or parts if desired.
- a document representing the collective knowledge of a diverse set of compositions containing information about a topic should first of all cover the most important aspects of the topic and its associated subtopics. Secondly it should contain the information according to the state of the collective knowledge and understating of the mass about that topic. Thirdly it should follow a logical path toward connecting the information about the knowledge therein so that it is easy for human to comprehend and follow the relations between the most important parts of the knowledge describing or analyzing or supporting a topic.
- FIG. 1 shows schematically the block diagram of the process flow, method and system of generating content according to one exemplary embodiment of the invention.
- FIG. 2 a shows conceptually a principal map of the Body Of Knowledge (BOK), according to one exemplary embodiment of such a map or graph.
- BOK Body Of Knowledge
- FIG. 2 b shows a principal route for composing content according to one exemplary embodiment.
- FIG. 3 shows one exemplary process of finding the most significant associates (MSA) using only the association strength matrix (ASM).
- FIG. 4 shows schematic block diagram of content composer in general.
- FIG. 5 shows schematics of one optional addition to the composer of the FIG. 4 , having different layers of editorial blocks.
- FIG. 6 shows the composing of content in demand or in response to a requested subject matter.
- FIG. 7 shows one exemplary schematic of a web service system having hardware and the embedded software and codes for providing content to users upon request.
- Systems and methods of generating freelanced or classified quality contents for and from a body of knowledge are disclosed so as to speed up the process of research and development, knowledge acquisition, sharing, and real (verified) information retrieval.
- authoritative content or article generation from a body of knowledge or a collection of compositions can be a desirable service or product.
- this is evidenced from the popularity of free encyclopedia of Wikipedia covering many numbers of subject matters of importance and interest.
- Wikipedia still uses a small group of people for each article making it notorious to errors and unverified facts.
- the capacity of content generation is limited due to the laborious process.
- Nos. 12/939,112 and 12/755,415) are semantically important and have significant value in the context of that body of knowledge but a generated composition, in the form of listing the important partitions, may lack the coherency and a logical route necessary for better comprehension of the generated composition by an average user.
- the invention discloses the method, algorithms, and the related systems and services of generating content composition/s from a body of knowledge.
- VSMs value significance measures
- BOK input body of knowledge
- section II-I a summarized version of the formulation which helps to explain the current inventions is recited here again. The complete formulation is found in the incorporated referenced applications. In section II-II, the composing method then is explained in reference to the accompanying figures and the formulation method in section II-I here.
- the Participation Matrix is a matrix indicating the participation of each ontological subject in each partitions of the composition.
- PM indicate the participation of one or more lower order OS into one or more OS of higher or the same order.
- PM is the most important array of data in this disclosure containing the raw information from which many other important functions, information, features, and desirable parameters can be extracted. Without intending any limitation on the value of PM entries, in the preferred embodiments throughout most of this disclosure (unless stated otherwise) the PM is a binary matrix having entries of one or zero and is built for a composition or a set of compositions as the following:
- PM kl OS 1 k ⁇ OS N k ⁇ ( pm 11 kl ... pm 1 ⁇ M kl ⁇ ⁇ ⁇ pm N ⁇ ⁇ 1 kl ... pm NM kl ) OS 1 l ... OS M l ( 1 )
- OS i l is the ith OS of the lth order
- OS i k is the ith OS of the kth order, extracted from the composition
- PM ij kl 1 if OS i k have participated, i.e. is a member, in the OS j l and 0 otherwise.
- association strengths play an important role in evaluation of some of the value significances of OSs of the compositions and, in fact, are entries of a new matrix called here the “Association Strength Matrix (ASM k
- c is a predetermined constant or a predefined function of other variables in Eq. 2.
- c is a predetermined constant or a predefined function of other variables in Eq. 2.
- l denotes the co-occurrences of OS i k and OS j k in the set of OSs of order l OS l , and in fact are the entries of the Co-Occurrence Matrix (COM k
- l are the “independent occurrence probability” of OS i k and OS j k respectively.
- the probability of independent occurrence is the “Frequency of Occurrences” (FO i k ) i.e. the number of times an OS k has appeared in the composition or its partition, divided by the total number of occurrences of all the other OSs of the same order in the composition, or divided by the number of possible occurrences of an OS in the partitions.
- the “Independent Occurrence Probability (IOP)” therefore is given by:
- ⁇ n is a normalization factor that is determined by the mathematical necessities in different situations. For example, when iop i k
- l refers to the independent probability of occurrence of OS i k in the M partitions of the composition then ⁇ n 1/M, wherein more than one occurrences of OS i k in a partition is not counted.
- the frequency of occurrences can be obtained by counting the occurrences of OSs of the particular order in the composition or its partitions, e.g. counting the appearances of particular word in the set of OS l , or more conveniently obtained from the main diagonal of COM k
- association strength defined by Eq. 2 is not symmetric and generally asm ji k
- VSMs value significance measures
- BOK input body of knowledge
- the value significance of higher order OSs can be evaluated either by direct value significance evaluation similar to lower order OSs, or can be derived from value significance of the participating lower orders into higher order.
- participation matrices to arrive at the VSMx i l
- k ⁇ i VSMx i k
- Eq. (5) can also be written in its matrix form to get the whole vector of value significance measure of OSs of order l
- the scores of the partitions can further be scaled or normalized. For instance the score or the resultant VSM of a partition (i.e. the VSM l
- FIG. 1 shows schematically one embodiment of the block diagram of the system and algorithm of generating new compositions from a body of knowledge.
- the notations and abbreviations are common with the patent application Ser. Nos. 12/939,112 and 12/755,415.
- the system has access to a body of knowledge.
- the body of knowledge can be a collection of compositions or a single composition.
- the body of knowledge can be assembled by querying a search engine and collect a desired number of documents related to query or the subject matter.
- the system have access or assembles a body of knowledge or a corpus related to one or more subject matter form the variety of repository sources that might be available to the system including all type of knowledge repositories, data bases etc.
- our exemplary input body of knowledge is a written text or has been transformed to a written text.
- the corpus or the BOK also called the input composition in this application and the references herein from time to time
- the input composition is partitioned to a desired number of partitions of different length or preferably to syntactically correct semantic units (such as word, sentences, paragraphs, etc.).
- the input composition is parsed to its constituents, words as OS order 1, sentences as OS order 2, the paragraphs as OS order 3, and so on.
- the extracted OSs of different orders of the BOK are stored in arrays of suitable format and storage efficiency and ease of retrieval.
- the storage can be temporary or more permanent computer readable media, for having accessed by other programs or be used in other similar sessions.
- Participation Matrix/es Concurrently or consequently the desired number of Participation Matrix/es (PM/s), as was described in section II-I, are built and also stored for further use.
- Participation matrix can be stored numerically or by any other programming language objects such as dictionaries, lists, list of lists, cell arrays, databases or any array of data etc. which are essentially different representation forms of the data contained in the PM/s. It is apparent to those skilled in the art that the formulations, mathematical objects and the described methods can be implemented in various ways using different computer programming languages or software packages that are suitable to perform the methods and the calculations.
- any of the objects and arrays of data and the calculations needed to implemented the methods and the systems of this invention can be done through localized computing and storage media facilities or be distributed over a distributed computer facility or facilities, distributed databases, file systems, parallel computing facilities, distributed hardware nodes, distributed storage hubs, distributed data warehouses, distributed processing, cluster computing, storage networks, and in general any type of computing architectures, communication networks, storage networks and facilities capable of implementing the methods and the systems of this invention.
- the whole system and method can be implemented and performed by geographically distant computer environments wherein one or more of the data objects and/or one or more of the operation and functions is stored or performed or processed in a geographically different location from other parts storing or performing or processing one or more of the data objects and/or one or more of the operations or functions of this disclosure.
- the system builds the Association Strength Matrix/es (ASM/s) and also keep them in temporal or more permanent computer readable storage medium.
- ASM/s Association Strength Matrix/es
- system can proceed to evaluate at least one of the “Value Significance Measures (VSM/s)” of the partitions and OSs of the desired order from their usage and their pattern of participation in the input composition, as shown in the FIG. 1 .
- VSM/s Value Significance Measures
- the system now can consider the ASM as an asymmetric directed graph as was explained in the patent application Ser. No. 12/939,112 referenced before, and use the ASM to build several other desirable graphs or maps.
- One of the desired maps in this application would be a map or a plan or a route that can show the relations between the OSs of the body of knowledge based on the “most significant associates (MSA)” which in turn can be based on their value significance and their strength of associations to each other.
- MSA most significant associates
- Such map or route can be followed by the composer module to make sure that the generated composition is coherent and sensible and represent the same essence of knowledge as the input body of knowledge. Therefore as shown in FIG.
- a principal map can be obtained or envisioned from which a composing backbone route or principal route is selected according to the method and algorithm that will be explained by referencing to FIG. 2 , a, and b of this application.
- the principal route can also be derived from the ASM directly as exemplified in the method shown in FIG. 3 .
- the composer block or module that composes a new composition by assembling the scored partitions of the body of knowledge based on the VSMs of the partitions according to the backbone or the principal route/s, and by using the participation information of the partitions into each other.
- the composer further might have several other predetermined criteria that should be considered in composing the output composition. Such criteria could be the length or percentage ratio of the generated composition relative to the given BOK, or the style, the type of substance (verified or novel), etc.
- the new composition will be usually composed or built as a summarization of the body of knowledge, a general overview or complete overview of the body knowledge, or novel aspects of the BOK.
- the aim is to have a much cleaner and logical view of the body of knowledge in a much shorter and structured compositions so that a consumer can save lots of research and trial times and making sure that the user has access to the most valuable knowledge related to his/her subject matter/s of interest.
- the new compositions, or the system which in fact could be used as a tool for knowledge seeker may be named as an answer, a summary, an essay, a response, a report, a content etc. and be used in variety of situations depend on the output length of the generated composition.
- FIG. 2 a shows one exemplary principal map of the knowledge of the input body of knowledge which can be formed, as one example, using the following protocol:
- FIG. 2 a shows one exemplary embodiment of principal map that can be driven from the ASM matrix.
- the principal map can further be refined with more restrictive predetermined criteria to be used as the route or the plan for composing the new content composition.
- the refined map is called “the principal or backbone route” or “composing plan” here.
- FIG. 2 b shows one more exemplary principal route or composing plan or route.
- the principal route is the route of the strongest association to its above layer associates.
- the thicker line route is one exemplary principal or backbone route and is determined by:
- FIG. 2 a , and 2 b are just two exemplary reasonable maps that can be useful and insightful.
- FIG. 3 shows one actual exemplary selection process and the algorithm of finding the nodes of principal or backbone route using the ASM and VSM.
- first selected set of OS l s then select a desired number of them based on their value significance (i.e. VSM k
- l value significance
- the same process can be done for the second group of two or more OSs of CRN k
- the process can also be done dynamically in such a way that finding or selecting an OSs for inclusion the composing route and then find the candidate partitions for inclusion in the new content composition and then move on to finding the next OSs of the composing route and repeating the process until certain criteria are met.
- the route usually starts form the highest valued (having the highest VSM regarding the important aspects of the parts of the BOK) in the first level or layer and pass through the most significant associates of each of the OSs of the earlier layer.
- the most significant associate can mean the OS that has the highest association strength or those associates that have highest VSM, or any desirable function of the association strength and VSM.
- l )” can be given by a set or a vector:
- l ⁇ ( asm ji k
- l ) ⁇ and j 1,2 . . . N (6),
- ⁇ is a predefined function and ⁇ is a predetermined value employed here as a threshold.
- Collection of the MSA for all the OSs can again be represented by a matrix called “Most Significant Association Matrix (or MSAM k
- l the edge between the node OS p k and OS q k is denoted by msam pq k
- the principal or backbone route can be identified from MSAM k
- composing routes or backbones can be devised, selected or identified based on the desired form and application of the generated content.
- criteria for the desired content could be to have information about the relations of the OSs demonstrating a predetermined range of association strength to each other or to one of most valued OSs.
- the final generated content could be a simple answer about a subject matter, a summarization of BOK related to a subject matter, a tutorial paper about the subject matter, background information content, or contains novel information of the BOK of a subject matter.
- a novel content can mostly include the less known (having lower VSM) OSs in the BOK but, optionally, with strong association to high valued OSs.
- VSM for OS i k :
- l ⁇ log b iop i k
- l is in fact a function of VSM 1 i k
- l also may be called the self-information of OS i k .
- l scores high in regards to the novelty aspect of a partition of the BOK.
- the scores of the partitions based on the VSM of the choice can further be scaled or normalized when it is more appropriate.
- the score or the resultant VSM of a partition i.e. the resultant VSM 6 l
- the score or the resultant VSM of a partition can be divided by the number of the OS k contained in the partition or by the total number of the characters used in the partitions etc. in order to have a fair comparison of the merits of a partition among a set of partitions of the BOK.
- l ⁇ 1 VSM 2 i k
- l This value significance (VSM 7 i k
- l ⁇ iop i k
- l ⁇ log b iop i k
- VSMs value significance measures
- an Ontological Subject Map introduced in the US patent application entitled “System and Method of Ontological Subject Mapping for knowledge Processing Applications” filed on Aug. 26, 2009, application Ser. No. 12/547,879, can be used.
- OSM Ontological Subject Map
- any form of graphs representing the body of knowledge such as semantic networks or maps, social networks, ontology databases, ontology trees, and the like, can be utilized for identification of a principal, backbone, or composing route.
- FIG. 4 shows the composer in more specific but general details. It shows an exemplary way that the composer performs and composes a content form the partitions of the BOK. This is one exemplary embodiments and protocols of using the contents of BOK and the derived data from the BOK to generate a new composition of content from the BOK.
- the system can have a plurality of format for generating content.
- the composer is designed to produce an authoritative article or content about the principal subject matter of the BOK.
- the system will follow the method and teachings of the current invention to extract the partitions (OSs) of the BOK, make an association strength matrix for the desired OSs (usually the words or phrases used in the BOK) and have identified the backbone rout and have obtained at least one VSM (value significance measure) for the desired OSs with the desired orders (usually the words and sentences or the paragraphs of the BOK) and have arrays or lists of the OSs of the different order in data base (temporary or more permanently) and the PM information.
- OSs partitions
- VSM value significance measure
- the procedure can be repeated for different branches of the backbone route without departing too far from the principal or backbone route.
- Many measures of distance and metrics can be defined to show the relevance and closeness of the selected partition in each of the section to the backbone route. That will guarantee certain level of coherency and semantic relevance in the generated content.
- each section and sub-section can have a localized composing plan of its own.
- the Introduction section it can be regards as an smaller content that its structures and criteria are different from other subsections explaining the details about the most significant associates of the subject matter and so on.
- FIG. 4 The block diagram of FIG. 4 , is intended for its generality and illustration and should not be interpreted as the only way of composing content or as limitations to the composing methods disclosed herein. Those familiar with the art may devise other methods and systems of building the composer with fewer steps and different complexities without departing from the scope and sprit of this disclosure that is emphasized in generating new composed contents from a body of knowledge.
- the body of knowledge and or collection of composition in particular may include multimedia content, Unicode strings, mathematical formulas, pictures, figures, data files etc.
- case 1 the subject matter can itself be a lengthy content, or the subject matter could be extracted from content given by a user/client. For instance a user can input or give the address to a content (e.g. a webpage) and would like to have further investigation into this content by using the method. Alternatively the system can extract the subject mater/s of the given content and assemble related body or bodies of knowledge and then perform the method of content composition.
- a content e.g. a webpage
- the system can extract the subject mater/s of the given content and assemble related body or bodies of knowledge and then perform the method of content composition.
- the composer can further have several layers of editorial blocks that is responsible to make the generated content yet more readable, useful, coherent and semantically and syntactically correct, that can adequately represent the most important desired aspects (background, novelty, all the most significant subject matters etc.) of a BOK.
- the editorial levels use the backbone route, (or can make yet a new route, considering the raw composed content as an input composition) and the retrieved selected partitions for the inclusion in the generated content, to make sure that the desired standards of syntactical and graphical appearances etc. are met.
- the content composing can be done with more than one iteration until certain measures of quality and knowledge substance are met.
- the preferred method and algorithm will depend on the processing power and the recourses available for implementing the method and the algorithms.
- the generated content can again be analyzed and its principal map be compared against the principal map of the original body of knowledge.
- VSM spectrum of the generated content is compared to that of the BOK.
- the automatically generated content composition may also be further edited by human operators and editors for final quality check.
- FIG. 6 shows an important application of the method and the system of automatic content generation from a body of knowledge in response to a user's request.
- the system of FIG. 6 will assemble a body of knowledge for the client or user and then generates the requested form of the content with the predetermined or optional formats for the user.
- the user's request can be a keyword, a question posed in natural language, or in general any content short or long.
- the system may first extract the OSs of the input request and find the keywords from the input request and assemble a BOK that is related to these keywords. Consequently as shown in FIG. 6 by following the method and algorithms of this application provide the desired content in the from of an answer, a coherent summarization of the assembled BOK, a content explaining the novel aspects of the keywords in the context of the assembled BOK, a tutorial content, and the like, to provide an answer as a service to the user's request.
- the input request can further be an existing content such as paper, a webpage, or a pre-built body of knowledge for which a user wants to have a composed content or like to have further investigations in a larger scale of related knowledge and information.
- a user can request a service for investigating the submitted paper or the content and demand a report of the investigation from the system in variety of forms such as the merit of the submitted content in comparison to larger body of knowledge in the same field or context. Or demand an authoritative report or summary or an essay regarding and related to subject matter/s of the submitted content etc.
- Those skilled in the art can envision various applications and further modes of operation for the system and methods disclosed here without departing from the scope and sprit of the invention.
- FIG. 7 shows, an exemplary application system and/or an online service provider system in which there are provided the web service appliances in the forms of storage, servers and software, and hardware that may contain pre-generated content for a list of subject matters and stored them for easy retrieval in response to a user's request for content or will create a content composition in response to a client input.
- the building blocks of the composer service engine are explained in the FIG. 7 itself.
- the system will return the premade content related to the subject mater of the client's request. If the system does not have the requested content or not in accordance with the requested format, then it will generate content with the desired format using the methods and systems of composing new content of the invention and by having access to repositories of knowledge, and information.
- the repositories of knowledge and information can be the available databases, corporate database/s, a publisher content collection, in-house repositories or otherwise, such as database of a search engine, or the whole internet. It also can include all types of different information representations such as multimedia.
- the system repositories of the premade content can further be classified under different subject matters, keywords, or possible on line journals, encyclopedias, wiki groups and the like.
- the system can at the same time work real time to constantly incorporate the latest findings in a body of knowledge related to a subject matter and modifies the generated content to reflect the latest findings, or add more contents to its repositories.
- the system can analyze a submitted content or body of knowledge by a user, or expand the content or the submitted body of the knowledge and generate new content compositions of requested formats, style, substance etc in demand.
- a document representing the collective knowledge of a diverse set of compositions containing information about a topic should first of all cover the most important aspects of the topic and its associated subtopics. Secondly it should contain the information according to the state of the collective knowledge and understating of the mass about that topic. Thirdly it should follow a logical path toward connecting the information about the knowledge therein so that it is easy for human to comprehend and follow the relations between the most important parts of knowledge describing or analyzing or supporting a topic.
- the methods, algorithms, and the systems disclosed in this application propose a great benefit to the knowledge professional and knowledge seekers so as to shorten their research time significantly while the generated content according to the teaching and the systems and services proposed in this applicant can give them valid account of a body of knowledge, without bias, overlooked facts, limitation on the subject matters, language, or compromise on the quality of knowledge.
- An important advantage of the methods disclosed herein that they not relay on the individual semantic or syntactic symbols and/or terms of the composition in order to provide a satisfactory service.
- the systems, methods and algorithms explained here are expected to accelerate the rate of knowledge discovery significantly, and make the task of learning and knowledge acquisition, research, and analysis of the knowledge and information much more efficient and effective.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
- Information Transfer Between Computers (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention discloses methods, algorithms, and the related systems and services of generating contents from a body of knowledge.
Description
- The present application is a divisional of U.S. patent application Ser. No. 12/946,838 filed on Nov. 15, 2010, which claims priority from U.S. provisional patent application No. 61/263,685 filed on Nov. 23, 2009, entitled “Automatic Content Composition Generation” which is herein incorporated by reference,
- This application also cross-references U.S. Pat. No. 8,401,980 “Methods for determining Context of Compositions of Ontological subjects and the application thereof using value significance measures (VSMS), co-occurrences, and frequency of occurrences of ontological subjects” filed on Nov. 3, 2010; and
- US patent application entitled “System and Method of Content Generation”, filed on Oct. 20, 2010, application Ser. No. 12/908,856; and
- US patent application entitled “System And Method For A Unified Semantic Ranking Of Compositions Of Ontological Subjects And The Applications Thereof”, filed on Apr. 7, 2010, application Ser. No. 12/755,415; and
- U.S. Pat. No. 8,452,725 entitled “System and Method of Ontological Subject Mapping for knowledge Processing Applications” filed on Aug. 26, 2009; and
- US patent application entitled “Assisted Knowledge Discovery and Publication System, and Method” filed on Jul. 24, 2008, application Ser. No. 12/179,363, which are incorporated herein by references along with their contents.
- This invention generally relates to content generation, knowledge and information processing, ontological subject processing, web content service provider.
- Currently human knowledge and the information produced by human in the forms of text, audio, video or multimedia contents are stored in vast repositories of corporate data centers, digital libraries, search engines, and storages of individual computer servers. The only effective tool at the disposal of a knowledge seeker professional for attaining knowledge or information is the service of search engines that provide a great many number of webpages and documents related to a keyword and a subject matter. The researchers have to still sift through countless documents to gain an obscure view of a body of knowledge related to his/her subject mater of interest. This process of knowledge seeking/acquisition needs highly trained professional, is very time consuming, slow, and expensive for both corporations and individuals. Moreover, there is no guarantee to the quality, value, and completeness of the knowledge gained from a human investigation of the body of knowledge related to a subject matter.
- Therefore, having a representative content for a body of knowledge that can accurately show the essence and context of the body of knowledge can be beneficial. Composing the representative content by human is very slow, time consuming and needs highly trained professional authorities.
- On the other hand, automatic content generation attempts, using Markov model or summarization techniques, have had a limited appeal since the results are not easy to read and comprehend by the users. Moreover there is no guarantee in terms of semantic significances of the automatically generated content to be used as a credible representative content for a body of knowledge.
- Therefore, there exists a need to automatically generate quality contents without these shortcomings.
- In this invention it is notice that the current automatic content generation method and systems are not able to preserve the context and substance nor can they represent the real significant essence of a body of knowledge.
- This application is about solving the identified problem of generating authoritative or novel compositions (with the desired length) to adequately represent a body of knowledge or any important aspect of it by having a significant substance, knowledge significance, credibility, with the context coherency, usefulness, and sensibility for a knowledge seeker user.
- According to one preferred exemplary embodiment the constituent components of the generated content composition are selected from the parts or partitions of one or more compositions or as we can call “the assembled body of knowledge” or simply “body of knowledge” in here. For instance a collection of WebPages are considered a body of knowledge, from which we desire to compose a new composition for using by a consumer. For this instance a plurality of web pages are obtained from a database after querying the database, e.g. a search engine database, and one desires to have a new composition built from or about the content of this plurality of WebPages so that a user can make sure to have the most appropriate and complete content, or in regards to a particular aspect, in hand which has almost the same information about a particular subject as the whole collection of WebPages returned by the search engine.
- The generated content can be a long authoritative article with related multimedia content imbedded therein or being as short as a single sentence statement. Similarly, the body of knowledge can be any contents from a single paragraph article to longer compositions such as books or any sets of these kinds of compositions. Furthermore the body of knowledge or sets of composition can include any form of contents such as audio, video or multimedia, DNA codes, etc. However in explaining the exemplary embodiments and methods of this disclosure for the most part (for ease of explanation and familiarity) we use the textual compositions without intending any limitations on the applications of this disclosure to any other type of compositions.
- The present method of composing new contents uses the methods and definitions as introduced in the patent application Ser. No. 12/939,112 to first evaluate the “Association strength matrix (ASM)”, and “Value Significance Measures (VSMs)” of the ontological subjects, parts and partitions of the assembled body of knowledge. Having evaluated the VSMs of the ontological subjects, and/or the partitions, and the association strengths of the ontological subjects, the current disclosure discloses the methods and algorithms on how to compose a new content in a systematic manner. The resultant content will conserve the most important knowledge and relations of the original body of knowledge while having a coherent and logical path or the composing plan, route or map.
- The method transforms the information of the usage and pattern of usage of ontological subjects of an input body of knowledge into matrices and the graphs or networks in accordance with the proposed defined matrices.
- In this disclosure, we define the automatic composition generation in general as composing ontological subjects of any order and any nature (e.g., text, audio, video, genetic code, electrical signal etc.) The composition can specifically be composed of parts or partitions of other compositions such as using sentence, paragraphs or web pages obtained from larger compositions (i.e. higher order Ontological Subjects as defined in the patent application Ser. Nos. 12/755,415 and 12/939,112). Additionally a composition can be composed of different parts of larger compositions or higher order ontological subject with the same or different forms (e.g. text, video, audio, etc.) or any combination of them. Yet additionally the composition can be composed of ontological subjects or parts of larges compositions of specific form, e.g. text, transformed or trans-mapped into other forms of ontological subjects, e.g. video or movie, as described in the patent application Ser. No. 12/908,856, entitled “System and Method of Content Generation”, filed on Oct. 20, 2010, which is also incorporated herein as reference.
- To achieve or make a content composition of the above, a method of selecting the constituting components of the composition, along with the principal route or composing plan for composing the compositions out of ontological subjects is disclosed. It starts by having access to a collection of Ontological Subject of different orders and different natures (that are extracted from a body of knowledge). Then by employing one or more of the preferred algorithms a principal route for semantically composing the composition is determined and according to the route and based on the merit or values significances measures of the partitions, i.e. ontological subjects of lower and higher orders, most appropriate and merit-full partitions are selected to represent the intended semantics aspect according to said principal route of the composition. The route may be selected dynamically as the new content composition is being formed.
- According to one exemplary embodiment of the invention, the method first follow the method of the patent application Ser. Nos. 12/939,112 to identify the most valuable partition of the body of knowledge by evaluating the value significance of the ontological subjects and/or the partitions as described in the patent application Ser. No. 12/939,112. The method may further construct a principal map of knowledge for that body of knowledge by evaluating the association strengths of the OSs of the given composition (e.g. a body of knowledge) and select a principal route or composing plan from which a new composition is being built. After identifying the principal route according to the predetermined requirements, style, aspect, application, etc. a new composition is constructed by selecting the most valued partitions of the body of knowledge that contain one or more of the associated OSs on the principal routs and explain the most significant OSs in such an order that will follow the principal rout or backbone of the composition. Depended on the allowed length or desirable length substantive details will be added based on their value significance measure/s and their relatedness or association with the OSs that need to be explained along the composition.
- According to another aspect of this disclosure a method and the associated exemplary system is introduced that provide the knowledge consumers with the verified and substantive knowledge about a topic or subject matter of interest. For a given title or a query, question, keyword, or any given content etc., a body of knowledge or corpus is created or obtained. Using the summarization and clustering methods disclosed in referenced applications, the most semantically or formally important partitions of the corpus is identified for inclusion into the composed content. Using the principal maps and/or principal route/s, then the structure of the article (the content composition) is identified and organized. Once the structure of the article is identified for the semantics that need to be in the composition, then we find the best suited partitions to convey the necessary information about that semantic. Following the identified structure one can compose a coherent and comprehensible content which can be used by a human consumer or another software agent. The selected partitions can be further rephrased, edited, or replaced with semantically similar ontological subjects or parts if desired.
- In essence, in this disclosure it is noticed that a document representing the collective knowledge of a diverse set of compositions containing information about a topic should first of all cover the most important aspects of the topic and its associated subtopics. Secondly it should contain the information according to the state of the collective knowledge and understating of the mass about that topic. Thirdly it should follow a logical path toward connecting the information about the knowledge therein so that it is easy for human to comprehend and follow the relations between the most important parts of the knowledge describing or analyzing or supporting a topic.
- The methods, formulas, algorithms, the related systems and few exemplary applications will be explained in more details in the detailed description sections of the application.
-
FIG. 1 : shows schematically the block diagram of the process flow, method and system of generating content according to one exemplary embodiment of the invention. -
FIG. 2 a: shows conceptually a principal map of the Body Of Knowledge (BOK), according to one exemplary embodiment of such a map or graph. -
FIG. 2 b: shows a principal route for composing content according to one exemplary embodiment. -
FIG. 3 : shows one exemplary process of finding the most significant associates (MSA) using only the association strength matrix (ASM). -
FIG. 4 : shows schematic block diagram of content composer in general. -
FIG. 5 : shows schematics of one optional addition to the composer of theFIG. 4 , having different layers of editorial blocks. -
FIG. 6 : shows the composing of content in demand or in response to a requested subject matter. -
FIG. 7 shows one exemplary schematic of a web service system having hardware and the embedded software and codes for providing content to users upon request. - Systems and methods of generating freelanced or classified quality contents for and from a body of knowledge are disclosed so as to speed up the process of research and development, knowledge acquisition, sharing, and real (verified) information retrieval.
- In numerous situations, for example, authoritative content or article generation from a body of knowledge or a collection of compositions can be a desirable service or product. For instance, this is evidenced from the popularity of free encyclopedia of Wikipedia covering many numbers of subject matters of importance and interest. However, Wikipedia still uses a small group of people for each article making it notorious to errors and unverified facts. Moreover the capacity of content generation is limited due to the laborious process. Moreover, there are many more subject matters of importance and interests that are not covered there or are not up to date.
- Therefore, an automatic system and method of generating contents which is fast and have no limitation on the capacity and the number of subject matters would be a highly valuable and effective service. However, automatic generation of valuable and complete contents using the vast repositories of contemporary knowledge is a vey challenging task.
- It is also important to notice that generating a content requires the access to at least one body of knowledge (e.g. a dictionary at least, or an expert's knowledge). Therefore generating content cannot be viewed without having a body of knowledge at disposal. So far automatic content generation attempts, using Markov model or summarization techniques, have had a limited appeal since the results are not easy to read and comprehend by the users. That is because mostly they are focused on the natural language analysis of contents and the syntactical correctness of the generated contents using the words and word relationship statistics to synthesize the sentences and paragraphs and not necessarily the significance and correctness or credibility of the knowledge or semantics of the composed content from an input body of knowledge in a meaningful manner. Composing or generating content word by word or expression to expression does not guarantee the meaning and semantic coherency of the generated content due to the inherent ambiguity of natural languages and multiple word senses. Natural language analysis methods relay on the word roles and senses that are highly ambiguous and language dependent.
- Hence, in other words, current automatic content generation method and systems are not able to preserve the context and substance of the input body of knowledge nor can they represent the real significant essence of the body of knowledge.
- In the U.S. patent application Ser. Nos. 12/755,415 filed on Apr. 7, 2010 and 12/939,112 filed on Nov. 3, 2010 both by the same applicant, which are incorporated here as references, it was noticed and mentioned that many types of information processing services such as those of search engines, summarizers, question answering and the like are all a type of content generation from a body of contents or knowledge. Moreover, all these types of content generation can indeed be viewed or regarded as a form of summarization of large body of content to a number of partitions of an input corpus or composition.
- Content generation therefore, in this view, is not a separate task from a summarization type involving the evaluation of the significance of the partitions of an input composition, as described in the U.S. patent application Ser. Nos. 12/939,112 and 12/755,415. Therefore, generating an authoritative content from a body of knowledge can also be done by using an efficient summarizations method to consolidate the true or conceived to be true information related to the topic. However, such summarizations based on value significance measures of the partitions of the input composition usually lack the coherency and continuity that is needed for an average reader to enjoy the benefits of such summarizations from a diverse set of compositions related to a topic of interest. In other words, though the summarized parts (employing the methods of application Ser. Nos. 12/939,112 and 12/755,415) are semantically important and have significant value in the context of that body of knowledge but a generated composition, in the form of listing the important partitions, may lack the coherency and a logical route necessary for better comprehension of the generated composition by an average user.
- Therefore in this description methods and systems are given for generating contents (or compositions) having the necessary substance, knowledge, and knowledge route to adequately convoying the state of the knowledge about a subject matter.
- Now the invention is disclosed in details in reference to the accompanying figures and exemplary cases and embodiments in the following subsections. The invention discloses the method, algorithms, and the related systems and services of generating content composition/s from a body of knowledge.
- This disclosure uses the definitions that were introduced in the U.S. patent application Ser. No. 12/939,112, which is incorporated as a reference, and are recited here again along with more clarifying points according to their usage in this disclosure and the mathematical formulations herein.
-
- 1. Ontological Subject: symbol or signal referring to a thing (tangible or otherwise) worthy of knowing about. Therefore Ontological Subject means generally any string of characters, but more specifically, characters, letters, numbers, words, bits, mathematical functions, sound signal tracks, video signal tracks, electrical signals, chemical molecules such as DNAs and their parts, or any combinations of them, and more specifically all such string combinations that indicates or refer to an entity, concept, quantity, and the incidences of such entities, concepts, and quantities. In this disclosure Ontological Subject's and the abbreviation OS or OSs are used interchangeably.
- 2. Ordered Ontological subjects: Ontological Subjects can be divided into sets with different orders depends on their length, attribute, and function. For instance, for ontological subjects of textual nature, one may characterizes letters as zeroth order OS, words as the first order, sentences as the second order, paragraphs as the third order, pages or chapters as the fourth order, documents as the fifth order, corpuses as the sixth order OS and so on. So a higher order OS is a combination or a set of lower order OSs or lower order OSs are members of a higher order OS. Equally one can order the genetic codes in different orders of ontological subjects. For instance, the 4 basis of a DNA molecules as the zeroth order OS, the base pairs as the first order, sets of pieces of DNA as the second order, genes as the third order, chromosomes as the fourth order, genomes as the fifth order, sets of similar genomes as the sixth order, sets of sets of genomes as the seventh order and so on. Yet the same can be defined for information bearing signals such as analogue and digital signals representing audio or video information. For instance for digital signals representing a video signal, bits (electrical One and Zero) can be defined as zeroth order OS, the bytes as first order, any sets of bytes as third order, and sets of sets of bytes, e.g. a frame, as fourth order OS and so on. Therefore definitions of orders for ontological subjects are arbitrary set of initial definitions that one should stick to in order to make sense of methods and mathematical formulations presented here and being able to interpret the consequent results or outcomes in more sensible and familiar language.
- More importantly Ontological Subjects can be stored, processed, manipulated, and transported only by transferring, transforming, and using matter or energy (equivalent to matter) and hence the OS processing is a completely physical transformation of materials and energy.
- 3. Composition: is an OS composed of constituent ontological subjects of lower or the same order, particularly text documents written in natural language documents, genetic codes, encryption codes, data files, voice files, video files, and any mixture thereof. A collection, or a set, of compositions is also a composition. Therefore a composition is also an Ontological Subject which can be broken to lower order constituent Ontological Subjects. In this disclosure, the preferred exemplary composition is a set of data containing ontological subjects, for example a webpage, papers, documents, books, a set of webpages, sets of PDF articles, multimedia files, or simply words and phrases. Compositions are distinctly defined here for assisting the description in more familiar language than a technical language using only the defined OSs notations.
- 4. Partitions of composition: a partition of a composition, in general, is a part or whole, i.e. a subset, of a composition or collection of compositions. Therefore, a partition is also an Ontological Subject having the same or lower order than the composition as an OS. More specifically in the case of textual compositions, partitions of a composition can be chosen to be characters, words, sentences, paragraphs, chapters, webpage, etc. A partition of a composition is also any string of symbols representing any form of information bearing signals such as audio or videos, texts, DNA molecules, genetic letters, genes, and any combinations thereof. However our preferred exemplary definition of a partition of a composition in this disclosure is word, sentence, paragraph, page, chapters and the like, or WebPages, and partitions of a collection of compositions can moreover include one or more of the individual compositions. Partitions are also distinctly defined here for assisting the description in more familiar language than a technical language using only the general OSs definitions.
- 5. Value Significance Measure: assigning a quantity, or a number or feature or a metric for an OS from a set of OSs so as to assist the selection of one or more of the OSs from the set. More conveniently and in most cases the significance measure is a type of numerical quantity assigned to a partition of a composition. Therefore significance measures are functions of OSs and one or more of other related mathematical objects, wherein a mathematical object can, for instance, be a mathematical object containing information of participations of OSs in each other, whose values are used in the decisions about the constituent OSs of a composition.
- 6. Summarization: is a process of selecting one or more OS from one or more sets of OSs according to predetermined criteria with or without the help of value significance and ranking metric/s. The selection or filtering of one or more OS from a set of OSs is usually done for the purposes of representation of a body of data by a summary as an indicative of that body. Specifically, therefore, in this disclosure searching through a set of partitions or compositions, and showing the search results according to the predetermined criteria is considered a form of summarization. In this view finding an answer to a query, e.g. question answering, or finding a composition related or similar to an input composition etc. are also a form of searching through a set of partitions and therefore are a form of summarization according to the given definitions here.
- 7. Subject matter: generally is an ontological subject or a composition itself. Therefore subject matters and OSs have in principal the same characteristics and are not distinguishable from each other. Yet less generally and bit more specifically a subject matter (SM), in the preferred exemplary embodiments of this application, is a word or combination of a word that shows a repeated pattern in many documents and people or some groups of people come to recognize that word or combinatory phrase. Nouns and noun phrases, verbs and verb phrases, with or without adjectives, are examples of subject matters. For instance the word “writing” could be a subject matter, and the phrase “Good Writing” is also a subject matter. A subject matter can also be a sentence or any combination of number of sentences. They are mostly related, but not limited, to nouns, noun phrases, entities, and things, real or imaginary. But preferably almost most of the time is a keyword or set of keywords or topic or a title of interest.
- 8. Body of Knowledge: is a composition or set of compositions available or assembled from different sources. The body of knowledge can be related to one or more subject matter or just a free or random collection of compositions. The “Body of Knowledge” may be abbreviated from time to time as BOK in this application. The BOK can further include compositions of different forms for instance one part of an exemplary BOK can be a text and another part contains video, or picture, or a genetic code.
- 9. The usage of quotation marks “ ”: throughout the disclosure several compound names of variable, functions and mathematical objects (such as “participation matrix”, “conditional occurrence probability” and the like) will be introduced that once or more is being placed between the quotation marks (“ ”) for identifying them as one object and must not be interpreted as being a direct quote from the literatures outside this disclosure (except the incorporated referenced patent applications).
- Now the invention is disclosed in details in reference to the accompanying figures and exemplary cases and embodiments in the following sub sections.
- The invention is now described in detailed disclosure accompanying by several exemplary embodiments of the system and its blocks according to the present invention.
- Although the method is general with broad applications and implementation, the disclosure is described by way of specific exemplary embodiments to consequently describe the implications and applications in the simplest form embodiments and senses.
- Without restriction intended for any form of contents such as text, audio, video, pictures and the like we start by describing the embodiments with regards to inputs as the body of knowledge in the form of text. However, for other forms of content the present methodology and process can be used once one considers that all types of contents are different realization of semantic representations of the universe. Therefore a semantic or knowledge representation transformation will make the current description applicable to all forms of contents and particularly all forms of electronic contents available.
- Also since most of human knowledge and daily information production is recorded in the form of text (or it can be converted to text), the detailed description is focused on textual compositions to illustrate the teachings and the method and the system. In what follows the invention is described in several sections and steps which in light of the previous definitions would be sufficient for those ordinary skilled in the art to comprehend and implement the method and the systems and the applications.
- Following the formulation introduced in the patent application Ser. No. 12/939,112 (especially EQ. 1-14) we proceed to evaluate the value significance measures (VSMs) of the lower order and higher order OSs of the input body of knowledge (BOK). For instance, the VSMs of the words and the VSMs of the sentences or paragraphs of the BOK can be calculated using the formulation and algorithm of the patent application Ser. No. 12/939,112.
- However, in section II-I, a summarized version of the formulation which helps to explain the current inventions is recited here again. The complete formulation is found in the incorporated referenced applications. In section II-II, the composing method then is explained in reference to the accompanying figures and the formulation method in section II-I here.
- Assuming we have a given composition of ontological subjects, e.g. an input text, the Participation Matrix (PM) is a matrix indicating the participation of each ontological subject in each partitions of the composition. In other words in terms of our definitions, PM indicate the participation of one or more lower order OS into one or more OS of higher or the same order. PM is the most important array of data in this disclosure containing the raw information from which many other important functions, information, features, and desirable parameters can be extracted. Without intending any limitation on the value of PM entries, in the preferred embodiments throughout most of this disclosure (unless stated otherwise) the PM is a binary matrix having entries of one or zero and is built for a composition or a set of compositions as the following:
-
- break the composition to desired numbers of partitions. For example, for a text document we can break the documents into chapters, pages, paragraphs, lines, and/or sentences, words etc.,
- identify the desired form, number, and order of the ontological subject of the composition by appropriate method such as parsing a text documents into its constituent words and phrases, sentences, etc.,
- select a desired N number of OSs of order k and a desired M number of OSs of order l (these OSs are usually the partitions of the composition from the step 1) of the composition, according to certain predefined criteria, and;
- construct a N×M matrix in which the ith raw (Ri) is a vector, with dimension M, indicating the presence of the ith OS of order k, (often extracted from the composition under investigation), in the OSs of order l, (often extracted from the same or another com position under investigation), by having the value of one (or a nonzero value), and not present by having the value of zero.
- We call this binary matrix the Participation Matrix of the order kl (PMkl) which can be shown as:
-
- where OSi l is the ith OS of the lth order, OSi k is the ith OS of the kth order, extracted from the composition, and PMij kl=1 if OSi k have participated, i.e. is a member, in the OSj l and 0 otherwise.
- The association strengths play an important role in evaluation of some of the value significances of OSs of the compositions and, in fact, are entries of a new matrix called here the “Association Strength Matrix (ASMk|l)” whose entries will be defined as the following:
-
- where c is a predetermined constant or a predefined function of other variables in Eq. 2. However in this disclosure we can we conveniently consider the case where c=1.
In Eq. 2, comij k|l denotes the co-occurrences of OSi k and OSj k in the set of OSs of order l OSl, and in fact are the entries of the Co-Occurrence Matrix (COMk|l) that is given by: -
COM k|l =PM kl*(PM kl)′ (3), - and the iopi k|l and iopj k|l are the “independent occurrence probability” of OSi k and OSj k respectively. The probability of independent occurrence is the “Frequency of Occurrences” (FOi k) i.e. the number of times an OSk has appeared in the composition or its partition, divided by the total number of occurrences of all the other OSs of the same order in the composition, or divided by the number of possible occurrences of an OS in the partitions. The “Independent Occurrence Probability (IOP)” therefore is given by:
-
iop i k|l=γn ·FO i k (4) - wherein γn is a normalization factor that is determined by the mathematical necessities in different situations. For example, when iopi k|l refers to the independent probability of occurrence of OSi k in the M partitions of the composition then γn=1/M, wherein more than one occurrences of OSi k in a partition is not counted. The frequency of occurrences can be obtained by counting the occurrences of OSs of the particular order in the composition or its partitions, e.g. counting the appearances of particular word in the set of OSl, or more conveniently obtained from the main diagonal of COMk|l, i.e. comii k|l, or the self-occurrence.
- It is important to notice that the association strength defined by Eq. 2, is not symmetric and generally asmji k|l#asmij k|l.
- Following the formulation introduced in 12/939,112 (especially EQ. 3-14) one can proceed to evaluate the value significance measures (VSMs) of the lower order and higher order OSs of the input body of knowledge (BOK). For instance, the VSMs of the words and the VSMs of the sentences or paragraphs of the BOK can be calculated using the formulation and algorithm of the patent application Ser. No. 12/939,112. Moreover, other appropriate measures of significances other than those mentioned exemplary in the application Ser. No. 12/939,112 can be defined as functions of one or more of the exemplary VSMs or any other mathematical objects introduced in that application.
- The value significance of higher order OSs,
e.g. order 1 in here, can be evaluated either by direct value significance evaluation similar to lower order OSs, or can be derived from value significance of the participating lower orders into higher order. Conveniently one can use the VSMxi k|l (x=1, 2 . . . ) and the participation matrices to arrive at the VSMxi l|k of higher order OSs or the partition of the composition as the followings: -
VSMx j l|k=Σi VSMx i k|l *pm ij kl (5). - Eq. (5) can also be written in its matrix form to get the whole vector of value significance measure of OSs of order l|k (l given k). i.e. VSMxl|k, as a function of the participation matrix, PMkl, and the vector VSMxk|l.
- If required the scores of the partitions, calculated based on the VSMk|l of the choice, can further be scaled or normalized. For instance the score or the resultant VSM of a partition (i.e. the VSMl|k in Eq. 5) can be divided by the number of the OSs contained in the partition or by the total number of the characters used in the partitions etc. in order to have a “density value significance measures” of the partitions of the BOK.
- II-II Methods for Composing a New Contnet from a BOK
- Having defined the pre-requisite variables, function, and matrices we now explain the process and method of composing new contents for and/or from a “body of knowledge (BOK)”.
- One preferred embodiment of the invention is now described in detailed in reference to the
FIG. 1 . Referring toFIG. 1 here, it shows schematically one embodiment of the block diagram of the system and algorithm of generating new compositions from a body of knowledge. The notations and abbreviations are common with the patent application Ser. Nos. 12/939,112 and 12/755,415. - As shown in the
FIG. 1 , the system has access to a body of knowledge. The body of knowledge can be a collection of compositions or a single composition. The body of knowledge can be assembled by querying a search engine and collect a desired number of documents related to query or the subject matter. In general the system have access or assembles a body of knowledge or a corpus related to one or more subject matter form the variety of repository sources that might be available to the system including all type of knowledge repositories, data bases etc. - For simplicity and easier comprehension of the system according to the present invention, we assume that our exemplary input body of knowledge is a written text or has been transformed to a written text. Then the corpus or the BOK (also called the input composition in this application and the references herein from time to time) is partitioned to a desired number of partitions of different length or preferably to syntactically correct semantic units (such as word, sentences, paragraphs, etc.). In the preferred method the input composition is parsed to its constituents, words as
OS order 1, sentences as OS order 2, the paragraphs as OS order 3, and so on. - As shown in
FIG. 1 , the extracted OSs of different orders of the BOK are stored in arrays of suitable format and storage efficiency and ease of retrieval. The storage can be temporary or more permanent computer readable media, for having accessed by other programs or be used in other similar sessions. - Concurrently or consequently the desired number of Participation Matrix/es (PM/s), as was described in section II-I, are built and also stored for further use. Participation matrix can be stored numerically or by any other programming language objects such as dictionaries, lists, list of lists, cell arrays, databases or any array of data etc. which are essentially different representation forms of the data contained in the PM/s. It is apparent to those skilled in the art that the formulations, mathematical objects and the described methods can be implemented in various ways using different computer programming languages or software packages that are suitable to perform the methods and the calculations.
- Moreover storage of any of the objects and arrays of data and the calculations needed to implemented the methods and the systems of this invention can be done through localized computing and storage media facilities or be distributed over a distributed computer facility or facilities, distributed databases, file systems, parallel computing facilities, distributed hardware nodes, distributed storage hubs, distributed data warehouses, distributed processing, cluster computing, storage networks, and in general any type of computing architectures, communication networks, storage networks and facilities capable of implementing the methods and the systems of this invention. In fact the whole system and method can be implemented and performed by geographically distant computer environments wherein one or more of the data objects and/or one or more of the operation and functions is stored or performed or processed in a geographically different location from other parts storing or performing or processing one or more of the data objects and/or one or more of the operations or functions of this disclosure.
- Referring to
FIG. 1 again, concurrent to making PM or consequently and by following the formulation of section II-I, and utilizing the algorithm and system of the patent application Ser. No. 12/939,112, the system builds the Association Strength Matrix/es (ASM/s) and also keep them in temporal or more permanent computer readable storage medium. - Having built at least one the PM/s and/or one of the ASM/s, system can proceed to evaluate at least one of the “Value Significance Measures (VSM/s)” of the partitions and OSs of the desired order from their usage and their pattern of participation in the input composition, as shown in the
FIG. 1 . - Having built the ASM, the system now can consider the ASM as an asymmetric directed graph as was explained in the patent application Ser. No. 12/939,112 referenced before, and use the ASM to build several other desirable graphs or maps. One of the desired maps in this application would be a map or a plan or a route that can show the relations between the OSs of the body of knowledge based on the “most significant associates (MSA)” which in turn can be based on their value significance and their strength of associations to each other. Such map or route can be followed by the composer module to make sure that the generated composition is coherent and sensible and represent the same essence of knowledge as the input body of knowledge. Therefore as shown in
FIG. 1 a principal map can be obtained or envisioned from which a composing backbone route or principal route is selected according to the method and algorithm that will be explained by referencing toFIG. 2 , a, and b of this application. The principal route can also be derived from the ASM directly as exemplified in the method shown inFIG. 3 . - Also shown in the
FIG. 1 , is the composer block or module that composes a new composition by assembling the scored partitions of the body of knowledge based on the VSMs of the partitions according to the backbone or the principal route/s, and by using the participation information of the partitions into each other. The composer further might have several other predetermined criteria that should be considered in composing the output composition. Such criteria could be the length or percentage ratio of the generated composition relative to the given BOK, or the style, the type of substance (verified or novel), etc. The new composition will be usually composed or built as a summarization of the body of knowledge, a general overview or complete overview of the body knowledge, or novel aspects of the BOK. - The advantage and value of such new composition is that important partitions having significant value in the body of knowledge are identified and recomposed in a systematic and logical manner which can be automated while it is readable and comprehensible by a human consumer. Moreover and more importantly the generated composition will not overlook important issues unlike a human composer. A human composer can easily get confused and lose the main points due to the sheer volume or diversity or size of the information or the knowledge embedded in the body of knowledge.
- The aim is to have a much cleaner and logical view of the body of knowledge in a much shorter and structured compositions so that a consumer can save lots of research and trial times and making sure that the user has access to the most valuable knowledge related to his/her subject matter/s of interest. The new compositions, or the system which in fact could be used as a tool for knowledge seeker, may be named as an answer, a summary, an essay, a response, a report, a content etc. and be used in variety of situations depend on the output length of the generated composition.
- Referring to
FIG. 2 a now, it shows one exemplary principal map of the knowledge of the input body of knowledge which can be formed, as one example, using the following protocol: -
- from the ASM calculate one of the VSM measures (VSM2 or the ASN for instance is good quality value measure) for an initial set of OSs of interest from the BOK, select a first set of OSs, having one or more member and poses the most significant value from said original set regarding a predetermined aspect, represent said first set of OSs in the first layer of tree like graph or map, as shown in
FIG. 2 a, as first layer nodes, - identify a desired number of most significant associates (MSA) (having for instance the highest association strength) of each member of said first set of OSs, which form the second set of OSs and are represented by corresponding nodes in the second layer; and
- repeating step 3 for said second set of OSs and represent them as nodes of the graph in the third layer, 4th layer and so forth until predetermined criteria such as number of layers, number of total nodes, minimum strength of the edges between each two nodes, and the likes are met.
- from the ASM calculate one of the VSM measures (VSM2 or the ASN for instance is good quality value measure) for an initial set of OSs of interest from the BOK, select a first set of OSs, having one or more member and poses the most significant value from said original set regarding a predetermined aspect, represent said first set of OSs in the first layer of tree like graph or map, as shown in
-
FIG. 2 a, shows one exemplary embodiment of principal map that can be driven from the ASM matrix. The principal map can further be refined with more restrictive predetermined criteria to be used as the route or the plan for composing the new content composition. The refined map is called “the principal or backbone route” or “composing plan” here. -
FIG. 2 b, shows one more exemplary principal route or composing plan or route. In this embodiment the principal route is the route of the strongest association to its above layer associates. The thicker line route is one exemplary principal or backbone route and is determined by: -
- selecting at least one OS or node from the first layer,
- selecting at least one OS from the next layer having the “Most Significant Association (MSA)” with said selected OSs of the first layer, and connect the first layer OSs with the most significant association, e.g. strongest association, in the second layer, and
- repeat the step 2 for the most significant associates of the first layer, to find the most significant associates of the second layer to form the third layer and so on or until a predetermined criteria is met.
- The actual depictions of the graphs are not necessary for composing the new composition. Moreover the backbone route can directly be derived from the ASM or other derivative matrices. The graphs are to demonstrate that there is more than one way to compose the composition after having the ASM and/or the VSMs of the ontological subjects and/or partitions of the body of knowledge.
FIG. 2 a, and 2 b are just two exemplary reasonable maps that can be useful and insightful. -
FIG. 3 shows one actual exemplary selection process and the algorithm of finding the nodes of principal or backbone route using the ASM and VSM. - As seen in this exemplary embodiment we start with the most valuable OS of order k of the composition whose value is shown as vsmj k|l in
FIG. 3 which is corresponded to OSj k, looking into the jth column of the ASM find the most significant associates/s to the OSj k, (in this example the one that has highest asm in column j) which in this embodiment is assumed to be OSi k, and then come back to the ith column of the ASM and find the most significant associates (the one that has highest asm in column i of the ASM) which is assumed to hit OSp k as shown inFIG. 3 , and then find the strongest associate for OSp k which was found to be OSq k, and so on. Obviously more parameters such as VSMs of the ontological subjects can also be considered beside the association strength in forming a decision regarding the selection of the OSs of the composing route. - In this way we can make a list (or an ordered set) of the OSk (nodes) on the backbone or composing route which is shown in the
FIG. 3 , as “Composing Route Nodes (CRN)” or the vector, or the list or the set which is denoted by CRNk|l inFIG. 3 . The composer can start from the first two or more of the OSs in the CRNk|l and find the partitions (simply by doing an AND operation of the corresponding rows of the OSs of CRNk|l in the PM) that contain the selected OSs in the list of CRNk|l. From these set of partitions (i.e. first selected set of OSl s) then select a desired number of them based on their value significance (i.e. VSMk|l in Eq. 5) for inclusion in the new composed content. Again the same process can be done for the second group of two or more OSs of CRNk|l (e.g. just by shifting the index in the list) and find all the desired partitions as the ingredients of or the constituent semantic parts of the new compositions. - It is noticed that various other ways of composing a new content composition can be devised without departing from the scope and spirit and the teachings of the invention. For example, the process can also be done dynamically in such a way that finding or selecting an OSs for inclusion the composing route and then find the candidate partitions for inclusion in the new content composition and then move on to finding the next OSs of the composing route and repeating the process until certain criteria are met.
- In general, unless looking for a specific part of the map, the route usually starts form the highest valued (having the highest VSM regarding the important aspects of the parts of the BOK) in the first level or layer and pass through the most significant associates of each of the OSs of the earlier layer. The most significant associate can mean the OS that has the highest association strength or those associates that have highest VSM, or any desirable function of the association strength and VSM. In general the “Most Significant Associates of OSi k (MSAi k|l)” can be given by a set or a vector:
-
MSA i k|l=ƒ(asm ji k|l ,VSM j k|l)≧γ and j=1,2 . . . N (6), - where ƒ is a predefined function and γ is a predetermined value employed here as a threshold. Collection of the MSA for all the OSs can again be represented by a matrix called “Most Significant Association Matrix (or MSAMk|l)” for which the MSAi k|l is the ith row. The edges of the graph between each two nodes of the principal route therefore can be obtained from MSAMk|l, e.g. as shown in
FIG. 2 b, the edge between the node OSp k and OSq k is denoted by msampq k|l. - In other words, generally, the principal or backbone route can be identified from MSAMk|l, which is based on the predetermined form of the function ƒ in Eq. 6, and the desired number of nodes in the principal route or any other constraint on the value of the elements of MSAMk|l.
- Many different composing routes or backbones can be devised, selected or identified based on the desired form and application of the generated content. For instance, criteria for the desired content could be to have information about the relations of the OSs demonstrating a predetermined range of association strength to each other or to one of most valued OSs. The final generated content could be a simple answer about a subject matter, a summarization of BOK related to a subject matter, a tutorial paper about the subject matter, background information content, or contains novel information of the BOK of a subject matter. For instance, a novel content can mostly include the less known (having lower VSM) OSs in the BOK but, optionally, with strong association to high valued OSs. For example to emphasize on the novel aspects of the BOK one can use the following VSM for OSi k:
-
VSM6i k|l=−logb iop i k|l (7) - wherein b is the logarithm base that one can choose b=2 for familiarity and convenience. This value significance (VSM6 i k|l) is in fact a function of VSM1 i k|l that magnifies the novelty of an OS (e.g. the OSi k) in the value significance of the partitions. The VSM6 i k|l also may be called the self-information of OSi k. The partition containing more of OSk of high VSM6 k|l scores high in regards to the novelty aspect of a partition of the BOK.
- However, optionally the scores of the partitions based on the VSM of the choice can further be scaled or normalized when it is more appropriate. For instance the score or the resultant VSM of a partition (i.e. the resultant VSM6 l|k from Eq. 5) can be divided by the number of the OSk contained in the partition or by the total number of the characters used in the partitions etc. in order to have a fair comparison of the merits of a partition among a set of partitions of the BOK.
- In another aspect one may want to select the partitions of substance and novelty for inclusion in the generated composition and therefore she/he might yet define another VSM to be used for evaluation of the partitions as the following:
-
VSM7i k|l=α1 VSM2i k|l+α2 VSM6i k|l (8) - wherein α1 and α2 can be some preselected constants. This value significance (VSM7 i k|l) is in fact a function of VSM2 i k|l and VSM6 i k|l (i.e. a function VSM2 i k|l and VSM1 i k|l) that can be used as a balance measure of substance and novelty of the partitions of the BOK employing Eq. 7. Or one may find a VSM function in the following form be more appropriate for her/his type of application:
-
VSMx i k|l =−iop i k|l·logb iop i k|l−logb iop i k|l=−logb iop i k|l(1+iop i k|l) (9) - Obviously numerous other value significances or combinations of them can be defined and introduced by those skilled in the art without departing from the scope and sprit of this invention. Depends on the application's aspect, and as mentioned in the patent application Ser. Nos. 12/939,112, various “value significance measures (VSMs)” can be defined as functions of other VSMs to serve the desired style, aspect, and purpose of the content composition generations. These VSMs play a role in filtering or selecting the most suitable parts or partitions of the composition (e.g. words, sentences, paragraphs, webpages, and documents.etc.) based on and for the desired application/s or goal/s.
- Also although in this preferred exemplary embodiments we use the ASM to indentify the route/s and map/s, other forms of association or any measure of significance of the associations between OSs of the BOK can be used to construct and identify the backbone rout, or the composing plan. For instance an Ontological Subject Map (OSM) introduced in the US patent application entitled “System and Method of Ontological Subject Mapping for knowledge Processing Applications” filed on Aug. 26, 2009, application Ser. No. 12/547,879, can be used. Generally any form of graphs representing the body of knowledge, such as semantic networks or maps, social networks, ontology databases, ontology trees, and the like, can be utilized for identification of a principal, backbone, or composing route.
- Referring to
FIG. 4 now, it shows the composer in more specific but general details. It shows an exemplary way that the composer performs and composes a content form the partitions of the BOK. This is one exemplary embodiments and protocols of using the contents of BOK and the derived data from the BOK to generate a new composition of content from the BOK. - The system can have a plurality of format for generating content. In one exemplary and important case assume the composer is designed to produce an authoritative article or content about the principal subject matter of the BOK.
- So such a content or article needs a title and several sections such as “Introduction” or background along with a number of sections presenting enough information about the most important aspects of the subject matter of the title.
- So one exemplary protocol for composing such an authoritative article in two general cases devised or can be considered here are:
-
- The subject matter of interest is known and we assembled a number of related content to this subject matter and have a body of knowledge about the subject but it is not well structured and dispersed or simply it is too long to be handled by human.
- There is a body of knowledge and we do not know what is it all about?
- For both cases, the system will follow the method and teachings of the current invention to extract the partitions (OSs) of the BOK, make an association strength matrix for the desired OSs (usually the words or phrases used in the BOK) and have identified the backbone rout and have obtained at least one VSM (value significance measure) for the desired OSs with the desired orders (usually the words and sentences or the paragraphs of the BOK) and have arrays or lists of the OSs of the different order in data base (temporary or more permanently) and the PM information. Now the system and the composer will perform the followings:
-
- identify the most significant OSs, e.g words or OS1, of the BOK by looking at the VSM (for instance the one which has the highest association strength number. i.e. ASN as defined in the application 61/259,640, and consider the most significantly valued OS as the main subject matter of new composition.
- If there are more than one OSs that have very close VSM the subject matter can contain either one of them or any combination of them.
- if the identified subject matter by the system is not the same as subject matter for which the BOK has been labeled (
case 1 above), then consider said labeled subject matter as the main OS in the first layer of principal map and proceed to next steps. - Identify the most significant sentence or statement from the array of stored OSls containing the identified most significant OSs or the subject matter, by looking at the PM and VSM for the sentences (that can be calculated by employing Eq. 5),
- use this statement as a title, or simply put the subject matter/s as the title. The title can include more than one subject matter.
- For the introduction section, from the ASM or principal map or backbone route, identify the most significant associates (MSA vector of Eq. 6) of the subject matter or the title, and find a desired number of sentences from the stored arrays of the OS2s of the BOK (i.e the sentences) which contain the subject matter and at least one or more of the most significant associates of the subject matter.
- Then after the introduction section, several following sections will be added. These sections follow the backbone route and include the most valuable partitions of the BOK that explain a relationship between the most significant associates of that layer of principal route. That means identifying the partitions that contain one or more of the associates of the associates of the subject matters or any combination of them and include them in the current section at the predetermined place. Moreover, for example, each important section can have a title (e.g. that indicate one of the most significant associates of the subject matter alone or in conjunction with the subject matter), and there could be assembled one or more paragraphs, composed of one or more sentences, which contain at least one OS from title of the section or its most significant associates. These sentences (or the paragraphs) can be identified, (by identifying their index) from the MSAi 1|2 (or MSAi 1|3) vector of each OSi 1, then from the PM find the partitions that they have been appeared together and by looking at their VSM of the sentences (or paragraphs), select the desired number of high value sentences/paragraphs that contain the associate of OSi 1s and then retrieve them from the stored array of OSj 2s (or OSj 3) of the BOK.
- The procedure can be repeated for different branches of the backbone route without departing too far from the principal or backbone route. Many measures of distance and metrics can be defined to show the relevance and closeness of the selected partition in each of the section to the backbone route. That will guarantee certain level of coherency and semantic relevance in the generated content.
- Furthermore each section and sub-section can have a localized composing plan of its own. For instance in the Introduction section it can be regards as an smaller content that its structures and criteria are different from other subsections explaining the details about the most significant associates of the subject matter and so on.
- The block diagram of
FIG. 4 , is intended for its generality and illustration and should not be interpreted as the only way of composing content or as limitations to the composing methods disclosed herein. Those familiar with the art may devise other methods and systems of building the composer with fewer steps and different complexities without departing from the scope and sprit of this disclosure that is emphasized in generating new composed contents from a body of knowledge. The body of knowledge and or collection of composition in particular may include multimedia content, Unicode strings, mathematical formulas, pictures, figures, data files etc. - Furthermore, in case one above (case 1) the subject matter can itself be a lengthy content, or the subject matter could be extracted from content given by a user/client. For instance a user can input or give the address to a content (e.g. a webpage) and would like to have further investigation into this content by using the method. Alternatively the system can extract the subject mater/s of the given content and assemble related body or bodies of knowledge and then perform the method of content composition.
- Referring To
FIG. 5 now, it show that the composer can further have several layers of editorial blocks that is responsible to make the generated content yet more readable, useful, coherent and semantically and syntactically correct, that can adequately represent the most important desired aspects (background, novelty, all the most significant subject matters etc.) of a BOK. As shown the editorial levels use the backbone route, (or can make yet a new route, considering the raw composed content as an input composition) and the retrieved selected partitions for the inclusion in the generated content, to make sure that the desired standards of syntactical and graphical appearances etc. are met. - Other checking measure of quality and substance can be devised and added to the composer for better quality of the composed content. Alternatively the content composing can be done with more than one iteration until certain measures of quality and knowledge substance are met. The preferred method and algorithm will depend on the processing power and the recourses available for implementing the method and the algorithms. For instance the generated content can again be analyzed and its principal map be compared against the principal map of the original body of knowledge. Or VSM spectrum of the generated content is compared to that of the BOK. However, the automatically generated content composition may also be further edited by human operators and editors for final quality check.
- Moreover, many other quantitative measures of a quality of the generated content can be devised without departing from the scope and sprit and goal of the current invention. For instance one can measure the real information of the BOK (using for instance the “differential conditional entropy measure” introduced in the patent application Ser. No. 12/939,112) and that of the generated content etc. for comparison.
- It is worth mentioning that the method of generating content compositions according to this disclosure and the accompanying references, will present the most credible and valuable parts of the body of knowledge (in regards to the desired aspect/s of the partitions) and therefore the generated contents will pose a high level of confidence in accuracy and substance.
- Referring to
FIG. 6 now, it shows an important application of the method and the system of automatic content generation from a body of knowledge in response to a user's request. The system ofFIG. 6 will assemble a body of knowledge for the client or user and then generates the requested form of the content with the predetermined or optional formats for the user. - The user's request can be a keyword, a question posed in natural language, or in general any content short or long. The system may first extract the OSs of the input request and find the keywords from the input request and assemble a BOK that is related to these keywords. Consequently as shown in
FIG. 6 by following the method and algorithms of this application provide the desired content in the from of an answer, a coherent summarization of the assembled BOK, a content explaining the novel aspects of the keywords in the context of the assembled BOK, a tutorial content, and the like, to provide an answer as a service to the user's request. - The input request can further be an existing content such as paper, a webpage, or a pre-built body of knowledge for which a user wants to have a composed content or like to have further investigations in a larger scale of related knowledge and information. In this case a user can request a service for investigating the submitted paper or the content and demand a report of the investigation from the system in variety of forms such as the merit of the submitted content in comparison to larger body of knowledge in the same field or context. Or demand an authoritative report or summary or an essay regarding and related to subject matter/s of the submitted content etc. Those skilled in the art can envision various applications and further modes of operation for the system and methods disclosed here without departing from the scope and sprit of the invention.
-
FIG. 7 shows, an exemplary application system and/or an online service provider system in which there are provided the web service appliances in the forms of storage, servers and software, and hardware that may contain pre-generated content for a list of subject matters and stored them for easy retrieval in response to a user's request for content or will create a content composition in response to a client input. The building blocks of the composer service engine are explained in theFIG. 7 itself. - Referring to
FIG. 7 , for instance if the system has had generated content for the subject matter of the client's request, then it will return the premade content related to the subject mater of the client's request. If the system does not have the requested content or not in accordance with the requested format, then it will generate content with the desired format using the methods and systems of composing new content of the invention and by having access to repositories of knowledge, and information. The repositories of knowledge and information can be the available databases, corporate database/s, a publisher content collection, in-house repositories or otherwise, such as database of a search engine, or the whole internet. It also can include all types of different information representations such as multimedia. - The system repositories of the premade content can further be classified under different subject matters, keywords, or possible on line journals, encyclopedias, wiki groups and the like. The system can at the same time work real time to constantly incorporate the latest findings in a body of knowledge related to a subject matter and modifies the generated content to reflect the latest findings, or add more contents to its repositories. Furthermore the system can analyze a submitted content or body of knowledge by a user, or expand the content or the submitted body of the knowledge and generate new content compositions of requested formats, style, substance etc in demand.
- In conclusion, in this disclosure it is noticed that a document representing the collective knowledge of a diverse set of compositions containing information about a topic should first of all cover the most important aspects of the topic and its associated subtopics. Secondly it should contain the information according to the state of the collective knowledge and understating of the mass about that topic. Thirdly it should follow a logical path toward connecting the information about the knowledge therein so that it is easy for human to comprehend and follow the relations between the most important parts of knowledge describing or analyzing or supporting a topic.
- Moreover, the methods, algorithms, and the systems disclosed in this application propose a great benefit to the knowledge professional and knowledge seekers so as to shorten their research time significantly while the generated content according to the teaching and the systems and services proposed in this applicant can give them valid account of a body of knowledge, without bias, overlooked facts, limitation on the subject matters, language, or compromise on the quality of knowledge. An important advantage of the methods disclosed herein that they not relay on the individual semantic or syntactic symbols and/or terms of the composition in order to provide a satisfactory service. The systems, methods and algorithms explained here, are expected to accelerate the rate of knowledge discovery significantly, and make the task of learning and knowledge acquisition, research, and analysis of the knowledge and information much more efficient and effective.
- It is understood that the preferred or exemplary embodiments and examples described herein are given to illustrate the principles of the invention and should not be construed as limiting its scope. Various modifications to the specific embodiments, formulations, and algorithms could be introduced by those skilled in the art without departing from the scope and spirit of the invention as set forth in the following claims.
Claims (20)
1. A non-transitory computer readable medium having computer executable instructions stored thereon that when executed by one or more processors, cause to construct one or more data arrays, from a composition of ontological subjects, respective of at least one route on a graph, said graph representing connections and associations of ontological subjects and/or value significances of ontological subjects of the composition.
2. The storage medium of claim 1 , wherein said instructions comprises instructions for calculating at least one quantitative measure indicative of the associations of the ontological subjects of the composition and/or one or more quantities indicative of one or more value significances of at least one ontological subject and/or one or more quantities indicative of one or more value significances of at least one partition of the composition.
3. The storage medium of claim 1 , wherein said instructions further comprises instructions for using the data of said one or more data arrays respective of the at least one route to select one or more partitions of the composition.
4. The storage medium of claim 1 , said instructions further comprises instructions for selecting one or more partitions of the composition based on a value of at least one function of at least one of said indicative quantities of value significance of the ontological subjects and/or those of the partitions.
5. The storage medium of claim 1 , wherein said composition is an assembled body of knowledge related to at least one of the ontological subjects contained in the body of knowledge.
6. The computer implemented method of claim 6 , wherein one or more constituent ontological subjects of the selected partitions are replaced with other ontological subjects that may not be a part of the partitions of the body of knowledge.
7. A method of calculating an association strength value for a pair of ontological subjects of a predefined order of a body of knowledge comprising:
partitioning, using one or more processors and one or more data storing mediums, the body of knowledge into plurality of partitions;
determining number of co-occurrences of the pair in the partitions of the body of knowledge
estimating frequency of occurrences of one or both of the ontological subjects of said pair;
calculating the association strength value for said pair of ontological subjects of the predefined order of the body knowledge as a function of said number of co-occurrences of the pair and frequency of occurrences of at least one of the ontological subjects of said pair of ontological subject of the predefined order.
8. The method of claim 7 wherein further a composing route is identified using said association strengths of pairs of ontological subjects of the predefined order of the body of knowledge.
9. The method of claim 7 ; further comprising calculating a significance value for at least one of said ontological subject of the predefined order as a function of frequency of occurrences of one or more ontological subjects of the predefined order and/or co-occurrences of said at least one ontological subject of the predefined order with one or more of said ontological subjects of the predefined order.
10. The method of claim 9 , wherein further a composing route is identified as a function of said association strengths of one or more pairs of ontological subjects of predefined order of the body of knowledge and one or more said significance values of the ontological subjects of predefined order.
11. A method of generating content from a body of knowledge comprising:
finding, using one or more processors and one or more data storing mediums, connections and associations between constituents ontological subjects of the body of knowledge;
selecting one or more of the ontological subjects according to one or more predefined types of connections between the ontological subjects; and
selecting one or more partitions of the body of knowledge having predefined relations with one or more of said selected ontological subjects thereby to assemble a content composition employing one or more of said selected partitions.
12. The method of claim 11 , wherein the body of knowledge is assembled in response to a given content.
13. The method of claim 11 , wherein one or more constituent ontological subjects of the selected partitions are replaced with other ontological subjects that may not be a part of the partitions of the body of knowledge.
14. A computer implemented method of generating content composition comprising:
accessing, using one or more processors and one or more data storing mediums, a body of knowledge;
identifying, using one or more processors or one or more data storing mediums, at least one composing route or map; and
selecting, using one or more processors or one or more data storing mediums, one or more partitions of the body of knowledge according to the composing route or map and assembling a content composition.
15. The computer implemented method of claim 14 , wherein the body of knowledge is partitioned to a plurality of partitions and one or more partitions of the body of knowledge is decomposed to their constituent ontological subjects assigned with an order lower than the order of the partitions.
16. The computer implemented method of claim 14 , wherein the composing route or map is identified based on a function of one or more quantities respective of one or more of the followings:
co-occurrence numbers of said lower order ontological subjects;
association strengths of said lower order ontological subjects;
probability of occurrences of the lower order ontological subjects of the body of knowledge;
value significances of the lower order ontological subjects, and
value significances of the partitions of said body of knowledge.
17. The computer implemented method of claim 14 , wherein the composed content includes at least one partition of the body of knowledge having certain predetermined quantity level of at least one type of value significance measures and contain one or more ontological subjects from:
the ontological subjects on the composing route; and
associates of the ontological subjects of the composing rout.
18. The computer implemented method of claim 14 , wherein the composed content is about one or more of predetermined ontological subjects.
19. The computer implemented method of claim 14 further comprising using instructions executable by one or more processing devices to perform a method for identifying one or more ontological subjects of a composition or the body of knowledge, comprising:
instructions for calculating quantities indicative of association strengths of the ontological subjects of the composition to each other;
instructions for calculating quantities indicative of at least one type of value significance of the ontological subjects of the composition;
instructions for identifying a set of ontological subjects based on their association strengths and/or the value significance quantities, wherein said set has at least one member.
20. The computer-readable medium of claim 14 , wherein said body of knowledge is assembled for an input content.
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/018,102 US20140006317A1 (en) | 2009-11-23 | 2013-09-04 | Automatic content composition generation |
US14/616,687 US9679030B2 (en) | 2008-07-24 | 2015-02-07 | Ontological subjects of a universe and knowledge processing thereof |
US14/694,887 US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
US15/589,914 US10846274B2 (en) | 2007-07-26 | 2017-05-08 | Ontological subjects of a universe and knowledge representations thereof |
US15/597,080 US10795949B2 (en) | 2007-07-26 | 2017-05-16 | Methods and systems for investigation of compositions of ontological subjects and intelligent systems therefrom |
US17/080,245 US20210073191A1 (en) | 2007-07-26 | 2020-10-26 | Knowledgeable Machines And Applications |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US26368509P | 2009-11-23 | 2009-11-23 | |
US12/946,838 US8560599B2 (en) | 2009-11-23 | 2010-11-15 | Automatic content composition generation |
US14/018,102 US20140006317A1 (en) | 2009-11-23 | 2013-09-04 | Automatic content composition generation |
Related Parent Applications (6)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/908,856 Continuation US20110093343A1 (en) | 2007-07-26 | 2010-10-20 | System and Method of Content Generation |
US12/946,838 Division US8560599B2 (en) | 2007-07-26 | 2010-11-15 | Automatic content composition generation |
US13/962,895 Division US8793253B2 (en) | 2007-07-26 | 2013-08-08 | Unified semantic ranking of compositions of ontological subjects |
US13/962,895 Continuation-In-Part US8793253B2 (en) | 2007-07-26 | 2013-08-08 | Unified semantic ranking of compositions of ontological subjects |
US14/694,887 Division US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
US15/589,914 Division US10846274B2 (en) | 2007-07-26 | 2017-05-08 | Ontological subjects of a universe and knowledge representations thereof |
Related Child Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/939,112 Continuation US8401980B2 (en) | 2007-07-26 | 2010-11-03 | Methods for determining context of compositions of ontological subjects and the applications thereof using value significance measures (VSMS), co-occurrences, and frequency of occurrences of the ontological subjects |
US14/151,022 Continuation US9613138B2 (en) | 2007-07-26 | 2014-01-09 | Unified semantic scoring of compositions of ontological subjects |
US14/274,731 Continuation US20140258211A1 (en) | 2007-07-26 | 2014-05-11 | Interactive and Social Knowledge Discovery Sessions |
US14/694,887 Continuation US9684678B2 (en) | 2007-07-26 | 2015-04-23 | Methods and system for investigation of compositions of ontological subjects |
Publications (1)
Publication Number | Publication Date |
---|---|
US20140006317A1 true US20140006317A1 (en) | 2014-01-02 |
Family
ID=44062892
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/946,838 Active 2031-09-25 US8560599B2 (en) | 2007-07-26 | 2010-11-15 | Automatic content composition generation |
US14/018,102 Abandoned US20140006317A1 (en) | 2007-07-26 | 2013-09-04 | Automatic content composition generation |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/946,838 Active 2031-09-25 US8560599B2 (en) | 2007-07-26 | 2010-11-15 | Automatic content composition generation |
Country Status (2)
Country | Link |
---|---|
US (2) | US8560599B2 (en) |
CA (1) | CA2722287A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11107096B1 (en) * | 2019-06-27 | 2021-08-31 | 0965688 Bc Ltd | Survey analysis process for extracting and organizing dynamic textual content to use as input to structural equation modeling (SEM) for survey analysis in order to understand how customer experiences drive customer decisions |
WO2022226643A1 (en) * | 2021-04-27 | 2022-11-03 | Learnexperts Edtech Inc. | System and method for generating content based on other source content |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9595298B2 (en) | 2012-07-18 | 2017-03-14 | Microsoft Technology Licensing, Llc | Transforming data to create layouts |
US10282069B2 (en) | 2014-09-30 | 2019-05-07 | Microsoft Technology Licensing, Llc | Dynamic presentation of suggested content |
US20160092419A1 (en) * | 2014-09-30 | 2016-03-31 | Microsoft Technology Licensing, Llc | Structured Sample Authoring Content |
US10474726B2 (en) | 2015-01-30 | 2019-11-12 | Micro Focus Llc | Generation of digital documents |
US10078632B2 (en) * | 2016-03-12 | 2018-09-18 | International Business Machines Corporation | Collecting training data using anomaly detection |
CN108470025A (en) * | 2018-03-21 | 2018-08-31 | 北京理工大学 | Partial-Topic probability generates regularization own coding text and is embedded in representation method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030028500A1 (en) * | 2001-06-21 | 2003-02-06 | Jameson Kevin Wade | Collection knowledge system |
US20050038805A1 (en) * | 2003-08-12 | 2005-02-17 | Eagleforce Associates | Knowledge Discovery Appartus and Method |
US20080154992A1 (en) * | 2006-12-22 | 2008-06-26 | France Telecom | Construction of a large coocurrence data file |
US20090030897A1 (en) * | 2007-07-26 | 2009-01-29 | Hamid Hatami-Hanza | Assissted Knowledge Discovery and Publication System and Method |
US7496593B2 (en) * | 2004-09-03 | 2009-02-24 | Biowisdom Limited | Creating a multi-relational ontology having a predetermined structure |
US20100174526A1 (en) * | 2009-01-07 | 2010-07-08 | Guangsheng Zhang | System and methods for quantitative assessment of information in natural language contents |
US8041702B2 (en) * | 2007-10-25 | 2011-10-18 | International Business Machines Corporation | Ontology-based network search engine |
-
2010
- 2010-11-15 CA CA2722287A patent/CA2722287A1/en not_active Abandoned
- 2010-11-15 US US12/946,838 patent/US8560599B2/en active Active
-
2013
- 2013-09-04 US US14/018,102 patent/US20140006317A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030028500A1 (en) * | 2001-06-21 | 2003-02-06 | Jameson Kevin Wade | Collection knowledge system |
US20050038805A1 (en) * | 2003-08-12 | 2005-02-17 | Eagleforce Associates | Knowledge Discovery Appartus and Method |
US7496593B2 (en) * | 2004-09-03 | 2009-02-24 | Biowisdom Limited | Creating a multi-relational ontology having a predetermined structure |
US20080154992A1 (en) * | 2006-12-22 | 2008-06-26 | France Telecom | Construction of a large coocurrence data file |
US20090030897A1 (en) * | 2007-07-26 | 2009-01-29 | Hamid Hatami-Hanza | Assissted Knowledge Discovery and Publication System and Method |
US8041702B2 (en) * | 2007-10-25 | 2011-10-18 | International Business Machines Corporation | Ontology-based network search engine |
US20100174526A1 (en) * | 2009-01-07 | 2010-07-08 | Guangsheng Zhang | System and methods for quantitative assessment of information in natural language contents |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11107096B1 (en) * | 2019-06-27 | 2021-08-31 | 0965688 Bc Ltd | Survey analysis process for extracting and organizing dynamic textual content to use as input to structural equation modeling (SEM) for survey analysis in order to understand how customer experiences drive customer decisions |
WO2022226643A1 (en) * | 2021-04-27 | 2022-11-03 | Learnexperts Edtech Inc. | System and method for generating content based on other source content |
Also Published As
Publication number | Publication date |
---|---|
US8560599B2 (en) | 2013-10-15 |
US20110125837A1 (en) | 2011-05-26 |
CA2722287A1 (en) | 2011-05-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Guo et al. | A deep look into neural ranking models for information retrieval | |
US20140006317A1 (en) | Automatic content composition generation | |
US9684678B2 (en) | Methods and system for investigation of compositions of ontological subjects | |
US9679030B2 (en) | Ontological subjects of a universe and knowledge processing thereof | |
Rafiei et al. | A novel method for expert finding in online communities based on concept map and PageRank | |
US20120278341A1 (en) | Document analysis and association system and method | |
US9070087B2 (en) | Methods and systems for investigation of compositions of ontological subjects | |
Gupta et al. | An overview of social tagging and applications | |
Contreras et al. | A semantic portal for the international affairs sector | |
Tran et al. | Beyond time: Dynamic context-aware entity recommendation | |
Wang et al. | Content-based hybrid deep neural network citation recommendation method | |
CN116956818A (en) | Text material processing method and device, electronic equipment and storage medium | |
Sateli et al. | Semantic user profiles: Learning scholars’ competences by analyzing their publications | |
Lux et al. | From folksonomies to ontologies: employing wisdom of the crowds to serve learning purposes | |
Ramanathan et al. | Creating user profiles using wikipedia | |
Tsatsaronis et al. | A Maximum-Entropy approach for accurate document annotation in the biomedical domain | |
Yi | A semantic similarity approach to predicting Library of Congress subject headings for social tags | |
Zhuhadar et al. | A hybrid recommender system guided by semantic user profiles for search in the e-learning domain. | |
Wang | A context centric model for building a knowledge advantage machine based on personal ontology patterns | |
Gupta et al. | Document summarisation based on sentence ranking using vector space model | |
Dhokar et al. | Tweet contextualization: combining sentence extraction, sentence aggregation and sentence reordering to enhance informativeness and readability | |
Java | Mining social media communities and content | |
Kaptein et al. | Explicit extraction of topical context | |
Tsagkias | Mining social media: tracking content and predicting behavior | |
Hao et al. | QSem: A novel question representation framework for question matching over accumulated question–answer data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |