WO2002021270A1 - Procede et systeme pour creer une base de donnees presentant une structure de savoir-faire - Google Patents

Procede et systeme pour creer une base de donnees presentant une structure de savoir-faire Download PDF

Info

Publication number
WO2002021270A1
WO2002021270A1 PCT/JP2000/006041 JP0006041W WO0221270A1 WO 2002021270 A1 WO2002021270 A1 WO 2002021270A1 JP 0006041 W JP0006041 W JP 0006041W WO 0221270 A1 WO0221270 A1 WO 0221270A1
Authority
WO
WIPO (PCT)
Prior art keywords
know
database
data
knowledge
name
Prior art date
Application number
PCT/JP2000/006041
Other languages
English (en)
Japanese (ja)
Inventor
Tadamitsu Ryu
Original Assignee
Cai Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cai Co., Ltd. filed Critical Cai Co., Ltd.
Priority to JP2002524817A priority Critical patent/JPWO2002021270A1/ja
Priority to PCT/JP2000/006041 priority patent/WO2002021270A1/fr
Priority to TW089119249A priority patent/TW498229B/zh
Publication of WO2002021270A1 publication Critical patent/WO2002021270A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to a method for constructing a database having a know-how structure and a database constructing system having a know-how structure.
  • the present invention relates to a method for constructing a database having a know-how structure capable of integrating an existing relational database and an object-oriented database.
  • the present invention relates to a method and a data construction system having a know-how structure.
  • the information contained in the target material is stored by dividing the class that defines the structure and the instance that records the information according to such a structure, and stores the function and meaning of the data.
  • Representative methods are to organize the information in a tree structure with emphasis on the data, and to manage the target materials in a tabular format as seen in relational databases. If a specific structure is adopted, such as the ll structure, there is a problem that specific information can be easily analyzed and searched, but other information becomes very difficult.
  • a simple object-oriented database is simply composed of classes and instances as shown in Fig. 12 (a).
  • A1 to A3 indicate the "value concept" to be recorded in the instance. In the instance, "value" is recorded according to the class C rule described in address X.
  • the data class located at the lower level is a lower concept of the data class located at the upper level. It is possible to easily search for a desired data by flowing down the data.
  • concepts such as superordinate concepts and subordinate concepts can be converted into data, but implicit intellectual cases that cannot be understood as concepts cannot be converted into data.
  • a relational database can be simply described as an individual record composed of schemas S1 to Sn and a plurality of tuples T11 to Tmn as their values, as shown in FIG. It has a data structure in which records are recorded in a table format. Therefore, when items other than the schema that existed from the beginning exist during the night, tuples corresponding to the scheme or schema cannot be converted to data. In order to convert this into data, the entire database must be redesigned, and once designed, the structure cannot be changed halfway. With a relational database Has to redesign the whole new database, so it cannot cope with the real-world data that is updated daily.
  • the present invention responds to the above-mentioned demands by revising the conventional database structure consisting of names and values, and adding values (knowledge) and attributes (how) including constraints such as names, types, and possible ranges.
  • An object of the present invention is to provide a method for constructing a database having a know-how structure and a database construction system having a know-how structure, which can be referred to.
  • a first aspect of the present invention is a method for constructing materials belonging to many industries and fields as a database having a know-how structure, wherein words are classified for each name and the name as a classification name is provided.
  • a database having a know-how structure including a step of making know-how, and a step of assigning an ID to the know-how-made intellectualization data and storing it in the knowledge-making database To provide a method of construction.
  • the “name as a classification name” means “word type + property characteristic” such as “person name”, “place name”, “degree adjective”, “place noun”, “action verb”, etc. "Says.
  • the “word type” is a classification that can categorize words such as parts of speech, and given the nature of the word, the word itself and / or the text in the text in which the word appears other Has certain restrictions on the word. For example, in the case of "person name”, the content or value contains characters, and there can be no numbers (excluding kanji numerals) or symbols.
  • the verb “go” (or its stem “line”) means “who,” “with whom,” “when,” “for what,” “where,” and “what means.
  • an intellectualized word is created by using the “name as a classification name” and the “restrictions on the type, size, and range that the name can take” as attributes, and the contents of the name as “value” to create an intellectual word. Recorded / stored in a structured word dictionary.
  • Various types of input information include, for example, a translator for foreign language documents, an OCR for EP prints, a voice recognition device for voice, a format converter for other types of electronic information, and video information. If there is, it is converted to text data using an image recognition device, and this is subjected to word decomposition and document analysis. At this time, correct and accurate document analysis is obtained by utilizing the knowledge-based diction words stored in the above-mentioned knowledge-based word dictionary, particularly, restrictions on attributes. In the method of the present invention, the result of such document analysis is converted into knowledge data as a set of element units including attributes and values, and is used as knowledge data.
  • a name is used as a classification name configured to refer to the attribute in the knowledge word dictionary, and the form of a set of element units consisting of the name and the value is used. It is also possible to make the know-how and knowledge-based data overnight.
  • the knowledge-based data that has been converted into know-how in this way is provided with an ID for discriminating it from others and stored in the knowledge-based database.
  • each record of the database is selected for a database already constructed as a relational database. And a step of converting a plurality of tuples of the record into a set of values and converting the schema into a set of attribute names.
  • a relational database many records consisting of multiple tuples corresponding to schemas are stored in a table format. First, this is selected for each record, the schema and the data consisting of one record are conceived, and the tuple set of the one record is determined. Replace the value set with the schema as the attribute name set. Thereby, the form is the same as the form of a set of element units composed of the attribute name and the value described in claim 1. That is, by such replacement, it becomes possible to make the relational database know-how, and it can be handled in the same manner as the knowledge data of the present invention.
  • a data-based construction method having the know-how structure according to the first aspect, further comprising: a data-based construction method for a database already constructed as a data-oriented database.
  • the method is characterized by including a step of converting the database instance into a set of varieties and converting the class into a set of attribute names and converting them to know-how.
  • a smart object-oriented database is composed of a class indicating the number and arrangement order of data and an instance indicating its value. If the class is replaced with a set of attribute names and the instance is replaced with a set of values, the form is the same as the form of a set of element units consisting of attribute names and values described in claim 1. That is, by such replacement, it becomes possible to make know-how of the object-oriented database, and it can be handled in the same manner as the knowledge-based data of the present invention.
  • a cache memory in which a predetermined number is retrieved from a large number of retrievals and rewritable. And if the database is accessed for a search, first, the habit cache memory is searched.
  • a fifth aspect of the present invention there is provided the method for constructing a database having a know-how structure according to the fourth aspect, wherein a predetermined number of names having a high frequency of use in the intellectualized data frequently extracted as search targets And a step of creating a relational database having a schema as a schema and recording it in a cache memory.
  • the first thing that should be stored in the cache memory is the name of the attribute that appears frequently in the knowledge-based data that is frequently extracted as a search target.
  • This is a relational database created.
  • a predetermined number of names that are frequently used are selected as schemas, knowledgeable data corresponding to such schemas are created in a table format, and a relational database is obtained.
  • the method for constructing a database having a know-how structure according to the fourth aspect, wherein some element units are common to the intellectualized data frequently extracted as a search target.
  • the method includes the steps of creating an object-oriented database in which the partial know-how data including the common part is the high-order and the know-how data including the different element units is the low-order, and recording the data in the habit cache memory. It is characterized by
  • the second thing that should be stored in the habit cache memory is that if some elementary units are common to intellectualized data that is frequently extracted as search targets, these are combined into higher-level concepts and created This is a project-oriented database. Among the knowledgeable data that is frequently extracted as search targets, those that can create a tree-structured database are prepared to further speed up the search.
  • the knowledge-based data frequently extracted as a search target is converted into the know-how related to the absolute truth.
  • the third thing that should be stored in the cache memory is that if there is a large number of identical data on the relative truth that can change over time, this is used as the know-how data on the absolute truth. It is a replacement.
  • the search speed is improved by compressing the number of data.
  • a method for constructing a database having a know-how structure according to the fourth aspect, wherein knowledge-based data which is frequently extracted as a search target is classified by industry. To determine the relevance of the words that appear in it, It is characterized by including a step of selecting a related word from the highest one and recording it in a cache memory.
  • the fourth thing that should be stored in the habit cache memory is the related words derived from the degree of relevance between words in each scene.
  • Related words are intended to search the database with one word instead of another word when the frequency of occurrence between words is high in a certain number of intellectualizations in the past. This is the case when searching for the activity of a single player in a certain sport, and searching for other star players who are awarded in the sport. Another method is used when the desired search results are not obtained.
  • a second aspect of the present invention is a database construction system in which materials belonging to many industrial fields are accumulated as data having a know-how structure, wherein a name as a word classification name and a type and size of the name are provided. , An attribute word composed of a number of intellectualized words having a value that is the value of the name, and a sentence input from the input means. Word analysis and document analysis with reference to the intellectualized word dictionary, and from the name and the value which can refer to the attribute or the attribute in the intellectualized word dictionary. Control means for making know-how as a set of element units, and a knowledge-making database for assigning IDs to the know-how-ized knowledge data and storing it. Providing a database construction system having a knowledge structure formed by.
  • a database construction system having the know-how structure according to the ninth aspect, wherein the control means is configured to control each record of the database for data already constructed as a relational database. And R-DB know-how control means for making know-how as the name of the attribute with the tap of the record as the value.
  • the present invention according to claim 11 is a database construction system having a know-how structure according to claim 9, wherein the control means is configured to control the data already constructed as the smart object-oriented database by using the smart object-oriented database. It is characterized by including O-DB know-how control means that converts the instance into a value and the class as an attribute name.
  • a database construction system having a know-how structure according to any one of the ninth to eleventh aspects, wherein a predetermined number is obtained for a large number of searches.
  • the present invention is characterized in that it includes a customizable cache memory that is recorded in a rewritable manner, and is configured to first search the customizable cache memory when the database is accessed for search.
  • the control means uses a name in the knowledge-based data frequently extracted as a search target.
  • a feature is that a relational database having a schema of a predetermined number of names with high frequency as a schema is created and recorded in a cache memory.
  • the present invention according to claim 14 is a database construction system having a know-how structure according to claim 12, wherein the control means includes a part of the knowledge-based data that is frequently extracted as a search target. If the units are common, a partial object-oriented database consisting of the common parts and the know-how data containing the different element units at the top will be created and stored in the cache memory. It is characterized in that it is configured as follows.
  • the control means converts the knowledge data frequently extracted as a search target into absolute truth. Is classified into know-how data on relative truth and know-how data on relative truth that can change with time, and if the latter exists in large numbers, it is replaced as know-how on absolute truth and recorded in cache memory. It is characterized by the following.
  • the present invention according to claim 16 is a database construction system having a know-how structure according to claim 12, wherein the control means converts the knowledge-based data frequently extracted as a search target into industry-specific fields. (Scenes) to determine the degree of relevance between the words that appear in it, and for each word, select the relevant word from those with a high degree of relevance and record it in the cache memory. It is characterized by.
  • FIG. 1 is a flowchart of one embodiment of a method for constructing a database having a know-how structure according to the first embodiment of the present invention.
  • FIG. 2 is a table showing several examples of knowledge words.
  • FIGS. 3 (a) and (b) are explanatory diagrams each showing a state in which the sentence cited as an example sentence is word-decomposed.
  • FIGS. 4 (a) and (b) are explanatory diagrams showing an example of knowledge data obtained from the documents of FIGS. 3 (a) and (b).
  • Figure 5 is a table showing a conventional relational database.
  • FIG. 6 is an explanatory diagram of a relational database obtained by selecting one record from the relational database shown in FIG.
  • Fig. 7 shows the results obtained by replacing the tap of one record shown in the relational data base of Fig. 6 with the value and the schema as the name of the attribute. It is explanatory drawing of the knowledge-ized data.
  • FIG. 8 is a diagram showing a conventional object-oriented database.
  • FIG. 9 is an explanatory diagram showing a procedure for creating an object-oriented database in which partial knowledge data including a common part is at the top and know-how data including different element units is at the bottom.
  • FIG. 10 is a table for explaining an example of a method of obtaining a related word having a high degree of relevance for a certain word.
  • FIG. 11 is a flowchart showing a flow of an embodiment of a method for constructing a database having a know-how structure according to the present invention.
  • FIGS. 12 (a) and (b) are a diagram for explaining the data structure in a conventional object-oriented database and a diagram for explaining the data structure in the data space, respectively. It is a schematic diagram.
  • FIG. 1 is a block diagram for explaining an overall image of various aspects of the present invention.
  • Reference numeral 10 indicates a set of materials belonging to a number of industry fields (a set of information on each industry field is called a "scene"). If the material is a foreign language document 10a, it is translated into Japanese via a translator 12a and sent to the know-how control means 20.
  • the material is a document, it is sent to the know-how control means 20 after being pre-processed by a known means such as an OCR 12b or a voice recognition software 12c if it is a voice. Further, in the case of various electronic information 10d, the format is unified by the format converter 12d, and then sent to the know-how control means 20. Image information such as human faces and fingerprints 10 e In this case, after being pre-processed by the image recognition device 12 e, it is sent to the know-how control means 20.
  • the system of the present invention performs word decomposition and document analysis using a knowledge word dictionary and converts input information into know-how.
  • a knowledge word dictionary is used to classify words for each name and to use the name as the classification name and constraints such as the type, size, and possible range of the name as attributes, and the value of the name as a value.
  • the "name as a classification name” is the type of word such as "part of speech” + “property” such as "person name”, “place name”, “degree adjective”, “location noun”, “action verb”, etc. Characteristics ".
  • the addition of the property ⁇ characteristic causes certain restrictions on the word itself and / or other words in the text in which the word appears.
  • the verb “go” (or its stem, “line”) means “who,” “with whom,” “when,” “for what,” “where,” or “what means.” Concatenate with the words representing,. In this case, for example, the content or value of “to go” must include a noun indicating the place. Therefore, as the “name as a classification name” of the knowledge word dictionary, only the “noun representing a place”, that is, only the “place name” of a proper noun and the word representing the place of a general noun are selected, and word decomposition is performed. Is referenced to obtain the correct answer to the sentence.
  • Fig. 2 is a table showing several examples of knowledgeable words.
  • the know-how control means 20 performs word decomposition and document analysis of the information input in various forms using the knowledge word dictionary 30 and obtains the know-how as a set of element units consisting of an attribute and a value. Become Alternatively, know-how can be obtained as a set of element units consisting of names and values that can refer to attributes in the knowledge word dictionary 30.
  • the storage capacity of the knowledge database 40 can be significantly reduced. For example, the following describes the case where two texts, "Tatsuzaki live elegantly with a dog in a Tokyo apartment,” and “Tatsuzaki goes to school.” You. Fig. 3 (a) and (b) show the state of each sentence after word decomposition.
  • Constraint occurs.
  • the know-how control means 20 indicates the proper word of the reading "ryuzaki” before “ha” or “ga” indicating the position of the subject, and the name "person” stored in the knowledge word dictionary 30. Search for “noun” or “personal name” and extract “Tatsuzaki”.
  • FIGS. 4 (a) and 4 (b) are examples of the knowledge-based data thus created.
  • knowledge data is constructed as a set of element units consisting of attribute names and values.
  • the intellectualization data 40 per night can be formed and stored as a set of element units consisting of attributes and values. Since the predetermined processing is performed by the know-how control means 20 without referring to the knowledge word dictionary 30, there is an advantage that the processing speed is increased correspondingly.
  • the large-capacity internal or intellectualization database 40 records and accumulates the intellectualization data constructed in this way with an ID.
  • the ID is added so that each piece of knowledge data is used as an address when the knowledge control means 20 refers to the data.
  • the know-how control means 20 converts the data already constructed as a relational database into know-how and records and accumulates it in the knowledge database 40.
  • R-DB know-how control means 20a that can be used in the same manner as the knowledge data.
  • Figure 5 is a table showing a conventional relational database.
  • S1 to Sn are attributes serving as search keys, that is, schemes, and T11 to Tmn are tuples that are contents or values.
  • Each row constitutes one record, but each record of the schema and database consisting of S1, S2, --- Sn is selected, and the record as shown in Fig. 6 is selected.
  • substitution is performed so that the record tuple is value and the schema is attribute name. Since such data has the same data structure as that of the above-mentioned knowledge and knowledge database, it can be used as it is in the knowledge-based database 40 or in order to match the scheme to the above-mentioned “name as a classification name”. After making the correction, it can be recorded and stored in the knowledge database 40.
  • control means converts the data already constructed as the object-oriented data base into know-how, records and accumulates the data in the knowledge database 40, and uses the same as the above-described knowledge data.
  • O-DB know-how control means 20b to be obtained is included.
  • FIG. 12 (a) is a diagram showing a conventional object-oriented database.
  • a 1 to A 3 are the names of the data in the class
  • V 1 to V 3 are the instances that are the contents or values.
  • Such data will have the same data structure as the intellectualized data of the present invention if each is replaced with the attribute and value described above. Therefore, the data can be recorded and stored in the knowledge database 40 as it is or after modifying the classification of the data in the class to match the “name as the classification name” described above. .
  • a custom cache memory 50 which takes out a predetermined number of those having a large number of searches and records them in a rewritable manner. Then, when the knowledge-based database 40 is accessed for search, first, it is configured to search the custom cache memory 50. As will be described later, the memory 50 also stores data from various approaches in a rewritable manner, and it is possible to perform a search from the one that is considered to be the fastest search speed among them.
  • the search speed is remarkably increased by pre-accumulating the large number of searches in the custom cache memory and searching the custom cache memory first when the database is accessed for search. It has the effect of improving.
  • Experience has shown that the search content in all databases is essentially the same with little change, except for a few percent. Therefore, it is expected that such a search will be performed in advance. Then, data corresponding to such a search is prepared in the cache memory so as to be able to respond quickly.
  • the intellectualized data recorded in the habit cache memory 50 First, as the intellectualized data recorded in the habit cache memory 50, first, among the intellectualized data that is frequently extracted as a search target, for example, a name that appears in 20,000 data and has a high appearance frequency There is a relational database in which a predetermined number of high-order, for example, 100 names are created as a schema (see FIG. 8).
  • the know-how control means 20 is provided with a control unit 20c for creating such a relational database.
  • Knowledge-based data-It should be selected appropriately depending on the size of the evening and the computing power of the computer, and should not be limited to the above figures.
  • searcher 60 When a searcher 60 who is a user of the database inputs search conditions using various input devices 62 such as a keyboard and a microphone, all or a predetermined number of the conditions are recorded in the custom cache memory 50. If the relational database matches the schema of the relational database, the relational database is searched, and the result is output as a search data.
  • the intellectualized data recorded in the habit cache memory 50 includes, in the case where some element units are common to intellectualized data that is frequently extracted as a search target, a common part
  • the know-how control means 20 is provided with a control unit 20d for creating such a project-oriented database.
  • the search speed for the target knowledge data becomes very fast.
  • the knowledge-based data recorded in the habit cache memory 50 can be changed as follows: Knowledge-based data that is frequently extracted as a search target can be changed depending on time and know-how data on absolute truth Some are classified as know-how related to relative truths, and the latter are replaced with know-how related to absolute truths when the latter exists in large numbers.
  • the control unit 20 e is provided with a control unit 20 e for creating such a project-oriented database.
  • the intellectualized data that is frequently extracted as a search target is classified into industries and fields (scenes) and the words appearing in the There is a related word that determines the degree of relevance and selects the most relevant one for each word
  • the know-how control means 20 is provided with a control unit 20g for creating such an object-oriented database.
  • the database is searched using the other word instead of one word.
  • you search for the success of a professional baseball star player you can search for other star players who will be awarded alongside that player, for example, when you search for articles on the active life of “Tatama” player of Taiei Fawkes, you will find Alliance from
  • Fig. 10 shows words that are given the same scene identifier, that is, a large number of materials containing values (values), which are decomposed into words and arranged in the order of appearance frequency of the words that appear in them. is there. Assuming that the number of words in the material is 100,000, for example, in material 1, the word V1 appears 10 times and the word V2 appears 7 times. Similarly, in document 2, word v 1 appears four times and word V 2 appears eight times.
  • the word v4 in such a table, it appears in 58 out of 100 materials.
  • the word v 1 appears in 98 out of 100 materials, but if the number of materials appearing in common with V 4 is 48, instead of the word V 4 By searching the word 100 for the word 100, 48 materials can be correctly extracted and 20 materials are missed.
  • a predetermined number of words having a high degree of relevance calculated in this way are selected and stored in the habit cache memory 50. Then, when the word V4 is given as a search condition from the searcher 60, the name as the classification name of the word V4 is specified by referring to the knowledge word dictionary 30 and the Various types of data in the cache memory 50, for example, those having the same name in a relational database-based data-oriented database are searched, and desired knowledge-based data is extracted. If the desired knowledge cannot be obtained, for example, a word having a high degree of relevance recorded in the custom cache memory 50 is used to search the custom cache memory 50 similarly.
  • FIG. 11 is a flowchart showing a flow of an embodiment of a method for constructing a database having a know-how structure according to the present invention.
  • the method of constructing a database having the know-how structure of the present invention generally includes a knowledge word dictionary creation step S 1, input information, an existing relational database or an existing smart object.
  • the intellectualized word dictionary creation process S1 classifies words for each name, and accumulates in the intellectualized word dictionary 30 the names as the classification names and the constraints such as the type, size, and available range of the names as attributes. Things.
  • the know-how control means 20 performs word decomposition / document analysis while referring to the knowledge word dictionary 30.
  • OCR is used for printed materials, a speech recognition device for voice, a format converter for other types of electronic information, and an image recognition device for video information.
  • the wording control means 20 performs word decomposition / document analysis while referring to the knowledge word dictionary 30.
  • the existing relational database replaces the tuple set of one record with a value set, replaces the schema with the attribute name set, and handles it in the same way as ordinary knowledge data after know-how.
  • instances are replaced by sets of values and classes are replaced by sets of attribute names.
  • the result of the document analysis is converted into know-how as knowledge data as a set of element units configured including attributes and values.
  • knowledge data as a set of element units configured including attributes and values.
  • attributes use knowledge as a classification name constructed so as to refer to attributes in a word dictionary, and provide know-how in the form of a set of element units consisting of the name and value to form knowledge De—Evening can be evening.
  • the knowledge-based knowledge-based data is assigned ID and stored in the knowledge-based database 40.
  • a step S6 of recording frequently-used knowledge data in a cache memory for quick and / or accurate search is provided.
  • the habit cache memory there are the following four usage modes of the habit cache memory.
  • the first is to create a relational database in which the names of the most frequently used intellectualized data that are frequently extracted as search targets are used as a schema, and this is used as a habit cache. It is recorded in memory.
  • the second is that if some elementary units are common to the intellectualized data that is frequently extracted as a search target, the elementalized data that is different from the partial knowledged data that is composed of the common parts is ranked higher.
  • ⁇ ⁇ which has knowledge data including A database is created and stored in the cache memory.
  • knowledge-based data that is frequently extracted as search targets are classified into know-how on absolute truth and know-how on relative truth that can change over time, and the latter exists in large numbers.
  • the fourth is to classify the intellectualized data that is frequently extracted as a search target by industry (field) (scene), determine the relevance of the words that appear in the data, and calculate the relevance of each word. In this case, the degree of relevance is high, and related words are selected from those, and are stored in a cache memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

Des matériaux présentant différentes zones sont recueillis pour permettre la création d'une base de données présentant une structure de savoir-faire. Le procédé selon l'invention comprend les étapes suivantes : (S1) tri des mots par nom, stockage des mots de connaissance comportant des attributs de restrictions, par ex. les noms du tri, types de nom, tailles, champs prélevés et présentant les valeurs des noms, et création des dictionnaires de mots de connaissance ; (S2) analyse de décomposition de mot/document appliquée aux informations d'entrée au moyen du dictionnaire de mots de connaissance, et création d'un ensemble d'unités d'éléments présentant une structure de savoir-faire qui comprend les attributs ou noms et valeurs conçus pour permettre de faire référence aux attributs dans le dictionnaire de mots de connaissance ; (S5) stockage des données de connaissance qui présentent une structure de savoir-faire et auxquelles des éléments d'identification sont affectés dans une base de connaissance. Un nombre prédéterminé de mots repérés un grand nombre de fois sont extraits et mémorisés de manière réinscriptible dans une mémoire cache d'usage. Lors d'un accès dans le but de faire une recherche dans la base de données, la recherche peut s'effectuer d'abord dans la mémoire cache d'usage.
PCT/JP2000/006041 2000-09-06 2000-09-06 Procede et systeme pour creer une base de donnees presentant une structure de savoir-faire WO2002021270A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2002524817A JPWO2002021270A1 (ja) 2000-09-06 2000-09-06 ノウハウ構造を有するデータベースの構築方法及びノウハウ構造を有するデータベース構築システム
PCT/JP2000/006041 WO2002021270A1 (fr) 2000-09-06 2000-09-06 Procede et systeme pour creer une base de donnees presentant une structure de savoir-faire
TW089119249A TW498229B (en) 2000-09-06 2000-09-19 Construction method and system of a database which possesses the know-how structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2000/006041 WO2002021270A1 (fr) 2000-09-06 2000-09-06 Procede et systeme pour creer une base de donnees presentant une structure de savoir-faire

Publications (1)

Publication Number Publication Date
WO2002021270A1 true WO2002021270A1 (fr) 2002-03-14

Family

ID=11736431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2000/006041 WO2002021270A1 (fr) 2000-09-06 2000-09-06 Procede et systeme pour creer une base de donnees presentant une structure de savoir-faire

Country Status (3)

Country Link
JP (1) JPWO2002021270A1 (fr)
TW (1) TW498229B (fr)
WO (1) WO2002021270A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818720A (zh) * 2022-06-23 2022-07-29 北京惠每云科技有限公司 一种专病数据集构建方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04205173A (ja) * 1990-11-29 1992-07-27 Shimadzu Corp 情報検索システム
JPH05113924A (ja) * 1991-10-23 1993-05-07 Nec Corp 情報管理モデル変換システム
JPH06195383A (ja) * 1992-09-25 1994-07-15 Nec Corp 知識ベース構築方式
JPH11203325A (ja) * 1998-01-16 1999-07-30 Tadamitsu Ryu データベース作成方法、そのプログラムを格納した記録媒体及びその作成方法で作成したデータ群を記録した記録媒体
JP2000056977A (ja) * 1998-06-02 2000-02-25 Internatl Business Mach Corp <Ibm> テキスト情報を処理する方法および装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH04205173A (ja) * 1990-11-29 1992-07-27 Shimadzu Corp 情報検索システム
JPH05113924A (ja) * 1991-10-23 1993-05-07 Nec Corp 情報管理モデル変換システム
JPH06195383A (ja) * 1992-09-25 1994-07-15 Nec Corp 知識ベース構築方式
JPH11203325A (ja) * 1998-01-16 1999-07-30 Tadamitsu Ryu データベース作成方法、そのプログラムを格納した記録媒体及びその作成方法で作成したデータ群を記録した記録媒体
JP2000056977A (ja) * 1998-06-02 2000-02-25 Internatl Business Mach Corp <Ibm> テキスト情報を処理する方法および装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
NAKAYAMA: "Chishiki joho kyouyuu wo sokushin suru multimedia modal hisho agent", EIZOU JOHO MEDIA GAKKAISHI, vol. 52, no. 4, 20 April 1998 (1998-04-20), SHADAN HOUJIN EIZOU MEDIA GAKKAI, JAPAN, pages 436 - 440, XP002934379 *
TERUHITO KANAZAWA ET AL.: "Bunsho kanren-sei wo kouryo shita kensaku houshiki (A retrieval method based on relevance of documents)", JOHO SHORI GAKKAI KENKYUU HOUKOKU (98-DBS-116), vol. 98, no. 58, 10 July 1998 (1998-07-10), SHADAN HOUJIN JOHO SHORI GAKKAI, JAPAN, pages 165 - 172, XP002934380 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818720A (zh) * 2022-06-23 2022-07-29 北京惠每云科技有限公司 一种专病数据集构建方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
TW498229B (en) 2002-08-11
JPWO2002021270A1 (ja) 2004-01-15

Similar Documents

Publication Publication Date Title
CN101404015B (zh) 自动生成词条层次
US20040054666A1 (en) Associative memory
CN110674252A (zh) 一种面向司法领域的高精度语义搜索系统
CN109522396B (zh) 一种面向国防科技领域的知识处理方法及系统
CN110956271B (zh) 一种海量数据的多级分类方法及装置
Sharaff et al. Analysing fuzzy based approach for extractive text summarization
Ashok Kumar et al. An efficient text-based image retrieval using natural language processing (NLP) techniques
JP4426041B2 (ja) カテゴリ因子による情報検索方法
CN114491079A (zh) 知识图谱构建和查询方法、装置、设备和介质
Rak et al. Multilabel associative classification categorization of MEDLINE articles into MeSH keywords
Petrus Soft and hard clustering for abstract scientific paper in Indonesian
WO2002021270A1 (fr) Procede et systeme pour creer une base de donnees presentant une structure de savoir-faire
KR20010107810A (ko) 웹 검색시스템 및 그 방법
Berenguer et al. Word embeddings for retrieving tabular data from research publications
CN104239295B (zh) 维汉翻译系统的多层次维语词法分析方法
CN110908989B (zh) 一种应用于数据清洗工具的数据匹配方法
JPH07325837A (ja) 抽象単語による通信文検索装置及び抽象単語による通信文検索方法
JP7168826B2 (ja) データ統合支援装置、データ統合支援方法、及びデータ統合支援プログラム
Daggupati Unsupervised duplicate detection (UDD) Of query results from multiple web databases
Rocha et al. Feature selection strategies for automated classification of digital media content
Alajmi et al. DACS Dewey index-based Arabic Document Categorization System
Prabhu ’Document Clustering for Information Retrieval–A General Perspective’
JP2006040058A (ja) 文書分類装置
Tajiri et al. A new approach for fuzzy classification in relational databases
JP6476638B2 (ja) 固有用語候補抽出装置、固有用語候補抽出方法、及び固有用語候補抽出プログラム

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CN JP KR US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref document number: 2003105808

Country of ref document: RU

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 2002524817

Country of ref document: JP

122 Ep: pct application non-entry in european phase