TW200521732A - Information system with natural language parsing ability and processing method thereof - Google Patents
Information system with natural language parsing ability and processing method thereof Download PDFInfo
- Publication number
- TW200521732A TW200521732A TW92137597A TW92137597A TW200521732A TW 200521732 A TW200521732 A TW 200521732A TW 92137597 A TW92137597 A TW 92137597A TW 92137597 A TW92137597 A TW 92137597A TW 200521732 A TW200521732 A TW 200521732A
- Authority
- TW
- Taiwan
- Prior art keywords
- natural language
- word
- item
- scope
- patent application
- Prior art date
Links
Abstract
Description
200521732 拥、本案若有化學式時’請揭示最能顧示發明特徵的化學式 讎 玫、發明說明: 【發明所屬之技術領域】 本發明係關於一種具自然語言解析能力的資訊系統及 處理方法,尤指一種利用自然語言解析手段以直接解構資 訊需求者以自然語言形式表達的需求内容,進而與以相同 技術構成的資料庫内容進行媒合,而以媒合度 回應予資料需求者,令搜尋#訊之人機具親 和力。 【先前技術】 現代人對於網際網路的飛交^ 丁内坪扪小谷3通;ji疋一座浩瀚無際的 資料庫,其内容之豐富直令人取之不盡用之不竭,而為方 便使用者搜尋資料,各大入口網站均有提供搜尋引擎,當 使用者在搜尋欄位輸入關鍵字並執行搜尋,即可找出眾多 與該關鍵字有關的網站、網頁或文章等。但這是不是一種 有效實用的方法是值得討論的,事 ’身上 對於一個外習雷 腦操作及深諳網路資斜廑紝描Μ ,七 … 一…… 來說,搜尋引擎不失為 種貫用的方式,因為其,Μ得如何1 來的資料量十分龐大,亦具備m “丨便技哥出 八 疋的技巧去抽絲剝萌地技 出需要的資料。然而,對於 ⑽找 不具備别述知識技能的使用者 而言,想要透過搜尋?丨擎快速 J使用者 羊厌迷的找到所需的資訊,是需求 200521732 碰運氣的。 _事實上,現代科技愈來愈講究人機介面的親和力,換 a之’是儘其可能的降低操作技巧的成分,卻能相對 操作的準確性盥右%料 ^ ^ 广Ί文造,因而即有所謂的人工智慧系統的 二,&些系、统之目的無非是為了使用者更方便操作。以 前述資訊搜尋方法而言,最直接的方法莫過於使用者直接 以自然語言表達其需求,而系統本身可以經由對自缺注士 二:析,瞭解其需求與搜尋内容,進而找出符合其需二 二:予以回應,例如使用者說了或輸入“這麼胖,怎麼辦 & ’此時系統就會自動判斷使用者是希望㈣減肥的方 L或相關資料,在解析出使用者的期望後,即可直接找出 ”咸肥有關的貝料予以回應。然而,如想達到前述目的, 必須有相當先進的技術支援,否則前述狀況亦只是 已。 而既有專利文獻所揭露關n然語言”的才目關技術, 大致可分為以下兩種·· 一種是統計及機率的方式,另一種則是〇_〇gy方式 、。但採統計及機率的方式,就目前已知的技術,其準確率 並不同’原因在於其沒有一解析機制,亦沒有配合運作的 知識庫,而造成其準確率偏低。 再者ontology方式的缺點在於需要投入大量人力以 建置知識庫’即使如此’因。nt。㈣y未針對句子的文法進 行剖析,所以對於長句與複合句均無法有效地理解。 故由上述可知,自然語言應用於網路資源的搜尋,雖 200521732 可提升人機介面的親和力,但就現有技術而言,準確率偏 低、缺乏有效的解析機制&建制知識庫祕費大 問題,均有待進一步克服解決。 【發明内容】 因此,本發明主要目的在提供一種具自然語言 力:資訊處理方法,其可供使用者直接以自然語言表達: ,哥資料之意思,經由_自然語言解析手段直接解構其二 内:容:,即與資料庫内容進行媒合,而以媒高: 者,藉此可透過強大的語言解析能力 貝訊搜尋的準確率,而方便使用者取得需要的資料, 另以相同技術建立知識庫則可大幅降低建置成本。 為達成前述目的採取的主要技術手段係 括下列步驟·· 匕 以便將自然語言轉換成一 建構式概念腳本”袼式 輸入自然語言之詞句; 執行一自然語言解析手段, 具有特定事件背景、需求條件的 執行一媒合手段, 一資料庫以相同技術產 係令所產生“建構式概念腳本,,與 生的“建構式概念腳本,,進行媒合 取媒合度最高資料回應需求; 產生 j則述方法中’透過對自然語言的逐步解冑,可據以 一反應料事件、彳枝條相m概念腳本”, 200521732 再據以和資料庫内玄^ @^千拔 ^ iff ^ r ^ 3 ,而資料庫内容亦係以相同 的技術將眾多知識轉換成“ ^ ^ ^ 建構式概念腳本,,,並同時記 錄其回應資料,故在媒合歩 J ^ 概今腳太” β 、 ,即由新產生的“建構式 =L 庫中的“建構式概念腳本,,進行媒合, =邏輯判斷以找出相同或相近的資料内容,藉此可供使 用者以自然語言描述盆需來 /、,,而提供更方便的資訊搜尋方 法:〇 前述自然語言解析手段包括下列步驟·· 檢查句型,確認輸入詞句屬於提出需求之語言句型; 斷岣,係對輸入詞句進行斷詞; 性 專業領域分類,係用以賦予斷詞後每—字詞之專業屬 如區分為專業詞、一般詞或新詞等; 關鍵詞組檢查’由需求問句中檢查是否存在顯示立需 求核心之關鍵詞組; 八 同義詞或同義詞組檢查’檢查需求問句中是否存在專 業闺之同義詞或關鍵詞組之同義詞組; 產生一代表使用者需求的“建構式概念腳本” (Constructive Concept Scn.pt)。 /前述自然語言可以語音方式說出,經語音辨識技術處 理後,再進行自然語言解析。 前述的句型檢查係透過一句型比對技術所達成。 前述專業領域分類係與一詞庫之内容進行比較,如為 詞庫中具有的專業詞’即定義為專業詞,如不是,則二 其是否詞庫中的一般詞’如是即定義為一般詞,不是則定 200521732 義為新詞Ό +私^專業凋庫的建立,係針對每一字詞在特定領域與 一疋數罝以上土丄 、早中的出現頻率及在某一篇文章中的出 現頻率,以計Iφ ^ ^^ _ 出5亥子詞的權重,再根據權重分數高低將 共b勿马專章却,,, 义 …1 (D〇main)及,,一般詞,,(gener丨.c)。 别述的媒合手段包括下列步驟·· :::庫令搜尋相同或近似的“建構式概念腳本,,; “者❺建構式概切本’,肖資料庫中搜尋到的 建構式概“本,,進行邏輯判斷; 依媒合度高低提供解析式回應。 月丨J述“建構式概念腳本” 一“條件”;其中:本内办包括—關鍵事件,,與 該“關鍵事件’,下具有複數的關鍵詞組; 下二’下亦具有複數的關鍵詞把,各關鍵詞組以 下仍为別具有複數的詞組。 前述的“建構式概念腳本,,搜尋係由下列步驟組成: 以需求建構式概念腳本”中的專金 ^ ^ 兮茶词去搜尋資料庫 中各個建構式概念腳本,,的專業詞詞庫; 根據搜尋到的專業詞,在資料庫中抑 ^(N-Gram); 找出相關的關鍵詞 根據找出的關鍵詞組進一步搜尋資♦斗jg 關鍵事件”、”條件”; ,庫中所有相關的” 根據搜尋到所有相關的”關鍵事件 條件”找出可能的200521732 If there is a chemical formula in this case, 'please reveal the chemical formula that best reflects the characteristics of the invention, and the description of the invention: [Technical field to which the invention belongs] The present invention relates to an information system and processing method with natural language analysis capabilities, especially Refers to a method that uses natural language analysis to directly deconstruct the demand content expressed by information demanders in the form of natural language, and then matches the content of the database with the same technology, and responds to the data demanders with the degree of matching. Human machine affinity. [Previous technology] Modern people's communication on the Internet ^ Ding Pingping, Xiaogu 3 links; ji 疋 a vast and endless database, the richness of its content is inexhaustible, and for convenience User search data. Each major portal site provides a search engine. When a user enters a keyword in the search field and performs a search, many websites, web pages, or articles related to the keyword can be found. But it is worth discussing whether this is an effective and practical method. It is a common practice for search engines to be used for a foreign brain operation and a deep understanding of Internet resources. This method, because it has a huge amount of data, and also has the skills of "" Bie Jige out of the eighth trick "to extract the necessary information. However, there is no other way to find it. As for the users with knowledge and skills, want to search through it? 丨 Engine quickly J users are obsessed with finding the required information. It is a need for 200521732. _In fact, modern technology is more and more particular about the man-machine interface. Affinity, for a 'is to reduce the components of the operation skills as much as possible, but the accuracy of the operation is relatively accurate ^ ^ wide text, so there are two so-called artificial intelligence systems, & some The purpose of system and system is nothing more than for the convenience of users. In terms of the aforementioned information search methods, the most direct method is that users directly express their needs in natural language, and the system itself can Jishi: Analyze, understand their needs and search content, and then find out what meets their needs 22: Respond, for example, the user said or typed "So fat, what to do & 'At this time the system will automatically determine the user is I hope to lose weight formula or related information, after analyzing the user ’s expectations, you can directly find out “salted fertilizer related shellfish materials and respond. However, if you want to achieve the aforementioned purpose, you must have fairly advanced technical support, Otherwise, the aforementioned situation is only a problem. The existing technology related to "Nanlang" is generally divided into the following two types: · one is the method of statistics and probability, and the other is 〇_〇gy the way,. However, in terms of statistics and probability, the currently known technologies have different accuracy. The reason is that it does not have an analysis mechanism and a knowledge base that cooperates with it, resulting in a low accuracy. Moreover, the disadvantage of the ontology method is that it requires a lot of manpower to build the knowledge base ‘even so’. nt. ㈣y does not analyze the grammar of the sentence, so it cannot effectively understand both long sentences and compound sentences. Therefore, from the above, it can be known that although natural language is applied to the search of network resources, although 200521732 can improve the affinity of the human-machine interface, as far as the existing technology is concerned, the accuracy is low, the lack of an effective analysis mechanism & the secret cost of building the knowledge base is large. The problems need to be further overcome. [Summary of the Invention] Therefore, the main purpose of the present invention is to provide a natural language power: information processing method, which can be used by users to directly express in natural language: the meaning of the data, directly deconstruct the second through _natural language analysis means : Rong: Matching with the contents of the database, and match the media with high: This can use the powerful language parsing ability to search the accuracy of Besson, so that users can easily obtain the required data, and build it with the same technology The knowledge base can significantly reduce the cost of construction. The main technical measures adopted to achieve the foregoing objectives include the following steps: · To convert natural language into a constructive conceptual script "to type natural language words and phrases; to perform a natural language analysis method, with specific event backgrounds and requirements. Implementation of a matchmaking method, a database using the same technology production line to generate "constructive concept script, and the" constructive concept script ", to meet the highest degree of matchmaking data response needs; generate j described methods "In the step-by-step explanation of natural language, we can use the concept script to reflect a material event and the stalks and m phases", 200521732 and based on the data in the database ^ @ ^ 千 拔 ^ iff ^ r ^ 3 The contents of the library are also based on the same technology to transform a lot of knowledge into "^ ^ ^ constructive concept script, and record their response data at the same time. "Constructive = Constructive concept script in L library", matchmaking, = logical judgment to find the same or similar data content, so that users can The language description basin needs to be /, and provides a more convenient information search method: The aforementioned natural language analysis method includes the following steps: · Check the sentence pattern to confirm that the input word belongs to the language sentence pattern for which the request is made; Segmentation of words and phrases; Classification of the field of sexual specialty, which is used to give the professional genus of each word after segmentation, such as distinguishing into professional words, general words or new words, etc .; Keyword group check is checked by the demand question to see if it is displayed Establish the core keyword group of the demand; Eight synonym or synonym group check 'to check whether there is a professional synonym or keyword group synonym group in the demand question; Generate a "Constructive Concept Scn." pt). / The aforementioned natural language can be spoken by voice, and then processed by speech recognition technology, and then analyzed by natural language. The aforementioned sentence pattern check is achieved through a sentence pattern comparison technique. The aforementioned professional field classification system is compared with the content of a thesaurus. If it is a professional word in the thesaurus, it is defined as a professional word. If it is not, then it is a general word in the thesaurus. If it is, it is defined as a general word. , Instead of setting 200521732 to mean the establishment of a new word Ό + private ^ professional withering library, for the occurrence frequency of each word in a specific field and a number of times more than 丄, early in the morning, and in a certain article In order to calculate Iφ ^ ^^ _, the weight of the 5 Haizi word is given, and then according to the weight score, there will be b Wu Ma chapters, but, meaning ... 1 (D〇main) and, general words, (gener 丨. c). The other means of matching include the following steps: · :: Ku Ling searches for the same or similar "constructive concept script," "Personal constructive sketches", found in the Xiao database. Make logical judgments; provide analytic responses according to the level of media integration. The month described the "constructive concept script" and a "condition"; of which: the internal office includes-key events, and the "key events", the next There are plural keyword groups; the next two 'also have plural keyword groups, and the following keywords groups are still other plural phrases. The aforementioned "constructive concept script, the search system is composed of the following steps: the special fund in the demand constructive concept script" ^ xi tea words to search for each constructive concept script in the database, the professional thesaurus; According to the searched professional words, suppress (N-Gram) in the database; find relevant keywords and further search for resources according to the found keyword groups. Key events "," conditions ";, all relevant in the database Find all possible "critical event conditions"
“建構式概念腳本”。 W 200521732 前述“建構式概念腳本’,搜尋步驟中,於找出 後:將針對專業詞進-步找出其同義㈤;又在找出關鍵: 組後,亦進一步找出關鍵詞組的同義詞。 月J述的远輯判斷係根據兩個“建構式概念腳本” 的差異以進行比對。 前述邏輯判斷係根據兩個“建構式概念腳本,,内的子 集合,間不同的差異’ ^義了”EQU”、”ΜΑΧ”、,,m|n,,及 XOR”等四個運算子,以計算出各個不同“關鍵詞組,,子 集合間之關係,並可進—步推導出各個不$ “關鍵事件” 、“條件”子集合間之關係,最後據以找出可能的“建構 式概念腳本”。 >本I月又目的在提供—種具自然語言解析能力的資 訊系統,其包括有: 一“建構式概念腳本,,資料庫,係透過―自然語言解 析手段將眾多知識轉換成“建構式概念腳本,,纟式並予儲 存; 次^ $用者;丨面,供使用者以自然語言輸入其對於相關 _ 而长並同樣以自然語言解析手段將輸入的自然語 言轉換成“建構式概念腳本,,格式; :媒合機制’係針對使用者“建構式概念腳本,,與資 料庫建構式概念腳本”進行媒合; 一邏輯解譯單元,係針對前述媒 行分析,隨即可回應予使用者。 的 前述系統進一步包括一後端管理機制,用以處理評分 200521732 不南的“键* ^ 冓式概念腳本”,藉以補強擴充資料庫内心 主要係針對蜂八 一 ^ 腳本,,交由:二。或使用者不滿意回覆的“建構式概念 別述後端官理機制處理,該後端管理機制至+ ,4確涊官理介面,係供確認依使用者需求所提“建M 式概念腳本,,之問題是否已存在資料庫中,且分 2 則送至下面疋 再作其他處理,若不是, 句中: = = =檢查/產生介面,係根據觀察問 透過同義組干擾了問句媒合的準確度, 以排除同義詞二:π二並和既有詞庫建立關連, ; ―门義岡組的干擾,進而提升媒合的準確度 字檢管理介面’係對使用者之問句進行精簡、錯 人^一專動作,經過正規化之後,即重新進行媒合, 坪f為可接受範圍,即無須再進行其他處理;、 新詞=詞檢查介面’係在斷詞後開始檢查問句中是否有 ° ,如經認定為新詞,即在資料庫中增加一筆新1 ,並賦予詞性; 曰刀聿新3 亨判::義校正管理介面’係用以檢查專業詞與-般詞之 *確認為誤判,即二 ==是否被誤判’ 本,,重新進行媒合 對問句“建構式概念腳 9 200521732 【實施方式】 本發明之自然δ吾言解析技術,主要可供使用者直接以 自然語言表達其對於尋找相關資訊的意願與需求内容,再 經由系統的自然語言解析過程,即可將其轉換成一種特殊 的“建構式概念腳本,,格式,再與資料庫中以相同技術轉 換而成的眾多“建構式概念腳本,,進行媒合,以找出最符 a使用者需求的相關資訊,例如有個人只說了 ·· “天氣這 麼熱,我又胖到1 〇 0公斤,不知道怎樣才能瘦一點?” ,由前述語意中反映出其尋找減重方法的需求,此時本發 明的系統與方法可以自動解析判斷其語意,進而瞭解其需 求,再依其需求找出相關的資訊,更特別的是,在解析其 語意的過程中,可以自動排除與其真義無關的贅詞、贅字 ’以前述語言為例,重點係在於重達i 〇 〇公斤的人如何 找到適合的減重方法?至於天氣冷熱應該不是重點,故可 予忽略,藉此種方式,可供使用者以自然語言表達其想要 找到的資訊,而本發明亦得直接解析其語意,以滿足其需 求。至於達成前述目的之具體技術内容,詳如以下所述·· 如第一圖所示,係本發明之系統架構示意圖,其包括 有: 具自然語言解析能力的資訊系統,其包括有: 一“建構式概念腳本,,資料庫(i 〇 ),係透過一自 然語言解析手段(11)將眾多知識轉換成-種特殊的“ 建構式概念腳本”格式並予儲存; 一使用者介面(2〇),供使用者以自然語言輸入其 200521732 對於相關資訊之需求,並同樣以一自然語言解析手段(2 1 )將輸入的自然語言轉換成“建構式概念腳本,,格式; -媒合機制(3 0 )’係針對使用者“建構式概念腳 本與資料庫建構式概念腳本”進行媒合; 一邏輯解譯單元(4 ◦),係針對前述媒合機制產生 的結果進行分析,其結果將送至電腦系統的調度程式 (Dispatcher) ( 4 1 )以回應予使用者;其中:"Constructive Concept Script." W 200521732 In the aforementioned "constructive concept script", in the search step, after finding out: it will further find out the synonym of the professional word; and after finding the key: group, it also further finds the synonyms of the keyword group. The far-distance judgments described in month J are based on the differences between the two "constructive conceptual scripts". The aforementioned logical judgments are based on the two "constructive conceptual scripts, the sub-sets within them, and the differences between them" The four operators "EQU", "ΜAX", ,, m | n, and XOR are calculated to calculate the relationship between different "keyword groups," and subsets, and each step can be further deduced. $ The relationship between "critical events" and "conditions" sub-collections, and finally based on which to find possible "constructive concept scripts". > The purpose of this month is to provide an information system with natural language analysis capabilities, which includes: a "constructive concept script, a database, which transforms a lot of knowledge into a" constructive type "through" natural language analysis means " Conceptual script, 纟 formula and save; times ^ $ 者; 丨 face, for users to input their relevant _ in natural language and long and also use the natural language parsing method to convert the input natural language into "constructive concept" Script, format;: Matching mechanism 'is to match users with "constructive concept script, and database constructive concept script"; A logical interpretation unit is to analyze the aforementioned media, and then respond to The aforementioned system of the user further includes a back-end management mechanism for processing the "key * ^ 冓 style concept script" which scores 200521732. To strengthen and expand the database, the heart is mainly targeted at the bee ^ script. : 2. Or the user is not satisfied with the reply, "constructive concept does not specify the back-end official mechanism processing, the back-end management mechanism to +, 4 confirm the official interface, Confirm whether the question of "Constructing M-type concept script" according to the user's needs has already been stored in the database, and send it to the next two points, and then do other processing. If it is not, in the sentence: = = = check / produce interface According to the observation, the accuracy of the interrogation of the interrogative sentence is disturbed through the synonymous group, so as to exclude the second synonym: π-binary and the existing thesaurus to establish a relationship; —— the interference of the Menyigang group, thereby improving the accuracy of the match The word-check management interface is a streamlined, wrong-cutting action of the user ’s question sentence. After normalization, the match is re-matched. Ping f is an acceptable range, which means that no other processing is required. = Word Checking Interface 'is to check whether there is ° in the question sentence after the word segmentation. If it is identified as a new word, a new 1 is added to the database and the part of speech is assigned; The correction management interface is used to check the professional words and general words * confirmed as misjudgment, that is, two == whether it is misjudged, and re-matched the question "constructive concept foot 9 200521732 [implementation method] Invented Nature Technology, which can be used by users to directly express their willingness and needs for finding relevant information in natural language, and then through the natural language analysis process of the system, it can be converted into a special "constructive concept script, format, Then match with the many "constructive concept scripts" converted from the database with the same technology to find out the relevant information that best meets the needs of the user. For example, one person only said "" The weather is so hot, I am fat again to 100 kilograms. I do n’t know how to be thinner. ”The above semantics reflect his need to find a way to lose weight. At this time, the system and method of the present invention can automatically analyze and judge its semantics, and then understand its semantics. Needs, and then find relevant information according to their needs, more specifically, in the process of parsing their semantics, it can automatically exclude redundant words and redundant words that have nothing to do with their true meanings. Taking the aforementioned language as an example, the focus is on re-i How does a person of 〇kg find a suitable weight loss method? As the hot and cold weather should not be the focus, it can be ignored. In this way, users can express the information they want to find in natural language, and the present invention must directly analyze its semantic meaning to meet its needs. As for the specific technical content to achieve the foregoing objectives, the details are as follows: As shown in the first figure, it is a schematic diagram of the system architecture of the present invention, which includes: an information system with natural language parsing capabilities, which includes: Constructive concept script, database (i 〇), through a natural language analysis method (11) to convert a large amount of knowledge into a special "constructive concept script" format and save; a user interface (2〇 ), For users to input their 200521732 related information needs in natural language, and also use a natural language parsing method (21) to convert the input natural language into "constructive concept script, format;-matchmaking mechanism ( 3 0) 'is to match the user's "constructive concept script and database construct concept script"; a logical interpretation unit (4 ◦) is to analyze the results generated by the aforementioned matchmaking mechanism, and the results will be Dispatcher (4 1) sent to the computer system to respond to the user; where:
刚述的自然語言解析手段(i)係以軟體達成,其 工作流程係如第二圖所示,包括下列步驟: 檢查句型(201),確認輸入詞句屬於提出需求之語言句 型;其一般為問句或祈使句,此步驟係用以確認使用者提 出需求之意願,當其輸入自然語言詞句符合特定句型時, 即認定其確有搜尋資訊之需求,故執行下一步驟; 斷詞(202),係對輸入詞句進行斷詞;The natural language parsing method (i) just described is achieved by software, and its workflow is shown in the second figure, which includes the following steps: Check the sentence pattern (201), confirm that the input word belongs to the language sentence pattern of the request; its general To ask a question or an imperative sentence, this step is used to confirm the user's willingness to put forward a demand. When the natural language sentence that he enters meets a specific sentence pattern, it is determined that he needs to search for information, so the next step is performed; 202), perform word segmentation on the input words;
專業領域分類(203),係用以賦予斷詞後每一字詞之; 業屬!± /權重,如區分為專業詞、一般詞或新詞等;其、 一步的詳細技術手段容後詳述; 關鍵詞組檢查(2〇4),由問句中檢查是否存在顯示其4 求核心之關鍵詞組,其大致分為兩類:一種是代表某種^ 疋事件或背景,另一種代表該資訊的各種相關“條件,,· 同義詞或同義詞組檢查(2〇5),係檢查問句中是否存 專業詞的同義詞或關鍵詞組的同義詞組,其中同義,产 係找出專業詞的同義詞,同義詞組檢查則找出與關鍵 的同義詞組; # 11 200521732 產生一代表使用者需求的“建構式概念腳本” (Constructive Concept Script)(206)。其中: 則述建構式概念腳本,,的内容請參閱第三圖所示, 其包括兩大群組,一為關鍵事件(Key event)、另一為“條 件(concmion);其中,關鍵事件以下包括一句型袼式 (ntence)及夕數與事件有關的關鍵詞組(N_Gram),每一 關鍵”司組(N_Gram)之下又包括多數的詞組(phrase),而構 成-樹狀結構。X “條件,,君f組下仍具備由多數關鍵詞組 (N-G:m)、詞組(Phrase)組成的樹狀結構内容。 ^述的句型檢查係透過_句型比對技術所達成。 月j述專業領域分類係與_詞庫之内容進行比較,如為 詞庫中具有的專業詞, 曰 |疋我馮寻業詞,如不是,則判斷 其是否詞庫中的一般詞,如是 義為新詞。 ^即-義為-般詞,不是則定 至於该專業詞庫的建立,待 盥一#^ θ 係針對母一字詞在特定領域 興疋數ΐ以上的文章中的出現艏査β士廿 ㈣μ 現頻率及在某一篇文章中的 出現頻率,以計算出該字 將其區分為,,專聿”,,m ,根據權重分數高低 為專業δ〇Ί (D〇_)及”一般詞”(generic)。 又,前述的媒合手段係如第 : ^ W圓所不,包括下列程序 在資料庫中搜尋相同或近似 (401); 的建構式概念腳本,, 令需求者的“建構式概念腳本π# 建構式概切本,,進行邏輯判斷(4〇2、): 與資料庫中搜尋到 的 12 200521732 依媒合度兩低提供解析式回應(4〇3)。其中: 前述的“建構式概念腳本,,搜尋方式係如第五圖所示 步驟達成:(其工作流程請配合參閱第六圖所示) 以新產生“建構式概念腳本,,中的專業詞去搜尋資料 庫中各個建構式概念腳本”的專業詞詞庫(5〇彳); 根據搜尋到的專業詞,在資料庫中找出相關的關鍵詞 組(N-Gram)(502);The professional field classification (203) is used to give each word after the word segmentation; ± / weight, such as distinguishing into professional words, general words or new words; its, one step detailed technical means will be described later in detail; Keyword group check (204), check whether the existence of the query shows its 4 to find the core Keyword groups, which are roughly divided into two categories: one is a kind of ^ 代表 event or background, and the other is a variety of related "conditions," synonym or synonym check (205), check the question Is there a synonym for a professional word or a keyword group, where synonymous, the lineage finds the synonyms of the professional words, and the synonym check finds the synonym groups with the key; # 11 200521732 Generates a "construction" representing user needs "Constructive Concept Script" (206). Among them: the constructive concept script is described in the third figure, which includes two groups, one is the key event (Key event), the other One is "concmion"; among them, the key event includes a sentence type (ntence) and the event-related keyword group (N_Gram), and each key is under the department group (N_Gram). Most include phrases (phrase), and constitute - tree .X "Jun condition f ,, the majority of the group still has a keyword group (N-G: m) the content tree, the phrase (Phrase) thereof. The sentence pattern check described above is achieved through the technique of sentence pattern comparison. Compare the classification of the professional field with the content of the _thesaurus. If it is a professional word in the thesaurus, say | 疋 我 冯 寻 业 词, if it is not, determine whether it is a general word in thesaurus. If the meaning is new word. ^ That is, the meaning is -the general word, if not, as for the establishment of the professional thesaurus, wait for a # ^ θ is for the appearance of the word mother in a particular field, the number of articles in the field of more than 艏 check β 士 廿 ㈣μ The frequency of occurrence and the frequency of appearance in a certain article, in order to calculate the word to distinguish it as, special, ",, m, according to the weight score for professional δ〇Ί (D〇_) and" general words " (Generic) In addition, the aforementioned means of matchmaking is as follows: ^ W Yuan does not include the following procedures to search the database for the same or similar (401); constructive concept script, so that the demander's "constructive Conceptual script π # Constructive sketches, make logical judgments (402,): and 12 200521732 searched in the database to provide analytic responses based on the low degree of media integration (4.03). Among them: The aforementioned "constructive concept script, the search method is achieved as shown in the fifth figure: (for the workflow, please refer to the sixth figure). To search the professional thesaurus for each constructive concept script in the database (50 ”); based on the searched professional words, find the relevant keyword group (N-Gram) in the database (502);
根據找出的關鍵詞組進一步搜尋資料庫中所有相關的 關鍵事件”、”條件”(5〇3); 根據搜尋到所有相關的”關鍵事件”、,,條件”找出可能^ ‘‘建構式概念腳本”(504)。 前述“建構式概念腳本,,搜尋步驟中,於找出專業择 後,將針對專業詞進-步找出其同義詞;又在找出關㈣ 組後,亦進一步找出關鍵詞組的同義詞紅,以提 準確性。 ‘According to the found keyword group, further search all relevant key events in the database "and" conditions "(503); according to the search all relevant" key events ", and conditions" find possible ^ "construction "Concept Script" (504). In the "Constructive Concept Script", in the search step, after finding the professional choice, it will further find the synonyms of the professional words; after finding the key group, it will further find Show the synonym red of the keyword group to improve accuracy. ‘
在元成前述搜尋步驟後,即進一步進行邏輯判斷 =七圖所示’該邏輯判斷係根據兩個“建構式概念腳 二,子集合之間不同的差異,定義了”晴,、”MA) p及XOR”等四個運算子,以計算出各個不同“關』 子集:間之關係’並可進一步推導出各個不同“丨 “:、條件”子集合間之關係’最後據以找出可彳 —構式概念腳本,,。由第七圖可看出邏輯判斷之工1 :士其揭示新產生的“建構式概念腳本” (7〇)幻 庫中眾多“建構式概念腳本,’(8())進行邏輯判斷d 13 200521732 程,由圖中可 腳本” (80 “事件” (8 事件” (82 8 4 )關連, 數的詞組(8 而如第八 On),其集 念腳本” (7 之間的關係, 構式概念腳本 以明顯看出,資料庫中的每一“ )係分別與複數的“關鍵事件^念 2 )關連,每一 “關鍵事件” (8丄) “ )又分別與複數的“關鍵詞組” (83) 每-“關鍵詞組,,(83) (8 ; 5 ) ( 8 6 )關連。 〃复 圖中虛線框所代表的物件(8 〇工) 合在第九圖中可以得到由新產 0 )到每一個物件(8 〇 透過此從屬關係進行評分,即可得到二二 ”的媒合度分數。 ~After Yuancheng's previous search steps, a further logical judgment is made = as shown in the figure. The logical judgment is based on the two "concepts of the constructive concept, the difference between the two sub-sets, and defines" qing, "MA) "p and XOR" and other four operators, to calculate the different "off" subsets: the relationship between each other 'and further derive the relationship between the different "丨:, condition" subsets, and finally find out Available-construction concept script. From the seventh figure, we can see the work of logical judgment 1: Shi Qi reveals the newly created "constructive concept script" (70) Many "constructive concept scripts in the magic library," (8 ()) makes logical judgment d 13 200521732 process, which is related by scriptable in the figure "(80" event "(8 event" (82 8 4)), the number of phrases (8 and as the eighth On), the set of script "(7 the relationship between the The concept script of the formula clearly shows that each “” in the database is related to the plural “key event 2”, and each “key event” (8 丄) “) is respectively related to the plural“ keyword group ” (83) per- "keyword group, (83) (8; 5) (8 6) related. The objects represented by the dashed box in the complex figure (80 °) can be obtained from the new product 0) to each of the objects (80 °). By scoring through this affiliation, you can get the “two two” media. Concordance score. ~
而系統亦將根據前述的媒合結果依評分狀況,回應予 使用者,由下列的第十圖吾人可以歸納出一些原則,^過 比較新產生“建構式概念腳本,,與媒合度最高“建構式概 念腳本” t間的關係’可以比數的方式描述其間的相^關 係,如: XX · yy,其中XX代表新產生的“建構式概念腳本” 籲 ,yy則代表媒合度評分最高的“建構式概念腳本”,其可 月€分析結果可如下列: 100 : 100 :您的問題内容系統完全理解,系統也有 學過万以下是為您的搜尋到適合的答案。 7 0以上· 1 〇 〇 :您的問題内容系統有些不是很清楚, 系統年紀還小,所學過的知識可能無法完全滿足您的需求 ’以下是系統能找到最接近的解答。 14 200521732 1 00 ; 7n 、 以上·您的問題内容系統完全理解,但是 系統學過的知钟 為可月b比您的問題内容更深入,以下是系統 找到更詳盡的解答。 以下·您的問題内容系統完全理解,但是 系統,過的知識可能比您的問題内容更深人且涵蓋更廣, 以下疋系統找到更詳盡的解答。The system will also respond to the user according to the aforementioned match result and the scoring status. From the following tenth figure, we can summarize some principles. ^ After a relatively new generation, the "constructive concept script, the highest degree of match" construction "Concept Script" "Relationship between t" can be used to describe the relationship between them, such as: XX · yy, where XX stands for the newly created "constructive concept script", and yy stands for "the highest matching score" "Constructive Concept Script", the analysis results can be as follows: 100: 100: Your question content system is fully understood, the system also learned 10,000 or less is to find a suitable answer for your search. 7 0 or more · 1 〇 〇: The content of your question system is not very clear, the system is still young, and the knowledge you have learned may not fully meet your needs. 'The following is the closest solution that the system can find. 14 200521732 1 00; 7n, above · You The content of the problem is completely understood by the system, but the knowledge learned by the system is better than the content of your question. The following is the system to find a more detailed answer. The following · The content of your question system is fully understood, but the systematic and past knowledge may be deeper and broader than your question content. The following system will find a more detailed answer.
卜:100 :您的問題内容系統很多都不是很清楚 系、、’先具備的知識可能無法完全滿足您的需求,以下是系 統月b找到最接近的解答。 9 9 〜7 Π · q η :99〜70 :您的問題内容系統大致理解,但是 系、”充具備的知識可能和您的問題有些許的出入,以下是系 統找到最接近的解答。 69 4〇 . 69〜40 :您的問題内容系統大部分尚未接觸 過’而且系統具備的知識可能和您的問題有蠻大的差距, 以下是系統找到最可能的解答。B: 100: Many of your questions are not very clear. The system, and the knowledge that you have first may not be able to fully meet your needs. The following is the closest solution to the system month b. 9 9 ~ 7 Π · q η: 99 ~ 70: The content of your question system is roughly understood, but the knowledge of the system may be slightly different from your question. The following is the closest solution that the system found. 69 4 〇. 69 ~ 40: Most of the content of your question system has not been touched ', and the knowledge possessed by the system may be quite different from your question. The following is the most likely solution found by the system.
40以下:40以下:您的問題内容系統可能沒接觸過, 系統將盡快擴充新知,透過您的問題和互動會讓系統更加 聰明,以下是系統找到最相關的解答。 0 : 0 : 100 :很抱歉,您的問題内容系統可能沒接觸 過’或是您的問題内容未清楚說明。以下是系統搜尋出可 能有相關的答案。 由上述說明可知,本發明利用自然語言解析手段為核 心所提供的資訊系統,其可方便使用者利用自然語言提出 其對於資訊的需求,所謂的自然語言,其可以是文字,亦 15 200521732 可以是語音,當語音方式說出,只須經語音辨識技術處理 後,再進行自然語言解析即可達成前述實施例相同之效果 Ο 由於資訊系統應不斷吸收新知,以滿足更多使用者的 需求,其意味著使用者提出的資訊需求是系統無法作欠, 或作答評分不高,另如使用者對回應狀況不滿意者,為解 決是項問題,本發明令前述系統進一步具有一後端管理機 制’其第-階段係提供下列管理介面,以便將媒合評分偏 低及使用者不滿意的回應送至此—機制進行處理,如第 一圖所示,其包括有: 管理介面(51) ’係供確認依使用者需求所 如建構式概念腳本,,之問題是否已存在資料庫中,且八 ,為可接受範圍内’若是’則不需要再作其他處理了 若不疋’則送至下一管理介面; ==化管理介面(52) ’係對使用者之問句進行 :,如查等動作’經過正規化之後’即重新進行媒 :;如媒&度之評分為可接受範圍’即無須再進行其他處 新詞檢查介面(5 3 ),係在斷詞後開始檢查問句 一=有新詞出現’如經認定為新詞,即在資料庫中增加 一聿新詞,並賦予詞性; 一 定義校正管理介面(54),係用以檢查專業詞I -般闲之誤判’係用以校正問句中的專業詞及一般气是; 被誤判,如相為誤判,即予重新定義,並對問句 16 200521732 式概念腳本”重新進行媒合。 建構式概 經過則述第一階段的修正處理後,可將該 念腳本重新進行媒合,並儲存於資料庫中。 又前述後端管理機制進-步包括有一同義詞或同義5 組檢查/產生介面(55),其主要目的用來觀察該“建 構式概念腳本”是否有同義字或同義詞組干擾了回應的準 確性,或者真的是新問題;如為新問題則可依系統訂定的 流程或人工方式產生新的同義字或同義詞組。Below 40: Below 40: The content of your question may not have been touched by the system. The system will expand new knowledge as soon as possible. Through your questions and interaction, the system will be more intelligent. The following is the system to find the most relevant answer. 0: 0: 100: Sorry, the content of your question may not have been accessed ’or the content of your question has not been clearly stated. Here are some answers that the system may find. It can be known from the above description that the information system provided by the present invention using natural language analysis as the core can facilitate users to use natural language to raise their needs for information. The so-called natural language can be text, and 15 200521732 can be Voice, when the voice is spoken, it only needs to be processed by speech recognition technology, and then the natural language analysis can be achieved to achieve the same effect as the previous embodiment. As the information system should continuously absorb new knowledge to meet the needs of more users, its It means that the information demanded by the user is that the system cannot owe, or the response score is not high. In addition, if the user is not satisfied with the response status, in order to solve this problem, the present invention further enables the aforementioned system to have a back-end management mechanism. In the first stage, the following management interface is provided in order to send the response of low match score and user dissatisfaction to this mechanism. As shown in the first figure, it includes: Management interface (51) Confirm whether the problem has been constructed in the database according to the user's needs, such as a concept script. If it is within the acceptable range, if no, no further processing is needed. If not, then it will be sent to the next management interface; == Huawei management interface (52) 'It is performed on the user's question: such as checking, etc. The action "after normalization" is to perform the media again; if the score of the media & degree is acceptable range, that is, there is no need to perform another new word check interface (5 3), which starts to check the question after the word segmentation. = A new word appears 'If it is identified as a new word, a new word is added to the database and the part of speech is assigned; a definition correction management interface (54) is used to check the professional word I-general misjudgment' It is used to correct the professional words and general spirits in questions; if they are misjudged, if they are misjudged, they will be redefined, and the question 16 200521732 concept script will be re-matched. After a stage of correction processing, the script can be re-matched and stored in the database. Furthermore, the aforementioned back-end management mechanism further includes a synonymous or synonymous 5 sets of check / generation interfaces (55), which mainly Purpose Script "Is there a word or phrase that group interferes with the accuracy of responses, or really a new problem; as generate new word or phrase group for the new system laid down flow problems or manually is based.
由上述可知,本發明主要目的在利用一自然語言解析 技術以解構使用者輸入自然語言之架構,並理解其内容, 進而透過-特別的媒合手段以便在資料庫中找到滿足其需 求的資訊内容,利用是項發明將使資訊系統的人機介面更 具親和力,且在資訊的搜尋取得上更臻快速便捷,故本發 明確已具備顯著的實用㈣進步性,並符合發B月專利要件 ’爰依法提起申請。 【圖式簡單說明】 (一)圖式部分 第一圖··係本發明之系統架構示意圖。 第二圖··係本發明方法之工作流程圖。 第三圖:係本發明“建構式概念腳本,,之内容示意圖 第四圖:係本發明“媒合手段,,之流程圖。 第五圖··係本發明“媒合手段,,中邏輯判斷之流程圖 17 200521732 第六圖:係本發明“媒合手段’,之卫作示意圖。 第七圖:係本發明“媒合手段,,巾邏輯判斷之一工作 圖。 第八圖:係本發明“媒合手段,,巾邏輯判斷又一工作 第九圖·係本發明新產生“建 it - M ,,. 冓式概心腳本與各候 遷建構式概念腳本,,間之關係示意圖。 第十圖:係本發明之回應方式示意圖。 第卜圖:係本發明後端管理機制之方塊圖。 (一)元件代表符號 (10) 建構式概念腳本”資料庫 (1 1 )( 2 1 )自然語言解析手段 (20)使用者介面 (3 0 )媒合機制 (40)邏輯解譯單元 (4 1 )調度程式 (5 1 )確認管理介面 (5 2 )正規化管理介面 (5 3 )新詞檢查介面 (54)定義校正管理介面 (5 5 )同義詞或同義詞組檢查/產生介面 (7 〇 ) ( 8 Q ) “建構式概念腳本” (81) ‘關鍵事件” 18 200521732 (82) “事件” ( 8 3 ) ( 8 4 ) “關鍵詞組” (85) ( 8 6 )詞組 (8 0 1 )〜(8 0 η )物件As can be seen from the above, the main purpose of the present invention is to use a natural language parsing technology to deconstruct the structure of the user's input of natural language and understand its content, and then through-special matching means to find information content in the database that meets its needs The use of this invention will make the human-machine interface of the information system more affinitive, and it will become faster and more convenient in the search of information. Therefore, the present invention does have significant practicality and advancement, and meets the requirements for issuing B-month patents.提 File an application according to law. [Schematic description] (I) Schematic part The first diagram is a schematic diagram of the system architecture of the present invention. The second figure ... is a working flowchart of the method of the present invention. The third picture: a schematic diagram of the content of the "constructive conceptual script" of the present invention. The fourth picture: a flowchart of the "mediating means" of the present invention. The fifth figure is a flow chart of the logical matching method of the present invention. 17 200521732 The sixth figure is a schematic diagram of the guard of the "matching method" of the present invention. The seventh picture: the present invention "matching means, one of the work of the logic judgment of the work. The eighth picture: the present invention" the match means, the work of the logic of judging another work. The ninth picture is a new invention of the invention " It-M ,,. Schematic diagram of the relationship between the 冓 -type outline script and the conceptual script of each relocation construct. Figure 10: Schematic diagram of the response method of the present invention. Figure 2: Schematic diagram of the back-end management mechanism of the present invention. Block diagram. (1) Symbols of component representation (10) Constructive concept script "database (1 1) (2 1) natural language analysis means (20) user interface (3 0) matching mechanism (40) logical solution Translation unit (4 1) scheduler (5 1) confirm management interface (5 2) normalized management interface (5 3) new word check interface (54) define correction management interface (5 5) synonyms or synonym group check / generation interface (7 〇) (8 Q) “Constructive Concept Script” (81) 'Key Events' 18 200521732 (82) “Events” (8 3) (8 4) “Keyword Groups” (85) (8 6) Phrase ( 8 0 1) ~ (8 0 η)
1919
Claims (1)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW92137597A TWI226560B (en) | 2003-12-31 | 2003-12-31 | Information system with natural language parsing ability and processing method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW92137597A TWI226560B (en) | 2003-12-31 | 2003-12-31 | Information system with natural language parsing ability and processing method thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI226560B TWI226560B (en) | 2005-01-11 |
TW200521732A true TW200521732A (en) | 2005-07-01 |
Family
ID=35634256
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW92137597A TWI226560B (en) | 2003-12-31 | 2003-12-31 | Information system with natural language parsing ability and processing method thereof |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI226560B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI461938B (en) * | 2010-05-07 | 2014-11-21 | Ulysses Systems Uk Ltd | System and method for identifying relevant information for an enterprise |
CN105760359A (en) * | 2014-11-21 | 2016-07-13 | 财团法人工业技术研究院 | Question processing system and method thereof |
US9734130B2 (en) | 2012-02-08 | 2017-08-15 | International Business Machines Corporation | Attribution using semantic analysis |
US10019512B2 (en) | 2011-05-27 | 2018-07-10 | International Business Machines Corporation | Automated self-service user support based on ontology analysis |
TWI823091B (en) * | 2020-05-28 | 2023-11-21 | 日商杰富意鋼鐵股份有限公司 | information retrieval system |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI679548B (en) * | 2018-05-09 | 2019-12-11 | 鼎新電腦股份有限公司 | Method and system for automated learning of a virtual assistant |
-
2003
- 2003-12-31 TW TW92137597A patent/TWI226560B/en not_active IP Right Cessation
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI461938B (en) * | 2010-05-07 | 2014-11-21 | Ulysses Systems Uk Ltd | System and method for identifying relevant information for an enterprise |
US10410156B2 (en) | 2010-05-07 | 2019-09-10 | Dimitris Lyras | System and method for identifying relevant information for an enterprise |
US10019512B2 (en) | 2011-05-27 | 2018-07-10 | International Business Machines Corporation | Automated self-service user support based on ontology analysis |
US10037377B2 (en) | 2011-05-27 | 2018-07-31 | International Business Machines Corporation | Automated self-service user support based on ontology analysis |
US10162885B2 (en) | 2011-05-27 | 2018-12-25 | International Business Machines Corporation | Automated self-service user support based on ontology analysis |
US9734130B2 (en) | 2012-02-08 | 2017-08-15 | International Business Machines Corporation | Attribution using semantic analysis |
US10839134B2 (en) | 2012-02-08 | 2020-11-17 | International Business Machines Corporation | Attribution using semantic analysis |
CN105760359A (en) * | 2014-11-21 | 2016-07-13 | 财团法人工业技术研究院 | Question processing system and method thereof |
TWI823091B (en) * | 2020-05-28 | 2023-11-21 | 日商杰富意鋼鐵股份有限公司 | information retrieval system |
Also Published As
Publication number | Publication date |
---|---|
TWI226560B (en) | 2005-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Pretrained transformers for text ranking: Bert and beyond | |
TWI732271B (en) | Human-machine dialog method, device, electronic apparatus and computer readable medium | |
CN107993724B (en) | Medical intelligent question and answer data processing method and device | |
Iyer et al. | Summarizing source code using a neural attention model | |
Bao et al. | Competitor mining with the web | |
JP2017511922A (en) | Method, system, and storage medium for realizing smart question answer | |
US8990246B2 (en) | Understanding and addressing complex information needs | |
Zhang et al. | Retrieval-polished response generation for chatbot | |
Zhao et al. | Brain-inspired search engine assistant based on knowledge graph | |
Yan et al. | Response selection from unstructured documents for human-computer conversation systems | |
Arguello et al. | Using query performance predictors to reduce spoken queries | |
CN113672698A (en) | Intelligent interviewing method, system, equipment and storage medium based on expression analysis | |
TW200521732A (en) | Information system with natural language parsing ability and processing method thereof | |
Juan | An effective similarity measurement for FAQ question answering system | |
Alshammari et al. | TAQS: an Arabic question similarity system using transfer learning of BERT with BILSTM | |
Peng et al. | MPSC: A multiple-perspective semantics-crossover model for matching sentences | |
Su et al. | Improved TF-IDF weight method based on sentence similarity for spoken dialogue system | |
Maxwell et al. | Natural language processing and query expansion in legal information retrieval: Challenges and a response | |
Duan et al. | Topic-extended emotional conversation generation model based on joint decoding | |
Arora et al. | DCU at the TREC 2019 Conversational Assistance Track. | |
Theodosiou et al. | Evaluating annotators consistency with the aid of an innovative database schema | |
Pande et al. | A computational literature analysis of conversational AI research with a focus on the coaching domain | |
Hinkelmann | A Computational Literature Analysis of Conversational AI Research with a Focus on the Coaching Domain | |
Chen et al. | Erler at the NTCIR-13 OpenLiveQ Task. | |
Song et al. | Topic control in a free conversation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |