WO2012079245A1 - 知识获取装置及方法 - Google Patents
知识获取装置及方法 Download PDFInfo
- Publication number
- WO2012079245A1 WO2012079245A1 PCT/CN2010/079937 CN2010079937W WO2012079245A1 WO 2012079245 A1 WO2012079245 A1 WO 2012079245A1 CN 2010079937 W CN2010079937 W CN 2010079937W WO 2012079245 A1 WO2012079245 A1 WO 2012079245A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- arbitrary
- model
- lattice
- frame
- unit
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
Definitions
- the present invention relates to the field of natural language processing research, and in particular to a knowledge acquisition device and a method
- Plant has two kinds of semantics: "plant” and "workshop". When “plant” and “life” or “eat” appear simultaneously in a sentence, the probability of "plant” is much larger than that of "workshop”; but when the sentence The "plant” and “manufacturing” appear at the same time, and the semantics is mainly "workshop”. If the computer is given the corresponding semantic analysis knowledge, the computer has the corresponding semantic disambiguation ability.
- grammar is a formal grammatical model that expresses linguistic structure in the "grid frame" (refer to "Nature” Formal Patterns of Language Processing, Feng Zhiwei, China University of Science and Technology Press, p. 293, first edition, January 2010).
- the grammar was first proposed by the American linguist C. Fillmore and defined the agent, the patient, the instrumental, the objective, the locative, and the subject. Dative ), factitive, benefactive, time, source, goal, comitative, etc.
- Each grid frame is centered on verbs or adjectives, and has a corresponding case slot.
- the grid has corresponding attribute features, such as the agent's agent (the subject of the sentence) and the object lattice. (the object of the sentence), and attributes that represent information such as time places, tools, and so on.
- Disambiguation is one of the fundamental tasks of natural language processing research due to the diversity and complexity of language. Disambiguation tasks are almost all over the various fields of natural language processing, such as word segmentation, part-of-speech tagging, syntactic structure analysis, and semantic analysis. , target language generation, etc., in the field of machine translation, speech recognition, dialogue systems and information retrieval must also solve the problem of disambiguation. In the disambiguation problem, the task of disambiguation of syntactic structure is very arduous.
- the syntactic structure of predicate components such as verbs is often a bridge from source language analysis to target language generation, which is related to the correctness of production language and The degree of process is one of the key technologies for machine translation research.
- Syntactic structure disambiguation is one of the premise and key factors of semantic disambiguation.
- the difficulty of syntactic structure disambiguation lies in the fact that the same verb has a majority of different structures, which is reflected in the diversity of the verb frame.
- Traditional natural language processing systems often use artificial methods to construct the frame of verbs. However, due to the large number of patterns in the grid, all artificial construction requires a lot of human resources.
- Patent Document 1 proposes a machine learning method based on probability dependent graphs to realize the lattice processing of the grid frame.
- Patent Document 1 Japanese Patent No. 3353578;
- Non-Patent Document 1 Daisuke Kawahara, Kazuo Kasumi. High-performance computing environment ⁇ Web force, D large-scale grid 7 - ⁇ construction;
- Non-Patent Document 2 Daisuke Kawahara, Kazuo Kazuo: Gege 7 ⁇ Dictionary D Gradually automatic construction, Japan Society of Natural Speech Processing, Vol.12, ⁇ .2, ⁇ .109-131, 2005.
- a first object of the present invention is to provide an efficient knowledge acquisition device.
- a second object of the present invention is to propose an efficient knowledge acquisition method.
- the present invention provides a knowledge acquisition apparatus, including: a grid frame feature extraction unit for extracting a grid frame element of a predicate component in an input sentence and attribute information thereof; a model library for storing Arbitrary lattice model; Arbitrary lattice decision unit is used to perform pattern matching on the extraction result of the lattice frame feature extraction unit and the arbitrary lattice model, and determine the arbitrary lattice information in the lattice frame of the predicate component.
- the present invention provides a knowledge acquisition method, including: extracting a grid frame element of a predicate component in an input sentence and attribute information thereof; performing pattern matching on the extraction result and the stored arbitrary lattice model, and determining a predicate Arbitrary information in the grid of components.
- FIG. 1 is a flowchart of Embodiment 1 of a knowledge acquisition method according to the present invention
- Embodiment 2 is a flowchart of Embodiment 2 of the knowledge acquisition method of the present invention.
- Embodiment 3 is a flowchart of Embodiment 3 of the knowledge acquisition method of the present invention.
- Embodiment 1 of the knowledge acquisition apparatus of the present invention
- FIG. 5 is a structural diagram of Embodiment 2 of the knowledge acquisition apparatus of the present invention.
- Figure 6 is a schematic diagram of the syntactic structure analysis of Japanese sentences
- FIG. 7 is a schematic diagram of the extracted verb frame structure.
- the various embodiments of the present invention are mainly based on the thinking of any lattice in the lattice frame of the predicate component, such as: sentences in Japanese:
- the bit frame structure reduces the difficulty of sentence analysis, syntactic structure disambiguation and semantic disambiguation in application systems with natural language understanding such as machine translation and dialogue systems.
- FIG. 1 is a flowchart of Embodiment 1 of a knowledge acquisition method according to the present invention. As shown in Figure 1, this embodiment includes:
- Step 102 Extract the grid frame element of the predicate component in the input sentence and its attribute information.
- Step 104 Perform pattern matching on the extracted result and the stored arbitrary lattice model, and determine arbitrary lattice information in the grid frame of the predicate component.
- the pattern matching is performed according to the stored arbitrary lattice model and the lattice frame of the predicate component, thereby realizing automatic acquisition and effective distinction between the necessary lattice and the arbitrary lattice of the prefix frame of the predicate component, and improving the structure of the natural language processing.
- FIG. 2 is a flowchart of Embodiment 2 of the knowledge acquisition method of the present invention.
- This embodiment uses Japanese The relationship between the mandatory lattice and the arbitrary lattice of the verb grid frame is explained as an example, and those skilled in the art can understand that the embodiments of the present invention are not limited to Japanese, and can be applied to any other language. As shown in FIG. 2, this embodiment includes:
- Step 201 receiving an input sentence, such as receiving a sentence [the other car from the library to the line ⁇ ], in the specific operation, the received sentence can also be read into the memory;
- Step 202 Perform lexical and syntactic analysis on the input sentence, such as:
- the lexical analysis includes two steps: the segmentation and the acquisition of the attribute characteristics of the word.
- the word segmentation is to segment the words of the sentence.
- the above sentence can be divided into [the / / self-driving car / ⁇ ⁇ library / ⁇ / line ⁇ ],
- the assignment of the attribute characteristics of the words can be obtained from a machine-readable dictionary, such as part of speech, the use of verbs, and the like;
- Step 203 Perform a feature extraction of the grid frame on the input sentence
- the semantic and conceptual information of the keyword is obtained from the read knowledge base information; when the feature extraction of the position frame of the predicate component such as the verb is performed, the predicate word to be extracted needs to be determined in advance.
- Characteristic elements such as words, part of speech, semantics, concepts, applicable fields, etc., and then extract the attribute values of the corresponding feature elements from the analysis results of step 202 and the knowledge base according to each component of the formulated feature elements; [People from the ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ] ] , ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ Attributes (or attribute information) such as semantics and concepts of [self-driving car] and [ ⁇ ]; such as the grammatical frame of the verb [row ⁇ ] extracted from the Japanese sentence [the ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ] ]]]]
- the attributes [person/animal], [self-driving] attributes [vehicles/items], [books] attributes [buildings/locations], etc. can be obtained; Understand that a specific instance of the knowledge base can be selected according to the type of language input and the selected features, when the input language When you speak Japanese, you can use the EDR dictionary developed by the Japanese information and communication organization. WordNet can be used in English, HowNet can be used in Chinese, and so on.
- Step 204 Perform pattern matching on the arbitrary lattice model stored in the pattern library and the grid frame of the predicate words extracted in step 203, and determine arbitrary lattice information in the grid frame of the predicate word; briefly described below; Explanation of 3;
- Step 205 Output the determination result of step 204.
- the determination result may also be sent to the knowledge base for processing by the lattice feature extraction unit to improve the performance and efficiency of the knowledge acquisition of the system;
- the output data may be combined in a certain format according to requirements, and the output may be in the form of a file, or may be directly stored in the database, for example, corresponding to the determination result of the above step 204, which may be [self-driving ⁇ '], It can also be [traffic means ⁇ '], that is, the phrase that can be determined to be an arbitrary sentence, or a segment containing semantic information and a specific helper; also can be judged in the sentence for the convenience of information processing and simplifying the processing of the verb frame.
- the arbitrary pattern is output together with the predicate component in the sentence, or the arbitrary lattice phrase after the arbitrary lattice extraction and the sentence from which the arbitrary lattice phrase is removed are output.
- the correct distinction between the necessary lattice and the arbitrary lattice in the lattice frame is realized by judging the relationship between the lattice frame and the arbitrary lattice of the predicate components such as verbs in the sentence, so that the structure of the predicate components such as verbs is simple, thereby greatly improving Coverage of the verb grid framework, and improve the accuracy of structural disambiguation and semantic disambiguation in syntactic structure analysis and semantic analysis, and provide efficient and credible knowledge for natural language understanding research fields such as information retrieval, machine translation, and dialogue systems. Get the method.
- FIG. 3 is a flowchart of Embodiment 3 of the knowledge acquisition method of the present invention. It is mainly used to explain the process of constructing a model library according to a machine learning method. Those skilled in the art can understand that the model library can be established based on learning data according to various machine learning methods. The following uses a support vector machine S VM as an example to use a machine. The learning method establishes a model library for explanation. As shown in FIG.
- the embodiment includes: Step 301, feature extraction;
- the theoretical algorithm of the support vector machine can refer to the following non-patent literature: [[Non-non-patented patents for literature 33]] Fang Fangrui Ruiming, support for the theoretical theory of the theory of the direction of the machine and its application analysis;; China National Electric Power Co., Ltd. ,, on the 11th of January, 1100, 1970, IISSBBNN:: 99778877550088336600337799..
- Hhttttpp :////wwwwww..ccss.. ccoorrnneellll..eedduu//PPeeooppllee//ttjj// ssvvmm lliigghhtt// oolldd// ssvvmm——lliigghhtt——vv44..0000.. hhttmmll
- the module module block it is possible to pass the command order of the pair of used learning modules, such as SSVVMM LLiigghhtt's ssvvmm - lleeaarrnniinngg learning order command , the selection of the number of parameter functions of the line is performed by pre-presetting the number of parameter parameters of the command command;; at the same time, the support is used Vector measuring machine At the same time, it also involves the calculation of the generation of special features, the selection of the special features, and the calculation of the special features.
- SSVVMM LLiigghhtt's ssvvmm - lleeaarrnniinngg learning order command the selection of the number of parameter functions of the line is performed by pre-presetting the number of parameter parameters of the command command;
- the support is used Vector measuring machine
- Special feature vector feature space between space and space can be based on the data used in the study of learning data, such as the text file of the text file segmentation Processing, calculating the approximate probability rate of the word or frequency of the word, or the occurrence frequency of the NN element model or the probability ratio of the current frequency, and proceeding to remove In addition to the part of the high-frequency frequency word part of the division, and so on, the work is done to complete the feature selection and selection;
- There are many methods for calculating the special feature rights such as Such as Bubuler's weight, absolute absolute word frequency ((TTFF)), inverted document file frequency ((IIDDFF)), TTFF--IIDDFF, TTFFCC, IITTCC, , entropy entropy weights and as well as TTFF--IIWWFF, etc.;
- the SSVVMM classification classifier when using the SSVVMM classification classifier, it is necessary to perform pre-pre-processing on the data of the learning learning data, and the pre-preprocessing is removed except the above.
- the characteristics of the special feature sign to the vector space, the special feature selection and the special feature weighting method, the selection method, the selection method, etc.
- the wrong example of the wrong case is identified as the class -11;; in addition to this, it is also necessary to root the space between the vector space and the empty space according to the characteristic feature
- Each element element is converted into a lattice format conversion method for all positive examples and negative negative examples in the data of the learning learning data.
- the format of the format conversion is changed from time to time, it is generally possible to use it.
- the line number of each characteristic feature element of the feature set in the set of vector space and space is replaced by the word or short phrase in the data of the learning data. ;; Examples such as:
- the word frequency is statistically assumed, and the state vector space (ie, the extracted feature) shown in Table 1 is assumed, as an example, and should not be interpreted as a limit;
- Step 302 according to the above extracted features and machine learning methods, modeling; as described above, if using SVMLight, the above svm-learning can be used Complete the machine learning task, get the SVM-based model library, and the model in the obtained model library is as follows:
- the feature elements of the predicate word extracted in step 203 in FIG. 2 are required to conform to the requirements of the constituent elements of the model in the model library; for example, using the SVM classifier
- the feature vector space used for SVM learning should contain knowledge. Semantics, concepts, applicable fields, etc. in the library.
- This embodiment is explained based on the word and Boolean weighted SVM learning method, and other methods such as supervised learning method, unsupervised learning method, semi-supervised learning method, clustering algorithm, related algorithm, and complex feature set can be used in the specific operation. And unity operation, probability context-free grammar, unitary model, hidden Markov model, naive Bayesian, decision tree model, maximum entropy model, error-driven transformation method, neural network, conditional random field (CRF) At least one of methods such as bootstrapping, Co-Training, and the like.
- methods such as bootstrapping, Co-Training, and the like.
- Embodiment 4 is a structural diagram of Embodiment 1 of the knowledge acquisition apparatus of the present invention.
- the method embodiments shown in Figures 1-3 can be applied to this embodiment.
- the embodiment includes: a grid frame feature extraction unit 420, which is used for extracting a grid frame element of the predicate component in the input sentence and its attribute information; a model library 4020 for storing the arbitrary lattice model; and an arbitrary lattice determining unit 430 for Pattern extraction is performed on the extraction result of the lattice frame feature extraction unit and the arbitrary lattice model, and the arbitrary lattice information in the lattice frame of the predicate component is determined.
- the input sentence memory unit 400 is configured to receive an input sentence, and the specific operation module can use various universal input modules, such as a keyboard, a pointing device, a handwritten character recognition, an optical character reader, a voice input recognition to input a sentence, or a text. Inputting in the form of a file or a database; the input sentence memory unit 400 may be a unit of various existing input statements capable of executing processing for obtaining language information;
- the lexical parsing unit 410 is configured to perform word segmentation processing and syntactic structure analysis on the input sentence; wherein, the word segmentation processing includes segmenting the input sentences, and assigning each word a part of the related attribute features; the syntactic structure analysis includes inputting The structure of the sentence, for example, the syntactic structure analysis of the Chinese sentence to determine the subject, predicate, object, attribute, adverbial and complement of the sentence; the knowledge base 4010 is used to give the output result of the lexical parsing unit 410, that is, the sentence Attributes such as semantics and concepts of words or phrases of each constituent element; for example, WordNet in English, HowNet in Chinese, etc.; The purpose of adding semantic and conceptual attribute features is to abstract the extracted grid frame; for example, Japanese sentences [ He is from the ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ]
- the grid frame feature extracting unit 420 is configured to extract the feature of the cell frame of the object verb for the output result of the lexical parsing unit 410 and the attribute features such as semantics and concepts acquired from the knowledge base 4010, and is the arbitrary cell determining unit 430.
- the pattern matching processing performed between the model library 4020 and the model library 4020 provides data conditions and basis; the feature selection method of the grid frame feature extraction unit 420 has many methods, and generally can use a feature extraction method based on document frequency, an information gain method, a ⁇ 2 statistical method, and Mutual information methods and so on.
- model library 4020 can be based on statistical methods.
- the learning data is used to determine the grid frame features extracted by the grid frame feature extraction unit 420, thereby determining and distinguishing the necessary lattices and arbitrary lattices in the lattice frame elements of the predicate components such as verbs.
- Models in the model library can be obtained from statistical machine learning methods using learning data, such as support vector machines, decision trees, and the like;
- the arbitrary cell determining unit 430 is configured to perform pattern matching between the verb cell frame feature extracted by the cell frame feature extracting unit 420 and the model library 4020, and determine the elements of the lattice frame of the predicate component such as the verb, and distinguish the necessary cells.
- an arbitrary lattice specifically, the model library 4020 established by using the support vector machine SVM, when there is an arbitrary lattice model such as [traffic means ⁇ '] in the model library 4020,
- the sentence [car] in the sentence [Peter Auto Club: Line:] can obtain semantic information [traffic means] from the knowledge base, and it can be known that the [transport means ⁇ '] in the model library 4020 is an arbitrary grid.
- [car ⁇ '] is an arbitrary grid;
- the output unit 440 is configured to output the result of the arbitrary cell determining unit, and the output may be in various forms, which may be a file output, or may be a display output or the like; and corresponding to the input sentence processed by the arbitrary cell determining unit 430, the output may be [Car ⁇ '], or [Car ⁇ '] and [Big Club (: Line:], etc., can also be output according to the needs of users.
- the output unit 440 writes its output to the knowledge base 4010 for direct processing by the lattice feature extraction unit 420 to improve the performance and efficiency of knowledge acquisition of the system.
- the arbitrary cell determining unit 430 can successfully divide the lattice elements in the grid frame of the verb into the necessary lattice and the arbitrary lattice, and separate the arbitrary lattice of the verb from the verb lattice frame to achieve the simplified verb lattice.
- the bit frame the purpose of compressing the number of grid frames, at the same time, can also reduce the difficulty of syntactic structure disambiguation and semantic disambiguation, improve the accuracy of syntactic analysis and semantic analysis, and related research on machine translation, information retrieval and speech recognition. And the application field plays a good role in promoting and improving.
- FIG. 5 is a structural diagram of a third embodiment of the knowledge acquisition apparatus of the present invention.
- the method embodiments shown in Figures 1-3 can be applied to this embodiment.
- the constituent unit and the connection relationship of the present embodiment are substantially the same as the knowledge acquisition apparatus shown in FIG. 5, and the difference is that: a database 5030 for storing learning data (such as a large-scale corpus) and machine learning are added.
- a database 5030 for storing learning data (such as a large-scale corpus) and machine learning are added.
- the machine learning unit 510 can perform machine learning using data in the learning database 5030 using methods such as support vector machines, decision trees, etc., thereby constructing a model library 4020, as explained in detail with reference to FIG.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201080069243.0A CN103119585B (zh) | 2010-12-17 | 2010-12-17 | 知识获取装置及方法 |
PCT/CN2010/079937 WO2012079245A1 (zh) | 2010-12-17 | 2010-12-17 | 知识获取装置及方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2010/079937 WO2012079245A1 (zh) | 2010-12-17 | 2010-12-17 | 知识获取装置及方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012079245A1 true WO2012079245A1 (zh) | 2012-06-21 |
Family
ID=46243987
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2010/079937 WO2012079245A1 (zh) | 2010-12-17 | 2010-12-17 | 知识获取装置及方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN103119585B (zh) |
WO (1) | WO2012079245A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714053A (zh) * | 2013-11-13 | 2014-04-09 | 北京中献电子技术开发中心 | 一种面向机器翻译的日语动词识别方法 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108959240A (zh) * | 2017-05-26 | 2018-12-07 | 上海醇聚信息科技有限公司 | 一种专有本体自动生成系统及方法 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1255213A (zh) * | 1997-03-04 | 2000-05-31 | 石仓博 | 语言分析系统及方法 |
JP2007206888A (ja) * | 2006-01-31 | 2007-08-16 | Toyota Central Res & Dev Lab Inc | 応答生成装置、方法及びプログラム |
WO2008117432A1 (ja) * | 2007-03-27 | 2008-10-02 | Fujitsu Limited | 電子文書の秘匿化プログラム |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7689411B2 (en) * | 2005-07-01 | 2010-03-30 | Xerox Corporation | Concept matching |
US8301435B2 (en) * | 2006-02-27 | 2012-10-30 | Nec Corporation | Removing ambiguity when analyzing a sentence with a word having multiple meanings |
JP5128328B2 (ja) * | 2008-03-13 | 2013-01-23 | 日本放送協会 | 曖昧性評価装置およびプログラム |
KR100956794B1 (ko) * | 2008-08-28 | 2010-05-11 | 한국전자통신연구원 | 다단계 용언구 패턴을 적용한 번역장치와 이를 위한적용방법 및 추출방법 |
CN101887443B (zh) * | 2009-05-13 | 2012-12-19 | 华为技术有限公司 | 一种文本的分类方法及装置 |
-
2010
- 2010-12-17 WO PCT/CN2010/079937 patent/WO2012079245A1/zh active Application Filing
- 2010-12-17 CN CN201080069243.0A patent/CN103119585B/zh active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1255213A (zh) * | 1997-03-04 | 2000-05-31 | 石仓博 | 语言分析系统及方法 |
JP2007206888A (ja) * | 2006-01-31 | 2007-08-16 | Toyota Central Res & Dev Lab Inc | 応答生成装置、方法及びプログラム |
WO2008117432A1 (ja) * | 2007-03-27 | 2008-10-02 | Fujitsu Limited | 電子文書の秘匿化プログラム |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714053A (zh) * | 2013-11-13 | 2014-04-09 | 北京中献电子技术开发中心 | 一种面向机器翻译的日语动词识别方法 |
Also Published As
Publication number | Publication date |
---|---|
CN103119585A (zh) | 2013-05-22 |
CN103119585B (zh) | 2015-12-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108170749B (zh) | 基于人工智能的对话方法、装置及计算机可读介质 | |
CN111143576A (zh) | 一种面向事件的动态知识图谱构建方法和装置 | |
WO2020232943A1 (zh) | 用于事件预测的知识图构建方法与事件预测方法 | |
CN108334495A (zh) | 短文本相似度计算方法及系统 | |
CN110209806A (zh) | 文本分类方法、文本分类装置及计算机可读存储介质 | |
CN104391942A (zh) | 基于语义图谱的短文本特征扩展方法 | |
KR101627428B1 (ko) | 딥 러닝을 이용하는 구문 분석 모델 구축 방법 및 이를 수행하는 장치 | |
CN113377897B (zh) | 基于深度对抗学习的多语言医疗术语规范标准化系统及方法 | |
CN110717341B (zh) | 一种以泰语为枢轴的老-汉双语语料库构建方法及装置 | |
CN103314369B (zh) | 机器翻译装置和方法 | |
WO2017198031A1 (zh) | 解析语义的方法和装置 | |
CN112420024A (zh) | 一种全端到端的中英文混合空管语音识别方法及装置 | |
Alsallal et al. | Intrinsic plagiarism detection using latent semantic indexing and stylometry | |
CN114217766A (zh) | 基于预训练语言微调与依存特征的半自动需求抽取方法 | |
Zhan et al. | Survey on event extraction technology in information extraction research area | |
Sun et al. | Multi-channel CNN based inner-attention for compound sentence relation classification | |
CN114722774B (zh) | 数据压缩方法、装置、电子设备及存储介质 | |
Kessler et al. | Extraction of terminology in the field of construction | |
CN115033753A (zh) | 训练语料集构建方法、文本处理方法及装置 | |
Yuwana et al. | On part of speech tagger for Indonesian language | |
CN112632272A (zh) | 基于句法分析的微博情感分类方法和系统 | |
WO2012079245A1 (zh) | 知识获取装置及方法 | |
CN106021225A (zh) | 一种基于汉语简单名词短语的汉语最长名词短语识别方法 | |
CN113590768B (zh) | 一种文本关联度模型的训练方法及装置、问答方法及装置 | |
Wang et al. | Attention-based recurrent neural model for named entity recognition in Chinese social media |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080069243.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10860718 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC DATED 07.10.2013 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 10860718 Country of ref document: EP Kind code of ref document: A1 |