WO2016127459A1 - Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent - Google Patents
Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent Download PDFInfo
- Publication number
- WO2016127459A1 WO2016127459A1 PCT/CN2015/073842 CN2015073842W WO2016127459A1 WO 2016127459 A1 WO2016127459 A1 WO 2016127459A1 CN 2015073842 W CN2015073842 W CN 2015073842W WO 2016127459 A1 WO2016127459 A1 WO 2016127459A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- user
- dictionary
- input
- unregistered
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
Definitions
- the invention relates to the technical field of computer science, in particular to a method and a device for identifying unregistered words in an intelligent interactive system.
- the sentence needs to be segmented first, but the existing word segmentation effect is caused by the presence of unregistered words in some sentences. It is not ideal, so it also affects the subsequent calculation of the similarity of sentences, resulting in intelligent reduction of intelligent interactive systems.
- the effect of word segmentation depends on the word segmentation algorithm and the word segmentation dictionary.
- the word segmentation algorithm has achieved good results, it is difficult to have a big improvement, and whether the words in the word segmentation dictionary are complete will directly affect the effect of the word segmentation. If the word segmentation dictionary does not contain the word, then the unregistered word appears. The word is difficult to be correctly segmented.
- search engine In the intelligent interactive system, when some users use the search engine, they will consciously perform keyword query, that is, query with special characters such as spaces,
- the main object of the present invention is to provide an unregistered word recognition method in an intelligent interactive system, which enriches the user dictionary, and can improve the word segmentation effect and improve the intelligence of the intelligent interaction system when it is required to segment the sentences input by the user based on the user dictionary. Level.
- the present invention provides a method for identifying an unregistered word in an intelligent interactive system, and the method for identifying an unregistered word in the intelligent interactive system includes the following steps:
- S60 determining whether the word input by the user is a word in a network entry, and if yes, adding the word input by the user to the user dictionary as an unregistered word, and inputting the word input by the user from the The user enters the word dictionary to delete, otherwise the word entered by the user is ignored.
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- the step S10 includes:
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- the present invention also provides an apparatus for identifying an unregistered word in an intelligent interactive system, wherein the device for identifying a non-registered word in the intelligent interactive system includes:
- a first-level identification module configured to determine whether the length of the word input by the user is equal to 1 or greater than 4, and if yes, ignore the word input by the user;
- a secondary identification module configured to determine, when the length of the word input by the user is greater than 1 and less than or equal to 4, whether the word input by the user is a preset word segment dictionary or a word existing in a user dictionary, and if so, Ignore the words entered by the user;
- a three-level identification module configured to determine, when the word input by the user is not a word in the word segment dictionary or the user dictionary, whether the word input by the user is included in a word dictionary or a word in a user dictionary If yes, ignore the words entered by the user;
- a user input word dictionary update module configured to add the word input by the user as a possible unregistered word to the user input when the word input by the user is not included in a word of the word segment dictionary or the user dictionary In the word dictionary;
- a four-level identification module configured to add a word input by the user as an unregistered word into the user dictionary when the word input by the user is a word in a network entry, and input the word input by the user from The user enters a word dictionary to delete, otherwise ignores the word input by the user.
- the obtaining module is specifically configured to:
- the device for identifying the unregistered word in the intelligent interaction system further includes:
- the user inputs a word dictionary word frequency statistics module for counting the word frequency of each word in the user input word dictionary;
- a user dictionary update module configured to add the word as an unregistered word to the user dictionary if the word frequency of the word in the user input word dictionary is greater than a preset value, and input the word from the user into the word dictionary delete.
- the device for identifying the unregistered word in the intelligent interaction system further includes:
- a user dictionary building module is configured to establish a user dictionary in which commonly used words of the user-specific application domain and the unregistered words are stored.
- the device for identifying the unregistered word in the intelligent interaction system further includes:
- the user inputs a word dictionary word building module for establishing a user input word dictionary word, and storing possible unregistered words in the user input word dictionary.
- the technical solution of the present invention adopts the above technical solution, which is to recognize whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, and whether it is included in the In a word dictionary or a word in the user dictionary, the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, The words entered by the user are added to the user dictionary while they are deleted from the user input word dictionary.
- the embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.
- FIG. 1 is a schematic flow chart of a first preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention
- FIG. 2 is a schematic flow chart of a second preferred embodiment of an unregistered word recognition method in the intelligent interactive system of the present invention
- FIG. 3 is a schematic structural diagram of a first preferred embodiment of an unregistered word recognition apparatus in the intelligent interactive system of the present invention
- FIG. 4 is a schematic structural diagram of a second preferred embodiment of an unregistered word recognition apparatus in the intelligent interactive system of the present invention.
- Natural language processing is an important direction in the field of computer science and artificial intelligence.
- words are the smallest language unit. Chinese does not have a specific mark between words, so it is necessary to perform Chinese word segmentation in advance when performing automatic processing.
- the large number of unregistered words has become a technical bottleneck affecting the effect of Chinese word segmentation.
- Unregistered Word Recognition is a process of automatically detecting and identifying words that have not appeared in the dictionary from the corpus. It is an important basic technology in the field of natural language processing, in Chinese automatic word segmentation, dictionary compilation, information extraction, information. There are a wide range of application requirements in the fields of search and machine translation.
- the main object of the present invention is to provide an unregistered word recognition method in an intelligent interactive system, which enriches the user dictionary, and can improve the word segmentation effect and improve the intelligence of the intelligent interaction system when it is required to segment the sentences input by the user based on the user dictionary. Level.
- the present invention provides a method for identifying an unregistered word in an intelligent interactive system.
- FIG. 1 is a schematic flowchart diagram of a first preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention.
- the intelligent interaction system in the embodiment of the present invention includes a client and a server.
- the client is used to obtain content input by the user
- the server is used to input the user.
- the content is processed and the results are fed back.
- the method for identifying an unregistered word in the intelligent interactive system includes the following steps:
- the embodiment of the present invention acquires a word input by a user through a client.
- the user inputs content from the input terminal, because the commonly used input method mostly has a memory function, such as Sogou Pinyin input method, Baidu Pinyin input method, etc., the user is also accustomed to inputting the sentence word by word.
- the word input by the user can be obtained by asynchronous transmission.
- Asynchronous transmission refers to transmitting a word input by the user as a user to the server end of the intelligent interactive system when the user inputs a word.
- the statement is transmitted to the server as a whole. That is, the words and statements entered by the user are transmitted asynchronously to the server.
- the second-level recognition is performed on the word input by the user, and it is determined whether the input word is a preset word segment dictionary or a user dictionary.
- the preset word segment dictionary in the embodiment of the present invention is a Chinese word segment dictionary in the prior art; the user dictionary refers to a pre-established set of words unique to the field in a certain application field, such as a health management application field. For example, watching movies, diet, physiotherapy, etc.
- the user dictionary described in the embodiment of the present invention may also be empty, and added and enriched in the process of subsequent user input.
- a person skilled in the art may perform a word-by-word traversal search matching on the word segmentation dictionary by using various methods, for example, a word input by the user, or pre-establish an index based on the word input by the user, and perform search matching based on the index. It is not limited here, as long as it can be determined whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, when the word input by the user already exists in the preset word segment dictionary or the user dictionary. At the time, the words entered by the user are ignored, otherwise the third level recognition is performed.
- the second level recognition determines that the word input by the user does not exist in the preset word segment dictionary or the user dictionary, further determining whether the word input by the user is included in a word dictionary or a word in the user dictionary
- the inclusion described herein means that the word input by the user is entirely included in a word dictionary or a word in the user dictionary.
- the word entered by the user is “Hello”
- a word of the word dictionary or user dictionary is “Hello”
- “Hello” is included in “Hello”
- the word input by the user is “you are beautiful” and the word of the word segment dictionary or the user dictionary is “hello”, the word input by the user is considered not included in the word segment dictionary or the user dictionary. In a word.
- the word input by the user is added as a possible unregistered word to the user input word dictionary.
- the user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words.
- S60 determining whether the word input by the user is a word in a network entry, and if yes, adding the word input by the user to the user dictionary as an unregistered word, and inputting the word input by the user from the The user enters the word dictionary to delete, otherwise the word entered by the user is ignored.
- the word input by the user is added as a possible unregistered word to the user input word dictionary.
- the network term priority refers to a term currently provided by Baidu Encyclopedia. Baidu Encyclopedia adheres to the spirit of equality, collaboration, sharing and freedom. It advocates equality before the network. All people work together to write an encyclopedia, so that knowledge can be continuously combined and expanded under certain technical rules and cultural contexts.
- the words in the Baidu Encyclopedia entry include the most popular new words at present, which can identify the unregistered words to the maximum extent. If the word is in the network entry, the word input by the user is added to the user dictionary as an unregistered word, and the word input by the user is deleted from the user input word dictionary, otherwise the user is ignored. Enter the word.
- Steps S10 to S60 are sequentially used to identify all words input by the user.
- the server side matches the existing database based on the existing word segmentation, calculation similarity, and matching algorithm based on the word segment dictionary and the user dictionary.
- the embodiment of the present invention recognizes whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, whether it is included in the word segment dictionary or a word in the user dictionary.
- the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, the words input by the user are added to the user dictionary. At the same time, it is deleted from the user input word dictionary.
- the embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.
- the step S10 includes:
- the change content of the text box when the user inputs the content is obtained.
- the habit will be one by one.
- enter “I ask” or “Please” “Q”, “Technology Park”, “How", “Go” the client will get the change content of the user input text box, for example, get “Excuse” first, and "I would like to ask” as the word input by the user, according to the flow chart of the first preferred embodiment of the unregistered word recognition method in the intelligent interactive system of the present invention, the "excuse me” is identified as an unregistered word.
- the server based on the preset word segmentation dictionary and the updated user dictionary according to the existing word segmentation, calculation similarity, matching algorithm from the default database need to return Content.
- the existing word dictionary and the updated user dictionary can be used according to the existing after the user inputs the sentence. Cut the word to further improve the effect of word segmentation and improve the intelligence level of the intelligent interactive system.
- FIG. 2 is a schematic flowchart diagram of a second preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention.
- the method for identifying an unregistered word in the intelligent interactive system further includes the following steps. :
- the user input word dictionary is used to temporarily store possible unregistered words that the user recognizes step by step through the above steps in the process of inputting a sentence.
- the word frequency refers to the frequency at which the word appears in the user input dictionary. These words are words that the user often inputs but do not exist in the network entry.
- the word frequency of each word in the user input word dictionary is counted, and the word frequency is greater than the preset value (may be common words but not included in The preset word segment dictionary and the user dictionary are added to the user dictionary to further enrich the user dictionary and delete the words from the user input word dictionary.
- the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
- S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
- the user dictionary in the embodiment of the present invention is a set of words unique to the field that are pre-established in a certain application field, such as a health management application field, such as watching movies, diet therapy, physical therapy, and the like. After pre-establishment, the user dictionary can be added and enriched during subsequent user input.
- the present invention also provides an apparatus for identifying an unregistered word in an intelligent interactive system.
- FIG. 3 is a schematic structural diagram of a first preferred embodiment of an unregistered word recognition apparatus in an intelligent interactive system according to the present invention.
- the device for identifying an unregistered word in the intelligent interaction system includes:
- An obtaining module 10 configured to acquire a word input by a user
- the obtaining module 10 acquires a word input by a user through a client.
- the user inputs content from the input terminal, because the commonly used input method mostly has a memory function, such as Sogou Pinyin input method, Baidu Pinyin input method, etc., the user is also accustomed to inputting the sentence word by word.
- the word input by the user can be obtained by asynchronous transmission.
- Asynchronous transmission refers to transmitting a word input by the user as a user to the server end of the intelligent interactive system when the user inputs a word.
- the statement is transmitted to the server as a whole. That is, the words entered by the user are asynchronously transmitted to the server.
- the first-level identification module 20 is configured to determine whether the length of the word input by the user is equal to 1 or greater than 4, and if yes, ignore the word input by the user;
- the first-level identification module 20 first performs first-level recognition, and determines the user input by calculating the length of the word input by the user. Whether the length of the word is equal to 1 or greater than 4, that is, whether it is a single word or a word of 4 or more words, and if so, the word input by the user is ignored, that is, the word input by the user or more than 4 words is filtered out. The word, otherwise the second level of recognition.
- the secondary identification module 30 is configured to determine, when the length of the word input by the user is greater than 1 and less than or equal to 4, whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, and if so, Then ignore the words entered by the user;
- the secondary identification module 30 performs second-level recognition on the word input by the user. It is judged whether the input word is a word in a preset word segment dictionary or a user dictionary.
- the preset word segment dictionary in the embodiment of the present invention is a Chinese word segment dictionary in the prior art; the user dictionary refers to a pre-established set of words unique to the field in a certain application field, such as a health management application field. For example, watching movies, diet, physiotherapy, etc.
- the user dictionary described in the embodiment of the present invention may also be empty, and added and enriched in the process of subsequent user input.
- a person skilled in the art may perform a word-by-word traversal search matching on the word segmentation dictionary by using various methods, for example, a word input by the user, or pre-establish an index based on the word input by the user, and perform search matching based on the index. It is not limited here, as long as it can be determined whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, when the word input by the user already exists in the preset word segment dictionary or the user dictionary. At the time, the words entered by the user are ignored, otherwise the third level recognition is performed.
- the third-level identification module 40 is configured to determine, when the word input by the user is not a word in the word segment dictionary or the user dictionary, whether the word input by the user is included in the word dictionary or a word in the user dictionary If yes, ignore the words entered by the user;
- the three-level identification module 40 further determines whether the word input by the user includes In a word segmentation dictionary or a word in a user dictionary, the inclusion herein means that the word input by the user is entirely included in a word dictionary or a word in a user dictionary.
- the word entered by the user is “Hello”
- a word of the word dictionary or user dictionary is “Hello”
- “Hello” is included in “Hello”
- the word input by the user is “you are beautiful” and the word of the word segment dictionary or the user dictionary is “hello”, the word input by the user is considered not included in the word segment dictionary or the user dictionary. In a word.
- the user input word dictionary update module 50 is configured to: when the three-level identification module 40 determines that the word input by the user is not included in a word of the word segment dictionary or the user dictionary, the word input by the user Added as a possible unregistered word to the user input word dictionary;
- the user input word dictionary update module 50 inputs the user.
- the words are added to the user input word dictionary as possible unregistered words.
- the user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words.
- a four-level identification module 60 configured to add a word input by the user as an unregistered word into the user dictionary when the word input by the user is a word in a network entry, and input the word entered by the user Deleted from the user input word dictionary, otherwise the words entered by the user are ignored.
- the user input word dictionary update module 50 inputs the user.
- the word is added as a possible unregistered word to the user input word dictionary
- the four-level identification module 60 determines whether the word input by the user is a word in a network entry, and the network entry priority refers to the current Baidu Encyclopedia Can provide the terms.
- Baidu Encyclopedia adheres to the spirit of equality, collaboration, sharing and freedom. It advocates equality before the network. All people work together to write an encyclopedia, so that knowledge can be continuously combined and expanded under certain technical rules and cultural contexts.
- the words in the Baidu Encyclopedia entry include the most popular new words at present, which can identify the unregistered words to the maximum extent. If the words are in the network entry, the words input by the user are added to the user dictionary, and the words input by the user are deleted from the user input word dictionary, otherwise the words input by the user are ignored.
- the server side matches the content to be returned from the preset database according to the existing word segmentation, calculation similarity and matching algorithm based on the word segment dictionary and the user dictionary. . Since the unregistered words are added to the user dictionary, when the words input by the user based on the user dictionary need to be segmented, the word segmentation effect can be improved, and the intelligent level of the intelligent interactive system can be improved.
- the embodiment of the present invention recognizes whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, whether it is included in the word segment dictionary or a word in the user dictionary.
- the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, the words input by the user are added to the user dictionary. At the same time, it is deleted from the user input word dictionary.
- the embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.
- the acquiring module is specifically configured to:
- the change content of the text box when the user inputs the content is obtained.
- the habit will be one by one.
- enter “I ask” or “Please” “Q”, “Technology Park”, “How", “Go” the client will get the change content of the user input text box, for example, get “Excuse” first, and "I would like to ask” as the word input by the user, according to the flow chart of the first preferred embodiment of the unregistered word recognition method in the intelligent interactive system of the present invention, the "excuse me” is identified as an unregistered word.
- the server based on the preset word segmentation dictionary and the updated user dictionary according to the existing word segmentation, calculation similarity, matching algorithm from the default database need to return Content.
- the existing word dictionary and the updated user dictionary can be used according to the existing after the user inputs the sentence. Cut the word to further improve the effect of word segmentation and improve the intelligence level of the intelligent interactive system.
- FIG. 4 is a schematic structural diagram of a second preferred embodiment of an unregistered word recognition apparatus in an intelligent interactive system according to the present invention.
- the device for identifying the unregistered word in the intelligent interactive system further includes:
- the user input word dictionary word frequency statistics module 70 is configured to count the word frequency of each word in the user input word dictionary
- the user dictionary update module 80 is configured to add the word as an unregistered word into the user dictionary if the word frequency of the word in the user input word dictionary is greater than a preset value, and input the word from the user into the word dictionary Deleted.
- the user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words. These words are words that the user often inputs but do not exist in the network entry.
- the word frequency of each word in the user input word dictionary is counted, and the word frequency is greater than the preset value (may be common words but not included in The preset word segment dictionary and the user dictionary are added to the user dictionary to further enrich the user dictionary and delete the words from the user input word dictionary.
- the device for identifying a non-registered word in the intelligent interactive system further includes:
- a user dictionary building module is configured to establish a user dictionary in which commonly used words of a user-specific application domain are stored.
- the user dictionary in the embodiment of the present invention is a set of words unique to the field that are pre-established in a certain application field, such as a health management application field, such as watching movies, diet therapy, physical therapy, and the like. After pre-establishment, the user dictionary can be added and enriched during subsequent user input.
- the device for identifying a non-registered word in the intelligent interactive system further includes:
- the user inputs a word dictionary word building module for establishing a user input word dictionary word and storing possible unregistered words input by the user during the input sentence.
- the main stored content can be seen from the user input word dictionary word update module.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
L'invention concerne un procédé permettant de reconnaître un mot non enregistré dans un système d'interaction intelligent. Le procédé consiste à : en reconnaissant progressivement si une longueur d'un mot entré par un utilisateur est égale à 1 ou supérieure à 4, si le mot entré par l'utilisateur est un mot existant dans un dictionnaire de segmentation de mots prédéfini ou un dictionnaire utilisateur et, si le mot entré par l'utilisateur est inclus dans un mot du dictionnaire de segmentation de mots ou du dictionnaire utilisateur niveau par niveau, filtrer un mot non enregistré éventuel, ajouter celui-ci à un dictionnaire de mots d'entrée d'utilisateurs et créer un enregistrement temporaire ; et lorsque le mot entré par l'utilisateur est également reconnu comme un mot dans une entrée réseau, ajouter le mot entré par l'utilisateur au dictionnaire utilisateur et supprimer simultanément celui-ci du dictionnaire de mots d'entrée d'utilisateurs. Un dictionnaire non enregistré éventuel est ajouté à un dictionnaire utilisateur en reconnaissant progressivement le mot entré par l'utilisateur niveau par niveau de façon à enrichir le dictionnaire utilisateur ; lorsque la phrase entrée par l'utilisateur est segmentée d'après le dictionnaire utilisateur, l'effet de segmentation de mots peut être amélioré, et le niveau d'intelligence du système d'interaction intelligent peut être amélioré.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510074982.3 | 2015-02-12 | ||
CN201510074982.3A CN104714940A (zh) | 2015-02-12 | 2015-02-12 | 智能交互系统中未登录词的识别方法和装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016127459A1 true WO2016127459A1 (fr) | 2016-08-18 |
Family
ID=53414286
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2015/073842 WO2016127459A1 (fr) | 2015-02-12 | 2015-03-07 | Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN104714940A (fr) |
WO (1) | WO2016127459A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113010665A (zh) * | 2019-12-20 | 2021-06-22 | 北京搜狗科技发展有限公司 | 一种词处理的方法及相关装置 |
CN113111655A (zh) * | 2021-05-12 | 2021-07-13 | 数库(上海)科技有限公司 | 分离词典的构建方法、基于分离词典的分词方法及设备 |
CN115221872A (zh) * | 2021-07-30 | 2022-10-21 | 苏州七星天专利运营管理有限责任公司 | 一种基于近义扩展的词汇扩展方法和系统 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108877939A (zh) * | 2018-05-10 | 2018-11-23 | 重庆大学 | 一种具有智能特征提取功能的健康管理系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1629836A (zh) * | 2003-12-17 | 2005-06-22 | 北京大学 | 学习中文新词的方法与装置 |
CN1912872A (zh) * | 2006-07-25 | 2007-02-14 | 北京搜狗科技发展有限公司 | 一种提取新词的方法和系统 |
CN101079027A (zh) * | 2007-06-27 | 2007-11-28 | 腾讯科技(深圳)有限公司 | 一种中文分词方法及系统 |
CN101118556A (zh) * | 2007-09-17 | 2008-02-06 | 中国科学院计算技术研究所 | 一种短文本的新词发现方法和系统 |
CN101539940A (zh) * | 2009-05-04 | 2009-09-23 | 清华大学 | 获取新词的方法和装置 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE19830225A1 (de) * | 1998-07-07 | 2000-01-13 | Wolfgang Hilberg | Elektronisches System für die flexible Aufnahme, Abgabe und Speicherung längerer Texte |
US7403888B1 (en) * | 1999-11-05 | 2008-07-22 | Microsoft Corporation | Language input user interface |
CN100595760C (zh) * | 2007-08-31 | 2010-03-24 | 北京搜狗科技发展有限公司 | 一种获取口语词条的方法、装置以及一种输入法系统 |
CN101556596B (zh) * | 2007-08-31 | 2012-04-18 | 北京搜狗科技发展有限公司 | 一种输入法系统及智能组词的方法 |
CN101751386B (zh) * | 2009-12-28 | 2012-05-23 | 华建机器翻译有限公司 | 一种未登录词的识别方法 |
CN103020034A (zh) * | 2011-09-26 | 2013-04-03 | 北京大学 | 中文分词方法和装置 |
CN103678684B (zh) * | 2013-12-25 | 2017-05-31 | 沈阳美行科技有限公司 | 一种基于导航信息检索的中文分词方法 |
CN104156349B (zh) * | 2014-03-19 | 2017-08-15 | 邓柯 | 基于统计词典模型的未登录词发现和分词系统及方法 |
CN103942190B (zh) * | 2014-04-16 | 2017-08-25 | 科大讯飞股份有限公司 | 语音合成中文本分词方法及系统 |
-
2015
- 2015-02-12 CN CN201510074982.3A patent/CN104714940A/zh not_active Withdrawn
- 2015-03-07 WO PCT/CN2015/073842 patent/WO2016127459A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1629836A (zh) * | 2003-12-17 | 2005-06-22 | 北京大学 | 学习中文新词的方法与装置 |
CN1912872A (zh) * | 2006-07-25 | 2007-02-14 | 北京搜狗科技发展有限公司 | 一种提取新词的方法和系统 |
CN101079027A (zh) * | 2007-06-27 | 2007-11-28 | 腾讯科技(深圳)有限公司 | 一种中文分词方法及系统 |
CN101118556A (zh) * | 2007-09-17 | 2008-02-06 | 中国科学院计算技术研究所 | 一种短文本的新词发现方法和系统 |
CN101539940A (zh) * | 2009-05-04 | 2009-09-23 | 清华大学 | 获取新词的方法和装置 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113010665A (zh) * | 2019-12-20 | 2021-06-22 | 北京搜狗科技发展有限公司 | 一种词处理的方法及相关装置 |
CN113111655A (zh) * | 2021-05-12 | 2021-07-13 | 数库(上海)科技有限公司 | 分离词典的构建方法、基于分离词典的分词方法及设备 |
CN115221872A (zh) * | 2021-07-30 | 2022-10-21 | 苏州七星天专利运营管理有限责任公司 | 一种基于近义扩展的词汇扩展方法和系统 |
Also Published As
Publication number | Publication date |
---|---|
CN104714940A (zh) | 2015-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020009297A1 (fr) | Appareil et procédé d'amélioration des performances de compréhension d'un langage sur la base d'une extraction de domaine | |
WO2017143692A1 (fr) | Téléviseur intelligent et son procédé de commande vocale | |
WO2016127459A1 (fr) | Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent | |
WO2017156893A1 (fr) | Procédé de commande vocale et téléviseur intelligent | |
WO2018034426A1 (fr) | Procédé de correction automatique d'erreurs dans un corpus balisé à l'aide de règles pdr de noyau | |
WO2019177182A1 (fr) | Appareil de recherche de contenu multimédia et procédé de recherche utilisant une analyse d'informations d'attributs | |
WO2019080406A1 (fr) | Procédé d'interaction vocale de télévision, dispositif de commande d'interaction vocale et support de stockage | |
WO2012134180A2 (fr) | Procédé de classification des émotions pour analyser des émotions inhérentes dans une phrase et procédé de classement des émotions pour des phrases multiples à l'aide des informations de contexte | |
WO2017028601A1 (fr) | Procédé et dispositif de commande vocale pour un terminal intelligent et système de télévision | |
WO2015131803A1 (fr) | Procédé et système de recommandation d'application | |
WO2013170662A1 (fr) | Procédé et dispositif d'ajout d'informations d'amis, et support de stockage informatique | |
WO2016167424A1 (fr) | Dispositif de recommandation de réponse automatique, et système et procédé de complétion automatique de phrase | |
WO2019242090A1 (fr) | Procédé, dispositif et appareil de réponse de service client intelligent, et support d'informations | |
WO2013139239A1 (fr) | Procédé de recommandation d'utilisateurs dans un réseau social et système associé | |
WO2017197802A1 (fr) | Procédé et appareil de mise en correspondance floue de chaînes de caractères | |
WO2018023926A1 (fr) | Procédé et système d'interaction pour téléviseur et terminal mobile | |
WO2015144089A1 (fr) | Procédé et appareil de recommandation d'application | |
WO2020224247A1 (fr) | Procédé, appareil et dispositif de provenance de données basés sur la chaine de blocs, et support d'informations lisible | |
WO2019169814A1 (fr) | Procédé, appareil et dispositif de génération automatique d'annotation en chinois, et support d'informations | |
WO2019051902A1 (fr) | Procédé de commande de terminal, climatiseur et support d'informations lisible par un ordinateur | |
WO2012130145A1 (fr) | Procédé et dispositif d'acquisition et de recherche d'informations de connaissance pertinentes | |
WO2016032021A1 (fr) | Appareil et procédé de reconnaissance de commandes vocales | |
WO2019218527A1 (fr) | Procédé et appareil de traitement de langage naturel combiné multisystème | |
WO2019085543A1 (fr) | Système de télévision et procédé de commande de télévision | |
WO2019062112A1 (fr) | Procédé et dispositif de commande d'un appareil de climatisation, appareil de climatisation et support lisible par ordinateur |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15881621 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15881621 Country of ref document: EP Kind code of ref document: A1 |