WO2016127459A1 - Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent - Google Patents

Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent Download PDF

Info

Publication number
WO2016127459A1
WO2016127459A1 PCT/CN2015/073842 CN2015073842W WO2016127459A1 WO 2016127459 A1 WO2016127459 A1 WO 2016127459A1 CN 2015073842 W CN2015073842 W CN 2015073842W WO 2016127459 A1 WO2016127459 A1 WO 2016127459A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
user
dictionary
input
unregistered
Prior art date
Application number
PCT/CN2015/073842
Other languages
English (en)
Chinese (zh)
Inventor
张贯京
陈兴明
葛新科
张少鹏
方静芳
高伟明
梁艳妮
周荣
梁昊原
周亮
Original Assignee
深圳市前海安测信息技术有限公司
深圳市易特科信息技术有限公司
深圳市贝沃德克生物技术研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市前海安测信息技术有限公司, 深圳市易特科信息技术有限公司, 深圳市贝沃德克生物技术研究院有限公司 filed Critical 深圳市前海安测信息技术有限公司
Publication of WO2016127459A1 publication Critical patent/WO2016127459A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis

Definitions

  • the invention relates to the technical field of computer science, in particular to a method and a device for identifying unregistered words in an intelligent interactive system.
  • the sentence needs to be segmented first, but the existing word segmentation effect is caused by the presence of unregistered words in some sentences. It is not ideal, so it also affects the subsequent calculation of the similarity of sentences, resulting in intelligent reduction of intelligent interactive systems.
  • the effect of word segmentation depends on the word segmentation algorithm and the word segmentation dictionary.
  • the word segmentation algorithm has achieved good results, it is difficult to have a big improvement, and whether the words in the word segmentation dictionary are complete will directly affect the effect of the word segmentation. If the word segmentation dictionary does not contain the word, then the unregistered word appears. The word is difficult to be correctly segmented.
  • search engine In the intelligent interactive system, when some users use the search engine, they will consciously perform keyword query, that is, query with special characters such as spaces,
  • the main object of the present invention is to provide an unregistered word recognition method in an intelligent interactive system, which enriches the user dictionary, and can improve the word segmentation effect and improve the intelligence of the intelligent interaction system when it is required to segment the sentences input by the user based on the user dictionary. Level.
  • the present invention provides a method for identifying an unregistered word in an intelligent interactive system, and the method for identifying an unregistered word in the intelligent interactive system includes the following steps:
  • S60 determining whether the word input by the user is a word in a network entry, and if yes, adding the word input by the user to the user dictionary as an unregistered word, and inputting the word input by the user from the The user enters the word dictionary to delete, otherwise the word entered by the user is ignored.
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • the step S10 includes:
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • the present invention also provides an apparatus for identifying an unregistered word in an intelligent interactive system, wherein the device for identifying a non-registered word in the intelligent interactive system includes:
  • a first-level identification module configured to determine whether the length of the word input by the user is equal to 1 or greater than 4, and if yes, ignore the word input by the user;
  • a secondary identification module configured to determine, when the length of the word input by the user is greater than 1 and less than or equal to 4, whether the word input by the user is a preset word segment dictionary or a word existing in a user dictionary, and if so, Ignore the words entered by the user;
  • a three-level identification module configured to determine, when the word input by the user is not a word in the word segment dictionary or the user dictionary, whether the word input by the user is included in a word dictionary or a word in a user dictionary If yes, ignore the words entered by the user;
  • a user input word dictionary update module configured to add the word input by the user as a possible unregistered word to the user input when the word input by the user is not included in a word of the word segment dictionary or the user dictionary In the word dictionary;
  • a four-level identification module configured to add a word input by the user as an unregistered word into the user dictionary when the word input by the user is a word in a network entry, and input the word input by the user from The user enters a word dictionary to delete, otherwise ignores the word input by the user.
  • the obtaining module is specifically configured to:
  • the device for identifying the unregistered word in the intelligent interaction system further includes:
  • the user inputs a word dictionary word frequency statistics module for counting the word frequency of each word in the user input word dictionary;
  • a user dictionary update module configured to add the word as an unregistered word to the user dictionary if the word frequency of the word in the user input word dictionary is greater than a preset value, and input the word from the user into the word dictionary delete.
  • the device for identifying the unregistered word in the intelligent interaction system further includes:
  • a user dictionary building module is configured to establish a user dictionary in which commonly used words of the user-specific application domain and the unregistered words are stored.
  • the device for identifying the unregistered word in the intelligent interaction system further includes:
  • the user inputs a word dictionary word building module for establishing a user input word dictionary word, and storing possible unregistered words in the user input word dictionary.
  • the technical solution of the present invention adopts the above technical solution, which is to recognize whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, and whether it is included in the In a word dictionary or a word in the user dictionary, the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, The words entered by the user are added to the user dictionary while they are deleted from the user input word dictionary.
  • the embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.
  • FIG. 1 is a schematic flow chart of a first preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention
  • FIG. 2 is a schematic flow chart of a second preferred embodiment of an unregistered word recognition method in the intelligent interactive system of the present invention
  • FIG. 3 is a schematic structural diagram of a first preferred embodiment of an unregistered word recognition apparatus in the intelligent interactive system of the present invention
  • FIG. 4 is a schematic structural diagram of a second preferred embodiment of an unregistered word recognition apparatus in the intelligent interactive system of the present invention.
  • Natural language processing is an important direction in the field of computer science and artificial intelligence.
  • words are the smallest language unit. Chinese does not have a specific mark between words, so it is necessary to perform Chinese word segmentation in advance when performing automatic processing.
  • the large number of unregistered words has become a technical bottleneck affecting the effect of Chinese word segmentation.
  • Unregistered Word Recognition is a process of automatically detecting and identifying words that have not appeared in the dictionary from the corpus. It is an important basic technology in the field of natural language processing, in Chinese automatic word segmentation, dictionary compilation, information extraction, information. There are a wide range of application requirements in the fields of search and machine translation.
  • the main object of the present invention is to provide an unregistered word recognition method in an intelligent interactive system, which enriches the user dictionary, and can improve the word segmentation effect and improve the intelligence of the intelligent interaction system when it is required to segment the sentences input by the user based on the user dictionary. Level.
  • the present invention provides a method for identifying an unregistered word in an intelligent interactive system.
  • FIG. 1 is a schematic flowchart diagram of a first preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention.
  • the intelligent interaction system in the embodiment of the present invention includes a client and a server.
  • the client is used to obtain content input by the user
  • the server is used to input the user.
  • the content is processed and the results are fed back.
  • the method for identifying an unregistered word in the intelligent interactive system includes the following steps:
  • the embodiment of the present invention acquires a word input by a user through a client.
  • the user inputs content from the input terminal, because the commonly used input method mostly has a memory function, such as Sogou Pinyin input method, Baidu Pinyin input method, etc., the user is also accustomed to inputting the sentence word by word.
  • the word input by the user can be obtained by asynchronous transmission.
  • Asynchronous transmission refers to transmitting a word input by the user as a user to the server end of the intelligent interactive system when the user inputs a word.
  • the statement is transmitted to the server as a whole. That is, the words and statements entered by the user are transmitted asynchronously to the server.
  • the second-level recognition is performed on the word input by the user, and it is determined whether the input word is a preset word segment dictionary or a user dictionary.
  • the preset word segment dictionary in the embodiment of the present invention is a Chinese word segment dictionary in the prior art; the user dictionary refers to a pre-established set of words unique to the field in a certain application field, such as a health management application field. For example, watching movies, diet, physiotherapy, etc.
  • the user dictionary described in the embodiment of the present invention may also be empty, and added and enriched in the process of subsequent user input.
  • a person skilled in the art may perform a word-by-word traversal search matching on the word segmentation dictionary by using various methods, for example, a word input by the user, or pre-establish an index based on the word input by the user, and perform search matching based on the index. It is not limited here, as long as it can be determined whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, when the word input by the user already exists in the preset word segment dictionary or the user dictionary. At the time, the words entered by the user are ignored, otherwise the third level recognition is performed.
  • the second level recognition determines that the word input by the user does not exist in the preset word segment dictionary or the user dictionary, further determining whether the word input by the user is included in a word dictionary or a word in the user dictionary
  • the inclusion described herein means that the word input by the user is entirely included in a word dictionary or a word in the user dictionary.
  • the word entered by the user is “Hello”
  • a word of the word dictionary or user dictionary is “Hello”
  • “Hello” is included in “Hello”
  • the word input by the user is “you are beautiful” and the word of the word segment dictionary or the user dictionary is “hello”, the word input by the user is considered not included in the word segment dictionary or the user dictionary. In a word.
  • the word input by the user is added as a possible unregistered word to the user input word dictionary.
  • the user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words.
  • S60 determining whether the word input by the user is a word in a network entry, and if yes, adding the word input by the user to the user dictionary as an unregistered word, and inputting the word input by the user from the The user enters the word dictionary to delete, otherwise the word entered by the user is ignored.
  • the word input by the user is added as a possible unregistered word to the user input word dictionary.
  • the network term priority refers to a term currently provided by Baidu Encyclopedia. Baidu Encyclopedia adheres to the spirit of equality, collaboration, sharing and freedom. It advocates equality before the network. All people work together to write an encyclopedia, so that knowledge can be continuously combined and expanded under certain technical rules and cultural contexts.
  • the words in the Baidu Encyclopedia entry include the most popular new words at present, which can identify the unregistered words to the maximum extent. If the word is in the network entry, the word input by the user is added to the user dictionary as an unregistered word, and the word input by the user is deleted from the user input word dictionary, otherwise the user is ignored. Enter the word.
  • Steps S10 to S60 are sequentially used to identify all words input by the user.
  • the server side matches the existing database based on the existing word segmentation, calculation similarity, and matching algorithm based on the word segment dictionary and the user dictionary.
  • the embodiment of the present invention recognizes whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, whether it is included in the word segment dictionary or a word in the user dictionary.
  • the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, the words input by the user are added to the user dictionary. At the same time, it is deleted from the user input word dictionary.
  • the embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.
  • the step S10 includes:
  • the change content of the text box when the user inputs the content is obtained.
  • the habit will be one by one.
  • enter “I ask” or “Please” “Q”, “Technology Park”, “How", “Go” the client will get the change content of the user input text box, for example, get “Excuse” first, and "I would like to ask” as the word input by the user, according to the flow chart of the first preferred embodiment of the unregistered word recognition method in the intelligent interactive system of the present invention, the "excuse me” is identified as an unregistered word.
  • the server based on the preset word segmentation dictionary and the updated user dictionary according to the existing word segmentation, calculation similarity, matching algorithm from the default database need to return Content.
  • the existing word dictionary and the updated user dictionary can be used according to the existing after the user inputs the sentence. Cut the word to further improve the effect of word segmentation and improve the intelligence level of the intelligent interactive system.
  • FIG. 2 is a schematic flowchart diagram of a second preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention.
  • the method for identifying an unregistered word in the intelligent interactive system further includes the following steps. :
  • the user input word dictionary is used to temporarily store possible unregistered words that the user recognizes step by step through the above steps in the process of inputting a sentence.
  • the word frequency refers to the frequency at which the word appears in the user input dictionary. These words are words that the user often inputs but do not exist in the network entry.
  • the word frequency of each word in the user input word dictionary is counted, and the word frequency is greater than the preset value (may be common words but not included in The preset word segment dictionary and the user dictionary are added to the user dictionary to further enrich the user dictionary and delete the words from the user input word dictionary.
  • the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:
  • S90 Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
  • the user dictionary in the embodiment of the present invention is a set of words unique to the field that are pre-established in a certain application field, such as a health management application field, such as watching movies, diet therapy, physical therapy, and the like. After pre-establishment, the user dictionary can be added and enriched during subsequent user input.
  • the present invention also provides an apparatus for identifying an unregistered word in an intelligent interactive system.
  • FIG. 3 is a schematic structural diagram of a first preferred embodiment of an unregistered word recognition apparatus in an intelligent interactive system according to the present invention.
  • the device for identifying an unregistered word in the intelligent interaction system includes:
  • An obtaining module 10 configured to acquire a word input by a user
  • the obtaining module 10 acquires a word input by a user through a client.
  • the user inputs content from the input terminal, because the commonly used input method mostly has a memory function, such as Sogou Pinyin input method, Baidu Pinyin input method, etc., the user is also accustomed to inputting the sentence word by word.
  • the word input by the user can be obtained by asynchronous transmission.
  • Asynchronous transmission refers to transmitting a word input by the user as a user to the server end of the intelligent interactive system when the user inputs a word.
  • the statement is transmitted to the server as a whole. That is, the words entered by the user are asynchronously transmitted to the server.
  • the first-level identification module 20 is configured to determine whether the length of the word input by the user is equal to 1 or greater than 4, and if yes, ignore the word input by the user;
  • the first-level identification module 20 first performs first-level recognition, and determines the user input by calculating the length of the word input by the user. Whether the length of the word is equal to 1 or greater than 4, that is, whether it is a single word or a word of 4 or more words, and if so, the word input by the user is ignored, that is, the word input by the user or more than 4 words is filtered out. The word, otherwise the second level of recognition.
  • the secondary identification module 30 is configured to determine, when the length of the word input by the user is greater than 1 and less than or equal to 4, whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, and if so, Then ignore the words entered by the user;
  • the secondary identification module 30 performs second-level recognition on the word input by the user. It is judged whether the input word is a word in a preset word segment dictionary or a user dictionary.
  • the preset word segment dictionary in the embodiment of the present invention is a Chinese word segment dictionary in the prior art; the user dictionary refers to a pre-established set of words unique to the field in a certain application field, such as a health management application field. For example, watching movies, diet, physiotherapy, etc.
  • the user dictionary described in the embodiment of the present invention may also be empty, and added and enriched in the process of subsequent user input.
  • a person skilled in the art may perform a word-by-word traversal search matching on the word segmentation dictionary by using various methods, for example, a word input by the user, or pre-establish an index based on the word input by the user, and perform search matching based on the index. It is not limited here, as long as it can be determined whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, when the word input by the user already exists in the preset word segment dictionary or the user dictionary. At the time, the words entered by the user are ignored, otherwise the third level recognition is performed.
  • the third-level identification module 40 is configured to determine, when the word input by the user is not a word in the word segment dictionary or the user dictionary, whether the word input by the user is included in the word dictionary or a word in the user dictionary If yes, ignore the words entered by the user;
  • the three-level identification module 40 further determines whether the word input by the user includes In a word segmentation dictionary or a word in a user dictionary, the inclusion herein means that the word input by the user is entirely included in a word dictionary or a word in a user dictionary.
  • the word entered by the user is “Hello”
  • a word of the word dictionary or user dictionary is “Hello”
  • “Hello” is included in “Hello”
  • the word input by the user is “you are beautiful” and the word of the word segment dictionary or the user dictionary is “hello”, the word input by the user is considered not included in the word segment dictionary or the user dictionary. In a word.
  • the user input word dictionary update module 50 is configured to: when the three-level identification module 40 determines that the word input by the user is not included in a word of the word segment dictionary or the user dictionary, the word input by the user Added as a possible unregistered word to the user input word dictionary;
  • the user input word dictionary update module 50 inputs the user.
  • the words are added to the user input word dictionary as possible unregistered words.
  • the user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words.
  • a four-level identification module 60 configured to add a word input by the user as an unregistered word into the user dictionary when the word input by the user is a word in a network entry, and input the word entered by the user Deleted from the user input word dictionary, otherwise the words entered by the user are ignored.
  • the user input word dictionary update module 50 inputs the user.
  • the word is added as a possible unregistered word to the user input word dictionary
  • the four-level identification module 60 determines whether the word input by the user is a word in a network entry, and the network entry priority refers to the current Baidu Encyclopedia Can provide the terms.
  • Baidu Encyclopedia adheres to the spirit of equality, collaboration, sharing and freedom. It advocates equality before the network. All people work together to write an encyclopedia, so that knowledge can be continuously combined and expanded under certain technical rules and cultural contexts.
  • the words in the Baidu Encyclopedia entry include the most popular new words at present, which can identify the unregistered words to the maximum extent. If the words are in the network entry, the words input by the user are added to the user dictionary, and the words input by the user are deleted from the user input word dictionary, otherwise the words input by the user are ignored.
  • the server side matches the content to be returned from the preset database according to the existing word segmentation, calculation similarity and matching algorithm based on the word segment dictionary and the user dictionary. . Since the unregistered words are added to the user dictionary, when the words input by the user based on the user dictionary need to be segmented, the word segmentation effect can be improved, and the intelligent level of the intelligent interactive system can be improved.
  • the embodiment of the present invention recognizes whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, whether it is included in the word segment dictionary or a word in the user dictionary.
  • the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, the words input by the user are added to the user dictionary. At the same time, it is deleted from the user input word dictionary.
  • the embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.
  • the acquiring module is specifically configured to:
  • the change content of the text box when the user inputs the content is obtained.
  • the habit will be one by one.
  • enter “I ask” or “Please” “Q”, “Technology Park”, “How", “Go” the client will get the change content of the user input text box, for example, get “Excuse” first, and "I would like to ask” as the word input by the user, according to the flow chart of the first preferred embodiment of the unregistered word recognition method in the intelligent interactive system of the present invention, the "excuse me” is identified as an unregistered word.
  • the server based on the preset word segmentation dictionary and the updated user dictionary according to the existing word segmentation, calculation similarity, matching algorithm from the default database need to return Content.
  • the existing word dictionary and the updated user dictionary can be used according to the existing after the user inputs the sentence. Cut the word to further improve the effect of word segmentation and improve the intelligence level of the intelligent interactive system.
  • FIG. 4 is a schematic structural diagram of a second preferred embodiment of an unregistered word recognition apparatus in an intelligent interactive system according to the present invention.
  • the device for identifying the unregistered word in the intelligent interactive system further includes:
  • the user input word dictionary word frequency statistics module 70 is configured to count the word frequency of each word in the user input word dictionary
  • the user dictionary update module 80 is configured to add the word as an unregistered word into the user dictionary if the word frequency of the word in the user input word dictionary is greater than a preset value, and input the word from the user into the word dictionary Deleted.
  • the user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words. These words are words that the user often inputs but do not exist in the network entry.
  • the word frequency of each word in the user input word dictionary is counted, and the word frequency is greater than the preset value (may be common words but not included in The preset word segment dictionary and the user dictionary are added to the user dictionary to further enrich the user dictionary and delete the words from the user input word dictionary.
  • the device for identifying a non-registered word in the intelligent interactive system further includes:
  • a user dictionary building module is configured to establish a user dictionary in which commonly used words of a user-specific application domain are stored.
  • the user dictionary in the embodiment of the present invention is a set of words unique to the field that are pre-established in a certain application field, such as a health management application field, such as watching movies, diet therapy, physical therapy, and the like. After pre-establishment, the user dictionary can be added and enriched during subsequent user input.
  • the device for identifying a non-registered word in the intelligent interactive system further includes:
  • the user inputs a word dictionary word building module for establishing a user input word dictionary word and storing possible unregistered words input by the user during the input sentence.
  • the main stored content can be seen from the user input word dictionary word update module.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé permettant de reconnaître un mot non enregistré dans un système d'interaction intelligent. Le procédé consiste à : en reconnaissant progressivement si une longueur d'un mot entré par un utilisateur est égale à 1 ou supérieure à 4, si le mot entré par l'utilisateur est un mot existant dans un dictionnaire de segmentation de mots prédéfini ou un dictionnaire utilisateur et, si le mot entré par l'utilisateur est inclus dans un mot du dictionnaire de segmentation de mots ou du dictionnaire utilisateur niveau par niveau, filtrer un mot non enregistré éventuel, ajouter celui-ci à un dictionnaire de mots d'entrée d'utilisateurs et créer un enregistrement temporaire ; et lorsque le mot entré par l'utilisateur est également reconnu comme un mot dans une entrée réseau, ajouter le mot entré par l'utilisateur au dictionnaire utilisateur et supprimer simultanément celui-ci du dictionnaire de mots d'entrée d'utilisateurs. Un dictionnaire non enregistré éventuel est ajouté à un dictionnaire utilisateur en reconnaissant progressivement le mot entré par l'utilisateur niveau par niveau de façon à enrichir le dictionnaire utilisateur ; lorsque la phrase entrée par l'utilisateur est segmentée d'après le dictionnaire utilisateur, l'effet de segmentation de mots peut être amélioré, et le niveau d'intelligence du système d'interaction intelligent peut être amélioré.
PCT/CN2015/073842 2015-02-12 2015-03-07 Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent WO2016127459A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510074982.3 2015-02-12
CN201510074982.3A CN104714940A (zh) 2015-02-12 2015-02-12 智能交互系统中未登录词的识别方法和装置

Publications (1)

Publication Number Publication Date
WO2016127459A1 true WO2016127459A1 (fr) 2016-08-18

Family

ID=53414286

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/073842 WO2016127459A1 (fr) 2015-02-12 2015-03-07 Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent

Country Status (2)

Country Link
CN (1) CN104714940A (fr)
WO (1) WO2016127459A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010665A (zh) * 2019-12-20 2021-06-22 北京搜狗科技发展有限公司 一种词处理的方法及相关装置
CN113111655A (zh) * 2021-05-12 2021-07-13 数库(上海)科技有限公司 分离词典的构建方法、基于分离词典的分词方法及设备
CN115221872A (zh) * 2021-07-30 2022-10-21 苏州七星天专利运营管理有限责任公司 一种基于近义扩展的词汇扩展方法和系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108877939A (zh) * 2018-05-10 2018-11-23 重庆大学 一种具有智能特征提取功能的健康管理系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1629836A (zh) * 2003-12-17 2005-06-22 北京大学 学习中文新词的方法与装置
CN1912872A (zh) * 2006-07-25 2007-02-14 北京搜狗科技发展有限公司 一种提取新词的方法和系统
CN101079027A (zh) * 2007-06-27 2007-11-28 腾讯科技(深圳)有限公司 一种中文分词方法及系统
CN101118556A (zh) * 2007-09-17 2008-02-06 中国科学院计算技术研究所 一种短文本的新词发现方法和系统
CN101539940A (zh) * 2009-05-04 2009-09-23 清华大学 获取新词的方法和装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19830225A1 (de) * 1998-07-07 2000-01-13 Wolfgang Hilberg Elektronisches System für die flexible Aufnahme, Abgabe und Speicherung längerer Texte
US7403888B1 (en) * 1999-11-05 2008-07-22 Microsoft Corporation Language input user interface
CN100595760C (zh) * 2007-08-31 2010-03-24 北京搜狗科技发展有限公司 一种获取口语词条的方法、装置以及一种输入法系统
CN101556596B (zh) * 2007-08-31 2012-04-18 北京搜狗科技发展有限公司 一种输入法系统及智能组词的方法
CN101751386B (zh) * 2009-12-28 2012-05-23 华建机器翻译有限公司 一种未登录词的识别方法
CN103020034A (zh) * 2011-09-26 2013-04-03 北京大学 中文分词方法和装置
CN103678684B (zh) * 2013-12-25 2017-05-31 沈阳美行科技有限公司 一种基于导航信息检索的中文分词方法
CN104156349B (zh) * 2014-03-19 2017-08-15 邓柯 基于统计词典模型的未登录词发现和分词系统及方法
CN103942190B (zh) * 2014-04-16 2017-08-25 科大讯飞股份有限公司 语音合成中文本分词方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1629836A (zh) * 2003-12-17 2005-06-22 北京大学 学习中文新词的方法与装置
CN1912872A (zh) * 2006-07-25 2007-02-14 北京搜狗科技发展有限公司 一种提取新词的方法和系统
CN101079027A (zh) * 2007-06-27 2007-11-28 腾讯科技(深圳)有限公司 一种中文分词方法及系统
CN101118556A (zh) * 2007-09-17 2008-02-06 中国科学院计算技术研究所 一种短文本的新词发现方法和系统
CN101539940A (zh) * 2009-05-04 2009-09-23 清华大学 获取新词的方法和装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010665A (zh) * 2019-12-20 2021-06-22 北京搜狗科技发展有限公司 一种词处理的方法及相关装置
CN113111655A (zh) * 2021-05-12 2021-07-13 数库(上海)科技有限公司 分离词典的构建方法、基于分离词典的分词方法及设备
CN115221872A (zh) * 2021-07-30 2022-10-21 苏州七星天专利运营管理有限责任公司 一种基于近义扩展的词汇扩展方法和系统

Also Published As

Publication number Publication date
CN104714940A (zh) 2015-06-17

Similar Documents

Publication Publication Date Title
WO2020009297A1 (fr) Appareil et procédé d'amélioration des performances de compréhension d'un langage sur la base d'une extraction de domaine
WO2017143692A1 (fr) Téléviseur intelligent et son procédé de commande vocale
WO2016127459A1 (fr) Procédé et dispositif de reconnaissance d'un mot non enregistré dans un système d'interaction intelligent
WO2017156893A1 (fr) Procédé de commande vocale et téléviseur intelligent
WO2018034426A1 (fr) Procédé de correction automatique d'erreurs dans un corpus balisé à l'aide de règles pdr de noyau
WO2019177182A1 (fr) Appareil de recherche de contenu multimédia et procédé de recherche utilisant une analyse d'informations d'attributs
WO2019080406A1 (fr) Procédé d'interaction vocale de télévision, dispositif de commande d'interaction vocale et support de stockage
WO2012134180A2 (fr) Procédé de classification des émotions pour analyser des émotions inhérentes dans une phrase et procédé de classement des émotions pour des phrases multiples à l'aide des informations de contexte
WO2017028601A1 (fr) Procédé et dispositif de commande vocale pour un terminal intelligent et système de télévision
WO2015131803A1 (fr) Procédé et système de recommandation d'application
WO2013170662A1 (fr) Procédé et dispositif d'ajout d'informations d'amis, et support de stockage informatique
WO2016167424A1 (fr) Dispositif de recommandation de réponse automatique, et système et procédé de complétion automatique de phrase
WO2019242090A1 (fr) Procédé, dispositif et appareil de réponse de service client intelligent, et support d'informations
WO2013139239A1 (fr) Procédé de recommandation d'utilisateurs dans un réseau social et système associé
WO2017197802A1 (fr) Procédé et appareil de mise en correspondance floue de chaînes de caractères
WO2018023926A1 (fr) Procédé et système d'interaction pour téléviseur et terminal mobile
WO2015144089A1 (fr) Procédé et appareil de recommandation d'application
WO2020224247A1 (fr) Procédé, appareil et dispositif de provenance de données basés sur la chaine de blocs, et support d'informations lisible
WO2019169814A1 (fr) Procédé, appareil et dispositif de génération automatique d'annotation en chinois, et support d'informations
WO2019051902A1 (fr) Procédé de commande de terminal, climatiseur et support d'informations lisible par un ordinateur
WO2012130145A1 (fr) Procédé et dispositif d'acquisition et de recherche d'informations de connaissance pertinentes
WO2016032021A1 (fr) Appareil et procédé de reconnaissance de commandes vocales
WO2019218527A1 (fr) Procédé et appareil de traitement de langage naturel combiné multisystème
WO2019085543A1 (fr) Système de télévision et procédé de commande de télévision
WO2019062112A1 (fr) Procédé et dispositif de commande d'un appareil de climatisation, appareil de climatisation et support lisible par ordinateur

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15881621

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15881621

Country of ref document: EP

Kind code of ref document: A1