WO2016127459A1

WO2016127459A1 - Method and device for recognizing unlogged word in intelligent interaction system

Info

Publication number: WO2016127459A1
Application number: PCT/CN2015/073842
Authority: WO
Inventors: 张贯京; 陈兴明; 葛新科; 张少鹏; 方静芳; 高伟明; 梁艳妮; 周荣; 梁昊原; 周亮
Original assignee: 深圳市前海安测信息技术有限公司; 深圳市易特科信息技术有限公司; 深圳市贝沃德克生物技术研究院有限公司
Priority date: 2015-02-12
Filing date: 2015-03-07
Publication date: 2016-08-18
Also published as: CN104714940A

Abstract

A method for recognizing an unlogged word in an intelligent interaction system. The method comprises: by gradually recognizing whether a length of a word input by a user is equal to 1 or greater than 4, whether the word input by the user is a word existing in a pre-set word segmentation dictionary or a user dictionary and whether the word input by the user is contained in a word of the word segmentation dictionary or the user dictionary level by level, screening a possible unlogged word, adding same to a user input word dictionary, making a temporary record, and when the word input by the user is further recognized as a word in a network entry, adding the word input by the user to the user dictionary, and simultaneously deleting same from the user input word dictionary. A possible unlogged dictionary is added to a user dictionary by gradually recognizing the word input by the user level by level, so that the user dictionary is enriched; when the a sentence input by the user is segmented on the basis of the user dictionary, the word segmentation effect can be improved, and the intelligence level of the intelligent interaction system can be improved.

Description

Method and device for identifying unregistered words in intelligent interactive system

Technical field

The invention relates to the technical field of computer science, in particular to a method and a device for identifying unregistered words in an intelligent interactive system.

Background technique

In the intelligent interactive system, whether it is indexing the problem or calculating the similarity between the user problem and the question and answer library, the sentence needs to be segmented first, but the existing word segmentation effect is caused by the presence of unregistered words in some sentences. It is not ideal, so it also affects the subsequent calculation of the similarity of sentences, resulting in intelligent reduction of intelligent interactive systems.

In the prior art, the effect of word segmentation depends on the word segmentation algorithm and the word segmentation dictionary. At present, the word segmentation algorithm has achieved good results, it is difficult to have a big improvement, and whether the words in the word segmentation dictionary are complete will directly affect the effect of the word segmentation. If the word segmentation dictionary does not contain the word, then the unregistered word appears. The word is difficult to be correctly segmented.

In the intelligent interactive system, when some users use the search engine, they will consciously perform keyword query, that is, query with special characters such as spaces, |, "", and the search engine can identify new words through the user's query record, thereby expanding the user. Dictionary for faster, more accurate queries. In the question and answer system, users are accustomed to using continuous sentences for querying, so the same method cannot be used to identify unregistered words.

Based on this, it is necessary to provide an unregistered word recognition method and device in the intelligent interactive system to enrich the user dictionary. When it is necessary to segment the words input by the user based on the user dictionary, the word segmentation effect can be improved, and the intelligent interactive system can be improved. Level.

Summary of the invention

The main object of the present invention is to provide an unregistered word recognition method in an intelligent interactive system, which enriches the user dictionary, and can improve the word segmentation effect and improve the intelligence of the intelligent interaction system when it is required to segment the sentences input by the user based on the user dictionary. Level.

To achieve the above object, the present invention provides a method for identifying an unregistered word in an intelligent interactive system, and the method for identifying an unregistered word in the intelligent interactive system includes the following steps:

S10: Obtain a word input by a user;

S20: determining whether the length of the word input by the user is equal to 1 or greater than 4, and if so, ignoring the word input by the user, otherwise executing S30;

S30: determining whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, and if so, ignoring the word input by the user, otherwise executing S40;

S40: determining whether the word input by the user is included in a word dictionary or a word in the user dictionary, and if so, ignoring the word input by the user, otherwise executing S50;

S50: adding the word input by the user as a possible unregistered word to the user input word dictionary;

S60: determining whether the word input by the user is a word in a network entry, and if yes, adding the word input by the user to the user dictionary as an unregistered word, and inputting the word input by the user from the The user enters the word dictionary to delete, otherwise the word entered by the user is ignored.

Preferably, the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:

S90: Establish a user dictionary in which commonly used words of a user-specific application domain are stored.

S100: Establish a user input word dictionary, and store possible unregistered words in the user input word dictionary.

Preferably, the step S10 includes:

S11: Obtain a change content of the text box when the user inputs;

S12: The changed content of the text box is used as a word input by the user.

S70: Statistics the word frequency of each word in the user input word dictionary;

S80: If the word frequency of a word in the user input word dictionary is greater than a preset value, the word is added to the user dictionary as an unregistered word, and the word is deleted from the user input word dictionary.

In addition, the present invention also provides an apparatus for identifying an unregistered word in an intelligent interactive system, wherein the device for identifying a non-registered word in the intelligent interactive system includes:

An acquisition module for obtaining a word input by a user;

a first-level identification module, configured to determine whether the length of the word input by the user is equal to 1 or greater than 4, and if yes, ignore the word input by the user;

a secondary identification module, configured to determine, when the length of the word input by the user is greater than 1 and less than or equal to 4, whether the word input by the user is a preset word segment dictionary or a word existing in a user dictionary, and if so, Ignore the words entered by the user;

a three-level identification module, configured to determine, when the word input by the user is not a word in the word segment dictionary or the user dictionary, whether the word input by the user is included in a word dictionary or a word in a user dictionary If yes, ignore the words entered by the user;

a user input word dictionary update module, configured to add the word input by the user as a possible unregistered word to the user input when the word input by the user is not included in a word of the word segment dictionary or the user dictionary In the word dictionary;

a four-level identification module, configured to add a word input by the user as an unregistered word into the user dictionary when the word input by the user is a word in a network entry, and input the word input by the user from The user enters a word dictionary to delete, otherwise ignores the word input by the user.

Preferably, the obtaining module is specifically configured to:

Obtaining the changed content of the text box when the user inputs, and changing the content of the text box as a word input by the user.

Preferably, the device for identifying the unregistered word in the intelligent interaction system further includes:

The user inputs a word dictionary word frequency statistics module for counting the word frequency of each word in the user input word dictionary;

a user dictionary update module, configured to add the word as an unregistered word to the user dictionary if the word frequency of the word in the user input word dictionary is greater than a preset value, and input the word from the user into the word dictionary delete.

A user dictionary building module is configured to establish a user dictionary in which commonly used words of the user-specific application domain and the unregistered words are stored.

The user inputs a word dictionary word building module for establishing a user input word dictionary word, and storing possible unregistered words in the user input word dictionary.

The technical solution of the present invention adopts the above technical solution, which is to recognize whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, and whether it is included in the In a word dictionary or a word in the user dictionary, the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, The words entered by the user are added to the user dictionary while they are deleted from the user input word dictionary. The embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.

DRAWINGS

1 is a schematic flow chart of a first preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention;

2 is a schematic flow chart of a second preferred embodiment of an unregistered word recognition method in the intelligent interactive system of the present invention;

3 is a schematic structural diagram of a first preferred embodiment of an unregistered word recognition apparatus in the intelligent interactive system of the present invention;

4 is a schematic structural diagram of a second preferred embodiment of an unregistered word recognition apparatus in the intelligent interactive system of the present invention.

The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.

detailed description

It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Natural language processing is an important direction in the field of computer science and artificial intelligence. In natural language processing, words are the smallest language unit. Chinese does not have a specific mark between words, so it is necessary to perform Chinese word segmentation in advance when performing automatic processing. The large number of unregistered words has become a technical bottleneck affecting the effect of Chinese word segmentation. Unregistered Word Recognition (UWI) is a process of automatically detecting and identifying words that have not appeared in the dictionary from the corpus. It is an important basic technology in the field of natural language processing, in Chinese automatic word segmentation, dictionary compilation, information extraction, information. There are a wide range of application requirements in the fields of search and machine translation.

To achieve the above object, the present invention provides a method for identifying an unregistered word in an intelligent interactive system.

Referring to FIG. 1, FIG. 1 is a schematic flowchart diagram of a first preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention.

In order to better explain the embodiment of the present invention, the intelligent interaction system in the embodiment of the present invention includes a client and a server. The client is used to obtain content input by the user, and the server is used to input the user. The content is processed and the results are fed back.

In an embodiment, as shown in FIG. 1, the method for identifying an unregistered word in the intelligent interactive system includes the following steps:

S10: Obtain a word input by a user;

The embodiment of the present invention acquires a word input by a user through a client. When the user inputs content from the input terminal, because the commonly used input method mostly has a memory function, such as Sogou Pinyin input method, Baidu Pinyin input method, etc., the user is also accustomed to inputting the sentence word by word. While the user inputs the statement, the word input by the user can be obtained by asynchronous transmission. Asynchronous transmission as used herein refers to transmitting a word input by the user as a user to the server end of the intelligent interactive system when the user inputs a word. When the user enters a statement, the statement is transmitted to the server as a whole. That is, the words and statements entered by the user are transmitted asynchronously to the server.

After obtaining the word input by the user, first performing the first level identification, and determining whether the length of the word input by the user is equal to 1 or greater than 4 by calculating the length of the word input by the user, that is, whether the word is a single word. Words or words of more than 4 words, if yes, ignore the words entered by the user, that is, filter out single words or words of more than 4 words input by the user, otherwise perform second level recognition.

When it is determined that the length of the word input by the user is greater than 1 and less than or equal to 4, the second-level recognition is performed on the word input by the user, and it is determined whether the input word is a preset word segment dictionary or a user dictionary. The word that exists. The preset word segment dictionary in the embodiment of the present invention is a Chinese word segment dictionary in the prior art; the user dictionary refers to a pre-established set of words unique to the field in a certain application field, such as a health management application field. For example, watching movies, diet, physiotherapy, etc. The user dictionary described in the embodiment of the present invention may also be empty, and added and enriched in the process of subsequent user input. A person skilled in the art may perform a word-by-word traversal search matching on the word segmentation dictionary by using various methods, for example, a word input by the user, or pre-establish an index based on the word input by the user, and perform search matching based on the index. It is not limited here, as long as it can be determined whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, when the word input by the user already exists in the preset word segment dictionary or the user dictionary. At the time, the words entered by the user are ignored, otherwise the third level recognition is performed.

When the second level recognition determines that the word input by the user does not exist in the preset word segment dictionary or the user dictionary, further determining whether the word input by the user is included in a word dictionary or a word in the user dictionary The inclusion described herein means that the word input by the user is entirely included in a word dictionary or a word in the user dictionary. For example, the word entered by the user is “Hello”, and a word of the word dictionary or user dictionary is “Hello,” and “Hello” is included in “Hello”, and the word dictionary or A word similar to "Hello" already exists in the user dictionary, at which time the word entered by the user is ignored. If the word input by the user is “you are beautiful” and the word of the word segment dictionary or the user dictionary is “hello”, the word input by the user is considered not included in the word segment dictionary or the user dictionary. In a word.

When the third level recognizes that the word input by the user is not included in the word dictionary or a word in the user dictionary, the word input by the user is added as a possible unregistered word to the user input word dictionary. . The user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words.

When the third level recognizes that the word input by the user is not included in the word dictionary or a word in the user dictionary, the word input by the user is added as a possible unregistered word to the user input word dictionary. At the same time, it is judged whether the word input by the user is a word in a network entry, and the network term priority refers to a term currently provided by Baidu Encyclopedia. Baidu Encyclopedia adheres to the spirit of equality, collaboration, sharing and freedom. It advocates equality before the network. All people work together to write an encyclopedia, so that knowledge can be continuously combined and expanded under certain technical rules and cultural contexts. Provide users with a creative network platform, emphasizing user participation and dedication, fully mobilizing the power of all users of the Internet, bringing together the wisdom of hundreds of millions of users, actively communicating and sharing, and achieving perfect integration with search engines, from different At the level of the user to meet the needs of information. Therefore, the words in the Baidu Encyclopedia entry include the most popular new words at present, which can identify the unregistered words to the maximum extent. If the word is in the network entry, the word input by the user is added to the user dictionary as an unregistered word, and the word input by the user is deleted from the user input word dictionary, otherwise the user is ignored. Enter the word.

Steps S10 to S60 are sequentially used to identify all words input by the user. After the user inputs the sentence, the server side matches the existing database based on the existing word segmentation, calculation similarity, and matching algorithm based on the word segment dictionary and the user dictionary. The content returned. Since the unregistered words are added to the user dictionary, when the words input by the user based on the user dictionary need to be segmented, the word segmentation effect can be improved, and the intelligent level of the intelligent interactive system can be improved.

The embodiment of the present invention recognizes whether the length of the word input by the user is equal to 1 or greater than 4, whether it is a preset word segment dictionary or a word existing in the user dictionary, whether it is included in the word segment dictionary or a word in the user dictionary. In the middle, the possible unregistered words are filtered into the user input word dictionary for temporary recording, and when the words input by the user are further recognized as words in the network entry, the words input by the user are added to the user dictionary. At the same time, it is deleted from the user input word dictionary. The embodiment of the present invention enriches the user dictionary by identifying the words input by the user step by step, and enriching the user unregistered words. When the word segmentation of the sentence input by the user is needed based on the user dictionary, the word segmentation effect can be improved. Improve the intelligence level of intelligent interactive systems.

As a preferred implementation, the step S10 includes:

S11: Obtain a change content of the text box when the user inputs;

S12: The changed content of the text box is used as a word input by the user.

When the user inputs a statement in the text box of the client, according to the user's input habits, the change content of the text box when the user inputs the content is obtained. For example, when the user inputs the phrase "How do you go to the technology park?", according to the habit, the habit will be one by one. In the text box, enter "I ask" or "Please" "Q", "Technology Park", "How", "Go", the client will get the change content of the user input text box, for example, get "Excuse" first, and "I would like to ask" as the word input by the user, according to the flow chart of the first preferred embodiment of the unregistered word recognition method in the intelligent interactive system of the present invention, the "excuse me" is identified as an unregistered word. Until the user inputs the statement, and then transmits the statement to the server, the server based on the preset word segmentation dictionary and the updated user dictionary according to the existing word segmentation, calculation similarity, matching algorithm from the default database need to return Content. By using the changed content of the text box as the word input by the user, the purpose of asynchronously transmitting the words and sentences input by the user is achieved, and the existing word dictionary and the updated user dictionary can be used according to the existing after the user inputs the sentence. Cut the word to further improve the effect of word segmentation and improve the intelligence level of the intelligent interactive system.

Referring to FIG. 2, FIG. 2 is a schematic flowchart diagram of a second preferred embodiment of an unregistered word recognition method in an intelligent interactive system according to the present invention.

In an embodiment, as shown in FIG. 2, based on the first preferred embodiment of the method for identifying an unregistered word in the intelligent interactive system of the present invention shown in FIG. 1, the method for identifying an unregistered word in the intelligent interactive system further includes the following steps. :

The user input word dictionary is used to temporarily store possible unregistered words that the user recognizes step by step through the above steps in the process of inputting a sentence. The word frequency refers to the frequency at which the word appears in the user input dictionary. These words are words that the user often inputs but do not exist in the network entry. The word frequency of each word in the user input word dictionary is counted, and the word frequency is greater than the preset value (may be common words but not included in The preset word segment dictionary and the user dictionary are added to the user dictionary to further enrich the user dictionary and delete the words from the user input word dictionary.

In an embodiment, the method for identifying an unregistered word in the intelligent interaction system further includes the following steps:

The user dictionary in the embodiment of the present invention is a set of words unique to the field that are pre-established in a certain application field, such as a health management application field, such as watching movies, diet therapy, physical therapy, and the like. After pre-establishment, the user dictionary can be added and enriched during subsequent user input.

To achieve the above object, the present invention also provides an apparatus for identifying an unregistered word in an intelligent interactive system.

Referring to FIG. 3, FIG. 3 is a schematic structural diagram of a first preferred embodiment of an unregistered word recognition apparatus in an intelligent interactive system according to the present invention.

In an embodiment, as shown in FIG. 3, the device for identifying an unregistered word in the intelligent interaction system includes:

An obtaining module 10, configured to acquire a word input by a user;

Specifically, the obtaining module 10 according to the embodiment of the present invention acquires a word input by a user through a client. When the user inputs content from the input terminal, because the commonly used input method mostly has a memory function, such as Sogou Pinyin input method, Baidu Pinyin input method, etc., the user is also accustomed to inputting the sentence word by word. While the user inputs the statement, the word input by the user can be obtained by asynchronous transmission. Asynchronous transmission as used herein refers to transmitting a word input by the user as a user to the server end of the intelligent interactive system when the user inputs a word. When the user submits a question, the statement is transmitted to the server as a whole. That is, the words entered by the user are asynchronously transmitted to the server.

The first-level identification module 20 is configured to determine whether the length of the word input by the user is equal to 1 or greater than 4, and if yes, ignore the word input by the user;

Specifically, after acquiring the word input by the user by the acquiring module 10, the first-level identification module 20 first performs first-level recognition, and determines the user input by calculating the length of the word input by the user. Whether the length of the word is equal to 1 or greater than 4, that is, whether it is a single word or a word of 4 or more words, and if so, the word input by the user is ignored, that is, the word input by the user or more than 4 words is filtered out. The word, otherwise the second level of recognition.

The secondary identification module 30 is configured to determine, when the length of the word input by the user is greater than 1 and less than or equal to 4, whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, and if so, Then ignore the words entered by the user;

Specifically, when the first-level identification module 20 determines that the length of the word input by the user is greater than 1 and less than or equal to 4, the secondary identification module 30 performs second-level recognition on the word input by the user. It is judged whether the input word is a word in a preset word segment dictionary or a user dictionary. The preset word segment dictionary in the embodiment of the present invention is a Chinese word segment dictionary in the prior art; the user dictionary refers to a pre-established set of words unique to the field in a certain application field, such as a health management application field. For example, watching movies, diet, physiotherapy, etc. The user dictionary described in the embodiment of the present invention may also be empty, and added and enriched in the process of subsequent user input. A person skilled in the art may perform a word-by-word traversal search matching on the word segmentation dictionary by using various methods, for example, a word input by the user, or pre-establish an index based on the word input by the user, and perform search matching based on the index. It is not limited here, as long as it can be determined whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, when the word input by the user already exists in the preset word segment dictionary or the user dictionary. At the time, the words entered by the user are ignored, otherwise the third level recognition is performed.

The third-level identification module 40 is configured to determine, when the word input by the user is not a word in the word segment dictionary or the user dictionary, whether the word input by the user is included in the word dictionary or a word in the user dictionary If yes, ignore the words entered by the user;

Specifically, when the secondary identification module 30 determines that the word input by the user does not exist in the preset word segment dictionary or the user dictionary, the three-level identification module 40 further determines whether the word input by the user includes In a word segmentation dictionary or a word in a user dictionary, the inclusion herein means that the word input by the user is entirely included in a word dictionary or a word in a user dictionary. For example, the word entered by the user is “Hello”, and a word of the word dictionary or user dictionary is “Hello,” and “Hello” is included in “Hello”, and the word dictionary or A word similar to "Hello" already exists in the user dictionary, at which time the word entered by the user is ignored. If the word input by the user is “you are beautiful” and the word of the word segment dictionary or the user dictionary is “hello”, the word input by the user is considered not included in the word segment dictionary or the user dictionary. In a word.

The user input word dictionary update module 50 is configured to: when the three-level identification module 40 determines that the word input by the user is not included in a word of the word segment dictionary or the user dictionary, the word input by the user Added as a possible unregistered word to the user input word dictionary;

Specifically, when the three-level identification module 40 determines that the word input by the user is not included in a word dictionary or a word in the user dictionary, the user input word dictionary update module 50 inputs the user. The words are added to the user input word dictionary as possible unregistered words. The user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words.

a four-level identification module 60, configured to add a word input by the user as an unregistered word into the user dictionary when the word input by the user is a word in a network entry, and input the word entered by the user Deleted from the user input word dictionary, otherwise the words entered by the user are ignored.

Specifically, when the three-level identification module 40 determines that the word input by the user is not included in a word dictionary or a word in the user dictionary, the user input word dictionary update module 50 inputs the user. The word is added as a possible unregistered word to the user input word dictionary, and the four-level identification module 60 determines whether the word input by the user is a word in a network entry, and the network entry priority refers to the current Baidu Encyclopedia Can provide the terms. Baidu Encyclopedia adheres to the spirit of equality, collaboration, sharing and freedom. It advocates equality before the network. All people work together to write an encyclopedia, so that knowledge can be continuously combined and expanded under certain technical rules and cultural contexts. Provide users with a creative network platform, emphasizing user participation and dedication, fully mobilizing the power of all users of the Internet, bringing together the wisdom of hundreds of millions of users, actively communicating and sharing, and achieving perfect integration with search engines, from different At the level of the user to meet the needs of information. Therefore, the words in the Baidu Encyclopedia entry include the most popular new words at present, which can identify the unregistered words to the maximum extent. If the words are in the network entry, the words input by the user are added to the user dictionary, and the words input by the user are deleted from the user input word dictionary, otherwise the words input by the user are ignored.

Through the above modules, all the words input by the user are sequentially identified. After the user inputs the sentence, the server side matches the content to be returned from the preset database according to the existing word segmentation, calculation similarity and matching algorithm based on the word segment dictionary and the user dictionary. . Since the unregistered words are added to the user dictionary, when the words input by the user based on the user dictionary need to be segmented, the word segmentation effect can be improved, and the intelligent level of the intelligent interactive system can be improved.

As a preferred embodiment, the acquiring module is specifically configured to:

Referring to FIG. 4, FIG. 4 is a schematic structural diagram of a second preferred embodiment of an unregistered word recognition apparatus in an intelligent interactive system according to the present invention.

In an embodiment, as shown in FIG. 4, based on the first preferred embodiment of the unregistered word recognition device in the intelligent interactive system of the present invention shown in FIG. 3, the device for identifying the unregistered word in the intelligent interactive system further includes:

The user input word dictionary word frequency statistics module 70 is configured to count the word frequency of each word in the user input word dictionary;

The user dictionary update module 80 is configured to add the word as an unregistered word into the user dictionary if the word frequency of the word in the user input word dictionary is greater than a preset value, and input the word from the user into the word dictionary Deleted.

The user input word dictionary is used to temporarily store words that are input after the user inputs the sentence but are deleted, and are identified step by step through the above steps and will eventually be recognized as possible unregistered words. These words are words that the user often inputs but do not exist in the network entry. The word frequency of each word in the user input word dictionary is counted, and the word frequency is greater than the preset value (may be common words but not included in The preset word segment dictionary and the user dictionary are added to the user dictionary to further enrich the user dictionary and delete the words from the user input word dictionary.

As a preferred embodiment, the device for identifying a non-registered word in the intelligent interactive system further includes:

A user dictionary building module is configured to establish a user dictionary in which commonly used words of a user-specific application domain are stored.

The user inputs a word dictionary word building module for establishing a user input word dictionary word and storing possible unregistered words input by the user during the input sentence. The main stored content can be seen from the user input word dictionary word update module.

The above are only the preferred embodiments of the present invention, and are not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the present invention and the drawings are directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of the present invention.

Claims

A method for identifying an unregistered word in an intelligent interactive system, characterized in that the method for identifying an unregistered word in the intelligent interactive system comprises the following steps:

S10: Obtain a word input by a user;

S20: determining whether the length of the word input by the user is equal to 1 or greater than 4, and if so, ignoring the word input by the user, otherwise executing S30;

S30: determining whether the word input by the user is a preset word segment dictionary or a word existing in the user dictionary, and if so, ignoring the word input by the user, otherwise executing S40;

S40: determining whether the word input by the user is included in a word dictionary or a word in the user dictionary, and if so, ignoring the word input by the user, otherwise executing S50;

S50: adding the word input by the user as a possible unregistered word to the user input word dictionary;

S60: determining whether the word input by the user is a word in a network entry, and if yes, adding the word input by the user to the user dictionary as an unregistered word, and inputting the word input by the user from the The user enters the word dictionary to delete, otherwise the word entered by the user is ignored.
The method for identifying an unregistered word in the intelligent interactive system according to claim 1, wherein the method for identifying the unregistered word in the intelligent interactive system further comprises the following steps:

S90: Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
The method for identifying an unregistered word in the intelligent interactive system according to claim 1, wherein the method for identifying the unregistered word in the intelligent interactive system further comprises the following steps:

S100: Establish a user input word dictionary, and store possible unregistered words in the user input word dictionary.
The method for identifying an unregistered word in the intelligent interactive system according to claim 1, wherein the step S10 comprises:

S11: Obtain a change content of the text box when the user inputs;

S12: The changed content of the text box is used as a word input by the user.
The method for identifying an unregistered word in the intelligent interactive system according to claim 4, wherein the method for identifying the unregistered word in the intelligent interactive system further comprises the following steps:

S90: Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
The method for identifying an unregistered word in the intelligent interactive system according to claim 4, wherein the method for identifying the unregistered word in the intelligent interactive system further comprises the following steps:

S100: Establish a user input word dictionary, and store possible unregistered words in the user input word dictionary.
The method for identifying an unregistered word in the intelligent interactive system according to claim 1, wherein the method for identifying the unregistered word in the intelligent interactive system further comprises the following steps:

S70: Statistics the word frequency of each word in the user input word dictionary;

S80: If the word frequency of a word in the user input word dictionary is greater than a preset value, the word is added to the user dictionary as an unregistered word, and the word is deleted from the user input word dictionary.
The method for identifying an unregistered word in the intelligent interactive system according to claim 7, wherein the method for identifying the unregistered word in the intelligent interactive system further comprises the following steps:

S90: Establish a user dictionary in which commonly used words of a user-specific application domain are stored.
The method for identifying an unregistered word in the intelligent interactive system according to claim 7, wherein the method for identifying the unregistered word in the intelligent interactive system further comprises the following steps:

S100: Establish a user input word dictionary, and store possible unregistered words in the user input word dictionary.
An apparatus for identifying an unregistered word in an intelligent interactive system, wherein the device for identifying a non-registered word in the intelligent interactive system includes:

An acquisition module for obtaining a word input by a user;

a first-level identification module, configured to determine whether the length of the word input by the user is equal to 1 or greater than 4, and if yes, ignore the word input by the user;

a secondary identification module, configured to determine, when the length of the word input by the user is greater than 1 and less than or equal to 4, whether the word input by the user is a preset word segment dictionary or a word existing in a user dictionary, and if so, Ignore the words entered by the user;

a three-level identification module, configured to determine, when the word input by the user is not a word in the word segment dictionary or the user dictionary, whether the word input by the user is included in a word dictionary or a word in a user dictionary If yes, ignore the words entered by the user;

a user input word dictionary update module, configured to add the word input by the user as a possible unregistered word to the user input when the word input by the user is not included in a word of the word segment dictionary or the user dictionary In the word dictionary;

a four-level identification module, configured to add a word input by the user as an unregistered word into the user dictionary when the word input by the user is a word in a network entry, and input the word input by the user from The user enters a word dictionary to delete, otherwise ignores the word input by the user.
The device for identifying an unregistered word in the intelligent interactive system according to claim 10, wherein the obtaining module is specifically configured to:

Obtaining the changed content of the text box when the user inputs, and changing the content of the text box as a word input by the user.
The device for identifying an unregistered word in the intelligent interactive system according to claim 10, wherein the device for identifying the unregistered word in the intelligent interactive system further comprises:

The user inputs a word dictionary word frequency statistics module for counting the word frequency of each word in the user input word dictionary;

a user dictionary update module, configured to add the word as an unregistered word to the user dictionary if the word frequency of the word in the user input word dictionary is greater than a preset value, and input the word from the user into the word dictionary delete.
The device for identifying an unregistered word in the intelligent interactive system according to claim 10, wherein the device for identifying the unregistered word in the intelligent interactive system further comprises:

A user dictionary building module is configured to establish a user dictionary in which commonly used words of the user-specific application domain and the unregistered words are stored.
The device for identifying an unregistered word in the intelligent interactive system according to claim 10, wherein the device for identifying the unregistered word in the intelligent interactive system further comprises:

The user inputs a word dictionary word building module for establishing a user input word dictionary word, and storing possible unregistered words in the user input word dictionary.