CN110825840A - Word bank expansion method, device, equipment and storage medium - Google Patents

Word bank expansion method, device, equipment and storage medium Download PDF

Info

Publication number
CN110825840A
CN110825840A CN201911086956.7A CN201911086956A CN110825840A CN 110825840 A CN110825840 A CN 110825840A CN 201911086956 A CN201911086956 A CN 201911086956A CN 110825840 A CN110825840 A CN 110825840A
Authority
CN
China
Prior art keywords
word
target
intention
category
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911086956.7A
Other languages
Chinese (zh)
Other versions
CN110825840B (en
Inventor
高志伟
陈孝良
苏少炜
常乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sound Intelligence Technology Co Ltd
Original Assignee
Beijing Sound Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sound Intelligence Technology Co Ltd filed Critical Beijing Sound Intelligence Technology Co Ltd
Priority to CN201911086956.7A priority Critical patent/CN110825840B/en
Publication of CN110825840A publication Critical patent/CN110825840A/en
Application granted granted Critical
Publication of CN110825840B publication Critical patent/CN110825840B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The application discloses a lexicon expansion method, device, equipment and storage medium, and belongs to the technical field of intelligent voice. The method comprises the following steps: acquiring at least one word to be added and a word category of the at least one word; comparing the at least one word with an intention word bank, and determining a target word which is not contained in the intention word bank in the at least one word, wherein the intention word bank is used for storing words used for expressing intention according to word categories; and adding the target word into the intention word library, and storing the target word corresponding to the word category of the target word. According to the method and the device, after the words and the word categories to be added are obtained, the target words which are not contained in the intention word library are added into the intention word library according to the word categories. The technical scheme provides a function of adding words to the intention word bank, so that a user can automatically add words to the intention word bank according to own requirements, and the intention word bank is expanded.

Description

Word bank expansion method, device, equipment and storage medium
Technical Field
The present application relates to the field of intelligent speech technologies, and in particular, to a method, an apparatus, a device, and a storage medium for expanding a lexicon.
Background
With the development of intelligent speech technology, intelligent speech interaction gradually becomes a popular method of human-computer interaction, an intelligent speech recognition system recognizes speech input by a user through an automatic speech recognition process and analyzes the intention of the user through a natural language processing process, but due to the complexity of Chinese sentences, manufacturers of the intelligent speech recognition systems generally establish a special intention word bank for intention analysis.
At present, when a user uses an intelligent voice recognition system, after voice is input, the intelligent voice recognition system performs voice recognition on the voice of the user, then queries corresponding intention terms in an intention word bank according to the terms recognized by the voice, and determines the intention of the user according to the queried intention terms. However, chinese has a large number of words, and some new words gradually appear along with the development of the times, and manufacturers cannot cover all the words in the intention lexicon when establishing the intention lexicon, so that if a user speaks a word that is not in the intention lexicon, the intelligent speech recognition system cannot determine the intention of the user according to the speech of the user, and therefore, a lexicon expansion method is urgently needed to expand the intention lexicon.
Disclosure of Invention
The embodiment of the application provides a word bank expansion method, a device, equipment and a storage medium, which can expand an intention word bank. The technical scheme is as follows:
in a first aspect, a method for expanding a lexicon is provided, which includes:
acquiring at least one word to be added and a word category of the at least one word;
comparing the at least one word with an intention word bank, and determining a target word which is not contained in the intention word bank in the at least one word, wherein the intention word bank is used for storing words used for expressing intention according to word categories;
and adding the target word into the intention word library, and storing the target word corresponding to the word category of the target word.
In one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes:
receiving an access instruction of a first user to a first interface, and acquiring word categories in the intention word library;
displaying word categories in the intention word library;
and acquiring the at least one word and the word category of the at least one word selected from the word categories in the intention word library.
In one possible implementation, the obtaining the at least one word and the word category of the at least one word selected from the word categories in the intention word library includes:
and acquiring at least one target file through a second interface, wherein the at least one target file is generated according to the at least one word and the word category of the at least one word, and the same target file comprises words in the same word category.
In one possible implementation, the comparing the at least one word with the word intent library to determine a target word of the at least one word that is not included in the word intent library includes:
for the words of each word category in the at least one word, comparing the words of the word category with the words of the same word category in the intention word library, removing the existing words in the intention word library from the words of the word category, and determining the remaining words as the target words.
In one possible implementation, the comparing the words of the word category with the words of the same word category in the intention word library includes:
performing de-duplication processing on the words in the word category;
and comparing the words after the duplication removal processing with the words in the same word category in the intention word library.
In one possible implementation, before the obtaining of the at least one word to be added and the word category of the at least one word, the method further includes:
when first voice of a second user is acquired, performing voice recognition on the first voice to acquire first text information of the first voice;
performing word segmentation processing on the first text information to obtain a plurality of words;
when a first word in the plurality of words is not included in the word bank, outputting user prompt information for prompting that the first word is not included in the word bank;
and receiving a confirmation adding instruction of the second user, wherein the confirmation adding instruction is used for indicating confirmation to add words.
In one possible implementation, before the receiving the confirmation add instruction of the second user, the method further includes:
receiving a word adding instruction of the second user, wherein the word adding instruction is used for indicating word adding;
and outputting confirmation prompt information, wherein the confirmation prompt information is used for prompting whether to confirm the word addition.
In one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes:
acquiring the first word as the at least one word to be added, and acquiring the word category of the first word as the word category of the at least one word.
In one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes:
acquiring a second word input by the second user and a word category of the second word, wherein the second word is the same as or different from the first word;
and acquiring the second word as the at least one word to be added, and acquiring the word category of the second word as the word category of the at least one word.
In one possible implementation, after the comparing the at least one word to the library of intended words, the method further comprises:
when the target word is determined not to exist in the at least one word, outputting first prompt information, wherein the first prompt information is used for indicating that the at least one word exists in the intention word bank.
In one possible implementation, the number of target words is one or more,
the adding the target word to the intention word library and storing the target word corresponding to the word category of the target word comprises:
for each target word, comparing the target word with a sensitive word bank;
and when the target word is not contained in the sensitive word stock, adding the target word into the intention word stock and storing the target word in correspondence with the word category of the target word.
In one possible implementation, the method further includes:
and when each target word is contained in the sensitive word bank, outputting second prompt information, wherein the second prompt information is used for prompting that the sensitive word is not allowed to be added.
In one possible implementation, the method further includes:
and when the target word is not contained in the sensitive word stock and the semantic similarity between the target word and any word in the sensitive word stock is greater than a similarity threshold, adding the target word to the sensitive word stock.
In one possible implementation, the method further includes:
acquiring a network new word;
sending the network new words to a manual review system;
receiving an audit result returned by the manual audit system, wherein the audit result is used for indicating whether the network new word is a sensitive word or not;
and when the auditing result indicates that the network new word is a sensitive word, adding the network new word into the sensitive word bank.
In one possible implementation, after the adding the target word to the library of intended words, the method further comprises:
outputting third prompt information for prompting that the at least one word has been successfully added to the word bank.
In a second aspect, a lexicon expansion apparatus is provided, comprising:
the acquisition module is used for acquiring at least one word to be added and the word category of the at least one word;
the determining module is used for comparing the at least one word with an intention word bank and determining a target word which is not contained in the intention word bank in the at least one word, wherein the intention word bank is used for storing words used for expressing intention according to word categories;
and the adding module is used for adding the target words into the intention word bank and storing the target words corresponding to the word categories of the target words.
In one possible implementation, the obtaining module is configured to:
receiving an access instruction of a first user to a first interface, and acquiring word categories in the intention word library;
displaying word categories in the intention word library;
and acquiring the at least one word and the word category of the at least one word selected from the word categories in the intention word library.
In a possible implementation manner, the obtaining module is configured to obtain at least one target file through a second interface, where the at least one target file is generated according to the at least one word and a word category of the at least one word, and a same target file includes words in a same word category.
In one possible implementation manner, the determining module is configured to, for each word in the at least one word category, compare the word in the word category with the word in the same word category in the intention word library, remove existing words in the intention word library from the words in the word category, and determine the remaining words as the target words.
In one possible implementation, the determining module is configured to:
performing de-duplication processing on the words in the word category;
and comparing the words after the duplication removal processing with the words in the same word category in the intention word library.
In one possible implementation, the apparatus further includes:
the recognition module is used for carrying out voice recognition on first voice of a second user when the first voice is obtained, so as to obtain first text information of the first voice;
the word segmentation module is used for carrying out word segmentation processing on the first text information to obtain a plurality of words;
an output module, configured to output user prompt information when a first term of the plurality of terms is not included in the term bank, where the user prompt information is used to prompt that the first term is not included in the term bank;
and the receiving module is used for receiving a confirmation adding instruction of the second user, wherein the confirmation adding instruction is used for indicating confirmation to add words.
In a possible implementation manner, the receiving module is further configured to receive a word adding instruction of the second user, where the word adding instruction is used to instruct word adding;
the output module is also used for outputting confirmation prompt information, and the confirmation prompt information is used for prompting whether to confirm word addition.
In one possible implementation manner, the obtaining module is configured to obtain the first word as the at least one word to be added, and obtain a word category of the first word as a word category of the at least one word.
In one possible implementation, the obtaining module is configured to:
acquiring a second word input by the second user and a word category of the second word, wherein the second word is the same as or different from the first word;
and acquiring the second word as the at least one word to be added, and acquiring the word category of the second word as the word category of the at least one word.
In one possible implementation, the apparatus further includes:
an output module, configured to output first prompt information when it is determined that the target word does not exist in the at least one word, where the first prompt information is used to indicate that the at least one word already exists in the intention word bank.
In one possible implementation, the number of target words is one or more,
the adding module is used for:
for each target word, comparing the target word with a sensitive word bank;
and when the target word is not contained in the sensitive word stock, adding the target word into the intention word stock and storing the target word in correspondence with the word category of the target word.
In one possible implementation, the apparatus further includes:
and the output module is used for outputting second prompt information when each target word is contained in the sensitive word stock, wherein the second prompt information is used for prompting that the sensitive words are not allowed to be added.
In one possible implementation, the apparatus further includes:
the adding module is further used for adding the target word to the sensitive word stock when the target word is not contained in the sensitive word stock and the semantic similarity between the target word and any word in the sensitive word stock is greater than a similarity threshold.
In one possible implementation, the apparatus further includes:
the acquisition module is also used for acquiring network new words;
the sending module is used for sending the network new words to the artificial review system;
the receiving module is used for receiving an auditing result returned by the manual auditing system, and the auditing result is used for indicating whether the network new words are sensitive words or not;
the adding module is further used for adding the network new word into the sensitive word bank when the auditing result indicates that the network new word is a sensitive word.
In one possible implementation, the apparatus further includes:
an output module, configured to output third prompt information, where the third prompt information is used to prompt that the at least one word has been successfully added to the intended word bank.
In a third aspect, an electronic device is provided, where the electronic device includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor to implement the method for word library expansion according to the first aspect or any implementation manner of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, in which at least one program code is stored, and the at least one program code is loaded and executed by a processor to implement the method for word bank expansion according to the first aspect or any implementation manner of the first aspect.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
after the words and word categories to be added are obtained, target words which are not contained in the intention word library are added into the intention word library according to the word categories. The technical scheme provides a function of adding words to the intention word bank, so that a user can automatically add words to the intention word bank according to own requirements, and the intention word bank is expanded.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic diagram of an implementation environment of a method for expanding a lexicon according to an embodiment of the present application;
FIG. 2 is a flowchart of a lexicon expansion method according to an embodiment of the present application;
FIG. 3 is a flowchart of a lexicon expansion method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a lexicon expansion device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device 500 according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an implementation environment of a method for expanding a thesaurus provided in an embodiment of the present application, and referring to fig. 1, the implementation environment may include two types, a first implementation environment may include an electronic device 101, as shown in (a) in fig. 1, and a second implementation environment may include an electronic device 102 and a server 103, as shown in (b) in fig. 1.
In a first implementation environment, the electronic device 101 is any intelligent device capable of performing voice interaction with a user, and is capable of implementing intelligent voice recognition functions, including automatic voice recognition and natural language processing. For example, the user inputs a voice on the electronic device 101, and the electronic device 101 determines an operation instruction corresponding to the voice and then executes the operation instruction.
In a second implementation environment, the electronic device 101 may implement an intelligent voice recognition function through interaction with the server 102, for example, a user inputs a voice on the electronic device 101, the electronic device 101 sends the voice of the user to the server 102, the server 102 determines an operation instruction corresponding to the voice after receiving the voice of the user, returns the operation instruction to the electronic device 101, and the electronic device 101 executes the operation instruction after receiving the operation instruction. Of course, the electronic device 102 may process the voice of the user and then send the processed voice to the server 102.
Fig. 2 is a flowchart of a lexicon expansion method according to an embodiment of the present application. Referring to fig. 2, the method includes:
201. at least one word to be added and a word category of the at least one word are obtained.
202. And comparing the at least one word with an intention word bank, and determining a target word which is not contained in the intention word bank in the at least one word, wherein the intention word bank is used for storing words used for expressing intention according to word categories.
203. And adding the target word into the intention word library and storing the target word corresponding to the word category of the target word.
According to the method provided by the embodiment of the application, after the words and the word categories to be added are obtained, the target words which are not contained in the intention word library are added into the intention word library according to the word categories. The technical scheme provides a function of adding words to the intention word bank, so that a user can automatically add words to the intention word bank according to own requirements, and the intention word bank is expanded.
In one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes:
receiving an access instruction of a first user to a first interface, and acquiring word categories in the intention word library;
displaying word categories in the intention word library;
and acquiring the at least one word and the word category of the at least one word selected from the word categories in the intention word library.
In one possible implementation, the obtaining the at least one word and the word category of the at least one word selected from the word categories in the intention word bank includes:
and acquiring at least one target file through a second interface, wherein the at least one target file is generated according to the at least one word and the word category of the at least one word, and the same target file comprises words in the same word category.
In one possible implementation, the comparing the at least one word to the thesaurus and determining a target word of the at least one word that is not included in the thesaurus comprises:
for the words of each word category in the at least one word, comparing the words of the word category with the words of the same word category in the intention word library, removing the existing words in the intention word library from the words of the word category, and determining the remaining words as the target words.
In one possible implementation, the comparing the word in the word category with the word in the same word category in the intention word library includes:
carrying out duplication elimination processing on the words in the word category;
and comparing the words after the duplication removal processing with the words of the same word category in the intention word library.
In one possible implementation, before the obtaining of the at least one word to be added and the word category of the at least one word, the method further includes:
when first voice of a second user is acquired, performing voice recognition on the first voice to acquire first text information of the first voice;
performing word segmentation processing on the first text information to obtain a plurality of words;
when a first word in the plurality of words is not contained in the intention word bank, outputting user prompt information, wherein the user prompt information is used for prompting that the first word is not contained in the intention word bank;
and receiving a confirmation adding instruction of the second user, wherein the confirmation adding instruction is used for indicating confirmation to perform word adding.
In one possible implementation, before the receiving the confirmation add instruction of the second user, the method further includes:
receiving a word adding instruction of the second user, wherein the word adding instruction is used for indicating word addition;
and outputting confirmation prompt information which is used for prompting whether the word addition is confirmed or not.
In one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes:
and acquiring the first word as the at least one word to be added, and acquiring the word category of the first word as the word category of the at least one word.
In one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes:
acquiring a second word input by the second user and a word category of the second word, wherein the second word is the same as or different from the first word;
and acquiring the second term as the at least one term to be added, and acquiring the term category of the second term as the term category of the at least one term.
In one possible implementation, after the comparing the at least one word to the library of intended words, the method further comprises:
when it is determined that the target word does not exist in the at least one word, outputting first prompt information, wherein the first prompt information is used for indicating that the at least one word already exists in the intention word bank.
In one possible implementation, the target word is one or more in number,
the adding the target word to the intention word library and storing the target word corresponding to the word category of the target word comprise:
for each target word, comparing the target word with a sensitive word library;
and when the target word is not contained in the sensitive word stock, adding the target word into the intention word stock and storing the target word corresponding to the word category of the target word.
In one possible implementation, the method further comprises:
when each target word is contained in the sensitive word stock, outputting second prompt information, wherein the second prompt information is used for prompting that the sensitive word is not allowed to be added.
In one possible implementation, the method further comprises:
and when the target word is not contained in the sensitive word stock and the semantic similarity between the target word and any word in the sensitive word stock is greater than a similarity threshold, adding the target word to the sensitive word stock.
In one possible implementation, the method further comprises:
acquiring a network new word;
sending the network new word to a manual review system;
receiving an audit result returned by the manual audit system, wherein the audit result is used for indicating whether the network new word is a sensitive word or not;
and when the auditing result indicates that the network new word is a sensitive word, adding the network new word into the sensitive word bank.
In one possible implementation, after the target term is added to the thesaurus, the method further comprises:
outputting third prompt information for prompting that the at least one word has been successfully added to the thesaurus.
All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.
Fig. 3 is a flowchart of a lexicon expansion method according to an embodiment of the present application. Taking the method as an example, executed by an electronic device, referring to fig. 3, the method includes:
301. at least one word to be added and a word category of the at least one word are obtained.
Wherein, at least one word to be added refers to a word which the user wants to add to the intention word bank. The user may be a first user, which is a developer user of the access system, such as an enterprise user, a community user, and the like, or a second user, which is a common user, such as an individual user. The term categories may also be referred to as domain classifications.
For the first user, in one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes: receiving an access instruction of a first user to a first interface, and acquiring word categories in the intention word library; displaying word categories in the intention word library; and acquiring the at least one word and the word category of the at least one word selected from the word categories in the intention word library.
The first user may first access a first interface of the system, i.e., an API (Application programming interface), which may be an interface for querying data, and the system may return all the word categories in the word library, and the electronic device may display the word categories. The first user can input at least one word to be added on the electronic equipment, and selects a word category of the at least one word from word categories displayed by the electronic equipment, so that the electronic equipment can acquire the at least one word and the word category of the at least one word. The word categories selected by the user may be one, for example, when the at least one word is in the same word category, the word categories selected by the user may also be multiple, for example, when the at least one word is in different word categories. By displaying all the classifications of the intention word library, the user can select the word category of the word to be added from the classifications.
In one possible implementation, the obtaining the at least one word and the word category of the at least one word selected from the word categories in the intention word bank includes: and acquiring at least one target file through a second interface, wherein the at least one target file is generated according to the at least one word and the word category of the at least one word, and the same target file comprises words in the same word category.
The first user can write at least one word which the first user wants to add into an Object file according to different word categories, and then sends the Object file to a second interface of the system through a post request, wherein the second interface can be an interface for importing data, the electronic device can acquire the at least one Object file through the second interface, and the Object file can be a file in a JSON (JavaScript Object Notation) format. By writing the words to be added into the target file according to different word categories, the words can be imported in batches.
For the second user, in one possible implementation, before the obtaining of the at least one word to be added and the word category of the at least one word, the method further includes: when first voice of a second user is acquired, performing voice recognition on the first voice to acquire first text information of the first voice; performing word segmentation processing on the first text information to obtain a plurality of words; when a first word in the plurality of words is not contained in the intention word bank, outputting user prompt information, wherein the user prompt information is used for prompting that the first word is not contained in the intention word bank; and receiving a confirmation adding instruction of the second user, wherein the confirmation adding instruction is used for indicating confirmation to perform word adding.
The second user can input the first voice on the electronic equipment, the electronic equipment can firstly adopt a voice recognition technology to perform voice recognition on the first voice to obtain first text information, then adopts a word segmentation algorithm to perform word segmentation on the first text information to obtain a plurality of words, then the electronic equipment can compare the plurality of words with the intention word bank, and when the fact that the first word in the plurality of words is not contained in the intention word bank is confirmed, the second user is prompted that the first word is not contained in the intention word bank in a mode of outputting prompt information. The manner in which the electronic device inputs the prompt message includes, but is not limited to, voice output, text display, and the like. After obtaining the prompt, the second user may trigger a confirmation addition instruction, which may be a confirmation statement, by performing a confirmation on the electronic device, for example, by voice. After the electronic equipment receives the addition confirmation instruction, the electronic equipment can know that the second user confirms to add words, and at the moment, the electronic equipment can start a word addition scene to perform subsequent word addition processes. When the intention word library does not contain words spoken by the user, the user is prompted, so that the user can add words in time.
In one possible implementation, before receiving the confirmation add instruction of the second user, the method further includes: receiving a word adding instruction of the second user, wherein the word adding instruction is used for indicating word addition; and outputting confirmation prompt information which is used for prompting whether the word addition is confirmed or not.
The second user can initiate a word adding instruction when finding that the words spoken by the second user are not contained in the intention word library, namely the words cannot be matched, the electronic device can know that the user wants to add the words after receiving the word adding instruction, and at the moment, a confirmation prompt message can be output to enable the user to confirm whether the words are added or not.
Wherein the word adding instruction may be a specific command statement, and the confirmation prompting message may be a confirm statement. The accuracy can be ensured by confirming the word addition through the interaction with the user.
For the second user, in one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes: and acquiring the first word as the at least one word to be added, and acquiring the word category of the first word as the word category of the at least one word.
Because the second user is a word adding process triggered when the first word in the first voice is not contained in the intention word stock, and the first word is likely to be a word which the second user needs to add, the electronic device can directly take the first word as a word to be added, and take the word category of the first word as the word category of the word to be added, so that the electronic device can add the word to be added to the intention word stock according to the word category.
In one possible implementation, the obtaining at least one word to be added and a word category of the at least one word includes: acquiring a second word input by the second user and a word category of the second word, wherein the second word is the same as or different from the first word; and acquiring the second term as the at least one term to be added, and acquiring the term category of the second term as the term category of the at least one term.
The second user can also input a second word needing to be added on the electronic equipment, the input mode can be voice input or character input, the electronic equipment can take the second word currently input by the second user as the word to be added, and the word category of the second word is obtained. By inputting the words to be added by the second user, the words added by the system can be ensured to be actually added by the second user, so that the accuracy of word addition is improved.
302. And comparing the at least one word with an intention word bank, and determining a target word which is not contained in the intention word bank in the at least one word, wherein the intention word bank is used for storing words used for expressing intention according to word categories.
After obtaining at least one word to be added by the user, the electronic device may compare the at least one word with existing words in an intention word bank, and if one or more words in the at least one word are not included in the intention word bank, the one or more words may be used as target words, that is, the number of the target words is one or more.
In one possible implementation, the comparing the at least one word to the thesaurus and determining a target word of the at least one word that is not included in the thesaurus comprises: for the words of each word category in the at least one word, comparing the words of the word category with the words of the same word category in the intention word library, removing the existing words in the intention word library from the words of the word category, and determining the remaining words as the target words.
For at least one word to be added, the electronic device may compare the word with existing words in the intention word bank according to the word category, and remove the existing words in the intention word bank from the at least one word. For a first user, at least one word to be added by the first user exists in the form of at least one target file, for the at least one target file, the electronic device can extract the at least one word from the at least one target file, and because different target files correspond to different word categories, the electronic device can extract words of the same word category from the same target file, and then compares the words of the same category with existing words in an intention word library.
Taking at least one word to be added by the user as an example including two words (word 1, word 2) of the category a and three words (word 3, word 4, and word 5) of the category B, the electronic device may compare the 2 words of the category a with the words of the category a in the intention thesaurus, and if the words of the category a in the intention thesaurus include word 1, remove the word 1 from the two words, and take the remaining word 2 as a target word. Similarly, the electronic device may compare 3 words of the B category with words of the B category in the intention thesaurus, and if the words of the B category in the intention thesaurus include word 4, remove the word 4 from the three words, and use the remaining words 3 and 5 as target words. Thus, the target words determined by the electronic device are word 2, word 3, and word 5. By comparing the word categories with the existing words in the intention word bank, the comparison efficiency can be improved.
In one possible implementation, the comparing the word in the word category with the word in the same word category in the intention word library includes: carrying out duplication elimination processing on the words in the word category; and comparing the words after the duplication removal processing with the words of the same word category in the intention word library.
For the words of the same word category in at least one word to be added, the same words may exist, before the words of the same word category are compared with the intention word library, the electronic device may first perform deduplication in the words of the word category, and then compare the remaining words after deduplication with the intention word library, so that the comparison efficiency can be further improved. Corresponding to the above example, the word 1 and the word 2 may be the same in the category a, so before comparing the word 1 and the word 2 in the category a with the intention word library, duplication may be removed in the word 1 and the word 2 in the category a, and after duplication removal, the word 1 or the word 2 may be left, and the word 1 or the word 2 may be compared with the word in the category a in the intention word library.
In this step, a target word may not exist in at least one word, in a possible embodiment, after the electronic device compares the at least one word with the intention word library, the method further includes: when it is determined that the target word does not exist in the at least one word, outputting first prompt information, wherein the first prompt information is used for indicating that the at least one word already exists in the intention word bank.
After the electronic device compares the at least one word with the intention word library, if the at least one word is found to be included in the intention word library, it may be determined that the target word does not exist in the at least one word, and thus the user may be prompted by outputting a first prompt message, which may be a voice prompt message or a text prompt message. By outputting the first prompt information, the user can know that the words which the user wants to add exist in the intention word bank.
It should be noted that the number of target words determined in this step 302 is one or more.
303. And for each target word, comparing the target word with the sensitive word library.
Wherein the sensitive thesaurus is used for storing sensitive words, such as words related to violence, pornography, politics and the like. The sensitive word stock can be established with a dynamic updating mechanism, and the sensitive word stock is periodically updated, wherein the updating operation comprises an adding operation or a deleting operation. Considering that as the network develops, some new network words, also called "new network words" or "network terms", are gradually appearing on the network, i.e. many informal languages, such as "mars", popular on the network. These new words of the network may be sensitive words and may be used to expand the sensitive thesaurus. In one possible implementation, the method for expanding the sensitive word bank for the network new word may include: acquiring a network new word; sending the network new word to a manual review system; receiving an audit result returned by the manual audit system, wherein the audit result is used for indicating whether the network new word is a sensitive word or not; and when the auditing result indicates that the network new word is a sensitive word, adding the network new word into the sensitive word bank. The network new words are sent to the manual review system for manual review, and whether the network new words are added into the sensitive word bank or not is determined according to the manual review result, so that the accuracy can be ensured.
For each target word, the electronic device may perform sensitive word detection on the target word, and specifically, the electronic device may determine whether the target word is a sensitive word by comparing the target word with a sensitive word library in which the sensitive word is stored.
For example, for each word category, after determining a target word in one word category, the electronic device may compare the target word in the word category with the sensitive word library, determine a target word in a next word category, and compare the target word in the next word category with the sensitive word library. Of course, the electronic device may also compare all the target words with the sensitive words in a unified manner after determining the target words in each word category, which is not limited in the embodiment of the present application.
304. And when the target word is not contained in the sensitive word stock, adding the target word to the intention word stock and storing the target word corresponding to the word category of the target word.
For each target word, if the target word is not contained in the sensitive word stock, that is, if the target word is different from the sensitive word in the sensitive word stock, the target word is indicated as a non-sensitive word, in this case, the electronic device may add the target word to the intention word stock and store the target word in correspondence with the word category of the target word, so that the target word is contained in the word of the word category in the intention word stock. If the target term is contained in the sensitive thesaurus, the target term can be removed and the remaining target terms are added to the intended thesaurus.
In one possible implementation, the method further comprises: and when the target word is not contained in the sensitive word stock and the semantic similarity between the target word and any word in the sensitive word stock is greater than a similarity threshold, adding the target word to the sensitive word stock.
The electronic equipment can analyze the word senses of the target words and the words in the sensitive word bank, and can add the target words to the sensitive word bank if the target words are different from the sensitive words in the sensitive word bank and are close to the semantics of the sensitive words. By adding the words close to the sensitive words in semantics into the sensitive word stock, the effect of expanding the sensitive word stock can be achieved, and the accuracy of detecting the sensitive words by using the sensitive word stock is improved.
This step is illustrated by taking as an example the existence of target words that are not included in the sensitive thesaurus, and in a possible embodiment, the target words may also all be included in the sensitive thesaurus, and accordingly, in a possible implementation manner, after comparing each target word with the sensitive thesaurus, the method further includes: when each target word is contained in the sensitive word stock, outputting second prompt information, wherein the second prompt information is used for prompting that the sensitive word is not allowed to be added.
When each target word is contained in the sensitive word stock, it indicates that each target word is a sensitive word, and in this case, the electronic device may prompt the user by outputting second prompt information, where the second prompt information may be voice prompt information or text prompt information. By outputting the second prompt information, the user can know that the word which the user wants to add is a sensitive word and is not allowed to be added to the intention word bank.
It should be noted that, steps 303 and 304 are one possible implementation manner of adding the target word to the intention word library, and storing the target word in correspondence with the word category of the target word. In one possible embodiment, the electronic device may also directly add the target term determined in step 302 to the intent word library.
305. Outputting third prompt information for prompting that the at least one word has been successfully added to the thesaurus.
After the target word is successfully added to the intention word bank, the electronic equipment can prompt the user in a mode of outputting third prompt information, wherein the third prompt information can be voice prompt information or character prompt information. By outputting the third prompt information, the user can know that the word to be added is successfully added to the intention word bank, and the user can realize the voice intention mode matching through the newly added word.
It should be noted that step 305 is an optional step. In one possible embodiment, the electronic device may not output the addition success information after the target word is added to the intention word bank.
In the related technology, if some words do not exist in the current intention word bank, when the user speaks the words, the words cannot be matched with the words in the intention word bank, and therefore intention analysis cannot be achieved. Thus, the recognition of the word and the matching of the user intention can be realized when the user speaks the word again.
It should be noted that, in the embodiment of the present application, the above steps are performed by the electronic device as an example, and it is understood that the above steps may also be implemented by interaction between the electronic device and the server, for example, step 301 may be performed by the electronic device, steps 302 to 304 may be performed by the server (for example, after the electronic device performs step 301, at least one word and a word category to be added may be sent to the server), and step 305 may be performed by the electronic device (for example, after the server performs step 304, the server may generate the third prompt information, and send the third prompt information to the electronic device for execution), which is not limited in the embodiment of the present application.
According to the method provided by the embodiment of the application, after the words and the word categories to be added are obtained, the target words which are not contained in the intention word library are added into the intention word library according to the word categories. The technical scheme provides a function of adding words to the intention word bank, so that a user can automatically add words to the intention word bank according to own requirements, and the intention word bank is expanded.
Fig. 4 is a schematic structural diagram of a lexicon expansion device according to an embodiment of the present application. Referring to fig. 4, the apparatus includes:
an obtaining module 401, configured to obtain at least one word to be added and a word category of the at least one word;
a determining module 402, configured to compare the at least one word with a word library, and determine a target word in the at least one word that is not included in the word library, where the word library is configured to store words for expressing an intention according to a word category;
and an adding module 403, configured to add the target word to the intention word library, where the target word is stored in correspondence with a word category of the target word.
In one possible implementation, the obtaining module is configured to:
receiving an access instruction of a first user to a first interface, and acquiring word categories in the intention word library;
displaying word categories in the intention word library;
and acquiring the at least one word and the word category of the at least one word selected from the word categories in the intention word library.
In a possible implementation manner, the obtaining module is configured to obtain at least one target file through the second interface, where the at least one target file is generated according to the at least one word and the word category of the at least one word, and a same target file includes words in a same word category.
In one possible implementation manner, the determining module is configured to, for each term in the at least one term category, compare the term in the term category with terms in the same term category in the intention word library, remove existing terms in the intention word library from the terms in the term category, and determine the remaining terms as the target term.
In one possible implementation, the determining module is configured to:
carrying out duplication elimination processing on the words in the word category;
and comparing the words after the duplication removal processing with the words of the same word category in the intention word library.
In one possible implementation, the apparatus further includes:
the recognition module is used for carrying out voice recognition on the first voice to obtain first text information of the first voice when the first voice of the second user is obtained;
the word segmentation module is used for carrying out word segmentation processing on the first text information to obtain a plurality of words;
the output module is used for outputting user prompt information when a first word in the plurality of words is not contained in the intention word bank, wherein the user prompt information is used for prompting that the first word is not contained in the intention word bank;
and the receiving module is used for receiving a confirmation adding instruction of the second user, wherein the confirmation adding instruction is used for indicating confirmation to add words.
In a possible implementation manner, the receiving module is further configured to receive a word adding instruction of the second user, where the word adding instruction is used to instruct to perform word addition;
the output module is also used for outputting confirmation prompt information, and the confirmation prompt information is used for prompting whether to confirm the word addition.
In a possible implementation manner, the obtaining module is configured to obtain the first term as the at least one term to be added, and obtain a term category of the first term as a term category of the at least one term.
In one possible implementation, the obtaining module is configured to:
acquiring a second word input by the second user and a word category of the second word, wherein the second word is the same as or different from the first word;
and acquiring the second term as the at least one term to be added, and acquiring the term category of the second term as the term category of the at least one term.
In one possible implementation, the apparatus further includes:
and the output module is used for outputting first prompt information when the target word does not exist in the at least one word, wherein the first prompt information is used for indicating that the at least one word exists in the intention word bank.
In one possible implementation, the target word is one or more in number,
the adding module is used for:
for each target word, comparing the target word with a sensitive word library;
and when the target word is not contained in the sensitive word stock, adding the target word into the intention word stock and storing the target word corresponding to the word category of the target word.
In one possible implementation, the apparatus further includes:
and the output module is used for outputting second prompt information when each target word is contained in the sensitive word stock, and the second prompt information is used for prompting that the sensitive words are not allowed to be added.
In one possible implementation, the apparatus further includes:
the adding module 403 is further configured to add the target word to the sensitive word stock when the target word is not included in the sensitive word stock and the semantic similarity between the target word and any word in the sensitive word stock is greater than a similarity threshold.
In one possible implementation, the apparatus further includes:
the obtaining module 401 is further configured to obtain a network new word;
the sending module is used for sending the network new words to the artificial review system;
the receiving module is used for receiving an auditing result returned by the manual auditing system, and the auditing result is used for indicating whether the network new words are sensitive words or not;
the adding module 403 is further configured to add the new network word to the sensitive word bank when the result of the review indicates that the new network word is a sensitive word.
In one possible implementation, the apparatus further includes:
and the output module is used for outputting third prompt information, and the third prompt information is used for prompting that the at least one word is successfully added to the intention word bank.
It should be noted that: in the word stock expansion apparatus provided in the above embodiment, only the division of the above functional modules is used for illustration when the word stock is expanded, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the embodiment of the word bank expansion device and the embodiment of the word bank expansion method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
Fig. 5 is a schematic structural diagram of an electronic device 500 according to an embodiment of the present disclosure, where the electronic device 500 may generate a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 501 and one or more memories 502, where the memory 502 stores at least one instruction, and the at least one instruction is loaded and executed by the processor 501 to implement the methods provided by the method embodiments. Of course, the electronic device may further have components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the electronic device may further include other components for implementing the functions of the device, which is not described herein again.
In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory, storing a computer program which, when executed by a processor, implements the thesaurus expansion method in the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (13)

1. A method for lexicon expansion, the method comprising:
acquiring at least one word to be added and a word category of the at least one word;
comparing the at least one word with an intention word bank, and determining a target word which is not contained in the intention word bank in the at least one word, wherein the intention word bank is used for storing words used for expressing intention according to word categories;
and adding the target word into the intention word library, and storing the target word corresponding to the word category of the target word.
2. The method of claim 1, wherein the obtaining at least one term to be added and a term category of the at least one term comprises:
receiving an access instruction of a first user to a first interface, and acquiring word categories in the intention word library;
displaying word categories in the intention word library;
and acquiring the at least one word and the word category of the at least one word selected from the word categories in the intention word library.
3. The method of claim 2, wherein the obtaining the at least one term and a term category of the at least one term selected from term categories in the term library comprises:
and acquiring at least one target file through a second interface, wherein the at least one target file is generated according to the at least one word and the word category of the at least one word, and the same target file comprises words in the same word category.
4. The method of claim 1, wherein prior to obtaining the at least one term to be added and the term category of the at least one term, the method further comprises:
when first voice of a second user is acquired, performing voice recognition on the first voice to acquire first text information of the first voice;
performing word segmentation processing on the first text information to obtain a plurality of words;
when a first word in the plurality of words is not included in the word bank, outputting user prompt information for prompting that the first word is not included in the word bank;
and receiving a confirmation adding instruction of the second user, wherein the confirmation adding instruction is used for indicating confirmation to add words.
5. The method of claim 4, wherein the obtaining at least one term to be added and a term category of the at least one term comprises:
acquiring the first word as the at least one word to be added, and acquiring the word category of the first word as the word category of the at least one word.
6. The method of claim 1, wherein after comparing the at least one term to the library of terms, the method further comprises:
when the target word is determined not to exist in the at least one word, outputting first prompt information, wherein the first prompt information is used for indicating that the at least one word exists in the intention word bank.
7. The method of claim 1, wherein the number of target words is one or more,
the adding the target word to the intention word library and storing the target word corresponding to the word category of the target word comprises:
for each target word, comparing the target word with a sensitive word bank;
and when the target word is not contained in the sensitive word stock, adding the target word into the intention word stock and storing the target word in correspondence with the word category of the target word.
8. The method of claim 7, further comprising:
and when each target word is contained in the sensitive word bank, outputting second prompt information, wherein the second prompt information is used for prompting that the sensitive word is not allowed to be added.
9. The method of claim 7, further comprising:
and when the target word is not contained in the sensitive word stock and the semantic similarity between the target word and any word in the sensitive word stock is greater than a similarity threshold, adding the target word to the sensitive word stock.
10. The method of claim 7, further comprising:
acquiring a network new word;
sending the network new words to a manual review system;
receiving an audit result returned by the manual audit system, wherein the audit result is used for indicating whether the network new word is a sensitive word or not;
and when the auditing result indicates that the network new word is a sensitive word, adding the network new word into the sensitive word bank.
11. A thesaurus expansion device, characterized in that the device comprises a plurality of functional modules for executing the thesaurus expansion method of any one of claims 1 to 10.
12. An electronic device, comprising a processor and a memory, wherein at least one program code is stored in the memory, and wherein the at least one program code is loaded and executed by the processor to implement the thesaurus expansion method according to any one of claims 1 to 10.
13. A computer-readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to implement the thesaurus expansion method according to any one of claims 1 to 10.
CN201911086956.7A 2019-11-08 2019-11-08 Word bank expansion method, device, equipment and storage medium Active CN110825840B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911086956.7A CN110825840B (en) 2019-11-08 2019-11-08 Word bank expansion method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911086956.7A CN110825840B (en) 2019-11-08 2019-11-08 Word bank expansion method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110825840A true CN110825840A (en) 2020-02-21
CN110825840B CN110825840B (en) 2023-02-17

Family

ID=69553534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911086956.7A Active CN110825840B (en) 2019-11-08 2019-11-08 Word bank expansion method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110825840B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400439A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Network bad data monitoring method and device and storage medium
CN111581971A (en) * 2020-06-04 2020-08-25 腾讯科技(深圳)有限公司 Word stock updating method and device, terminal and storage medium
CN115456589A (en) * 2022-09-19 2022-12-09 国网河南省电力公司信息通信公司 Contract auditing method and device based on deep learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986310A (en) * 2010-11-16 2011-03-16 无敌科技(西安)有限公司 Method and device for updating cyberword dictionary
CN105138663A (en) * 2015-09-01 2015-12-09 百度在线网络技术(北京)有限公司 Word bank query method and device
CN105389349A (en) * 2015-10-27 2016-03-09 上海智臻智能网络科技股份有限公司 Dictionary updating method and apparatus
CN105426357A (en) * 2015-11-06 2016-03-23 武汉卡比特信息有限公司 Fast voice selection method
CN107515877A (en) * 2016-06-16 2017-12-26 百度在线网络技术(北京)有限公司 The generation method and device of sensitive theme word set
CN108536821A (en) * 2018-04-09 2018-09-14 北京信息科技大学 A kind of construction method of race News Field dictionary
CN109408818A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 New word identification method, device, computer equipment and storage medium
CN109933774A (en) * 2017-12-15 2019-06-25 腾讯科技(深圳)有限公司 Method for recognizing semantics, device storage medium and electronic device
US20190205326A1 (en) * 2018-01-04 2019-07-04 Fujitsu Limited Search result output method, search result output method, and non-transitory computer-readable storage medium for storing program

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101986310A (en) * 2010-11-16 2011-03-16 无敌科技(西安)有限公司 Method and device for updating cyberword dictionary
CN105138663A (en) * 2015-09-01 2015-12-09 百度在线网络技术(北京)有限公司 Word bank query method and device
CN105389349A (en) * 2015-10-27 2016-03-09 上海智臻智能网络科技股份有限公司 Dictionary updating method and apparatus
CN105426357A (en) * 2015-11-06 2016-03-23 武汉卡比特信息有限公司 Fast voice selection method
CN107515877A (en) * 2016-06-16 2017-12-26 百度在线网络技术(北京)有限公司 The generation method and device of sensitive theme word set
CN109933774A (en) * 2017-12-15 2019-06-25 腾讯科技(深圳)有限公司 Method for recognizing semantics, device storage medium and electronic device
US20190205326A1 (en) * 2018-01-04 2019-07-04 Fujitsu Limited Search result output method, search result output method, and non-transitory computer-readable storage medium for storing program
CN108536821A (en) * 2018-04-09 2018-09-14 北京信息科技大学 A kind of construction method of race News Field dictionary
CN109408818A (en) * 2018-10-12 2019-03-01 平安科技(深圳)有限公司 New word identification method, device, computer equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENXI CUI: "Recognize user intents in online interactions from massive social media data", 《2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS》 *
刘哲: "情感词库构建与网络新词发现算法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
齐富民 等: "SVM词库智能更新技术在搜索分类中的应用", 《计算机工程与设计》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111400439A (en) * 2020-02-26 2020-07-10 平安科技(深圳)有限公司 Network bad data monitoring method and device and storage medium
CN111581971A (en) * 2020-06-04 2020-08-25 腾讯科技(深圳)有限公司 Word stock updating method and device, terminal and storage medium
CN111581971B (en) * 2020-06-04 2024-01-23 腾讯科技(深圳)有限公司 Word stock updating method, device, terminal and storage medium
CN115456589A (en) * 2022-09-19 2022-12-09 国网河南省电力公司信息通信公司 Contract auditing method and device based on deep learning

Also Published As

Publication number Publication date
CN110825840B (en) 2023-02-17

Similar Documents

Publication Publication Date Title
CN106570180B (en) Voice search method and device based on artificial intelligence
CN110825840B (en) Word bank expansion method, device, equipment and storage medium
TW202020691A (en) Feature word determination method and device and server
JP7289330B2 (en) Novel category tag mining method and apparatus, electronic device, computer readable medium, and computer program product
US20160188569A1 (en) Generating a Table of Contents for Unformatted Text
CN111339751A (en) Text keyword processing method, device and equipment
CN104573099A (en) Topic searching method and device
TWI536183B (en) System and method for eliminating language ambiguity
WO2023024975A1 (en) Text processing method and apparatus, and electronic device
CN111488468A (en) Geographic information knowledge point extraction method and device, storage medium and computer equipment
CN111859013A (en) Data processing method, device, terminal and storage medium
CN113961768B (en) Sensitive word detection method and device, computer equipment and storage medium
US11822589B2 (en) Method and system for performing summarization of text
CN113128205B (en) Scenario information processing method and device, electronic equipment and storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116226681B (en) Text similarity judging method and device, computer equipment and storage medium
CN112287077A (en) Statement extraction method and device for combining RPA and AI for document, storage medium and electronic equipment
CN112183114A (en) Model training and semantic integrity recognition method and device
WO2019148797A1 (en) Natural language processing method, device, computer apparatus, and storage medium
CN112183074A (en) Data enhancement method, device, equipment and medium
CN117591624B (en) Test case recommendation method based on semantic index relation
CN116992111B (en) Data processing method, device, electronic equipment and computer storage medium
CN115905456B (en) Data identification method, system, equipment and computer readable storage medium
CN114003685B (en) Word segmentation position index construction method and device, and document retrieval method and device
CN117874170A (en) Domain model retrieval method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant