WO2017107518A1 - Procédé et appareil d'analyse d'un contenu vocal - Google Patents

Procédé et appareil d'analyse d'un contenu vocal Download PDF

Info

Publication number
WO2017107518A1
WO2017107518A1 PCT/CN2016/096186 CN2016096186W WO2017107518A1 WO 2017107518 A1 WO2017107518 A1 WO 2017107518A1 CN 2016096186 W CN2016096186 W CN 2016096186W WO 2017107518 A1 WO2017107518 A1 WO 2017107518A1
Authority
WO
WIPO (PCT)
Prior art keywords
phrase
word
corpus
probability
voice content
Prior art date
Application number
PCT/CN2016/096186
Other languages
English (en)
Chinese (zh)
Inventor
周蕾蕾
Original Assignee
乐视控股(北京)有限公司
乐视致新电子科技(天津)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐视控股(北京)有限公司, 乐视致新电子科技(天津)有限公司 filed Critical 乐视控股(北京)有限公司
Publication of WO2017107518A1 publication Critical patent/WO2017107518A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique

Definitions

  • the present application relates to the field of information processing, and in particular, to a method and apparatus for analyzing voice content.
  • Natural language processing technology can help people to communicate better with the machine.
  • the voice recognition module in the computer recognizes the voice content sent by the user, and parses the voice content to obtain the semantics corresponding to the voice content.
  • the computer performs related operations based on the parsed semantics.
  • the general method for the machine to parse the voice content sent by the user is: the first step: establish a language model, usually before the language model is established, it is necessary to artificially mark some commonly used corpora, for example, the user is directed to "I want to see Andy Lau's "The concert” is marked with the corpus, in which "I” can mark adult pronouns, "Andy Lau” is marked as a star name, etc., and then the words in the corpus are classified according to the content of the mark, for example, the personal pronoun is a class, star The name is a class, etc., complete the classification of the phrase, that is, complete the establishment of the language model; the second step: according to the phrase in the established language model, the word content of the user input is cut, usually using CRF (Conditional Random Field)
  • CRF Consumer Random Field
  • the corpus can be cut into "what / time / have / Andy Lau / concert / concert", or cut into "what / time / have /Andy Lau / / singing / meeting", because the language model has two phrases “singing" and "concert", in this case, we must compare the probability of the two words appearing in the corpus, for example, " The singer has a higher probability of appearing in the corpus than the "concert", so the above corpus is preferentially cut into "what / time / have / Andy Lau / / singing / meeting”; the third step: the cut phrase and The grammar files in the machine are matched to resolve the semantics of the user's voice content, and BNF (Backus-Naur Form) is a grammar frequently used by users.
  • BNF Backus-Naur Form
  • the voice content input by the user for example, in the above example, if "singing" has a greater probability of appearing in the corpus than "song", then the above corpus is preferentially cut into "what/time/have/Andy Lau// Singing/meeting, obviously, this does not match the semantics of the voice content sent by the user.
  • the embodiment of the present application provides a method and apparatus for parsing voice content, which is used to solve the problem that a machine error parses a user-entered voice content due to a small number of corpora in a specific domain when a language model is established.
  • An embodiment of the present application provides a method for parsing voice content, the method comprising: combining a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, according to the first word dictionary in the machine
  • the stored corpus performs a word segmentation to obtain a phrase in the corpus; statistics a probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjusts the probability or frequency according to a predetermined rule to make a specific The probability or frequency of occurrence of a phrase in the field in a phrase in the corpus; combining a phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and according to the second slice
  • the word dictionary performs word-cutting on the voice content sent by the user, obtains a phrase in the voice content, and parses the phrase in the voice content according to the grammar file to obtain corresponding semantics.
  • the combining the phrase in the specific domain and the phrase in the non-specific domain to generate the first word dictionary comprises:
  • a preset number of phrases are selected from the phrases of the specific domain in the corpus, and the selected phrases are combined with the phrases in the non-specific domain to generate a first word dictionary.
  • the translating the specific content of the voice content sent by the user according to the second word-cut dictionary include:
  • the speech content sent by the user is respectively cut using the method of the backward maximum cut word and the forward minimum cut word, and if the two types of word-cutting methods are different, the The second word dictionary searches for the probability or frequency corresponding to the different phrases, and selects a phrase with a large probability or frequency as the final word segment.
  • the second word dictionary includes:
  • the address area the guiding machine searches for a position of a phrase in the voice content after the word-cut sent by the user in the second word-cut dictionary;
  • the phrase area stores a corresponding phrase in the address area.
  • the parsing the phrases in the voice content according to the grammar file specifically includes:
  • the keyword matching specifically includes:
  • phrase of the specific domain comprises at least one of the following:
  • An apparatus for parsing voice content comprising: a combining unit, a statistic unit, a word cutting unit, and a parsing unit;
  • the combining unit is configured to combine a phrase in a specific domain and a phrase in a non-specific domain to generate a first word dictionary, and perform a word cutting on the corpus stored in the machine according to the first word dictionary to obtain the corpus Phrase in
  • the statistical unit is configured to count a probability or frequency of occurrence of each phrase in the corpus in a phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that a phrase in a specific domain is in the corpus The probability or frequency of occurrence in the phrase in the phrase increases;
  • the word-cutting unit is configured to combine the phrase in the corpus with the adjusted probability or frequency to generate a second word-cut dictionary, and perform word-cutting on the voice content sent by the user according to the second word-cut dictionary. Obtaining a phrase in the voice content;
  • the parsing unit is configured to parse a phrase in the voice content according to a grammar file to obtain a corresponding semantic.
  • the combining unit includes: a word subunit, a statistical subunit, and a combined subunit; wherein
  • the word-cutting unit configured to perform a word-cutting on a corpus stored by the machine according to a phrase of a specific domain, to obtain a phrase of a specific domain in the corpus;
  • the statistical subunit is configured to count a probability or a frequency of occurrence of a phrase of each specific domain in the corpus in a phrase in a specific domain in the corpus;
  • the combining subunit configured to select a preset number of phrases from a phrase of a specific domain in the corpus according to the ranking of the probability or frequency, and combine the selected phrase with a phrase in a non-specific domain to generate The first word dictionary.
  • the word-cutting unit comprises:
  • a combination subunit configured to combine the phrase in the corpus and the adjusted probability or frequency to generate a second word dictionary
  • the word-cutting unit is configured to perform a word-cutting on the voice content sent by the user according to the second word-cutting dictionary, using a backward maximum cut word and a forward minimum cut word;
  • the finding subunit is configured to search for a probability or a frequency corresponding to the different phrase in the second word dictionary, when the phrases obtained by the two word-cutting methods are different, and select a probability or a frequency with a large frequency
  • the phrase is the final word of the word.
  • the embodiment of the present application provides an electronic device, including the device for parsing voice content according to any of the foregoing embodiments.
  • the embodiment of the present application provides a non-transitory computer readable storage medium, wherein the non-transitory computer readable storage medium can store computer instructions, which can implement the parsing of voice content provided by the embodiments of the present application. Some or all of the steps in the various implementations of the method.
  • An embodiment of the present application provides an electronic device, including: one or more processors; and a memory; Wherein the memory stores instructions executable by the one or more processors, the instructions being configured to perform the method of parsing the voice content of any of the above-described embodiments of the present application.
  • An embodiment of the present application provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, The computer is caused to perform the method for analyzing voice content according to any of the above embodiments of the present application.
  • the probability or frequency of occurrence of each phrase in all phrases in the stored corpus in the machine is increased, thereby Improve the accuracy of the machine's semantics of parsing user speech content.
  • FIG. 1 is a schematic flowchart of a method for parsing voice content according to Embodiment 1 of the present application;
  • FIG. 2 is a schematic flowchart of a language model adaptation according to Embodiment 1 of the present application;
  • FIG. 3 is a schematic diagram of an address area portion in a second word dictionary provided by Embodiment 1 of the present application;
  • FIG. 4 is a schematic diagram of a portion of a phrase region in a second word dictionary provided by Embodiment 1 of the present application;
  • FIG. 5 is a schematic flowchart of a method for cutting a user voice content by using a joint manner of a backward maximum cut word and a forward minimum cut word according to Embodiment 1 of the present application;
  • FIG. 6 is a schematic diagram of a syntax prepared by using a syntax tree according to Embodiment 1 of the present application.
  • FIG. 7 is a schematic flowchart of a method for matching voice content sent by a user according to a grammar file according to Embodiment 1 of the present application;
  • FIG. 8 is a schematic flowchart of a complete method for parsing voice content according to Embodiment 1 of the present application.
  • FIG. 9 is a schematic structural diagram of an apparatus for analyzing voice content according to Embodiment 2 of the present application.
  • FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the embodiment of the present application provides a method and apparatus for analyzing voice content, which are used to solve a domain-specific corpus when building a language model. There are few problems that cause machine errors to resolve the voice content entered by the user.
  • FIG. 1 is a schematic flowchart of a method for parsing voice content according to an embodiment of the present application. The method is as follows:
  • Step 11 Combine the phrase in the specific domain with the phrase in the non-specific domain to generate a first word dictionary, and perform a word segmentation on the corpus stored in the machine according to the first word dictionary to obtain a phrase in the corpus.
  • the dictionary of a specific domain is screened, and the dictionary of these specific fields is combined to generate a full dictionary, for example, a phrase in a field such as a computer, a machine, or an entertainment is combined into a specific domain dictionary, and a specific domain dictionary is selected.
  • a full dictionary As a full dictionary; then CRF word-cutting according to the phrase in the full dictionary to the stored corpus in the machine (as in step 21 of FIG. 2), obtaining a phrase in a specific field in the corpus; and then counting each of the specific fields
  • the probability or frequency of occurrence of the phrase in all the specific domain phrases in the obtained corpus, and according to the probability or frequency ranking, the phrase is selected as a dynamic dictionary according to the preset number (step 22 of FIG.
  • phrases in a non-specific field can include personal pronouns, such as you, me, him, etc.; phrases in non-specific fields are also Can include common verbs, such as playing, thinking, wanting, taking, etc.
  • the phrase here contains both phrases in a specific field and phrases in a non-specific field.
  • the corpus stored in the machine is: I want to see Andy Lau's concert, and then CRF cut the corpus according to the first word dictionary.
  • the phrase in the first word dictionary is: I, think, see , want to see, Andy Lau, the concert, then according to the phrase in the first word dictionary can cut the corpus into: I / think / see / Andy Lau / / concert, or cut into: I / want to see / Andy Lau / / Concert, then you need to compare the probability or frequency of the words "think” and "want to see” in the corpus. If the probability or frequency of the latter is greater, then "I want to see , Andy Lau, the singer "These phrases as training corpus language model.
  • Step 12 Statistics the probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that the phrase in the specific domain is in the phrase in the corpus The probability or frequency of occurrence increases.
  • step 11 the training corpus in the language model is obtained, that is, the phrases in all the corpora in the machine are obtained.
  • the language model needs to be trained (step 25 in Fig. 2), which can be performed by using the SRILM tool.
  • the training of the language model may include, but is not limited to, the probability or frequency of occurrence of each phrase in all corpora of the statistical machine in all phrases.
  • the SRILM language model training tool is only an exemplary description, and may also be other training methods. Specifically limited.
  • the user After training the language model, the user needs to test the training results. For example, to check the probability of occurrence of each phrase.
  • the probability of occurrence of each phrase it may be found that the phrases in some specific corpora are often appearing in the corpus, but relatively non-specific. Some of the similar phrases in the field have a low probability of occurrence, so that when the relevant corpus is cut, the phrases in these specific corpora may be annihilated by some other similar phrases, causing the word-cutting error, so that the machine cannot correctly resolve the user's Voice content.
  • the probability of occurrence of each particular domain phrase is divided by Psum2 to obtain P2; finally P1 is multiplied by the weight coefficient k1, P2 is multiplied by the weight coefficient k2, respectively, to obtain the non-specific domain phrase respectively.
  • the final probability that each phrase in a particular domain phrase appears in all corpora where the user can set the values of k1 and k2 according to individual needs, but the sum of k1 and k2 is 1, and to be in the dynamic dictionary
  • the final probability of occurrence of each phrase is greater than the final probability of occurrence of each phrase in a non-specific field, and the weighting factor k1 is smaller than Psum1.
  • Step 13 Combine the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and perform a word cut on the voice content sent by the user according to the second word dictionary to obtain the voice.
  • the phrase in the content is
  • the adaptive process of the language model is completed (step 26 of Figure 2), at which time the machine can output the adaptive language model (step 27 of Figure 2). ).
  • the adaptive language model has both the phrases obtained after training and the probability and frequency of redistribution corresponding to each phrase. Then, the adaptive language model needs to be converted into a second word dictionary.
  • the structure of the second word dictionary has many kinds. The main purpose of the second word dictionary is to help the machine send the voice to the user faster and more accurately. The content is cut.
  • the structure of one of the second word-cutting dictionaries is exemplarily described, and the structure of the second word-cutting dictionary includes two parts, an address area and a phrase area.
  • the address information in the address area helps the machine find the corresponding position of the phrase in the second word dictionary according to the phrase after the user cuts the word; the phrase stored in the phrase area is the phrase corresponding to the address area.
  • the address area may include 10 Arabic numerals (ie, 0 to 9), 26 uppercase letters or lowercase letters (ie, A to Z or a to z), and address information corresponding to the commonly used Chinese characters.
  • the numbers and letters are in full-width format, and each number or letter itself occupies two bytes.
  • the address information corresponding to each number, letter or Chinese character occupies four bytes, which is assumed to be commonly used in the second word dictionary.
  • There are 6768 Chinese characters, and the address information corresponding to numbers, letters and Chinese characters is shared (10+26+6768)*4 27216. If the first address is uniDict, the first address of the phrase area is uniDict+27216, as shown in Figure 3.
  • the first address of the phrase area is uniDict+27216, the address holds the phrase with the number “0”; the address corresponding to the letter area “A” is uniDict+40, and the address is saved with the letter “A”.
  • FIG. 4 is a schematic diagram of a phrase in a phrase area: the first address corresponding to “0” is uniDict+27216, and can be seen as “0”.
  • the phrase that is the first word can be "05 mm”. If the user wants to find a phrase with the first word of "0”, look down from the first address uniDict+27216 until the guard mark is encountered.
  • the guard mark here refers to The phrase with the "0" as the first word in the second word dictionary has reached the last one.
  • the first word may not be stored in the phrase portion. For example, "05 mm" shown in Fig. 4 is stored in the dictionary as "5 mm".
  • Wordlen indicates the length of the phrase
  • the second word dictionary can include numbers, letters and Chinese characters, which can improve the accuracy of the machine to parse the semantics of the user's voice content.
  • the voice content input by the user is "when to play Journey 2”
  • the word is cut In the dictionary, only "Journey to the West", without the number "2”, may cut the above voice content into "what / time / play / Journey to the West / ah", which may lead to machine parsing errors.
  • the manner in which the voice content sent by the user is cut according to the second word-cutting dictionary is very different.
  • the word can be cut in the manner of the backward maximum cut word, or the forward minimum cut word can be used.
  • first search for the phrase in front of the voice content for example, first search for the word "juvenile” in the word dictionary, and find the corresponding phrase, then the phrase after the "boy” Searching, that is, searching for "Baoqing", and finding that there is no corresponding phrase in the word dictionary, then re-searching for one more word, that is, searching for "Baoqingtian", using the same method to finally complete the cutting of the voice content. word.
  • the words of the user's voice content After cutting the words of the user's voice content by using the above-mentioned backward maximum cut word and forward minimum cut word combination, if the obtained word results are different, that is, the obtained phrases are different, the words are compared by comparing different phrases.
  • the probability or frequency in the dictionary determines the final result of the word. As shown in Figure 5, if the speech content of the "Juvenile Bao Qingtian broadcasts on the TV" is the maximum backward word and the forward minimum word, the result of the backward maximum word is "Junior/Bag".
  • Step 14 Parse the phrases in the voice content according to the grammar file to obtain corresponding semantics.
  • BNF grammar As an example.
  • the basic rules of BNF grammar include but are not limited to the following aspects:
  • the content contained in the content is optional, indicating that its content can be skipped;
  • &keyword(textFrag,key,defaultValue,showValue) This function is used to extract the keywords of the input text.
  • the function of the above function is illustrated.
  • the function defined in the machine is: &keyword(Beijing
  • the specific value is the defaultValue of the function.
  • the defaultValue here is "local”; then the "tomorrow” entered by the user is The keyword in the function "&keyword” is matched with “tomorrow” in the function. Because showValue is not defined in the function, the time entered by the user is directly assigned. "Tomorrow”; finally, the "rain” input by the user is matched with the keyword in "&keyword” (the rain, snow, weather, undefined, weather), and the “rain” in the function Successful match, because the function defines showValue, and the showValue of the function is "weather", so the "rain” input by the user is replaced with "weather”.
  • the machine matches the "when it rains tomorrow" input by the user into "local tomorrow weather” and performs related operations.
  • the order in which the content input by the user is matched in the above example is only an exemplary description.
  • the matching order is not specifically limited herein.
  • the words “tomorrow” and “rain” are input by the user. In order, you can match “Tomorrow” first, or match “rain” first, or you can match both words at the same time.
  • &duplicate(TextFrag,least,most) This function indicates that the TextFrag is repeated m times.
  • the value range of m is: least ⁇ n ⁇ most, for example, the definition function: &duplicate(TextFrag,1,3), the output content at this time Is: TextFrag[TextFrag][TextFrag];
  • &comb(textFrag1, textFrag2,...,textFragN) This function indicates that the syntax fragments TextFrag1, TextFrag2, ..., textFragN are arranged and combined, for example, the definition function: &comb(Text Frag1, TextFrag2); the output content is: (TextFrag1TextFrag2 )
  • the grammar file is parsed: the name of the grammar file is "video on demand", and the grammar file has three keywords: type, movie, and year. Specifically, if the text content "plays the 2002 film infernal" for the text content, the defined grammar file can be:
  • ⁇ category list> &keyword (movie
  • each grammar is written in the form of a grammar tree, and finally a grammar file is written in the form of a "grammar forest.”
  • the syntax tree written is as shown in Fig. 6: in the first level of the syntax tree, the file name is displayed: "video on demand”; second In the level, there are four parts: the first part is “play”, the second part is “year”, the third part is “of”, the fourth part is “film list” and “category list”, among them, “video list” It can be a movie or a TV show.
  • the machine can match the voice content sent by the user according to the grammar file, and the matching manner includes two types: full matching and keyword matching.
  • the schematic diagram of the specific matching process is as shown in FIG. 7: firstly, the voice content input by the user is fully matched according to the grammar file (step 71 in FIG. 7), where the voice content is the voice content after the word is cut; the matching result is judged. (Step 72 in Figure 7), if the full match is successful, the matching result is printed (as in step 73 in Figure 7); if the full match fails, the keyword matching is performed (step 74 in Figure 7), specifically It means: searching for the corresponding keyword from the keyword list in the grammar file, and if the matching is successful, printing the matching result.
  • the voice content input by the user is “I want to play the 2002 infernal movie”
  • the machine converts the voice content into corresponding text content, and cuts the text content.
  • the word, the result of the word cut is "I want / play / 2002 / / movie / infernal”.
  • "I want" in the grammar file does not have the corresponding word to cover, that is, the full match fails; then the keyword matching, as follows:
  • Keyword matching process as long as the keywords in the input text can match the keywords in the keyword list in the grammar file, so as to match the full match
  • keyword matching is more flexible, and the constraints on the input text content are smaller, which improves the probability of successful matching.
  • the grammar is as comprehensive as possible. Here you can write examples in the written grammar rules. The specific process is: first design the user scene; then write the example sentence; finally, cover the example sentence according to the written grammar.
  • the keywords should be clear, which is convenient for the machine to perform keyword matching.
  • the grammar fragment in the grammar file is "[today][[][[]][weather]"
  • the grammar fragment can cover "/ /Weather" text content, obviously, the text content does not conform to human language habits, and this serious overproduction will reduce the advantages of the grammar file structure.
  • the grammar file can be split into several sub-entries.
  • the grammar grammar fragment described above it can be written as: the first-level sub-entries are: “[today][of] ⁇ >[] [Weather]”; the second sub-entries are: “[Today][Guangzhou][[][Weather]”, the third-level subentry is: “[Today][[][Guangzhou][Weather]", so It is possible to reduce the overproduction in the grammar file.
  • the phrase in the grammar file is as close as possible to the phrase in the word dictionary, which makes the machine more accurately parse the user's voice content. For example, "I want to know” can be cut into “I want / know” according to the word dictionary, and the phrase in the grammar file should be consistent, which can be "[I want] ⁇ know>” instead of "[I ][Think] ⁇ Know> and so on.
  • the voice content sent by the user is "I want to make a call”
  • the machine may cut the voice content into "I want/play/telephone”, at this time although the machine There is an error in the word cut, but the grammar file should be parsed according to the "call” method, which can reduce the machine's parsing error due to the wording error.
  • the grammar file When the grammar file is written in the syntax tree, at least one mandatory option is included in the root node, otherwise the input text is overwritten by the syntax, causing the machine to parse the error.
  • the grammar fragment in the grammar file is "[today][[][[]][weather]", because the phrases in the grammar fragment are all optional, if the user input the voice content is "Today's Shanghai The weather, this time can also match the phrases in the grammar file, obviously, this will lead to machine parsing errors.
  • the method for analyzing the voice content is performed.
  • the description of the system is as shown in FIG. 8: the first step: the adaptive process of the language model (step 81 in FIG. 8), specifically: adjusting each phrase in the corpus in the phrase in the corpus The probability or frequency of occurrence increases the probability and frequency of occurrence of phrases in a particular domain in a phrase in the corpus of the machine; the second step: cutting the speech content sent by the user according to the word-cut dictionary (as in Figure 8) Step 82); Step 3: Perform a full match on the voice content after the word cut according to the grammar file (step 83 in FIG. 8), at which time the machine determines whether the full match is successful (step 84 in FIG.
  • step 8 if the matching is successful, the matching result is printed (such as step 85 in FIG. 8), where the grammar file can be in the form of a syntax tree; and the fourth step: if the full matching fails, the keyword matching is performed (step 86 in FIG. 8). ), the matching result is printed after the keyword matching is successful.
  • the process of completing the matching is the process of the machine parsing the user's voice content.
  • the word dictionary in the embodiment of the present application includes an address area and a phrase area, and the first word partition is used in the phrase area, so that the machine can quickly find the position of the corresponding phrase in the word dictionary.
  • the phrases in the phrase area contain numbers, letters, and Chinese characters, increasing the accuracy of the machine's ability to resolve the semantics of the user's voice content.
  • the embodiment of the present application expands on the basis of the existing BNF grammar rules, and provides the writing skills of the grammar rules, improves the readability of the grammar file, and improves the semantic accuracy of the machine parsing the user's voice content. .
  • the non-transitory computer readable storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).
  • Embodiment 1 A method for parsing voice content is provided in Embodiment 1.
  • the embodiment of the present application provides an apparatus for parsing voice content, which is used to improve the accuracy of semantic analysis of semantics in a user's voice content.
  • An apparatus for parsing voice content comprising: a combining unit 91, a statistic unit 92, a word cutting unit 93, and a parsing unit 94;
  • the combining unit 91 may be configured to combine a phrase in a specific domain with a phrase in a non-specific domain to generate a first word dictionary, and perform a word cutting on the corpus stored in the machine according to the first word dictionary to obtain the a phrase in a corpus;
  • the statistic unit 92 may be configured to count the probability or frequency of occurrence of each phrase in the corpus in the phrase in the corpus, and adjust the probability or frequency according to a predetermined rule, so that the phrase in the specific domain is in the corpus The probability or frequency of occurrence in the phrase in the phrase increases;
  • the word unit 93 may be configured to combine the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and perform word cutting on the voice content sent by the user according to the second word dictionary. Obtaining a phrase in the voice content;
  • the parsing unit 94 is configured to parse the phrases in the voice content according to the grammar file to obtain corresponding semantics.
  • the working process of the above device embodiment is: first step: the combining unit 91 combines the phrase in the specific domain with the phrase in the non-specific domain to generate a first word dictionary, and stores the same in the machine according to the first word dictionary.
  • the corpus performs a word cut to obtain a phrase in the corpus;
  • the second step the statistic unit 92 counts the probability or frequency of occurrence of each phrase in the corpus in the corpus, and adjusts the probability or frequency according to a predetermined rule, so that the specific The probability or frequency of occurrence of a phrase in the field in a phrase in the corpus increases;
  • a third step: the word segmentation unit 93 combines the phrase in the corpus with the adjusted probability or frequency to generate a second word dictionary, and according to the phrase
  • the second word-cut dictionary performs word-cutting on the voice content sent by the user to obtain a phrase in the voice content.
  • the fourth step: the parsing unit 94 parses the phrase in the voice content according to the grammar file to obtain
  • the combining unit 91 includes: a word subunit, a statistical subunit, and a combined subunit. ;among them,
  • a word subunit which can be used to perform a word segmentation on a corpus stored by the machine according to a phrase of a specific domain, to obtain a phrase of a specific domain in the corpus; and obtain an artificial markup method according to the prior art.
  • a statistical subunit which can be used to count the probability or frequency of occurrence of a phrase of each particular domain in the corpus in a particular domain of the corpus;
  • the combining subunit may be configured to select a preset number of phrases from a phrase of a specific domain in the corpus according to the ranking of the probability or frequency, and combine the selected phrase with a phrase in a non-specific domain to generate a first All word dictionary. Select the phrase with the highest probability or frequency of the phrase in a specific field, and generate the first word dictionary from the phrases that often appear in the corpus, which can improve the efficiency of machine word cutting.
  • the word-cutting unit 93 includes:
  • Combining a subunit, the phrase in the corpus and the adjusted probability or frequency are combined to generate a second word dictionary; wherein each phrase in the corpus is adjusted to appear in a phrase in the corpus
  • the probability or frequency increases the probability and frequency of occurrences of phrases in a particular domain in a phrase in the corpus of the machine, thereby increasing the accuracy of the machine's semantics of parsing the user's speech content.
  • the word segmentation unit may be configured to perform a word segmentation on the voice content sent by the user according to the second word dictionary, using a backward maximum cut word and a forward minimum cut word;
  • the search subunit may be configured to search for a probability or a frequency corresponding to the different phrase in the second word dictionary when the phrases obtained by the two word-cutting methods are different, and select a phrase with a large probability or a frequency As the final word of the word.
  • the above-mentioned word segmentation unit and the search sub-unit are used to cut the word content of the user by using a word-cutting method in which the backward maximum word segmentation and the forward minimum word segmentation are combined, so that the result of the word segmentation is more accurate.
  • an electronic device including the device for parsing voice content according to any of the foregoing embodiments.
  • a non-transitory computer readable storage medium is also provided, the non-transitory computer readable storage medium storing computer executable instructions executable by any of the above methods The method of parsing voice content in the example.
  • FIG. 10 is a hardware node of an electronic device for performing a method for parsing voice content according to an embodiment of the present application.
  • the schematic diagram, as shown in FIG. 10, includes:
  • processors 1010 and a memory 1020 are illustrated by one processor 1010 in FIG.
  • the apparatus that performs the method of parsing the voice content may further include: an input device 1030 and an output device 1040.
  • the processor 1010, the memory 1020, the input device 1030, and the output device 1040 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
  • the memory 1020 is used as a non-transitory computer readable storage medium, and can be used for storing a non-volatile software program, a non-volatile computer executable program, and a module, such as a program corresponding to the method for parsing voice content in the embodiment of the present application.
  • An instruction/module for example, the combination unit 91, the statistical unit 92, the word-cutting unit 93, and the parsing unit 94 shown in FIG. 9).
  • the processor 1010 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 1020, that is, a method of parsing the voice content by the above method embodiments.
  • the memory 1020 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the device that parses the voice content, and the like.
  • memory 1020 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 1020 can optionally include memory remotely disposed relative to processor 1010, which can be connected to a device that parses the voice content over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Input device 1030 can receive input numeric or character information, as well as generate key signal inputs related to user settings and function control of the device that parses the voice content.
  • the output device 1040 can include a display device such as a display screen.
  • the one or more modules are stored in the memory 1020, and when executed by the one or more processors 1010, perform the method of parsing voice content in any of the above method embodiments.
  • the electronic device of the embodiment of the present application exists in various forms, including but not limited to:
  • Mobile communication devices These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access.
  • Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
  • Portable entertainment devices These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.
  • the server consists of a processor, a hard disk, a memory, a system bus, etc.
  • the server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability High reliability in terms of reliability, security, scalability, and manageability.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

La présente invention concerne un procédé et un appareil permettant d'analyser un contenu vocal. Le procédé consiste à : produire un premier dictionnaire de segmentation en mots par combinaison d'un groupe de mots, dans un champ spécifié, à un groupe de mots dans un champ non spécifié et effectuer une segmentation en mots sur un corpus mémorisé dans une machine en fonction du premier dictionnaire de segmentation en mots afin d'obtenir un groupe de mots dans le corpus (11) ; réaliser des statistiques, dans le corpus, sur la probabilité ou la fréquence d'apparition de chaque groupe de mots dans le groupe de mots du corpus et ajuster la probabilité ou la fréquence en fonction d'une règle prédéfinie, de sorte que la probabilité ou la fréquence d'apparition du groupe de mots dans le champ spécifié dans le groupe de mots dans le corpus augmente (12) ; produire un second dictionnaire de segmentation en mots par combinaison du groupe de mots dans le corpus à la probabilité ou fréquence ajustée et effectuer une segmentation en mots sur un contenu vocal envoyé par un utilisateur en fonction du second dictionnaire de segmentation en mots afin d'obtenir un groupe de mots dans le contenu vocal (13) ; et analyser le groupe de mots dans le contenu vocal en fonction d'un fichier de grammaire afin d'obtenir un sémantème (14) correspondant. Grâce au procédé, la probabilité d'apparition d'un groupe de mots dans un champ spécifié de tous les groupes de mots dans une machine augmente, ce qui améliore le taux de précision de l'analyse par la machine d'un sémantème d'un contenu vocal.
PCT/CN2016/096186 2015-12-25 2016-08-22 Procédé et appareil d'analyse d'un contenu vocal WO2017107518A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510995231.5A CN105912521A (zh) 2015-12-25 2015-12-25 一种解析语音内容的方法及装置
CN201510995231.5 2015-12-25

Publications (1)

Publication Number Publication Date
WO2017107518A1 true WO2017107518A1 (fr) 2017-06-29

Family

ID=56744050

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/096186 WO2017107518A1 (fr) 2015-12-25 2016-08-22 Procédé et appareil d'analyse d'un contenu vocal

Country Status (2)

Country Link
CN (1) CN105912521A (fr)
WO (1) WO2017107518A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019034957A1 (fr) * 2017-08-17 2019-02-21 International Business Machines Corporation Pré-analyseur à commande lexicale spécifique au domaine
CN110390002A (zh) * 2019-06-18 2019-10-29 深圳壹账通智能科技有限公司 通话资源配置方法、装置、计算机可读存储介质及服务器
US10769375B2 (en) 2017-08-17 2020-09-08 International Business Machines Corporation Domain-specific lexical analysis
CN112016297A (zh) * 2020-08-27 2020-12-01 深圳壹账通智能科技有限公司 意图识别模型测试方法、装置、计算机设备和存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399919A (zh) * 2017-02-06 2018-08-14 中兴通讯股份有限公司 一种语义识别方法和装置
CN107193973B (zh) * 2017-05-25 2021-07-20 百度在线网络技术(北京)有限公司 语义解析信息的领域识别方法及装置、设备及可读介质
US10599645B2 (en) * 2017-10-06 2020-03-24 Soundhound, Inc. Bidirectional probabilistic natural language rewriting and selection
CN109447863A (zh) * 2018-10-23 2019-03-08 广州努比互联网科技有限公司 一种4mat实时分析方法及系统
CN109446376B (zh) * 2018-10-31 2021-06-25 广东小天才科技有限公司 一种通过分词对语音进行分类的方法及系统
CN111831832B (zh) * 2020-07-27 2022-07-01 北京世纪好未来教育科技有限公司 词表构建方法、电子设备及计算机可读介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050289141A1 (en) * 2004-06-25 2005-12-29 Shumeet Baluja Nonstandard text entry
CN1949211A (zh) * 2005-10-13 2007-04-18 中国科学院自动化研究所 一种新的汉语口语解析方法及装置
US20070233458A1 (en) * 2004-03-18 2007-10-04 Yousuke Sakao Text Mining Device, Method Thereof, and Program
CN101788989A (zh) * 2009-01-22 2010-07-28 蔡亮华 词汇信息处理方法及系统
CN104077275A (zh) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 一种基于语境进行分词的方法和装置
CN105096933A (zh) * 2015-05-29 2015-11-25 百度在线网络技术(北京)有限公司 分词词典的生成方法和装置及语音合成方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101404035A (zh) * 2008-11-21 2009-04-08 北京得意音通技术有限责任公司 一种基于文本或语音的信息搜索方法
US9569425B2 (en) * 2013-03-01 2017-02-14 The Software Shop, Inc. Systems and methods for improving the efficiency of syntactic and semantic analysis in automated processes for natural language understanding using traveling features
CN103294666B (zh) * 2013-05-28 2017-03-01 百度在线网络技术(北京)有限公司 语法编译方法、语义解析方法以及对应装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233458A1 (en) * 2004-03-18 2007-10-04 Yousuke Sakao Text Mining Device, Method Thereof, and Program
US20050289141A1 (en) * 2004-06-25 2005-12-29 Shumeet Baluja Nonstandard text entry
CN1949211A (zh) * 2005-10-13 2007-04-18 中国科学院自动化研究所 一种新的汉语口语解析方法及装置
CN101788989A (zh) * 2009-01-22 2010-07-28 蔡亮华 词汇信息处理方法及系统
CN104077275A (zh) * 2014-06-27 2014-10-01 北京奇虎科技有限公司 一种基于语境进行分词的方法和装置
CN105096933A (zh) * 2015-05-29 2015-11-25 百度在线网络技术(北京)有限公司 分词词典的生成方法和装置及语音合成方法和装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019034957A1 (fr) * 2017-08-17 2019-02-21 International Business Machines Corporation Pré-analyseur à commande lexicale spécifique au domaine
US10445423B2 (en) 2017-08-17 2019-10-15 International Business Machines Corporation Domain-specific lexically-driven pre-parser
US10496744B2 (en) 2017-08-17 2019-12-03 International Business Machines Corporation Domain-specific lexically-driven pre-parser
GB2579957A (en) * 2017-08-17 2020-07-08 Ibm Domain-specific lexically-driven pre-parser
US10769375B2 (en) 2017-08-17 2020-09-08 International Business Machines Corporation Domain-specific lexical analysis
US10769376B2 (en) 2017-08-17 2020-09-08 International Business Machines Corporation Domain-specific lexical analysis
CN110390002A (zh) * 2019-06-18 2019-10-29 深圳壹账通智能科技有限公司 通话资源配置方法、装置、计算机可读存储介质及服务器
CN112016297A (zh) * 2020-08-27 2020-12-01 深圳壹账通智能科技有限公司 意图识别模型测试方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN105912521A (zh) 2016-08-31

Similar Documents

Publication Publication Date Title
WO2017107518A1 (fr) Procédé et appareil d'analyse d'un contenu vocal
JP6675463B2 (ja) 自然言語の双方向確率的な書換えおよび選択
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
US10810272B2 (en) Method and apparatus for broadcasting search result based on artificial intelligence
CN108304375B (zh) 一种信息识别方法及其设备、存储介质、终端
Zajic et al. Multi-candidate reduction: Sentence compression as a tool for document summarization tasks
JP5819860B2 (ja) 複合語分割
US7158930B2 (en) Method and apparatus for expanding dictionaries during parsing
US20050154580A1 (en) Automated grammar generator (AGG)
WO2014187096A1 (fr) Procédé et système d'ajout de signes de ponctuation à des fichiers vocaux
WO2012083892A1 (fr) Procédé et dispositif destinés au filtrage des informations préjudiciables
WO2015127747A1 (fr) Procédé et dispositif d'ajout de fichier multimédia
CN106649253B (zh) 基于后验证的辅助控制方法及系统
CN106294460B (zh) 一种基于字和词混合语言模型的汉语语音关键词检索方法
US10553203B2 (en) Training data optimization for voice enablement of applications
JP2001101185A (ja) 辞書の自動切り換えが可能な機械翻訳方法および装置並びにそのような機械翻訳方法を実行するためのプログラムを記憶したプログラム記憶媒体
US20190138270A1 (en) Training Data Optimization in a Service Computing System for Voice Enablement of Applications
US10037321B1 (en) Calculating a maturity level of a text string
WO2012079257A1 (fr) Procédé et dispositif de traduction automatique
WO2014036827A1 (fr) Procédé et équipement utilisateur de correction de texte
CN111680129B (zh) 语义理解系统的训练方法及系统
CN109190116B (zh) 语义解析方法、系统、电子设备及存储介质
US20210312901A1 (en) Automatic learning of entities, words, pronunciations, and parts of speech
CN112149403A (zh) 一种确定涉密文本的方法和装置
Mrva et al. A PLSA-based language model for conversational telephone speech.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877330

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16877330

Country of ref document: EP

Kind code of ref document: A1