WO2020244150A1 - 语音检索方法、装置、计算机设备及存储介质 - Google Patents

语音检索方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2020244150A1
WO2020244150A1 PCT/CN2019/117872 CN2019117872W WO2020244150A1 WO 2020244150 A1 WO2020244150 A1 WO 2020244150A1 CN 2019117872 W CN2019117872 W CN 2019117872W WO 2020244150 A1 WO2020244150 A1 WO 2020244150A1
Authority
WO
WIPO (PCT)
Prior art keywords
corpus
gram model
model
training
result
Prior art date
Application number
PCT/CN2019/117872
Other languages
English (en)
French (fr)
Inventor
黄锦伦
陈磊
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020244150A1 publication Critical patent/WO2020244150A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • This application relates to the field of voice recognition technology, and in particular to a voice retrieval method, device, computer equipment and storage medium.
  • the embodiments of the present application provide a voice retrieval method, device, computer equipment, and storage medium, aiming to solve the problem that the voice recognition system in the prior art has a low voice recognition accuracy rate in a supermarket scenario, resulting in inaccurate recognition results.
  • an embodiment of the present application provides a voice retrieval method, which includes:
  • an embodiment of the present application provides a voice retrieval device, which includes:
  • the model training unit is configured to receive a training set corpus, input the training set corpus to the initial N-gram model for training, and obtain an N-gram model; wherein, the N-gram model is an N-gram model;
  • the voice recognition unit is used to receive the voice to be recognized, and to recognize the voice to be recognized through the N-gram model to obtain a recognition result;
  • the word segmentation unit is used to segment the recognition result to obtain a sentence segmentation result corresponding to the recognition result;
  • the part-of-speech analysis unit is used to perform morphological analysis according to the word segmentation result of the sentence to obtain the noun part-of-speech keywords corresponding to the word segmentation result of the sentence;
  • the retrieval unit is configured to search for a corpus whose similarity with the nominal keyword exceeds a preset similarity threshold in a pre-stored recommended corpus to obtain a retrieval result.
  • an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and running on the processor, and the processor executes the computer
  • the program implements the voice retrieval method described in the first aspect.
  • the embodiments of the present application also provide a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the processor executes the above-mentioned On the one hand, the voice retrieval method.
  • Figure 1 is a schematic diagram of an application scenario of a voice retrieval method provided by an embodiment of the application
  • FIG. 2 is a schematic flowchart of a voice retrieval method provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of a sub-flow of a voice retrieval method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of another sub-flow of the voice retrieval method provided by an embodiment of this application.
  • Figure 5 is a schematic block diagram of a voice retrieval device provided by an embodiment of the application.
  • FIG. 6 is a schematic block diagram of subunits of a voice retrieval device provided by an embodiment of the application.
  • FIG. 7 is a schematic block diagram of another subunit of the voice retrieval device provided by an embodiment of the application.
  • FIG. 8 is a schematic block diagram of a computer device provided by an embodiment of the application.
  • FIG. 1 is a schematic diagram of an application scenario of a voice retrieval method provided by an embodiment of this application
  • FIG. 2 is a schematic flowchart of a voice retrieval method provided by an embodiment of this application, and the voice retrieval method is applied in a server. The method is executed by application software installed in the server.
  • the method includes steps S110 to S150.
  • the server can receive the training set corpus for training to obtain an N-gram model, and use the N-gram model to recognize the to-be-recognized voice uploaded to the server by the front-end voice collection terminal set in the smart supermarket.
  • the training set corpus is a mixed library of general corpus and consumer product corpus
  • the consumer product corpus is a corpus that includes a large number of product names (such as product brands, product names, etc.); the difference between general corpus and consumer product corpus is that, The vocabulary in the general corpus is not biased towards a specific field.
  • step S110 includes:
  • the consumer product corpus is a corpus that includes a large number of product names.
  • the difference between the general corpus and the consumer product corpus is that the vocabulary in the general corpus is not biased toward a specific field, but the vocabulary in each field includes .
  • the N-gram model is a language model (Language Model, LM).
  • the language model is a probability-based discriminant model. Its input is a sentence (the sequence of words), and the output is the probability of the sentence, that is, these words The joint probability (joint probability).
  • N-Gram language model Assuming that the sentence T is composed of word sequences w 1 , w 2 , w 3 ... w n , the N-Gram language model is expressed as follows:
  • conditional probability of each word in the sentence T can be obtained by counting in the corpus. Then the n-ary model is as follows:
  • C (w in-1, ... , w i) represents a string w in-1, ..., w i is the number of times in the corpus.
  • the model fusion ratio of the first N-gram model and the second N-gram model is also 2:8.
  • the first N-gram model and the second N-gram model are fused, and finally an N-gram model for speech recognition is obtained. Since the ratio of consumer goods corpus to general corpus is initially set, the final fusion of the N-gram model can effectively improve the accuracy of speech recognition in the smart supermarket scenario.
  • step S111 includes:
  • S1111 perform word segmentation on the consumer product corpus based on a probability statistical word segmentation model, and obtain a first word segmentation result corresponding to the consumer product corpus;
  • the word segmentation process for each sentence in the consumer goods corpus is as follows:
  • the word segmentation model based on probability statistics can find the target word string W, so that W satisfies: P(W
  • C) MAX(P(Wa
  • the first word segmentation result corresponding to the consumer goods corpus is obtained, the first word segmentation result is input to the first initial N-gram model for training, and the first N-gram model is obtained.
  • the first N-gram model is The sentence recognition accuracy rate in the smart supermarket scene is high.
  • the second N-gram model has a high recognition rate of sentences in ordinary daily life scenes (that is, the recognition rate of sentences that are not biased to a certain life scene is high).
  • S120 Receive a voice to be recognized, and recognize the voice to be recognized through the N-gram model to obtain a recognition result.
  • the recognition is a whole sentence, such as "I want to buy XX brand instant noodles"
  • the voice to be recognized can be performed through the N-gram model Effectively recognize the sentence with the highest recognition probability as the recognition result.
  • step S130 includes:
  • the recognition result is segmented based on the probability and statistics word segmentation model, and a sentence segmentation result corresponding to the recognition result is obtained.
  • step S130 when performing word segmentation on the recognition result in step S130, the specific process of word segmentation based on the probability and statistics word segmentation model can be referred to step S1111. After the recognition result is segmented, further part of speech analysis can be performed.
  • S140 Perform morphological analysis according to the word segmentation result of the sentence to obtain a noun part-of-speech keyword corresponding to the word segmentation result of the sentence.
  • step S140 includes:
  • the sentence segmentation result is used as the input of the pre-trained joint lexical analysis model to obtain the noun part of speech keywords in the sentence segmentation result.
  • the input of the lexical analysis task is a string ("sentence” is used to refer to it later), and the output is the word boundary, part of speech, and entity category in the sentence.
  • Sequence labeling is a classic modeling method of lexical analysis.
  • the network structure learning features based on GRU (Gated Recurrent Unit) are used, and the learned features are connected to the CRF decoding layer (CRF or Conditional Random Field) to complete sequence labeling.
  • the CRF decoding layer essentially replaces the linear model in the traditional CRF with a nonlinear neural network, based on the sentence-level likelihood probability, which can better solve the label bias problem.
  • the input of the joint lexical analysis model is expressed in one-hot mode.
  • Each word is represented by an id and a one-hot sequence is converted into a word vector sequence represented by a real vector through the word table; the word vector sequence is used as the input of the two-way GRU to learn the input sequence
  • the feature representation of, a new feature representation sequence is obtained, in which two layers of bidirectional GRU are stacked to increase learning ability; CRF takes the features learned by GRU as input, and uses tag sequence as supervision signal to realize the part of speech of each word segmentation in the sentence segmentation result Label.
  • the probability that the name part-of-speech keyword is a product brand or a product name is more likely, so the noun part-of-speech keyword corresponding to the sentence segmentation result is selected as the screening result to further perform product search.
  • S150 Search a pre-stored recommended corpus for a corpus whose similarity with the nominal keyword exceeds a preset similarity threshold to obtain a retrieval result.
  • search for each noun part-of-speech keyword in a preset recommended corpus to obtain words with a greater similarity to the part-of-speech keywords as the search result.
  • search for each noun part-of-speech keywords in the preset recommended corpus and get words with a greater degree of similarity to the part-of-speech keywords, which are specifically based on the Word2Vec model (the Word2Vec model is a real-valued word Vector efficient tool) to obtain the word vector corresponding to the nominal keyword, and then calculate the similarity with the word vector corresponding to each corpus in the pre-stored recommendation corpus.
  • the similarity between the two vectors is calculated by Calculate the Euclidean distance between two vectors. If there is a corpus whose similarity to the nominal keyword exceeds the preset similarity threshold in the pre-stored recommendation corpus, the corresponding corpus is used as one of the search results, that is, multiple matches with the nominal keyword The corpus whose similarity exceeds the preset similarity threshold together constitute the search result.
  • step S160 the method further includes:
  • the consumer goods that the user needs to purchase can be learned based on the nominal keywords in the search result.
  • the consumer goods database needs to be obtained from the consumer goods database.
  • Placement information for example, a certain type of consumer goods are placed in column ZZ of XX numbered shelf YY layer, and column ZZ of XX numbered shelf YY layer is the placement position information
  • the placement position information is sent to the display terminal
  • the display can effectively prompt the user to quickly find and purchase consumer goods.
  • the method further includes:
  • the target consumer products acquired corresponding to each nominal keyword in the search result are sorted in descending order according to the sales volume data to obtain a sorted display list of target consumer products.
  • the target consumer products retrieved based on the user's voice to be recognized in order to visually display the consumer products that the user may need to purchase, in this case, you can set the descending order of the sales data of the target consumer products to obtain the target consumer product sort display list In this way, users can intuitively know products with high sales.
  • This method uses speech recognition technology to obtain noun part-of-speech keywords through morphological analysis of the results of speech recognition, so as to achieve more accurate retrieval results in the recommended corpus based on noun part-of-speech keywords.
  • the embodiment of the present application also provides a voice retrieval device, which is used to execute any embodiment of the aforementioned voice retrieval method.
  • a voice retrieval device which is used to execute any embodiment of the aforementioned voice retrieval method.
  • FIG. 5 is a schematic block diagram of a voice retrieval device provided in an embodiment of the present application.
  • the voice retrieval device 100 can be configured in a server.
  • the voice retrieval device 100 includes a model training unit 110, a voice recognition unit 120, a recognition result word segmentation unit 130, a part of speech analysis unit 140, and a retrieval unit 150.
  • the model training unit 110 is configured to receive a training set corpus, input the training set corpus to an initial N-gram model for training, and obtain an N-gram model; wherein, the N-gram model is an N-gram model.
  • the server can receive the training set corpus for training to obtain an N-gram model, and use the N-gram model to recognize the to-be-recognized voice uploaded to the server by the front-end voice collection terminal set in the smart supermarket.
  • the training set corpus is a mixed library of general corpus and consumer product corpus
  • the consumer product corpus is a corpus that includes a large number of product names (such as product brands, product names, etc.); the difference between general corpus and consumer product corpus is that, The vocabulary in the general corpus is not biased towards a specific field.
  • the N-gram model for speech recognition can be obtained by inputting the training set corpus to the initial N-gram model for training.
  • the model training unit 110 includes:
  • the first training unit 111 is configured to obtain a consumer product corpus, input the consumer product corpus into a first initial N-gram model for training, and obtain the first N-gram model;
  • the second training unit 112 is configured to obtain a general corpus, input the general corpus into a second initial N-gram model for training, and obtain a second N-gram model;
  • the model fusion unit 113 is configured to fuse the first N-gram model and the second N-gram model according to the set model fusion scale to obtain an N-gram model.
  • the consumer product corpus is a corpus that includes a large number of product names.
  • the difference between the general corpus and the consumer product corpus is that the vocabulary in the general corpus is not biased toward a specific field.
  • the N-gram model is a language model (Language Model, LM).
  • the language model is a probability-based discriminant model. Its input is a sentence (the sequence of words), and the output is the probability of the sentence, that is, these words The joint probability (joint probability).
  • N-Gram language model Assuming that the sentence T is composed of word sequences w 1 , w 2 , w 3 ... w n , the N-Gram language model is expressed as follows:
  • conditional probability of each word in the sentence T can be obtained by counting in the corpus. Then the n-ary model is as follows:
  • C (w in-1, ... , w i) represents a string w in-1, ..., w i is the number of times in the corpus.
  • the model fusion ratio of the first N-gram model and the second N-gram model is also 2:8.
  • the first N-gram model and the second N-gram model are fused, and finally an N-gram model for speech recognition is obtained. Since the ratio of consumer goods corpus to general corpus is initially set, the final fusion of the N-gram model can effectively improve the accuracy of speech recognition in the smart supermarket scenario.
  • the first training unit 111 includes:
  • the word segmentation unit 1111 is configured to segment the consumer product corpus based on a probability statistical word segmentation model to obtain a first word segmentation result corresponding to the consumer product corpus;
  • the word segmentation training unit 1112 is configured to input the first word segmentation result into the first initial N-gram model for training, and obtain the first N-gram model.
  • the word segmentation process for each sentence in the consumer goods corpus is as follows:
  • the word segmentation model based on probability statistics can find the target word string W, so that W satisfies: P(W
  • C) MAX(P(Wa
  • the first word segmentation result corresponding to the consumer goods corpus is obtained, the first word segmentation result is input to the first initial N-gram model for training, and the first N-gram model is obtained.
  • the first N-gram model is The sentence recognition accuracy rate in the smart supermarket scene is high.
  • the second N-gram model has a high recognition rate of sentences in ordinary daily life scenes (that is, the recognition rate of sentences that are not biased to a certain life scene is high).
  • the voice recognition unit 120 is configured to receive the voice to be recognized, and to recognize the voice to be recognized through the N-gram model to obtain a recognition result.
  • the recognition is a whole sentence, such as "I want to buy XX brand instant noodles"
  • the voice to be recognized can be performed through the N-gram model Effectively recognize the sentence with the highest recognition probability as the recognition result.
  • the recognition result segmentation unit 130 is configured to segment the recognition result to obtain a sentence segmentation result corresponding to the recognition result.
  • the recognition result word segmentation unit 130 is further used for:
  • the recognition result is segmented based on the probability and statistics word segmentation model, and a sentence segmentation result corresponding to the recognition result is obtained.
  • the specific process of segmentation based on the probability and statistics word segmentation model may refer to the word segmentation unit 1111. After the recognition result is segmented, further part of speech analysis can be performed.
  • the part-of-speech analysis unit 140 is configured to perform morphological analysis according to the sentence segmentation result to obtain the noun part-of-speech keywords corresponding to the sentence segmentation result.
  • the part-of-speech analysis unit 140 is further used to:
  • the sentence segmentation result is used as the input of the pre-trained joint lexical analysis model to obtain the noun part of speech keywords in the sentence segmentation result.
  • the input of the lexical analysis task is a character string ("sentence” is used to refer to it later), and the output is the word boundary, part of speech, and entity category in the sentence.
  • Sequence labeling is a classic modeling method of lexical analysis.
  • the network structure learning features based on GRU (Gated Recurrent Unit) are used, and the learned features are connected to the CRF decoding layer (CRF or Conditional Random Field) to complete sequence labeling.
  • the CRF decoding layer essentially replaces the linear model in the traditional CRF with a nonlinear neural network, based on the sentence-level likelihood probability, which can better solve the label bias problem.
  • the input of the joint lexical analysis model is expressed in one-hot mode.
  • Each word is represented by an id and a one-hot sequence is converted into a word vector sequence represented by a real vector through the word table; the word vector sequence is used as the input of the two-way GRU to learn the input sequence
  • the feature representation of, a new feature representation sequence is obtained, in which two layers of bidirectional GRU are stacked to increase learning ability; CRF takes the features learned by GRU as input, and uses tag sequence as supervision signal to realize the part of speech of each word segmentation in the sentence segmentation result Label.
  • the probability that the name part-of-speech keyword is a product brand or a product name is more likely, so the noun part-of-speech keyword corresponding to the sentence segmentation result is selected as the screening result to further perform product search.
  • the retrieval unit 150 is configured to search for a corpus whose similarity with the nominal keyword exceeds a preset similarity threshold in a pre-stored recommended corpus to obtain a retrieval result.
  • search for each noun part-of-speech keyword in a preset recommended corpus to obtain words with a greater similarity to the part-of-speech keywords as the search result.
  • search for each noun part-of-speech keywords in the preset recommended corpus and get words with a greater degree of similarity to the part-of-speech keywords, which are specifically based on the Word2Vec model (the Word2Vec model is a real-valued word Vector efficient tool) to obtain the word vector corresponding to the nominal keyword, and then calculate the similarity with the word vector corresponding to each corpus in the pre-stored recommendation corpus.
  • the similarity between the two vectors is calculated by Calculate the Euclidean distance between two vectors. If there is a corpus whose similarity to the nominal keyword exceeds the preset similarity threshold in the pre-stored recommendation corpus, the corresponding corpus is used as one of the search results, that is, multiple matches with the nominal keyword The corpus whose similarity exceeds the preset similarity threshold together constitute the search result.
  • the device uses speech recognition technology to obtain noun part-of-speech keywords after morphological analysis of the results of speech recognition, so as to achieve more accurate retrieval results in the recommended corpus based on noun part-of-speech keywords.
  • the above-mentioned voice retrieval device can be implemented in the form of a computer program, and the computer program can be run on a computer device as shown in FIG. 8.
  • FIG. 8 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the computer device 500 is a server, and the server may be an independent server or a server cluster composed of multiple servers.
  • the computer device 500 includes a processor 502, a memory, and a network interface 505 connected through a system bus 501, where the memory may include a non-volatile storage medium 503 and an internal memory 504.
  • the non-volatile storage medium 503 can store an operating system 5031 and a computer program 5032.
  • the processor 502 can execute the voice retrieval method.
  • the processor 502 is used to provide computing and control capabilities, and support the operation of the entire computer device 500.
  • the internal memory 504 provides an environment for the operation of the computer program 5032 in the non-volatile storage medium 503.
  • the processor 502 can execute the voice retrieval method.
  • the network interface 505 is used for network communication, such as providing data information transmission.
  • the structure shown in FIG. 8 is only a block diagram of part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 500 to which the solution of the present application is applied.
  • the specific computer device 500 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.
  • the processor 502 is configured to run a computer program 5032 stored in a memory to implement the voice retrieval method of the embodiment of the present application.
  • the embodiment of the computer device shown in FIG. 8 does not constitute a limitation on the specific configuration of the computer device.
  • the computer device may include more or less components than those shown in the figure. Or combine certain components, or different component arrangements.
  • the computer device may only include a memory and a processor. In such an embodiment, the structures and functions of the memory and the processor are consistent with the embodiment shown in FIG. 8 and will not be repeated here.
  • the processor 502 may be a central processing unit (Central Processing Unit, CPU), and the processor 502 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.
  • a computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program is executed by a processor to implement the voice retrieval method of the embodiment of the present application.
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.
  • a physical, non-transitory storage medium such as a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk, or an optical disk that can store program codes. medium.

Abstract

提供了一种语音检索方法、装置、计算机设备及存储介质。该方法包括:接收训练集语料库,将训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型(S110);接收待识别语音,通过N-gram模型对待识别语音进行识别,得到识别结果(S120);将识别结果进行分词,得到与识别结果对应的语句分词结果(S130);根据语句分词结果进行词法分析,得到语句分词结果对应的名词词性关键词(S140);以及在预先存储的推荐语料库中搜索与名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果(S150)。

Description

语音检索方法、装置、计算机设备及存储介质
本申请要求于2019年6月6日提交中国专利局、申请号为201910492599.8、申请名称为“语音检索方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音识别技术领域,尤其涉及一种语音检索方法、装置、计算机设备及存储介质。
背景技术
目前,智能超市通过语音识别去检索商品,通常通过模糊查询去匹配商品,这时候需要对语音识别的结果进行分析,智能获取用户需要购买的物品名称。用户在使用时,经常会说出整个句子,例如:我要买XXX,我要吃XXX等等,而当前的语音识别系统无法判断准确其购买意图。
发明内容
本申请实施例提供了一种语音检索方法、装置、计算机设备及存储介质,旨在解决现有技术中语音识别系统在超市场景下语音识别准确率低下,导致识别结果不准确的问题。
第一方面,本申请实施例提供了一种语音检索方法,其包括:
接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型;
接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别,得到识别结果;
将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果;
根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词;以及
在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果。
第二方面,本申请实施例提供了一种语音检索装置,其包括:
模型训练单元,用于接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型;
语音识别单元,用于接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别,得到识别结果;
分词单元,用于将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果;
词性分析单元,用于根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词;以及
检索单元,用于在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果。
第三方面,本申请实施例又提供了一种计算机设备,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的语音检索方法。
第四方面,本申请实施例还提供了一种计算机可读存储介质,其中所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行上述第一方面所述的语音检索方法。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的语音检索方法的应用场景示意图;
图2为本申请实施例提供的语音检索方法的流程示意图;
图3为本申请实施例提供的语音检索方法的子流程示意图;
图4为本申请实施例提供的语音检索方法的另一子流程示意图;
图5为本申请实施例提供的语音检索装置的示意性框图;
图6为本申请实施例提供的语音检索装置的子单元示意性框图;
图7为本申请实施例提供的语音检索装置的另一子单元示意性框图;
图8为本申请实施例提供的计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
请参阅图1和图2,图1为本申请实施例提供的语音检索方法的应用场景示意图,图2为本申请实施例提供的语音检索方法的流程示意图,该语音检索方法应用于服务器中,该方法通过安装于服务器中的应用软件进行执行。
如图2所示,该方法包括步骤S110~S150。
S110、接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型。
在本实施例中,是站在服务器的角度描述技术方案。服务器中可接收训练集语料库训练得到N-gram模型,通过N-gram模型对设置在智能超市的前端语音采集终端上传至服务器的待识别语音进行识别。
在本实施例中,训练集语料库是通用语料和消费品语料的混合库,消费品语料是包括了大量商品名称(如商品品牌,商品名称等)的语料库;通用语料与消费品语料的不同之处在于,通用语料中的词汇并未偏向于某一具体领域。通过所述训练集语料库输入至初始N-gram模型进行训练,即可得到用于语音识 别的N-gram模型。
在一实施例中,如图3所示,步骤S110包括:
S111、获取消费品语料,将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型;
S112、获取通用语料,将所述通用语料输入至第二初始N-gram模型进行训练,得到第二N-gram模型;
S113、根据所设置的模型融合比例,将所述第一N-gram模型和所述第二N-gram模型进行融合,得到N-gram模型。
在本实施例中,消费品语料是包括了大量商品名称的语料库,通用语料与消费品语料的不同之处在于,通用语料中的词汇并未偏向于某一具体领域,而是各个领域的词汇都包括。
N-gram模型是一种语言模型(Language Model,LM),语言模型是一个基于概率的判别模型,它的输入是一句话(单词的顺序序列),输出是这句话的概率,即这些单词的联合概率(joint probability)。
假设句子T是有词序列w 1,w 2,w 3...w n组成,用公式表示N-Gram语言模型如下:
P(T)=P(w 1)*p(w 2)*p(w 3)*…*p(w n)
=p(w 1)*p(w 2|w 1)*p(w 3|w 1w 2)*…*p(w n|w 1w 2w 3...)
一般常用的N-Gram模型是Bi-Gram和Tri-Gram。分别用公式表示如下:
Bi-Gram:
P(T)=p(w 1|begin)*p(w 2|w 1)*p(w 3|w 2)*…*p(w n|w n-1)
Tri-Gram:
P(T)=p(w 1|begin 1,begin 2)*p(w 2|w 1,begin 1)*p(w 3|w 2w 1)*…*p(w n|w n-1,w n-2);
可见,对于句子T中每一个词出现的条件概率,可以通过在语料库中统计计数的方式得出。则n元模型如下:
p(w n|w 1w 2w 3...)=C(w i-n-1,…,w i)/C(w i-n-1,…,w i-1);
式中C(w i-n-1,…,w i)表示字符串w i-n-1,…,w i在语料库中的次数。
根据所设置的模型融合比例,如消费品语料与通用语料的比例设置为2:8,得到第一N-gram模型和所述第二N-gram模型的模型融合比例也为2:8,将所述第一N-gram模型和所述第二N-gram模型进行融合,最终得到用于语音识别的 N-gram模型。由于初始设置了消费品语料与通用语料的比例,最终融合得到的N-gram模型以智能超市场景下的语音识别的准确率得到有效的提高。
在一实施例中,如图4所示,步骤S111包括:
S1111、将所述消费品语料基于概率统计分词模型进行分词,得到与所述消费品语料对应的第一分词结果;
S1112、将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型。
在本实施例中,将所述消费品语料中每一语句通过基于概率统计分词模型进行分词过程如下:
例如,令C=C1C2...Cm,C是待切分的汉字串,令W=W1W2...Wn,W是切分的结果,Wa,Wb,……,Wk是C的所有可能的切分方案。那么,基于概率统计分词模型就是能够找到目的词串W,使得W满足:P(W|C)=MAX(P(Wa|C),P(Wb|C)...P(Wk|C))的分词模型,上述分词模型得到的词串W即估计概率为最大之词串。即:
对一个待分词的子串S,按照从左到右的顺序取出全部候选词w1、w2、…、wi、…、wn;在词典中查出每个候选词的概率值P(wi),并记录每个候选词的全部左邻词;计算每个候选词的累计概率,同时比较得到每个候选词的最佳左邻词;如果当前词wn是字串S的尾词,且累计概率P(wn)最大,则wn就是S的终点词;从wn开始,按照从右到左顺序,依次将每个词的最佳左邻词输出,即S的分词结果。在得到了与所述消费品语料对应的第一分词结果,将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型,该第一N-gram模型对智能超市场景下的语句识别准确率较高。
同样的,通过将所述通用语料基于概率统计分词模型进行分词,得到与所述通用语料对应的第二分词结果;将所述第二分词结果输入至第二初始N-gram模型进行训练,得到第二N-gram模型,该第二N-gram模型对日常生活普通场景下的语句识别准确率较高(即不偏向于对某一生活场景的语句的识别率较高)。
S120、接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别,得到识别结果。
当通过所述N-gram模型对所述待识别语音进行进行识别,识别得到的是一整句话,例如“我要买XX品牌方便面”,通过N-gram模型能对所述待识别语 音进行进行有效识别,得到识别概率最大的语句作为识别结果。
S130、将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果。
在一实施例中,步骤S130包括:
将所述识别结果基于概率统计分词模型进行分词,得到与所述识别结果对应的语句分词结果。
在本实施例中,步骤S130中对所述识别结果进行分词时也是采用基于概率统计分词模型进行分词具体过程可参考步骤S1111。当将识别结果进行分词后,即可进一步的进行词性分析。
S140、根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词。
在一实施例中,步骤S140包括:
将所述语句分词结果作为预先训练的联合词法分析模型的输入,得到所述语句分词结果中名词词性关键词。
在本实施例中,通过联合词法分析模型进行词法分析的过程如下:
词法分析任务的输入是一个字符串(后面使用『句子』来指代它),而输出是句子中的词边界和词性、实体类别。序列标注是词法分析的经典建模方式。在构建联合词法分析模型(即LAC模型)使用基于GRU(门控循环单元)的网络结构学习特征,将学习到的特征接入CRF解码层(CRF即条件随机场)完成序列标注。CRF解码层本质上是将传统CRF中的线性模型换成了非线性神经网络,基于句子级别的似然概率,因而能够更好的解决标记偏置问题。
联合词法分析模型的输入采用one-hot方式表示,每个字以一个id表示one-hot序列通过字表,转换为实向量表示的字向量序列;字向量序列作为双向GRU的输入,学习输入序列的特征表示,得到新的特性表示序列,其中堆叠了两层双向GRU以增加学习能力;CRF以GRU学习到的特征为输入,以标记序列为监督信号,实现对语句分词结果中各分词的词性标注。由于在智能超市的场景下,名字词性关键词为商品品牌或商品名称的概率更大,故选取所述语句分词结果对应的名词词性关键词作为筛选结果,以进一步进行商品检索。
S150、在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果。
在本实施例中,当获取了名词词性关键词,在预设的推荐语料库中对各名词词性关键词进行搜索,得到与词词性关键词近似度较大的词,以作为检索结果。其中,在在预设的推荐语料库中对各名词词性关键词进行搜索,得到与词词性关键词近似度较大的词是,具体是根据Word2Vec模型(Word2Vec模型是一款将词表征为实数值向量的高效工具)获取所述名词性关键词对应的词向量,然后与预先存储的推荐语料库中每一语料对应的词向量进行相似度的计算,其中计算两个向量之间的相似度是通过计算两个向量之间的欧式距离。若在预先存储的推荐语料库中存在与所述名词性关键词的相似度超出预设的相似度阈值的语料,将对应的语料作为检索结果之一,即多个符合与所述名词性关键词的相似度超出预设的相似度阈值的语料共同组成检索结果。
在一实施例中,步骤S160之后还包括:
获取所述检索结果中各名词性关键词对应的消费品的摆放位置信息,将所述摆放位置信息发送至显示端进行显示。
在本实施例中,当通过对待识别语音进行识别得到了检索结果后,即可根据检索结果中名词性关键词的获知用户所需购买的消费品,此时需要从消费品数据库中获取各消费品的摆放位置信息(例如某一类消费品摆放在XX编号货架YY层第ZZ列,XX编号货架YY层第ZZ列即为摆放位置信息),此时将所述摆放位置信息发送至显示端进行显示即可有效提示用户快速寻找到消费品并进行购买。
在一实施例中,所述获取所述检索结果中各名词性关键词对应的消费品的摆放位置信息,将所述摆放位置信息发送至显示端进行显示之后,还包括:
将根据所述检索结果中各名词性关键词对应获取的目标消费品进行按照销量数据降序排序,得到目标消费品排序展示列表。
在本实施例中,根据用户的待识别语音而检索出来的目标消费品,为了直观的展示用户可能需要购买的消费品,此时可以设定按目标消费品的销量数据降序排序,得到目标消费品排序展示列表,通过这一方式用户能直观的获知销量高的产品。
该方法采用语音识别技术,通过对语音识别的结果进行词法分析后得到名词词性关键词,实现根据名词词性关键词在推荐语料库中更准确的获取检索结果。
本申请实施例还提供一种语音检索装置,该语音检索装置用于执行前述语音检索方法的任一实施例。具体地,请参阅图5,图5是本申请实施例提供的语音检索装置的示意性框图。该语音检索装置100可以配置于服务器中。
如图5所示,语音检索装置100包括模型训练单元110、语音识别单元120、识别结果分词单元130、词性分析单元140、检索单元150。
模型训练单元110,用于接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型。
在本实施例中,是站在服务器的角度描述技术方案。服务器中可接收训练集语料库训练得到N-gram模型,通过N-gram模型对设置在智能超市的前端语音采集终端上传至服务器的待识别语音进行识别。
在本实施例中,训练集语料库是通用语料和消费品语料的混合库,消费品语料是包括了大量商品名称(如商品品牌,商品名称等)的语料库;通用语料与消费品语料的不同之处在于,通用语料中的词汇并未偏向于某一具体领域。通过所述训练集语料库输入至初始N-gram模型进行训练,即可得到用于语音识别的N-gram模型。
在一实施例中,如图6所示,模型训练单元110包括:
第一训练单元111,用于获取消费品语料,将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型;
第二训练单元112,用于获取通用语料,将所述通用语料输入至第二初始N-gram模型进行训练,得到第二N-gram模型;
模型融合单元113,用于根据所设置的模型融合比例,将所述第一N-gram模型和所述第二N-gram模型进行融合,得到N-gram模型。
在本实施例中,消费品语料是包括了大量商品名称的语料库,通用语料与消费品语料的不同之处在于,通用语料中的词汇并未偏向于某一具体领域。
N-gram模型是一种语言模型(Language Model,LM),语言模型是一个基于概率的判别模型,它的输入是一句话(单词的顺序序列),输出是这句话的概率,即这些单词的联合概率(joint probability)。
假设句子T是有词序列w 1,w 2,w 3...w n组成,用公式表示N-Gram语言模型如下:
P(T)=P(w 1)*p(w 2)*p(w 3)*…*p(w n)
=p(w 1)*p(w 2|w 1)*p(w 3|w 1w 2)*…*p(w n|w 1w 2w 3...)
一般常用的N-Gram模型是Bi-Gram和Tri-Gram。分别用公式表示如下:
Bi-Gram:
P(T)=p(w 1|begin)*p(w 2|w 1)*p(w 3|w 2)*…*p(w n|w n-1)
Tri-Gram:
P(T)=p(w 1|begin 1,begin 2)*p(w 2|w 1,begin 1)*p(w 3|w 2w 1)*…*p(w n|w n-1,w n-2);
可见,对于句子T中每一个词出现的条件概率,可以通过在语料库中统计计数的方式得出。则n元模型如下:
p(w n|w 1w 2w 3...)=C(w i-n-1,…,w i)/C(w i-n-1,…,w i-1);
式中C(w i-n-1,…,w i)表示字符串w i-n-1,…,w i在语料库中的次数。
根据所设置的模型融合比例,如消费品语料与通用语料的比例设置为2:8,得到第一N-gram模型和所述第二N-gram模型的模型融合比例也为2:8,将所述第一N-gram模型和所述第二N-gram模型进行融合,最终得到用于语音识别的N-gram模型。由于初始设置了消费品语料与通用语料的比例,最终融合得到的N-gram模型以智能超市场景下的语音识别的准确率得到有效的提高。
在一实施例中,如图7所示,第一训练单元111包括:
分词单元1111,用于将所述消费品语料基于概率统计分词模型进行分词,得到与所述消费品语料对应的第一分词结果;
分词训练单元1112,用于将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型。
在本实施例中,将所述消费品语料中每一语句通过基于概率统计分词模型进行分词过程如下:
例如,令C=C1C2...Cm,C是待切分的汉字串,令W=W1W2...Wn,W是切分的结果,Wa,Wb,……,Wk是C的所有可能的切分方案。那么,基于概率统计分词模型就是能够找到目的词串W,使得W满足:P(W|C)=MAX(P(Wa|C),P(Wb|C)...P(Wk|C))的分词模型,上述分词模型得到的词串W即估计概率为最大之词串。即:
对一个待分词的子串S,按照从左到右的顺序取出全部候选词w1、w2、…、wi、…、wn;在词典中查出每个候选词的概率值P(wi),并记录每个候选词的 全部左邻词;计算每个候选词的累计概率,同时比较得到每个候选词的最佳左邻词;如果当前词wn是字串S的尾词,且累计概率P(wn)最大,则wn就是S的终点词;从wn开始,按照从右到左顺序,依次将每个词的最佳左邻词输出,即S的分词结果。在得到了与所述消费品语料对应的第一分词结果,将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型,该第一N-gram模型对智能超市场景下的语句识别准确率较高。
同样的,通过将所述通用语料基于概率统计分词模型进行分词,得到与所述通用语料对应的第二分词结果;将所述第二分词结果输入至第二初始N-gram模型进行训练,得到第二N-gram模型,该第二N-gram模型对日常生活普通场景下的语句识别准确率较高(即不偏向于对某一生活场景的语句的识别率较高)。
语音识别单元120,用于接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别,得到识别结果。
当通过所述N-gram模型对所述待识别语音进行进行识别,识别得到的是一整句话,例如“我要买XX品牌方便面”,通过N-gram模型能对所述待识别语音进行进行有效识别,得到识别概率最大的语句作为识别结果。
识别结果分词单元130,用于将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果。
在一实施例中,识别结果分词单元130还用于:
将所述识别结果基于概率统计分词模型进行分词,得到与所述识别结果对应的语句分词结果。
在本实施例中,识别结果分词单元130中对所述识别结果进行分词时也是采用基于概率统计分词模型进行分词具体过程可参考分词单元1111。当将识别结果进行分词后,即可进一步的进行词性分析。
词性分析单元140,用于根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词。
在一实施例中,词性分析单元140还用于:
将所述语句分词结果作为预先训练的联合词法分析模型的输入,得到所述语句分词结果中名词词性关键词。
在本实施例中,通过联合词法分析模型进行词法分析的过程如下:
词法分析任务的输入是一个字符串(后面使用『句子』来指代它),而输出 是句子中的词边界和词性、实体类别。序列标注是词法分析的经典建模方式。在构建联合词法分析模型(即LAC模型)使用基于GRU(门控循环单元)的网络结构学习特征,将学习到的特征接入CRF解码层(CRF即条件随机场)完成序列标注。CRF解码层本质上是将传统CRF中的线性模型换成了非线性神经网络,基于句子级别的似然概率,因而能够更好的解决标记偏置问题。
联合词法分析模型的输入采用one-hot方式表示,每个字以一个id表示one-hot序列通过字表,转换为实向量表示的字向量序列;字向量序列作为双向GRU的输入,学习输入序列的特征表示,得到新的特性表示序列,其中堆叠了两层双向GRU以增加学习能力;CRF以GRU学习到的特征为输入,以标记序列为监督信号,实现对语句分词结果中各分词的词性标注。由于在智能超市的场景下,名字词性关键词为商品品牌或商品名称的概率更大,故选取所述语句分词结果对应的名词词性关键词作为筛选结果,以进一步进行商品检索。
检索单元150,用于在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果。
在本实施例中,当获取了名词词性关键词,在预设的推荐语料库中对各名词词性关键词进行搜索,得到与词词性关键词近似度较大的词,以作为检索结果。其中,在在预设的推荐语料库中对各名词词性关键词进行搜索,得到与词词性关键词近似度较大的词是,具体是根据Word2Vec模型(Word2Vec模型是一款将词表征为实数值向量的高效工具)获取所述名词性关键词对应的词向量,然后与预先存储的推荐语料库中每一语料对应的词向量进行相似度的计算,其中计算两个向量之间的相似度是通过计算两个向量之间的欧式距离。若在预先存储的推荐语料库中存在与所述名词性关键词的相似度超出预设的相似度阈值的语料,将对应的语料作为检索结果之一,即多个符合与所述名词性关键词的相似度超出预设的相似度阈值的语料共同组成检索结果。
该装置采用语音识别技术,通过对语音识别的结果进行词法分析后得到名词词性关键词,实现根据名词词性关键词在推荐语料库中更准确的获取检索结果。
上述语音检索装置可以实现为计算机程序的形式,该计算机程序可以在如图8所示的计算机设备上运行。
请参阅图8,图8是本申请实施例提供的计算机设备的示意性框图。该计算 机设备500是服务器,服务器可以是独立的服务器,也可以是多个服务器组成的服务器集群。
参阅图8,该计算机设备500包括通过系统总线501连接的处理器502、存储器和网络接口505,其中,存储器可以包括非易失性存储介质503和内存储器504。
该非易失性存储介质503可存储操作系统5031和计算机程序5032。该计算机程序5032被执行时,可使得处理器502执行语音检索方法。
该处理器502用于提供计算和控制能力,支撑整个计算机设备500的运行。
该内存储器504为非易失性存储介质503中的计算机程序5032的运行提供环境,该计算机程序5032被处理器502执行时,可使得处理器502执行语音检索方法。
该网络接口505用于进行网络通信,如提供数据信息的传输等。本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备500的限定,具体的计算机设备500可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器502用于运行存储在存储器中的计算机程序5032,以实现本申请实施例的语音检索方法。
本领域技术人员可以理解,图8中示出的计算机设备的实施例并不构成对计算机设备具体构成的限定,在其他实施例中,计算机设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。例如,在一些实施例中,计算机设备可以仅包括存储器及处理器,在这样的实施例中,存储器及处理器的结构及功能与图8所示实施例一致,在此不再赘述。
应当理解,在本申请实施例中,处理器502可以是中央处理单元(Central Processing Unit,CPU),该处理器502还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供计算机可读存储介质。该计算机可读存储介 质可以为非易失性的计算机可读存储介质。该计算机可读存储介质存储有计算机程序,其中计算机程序被处理器执行时实现本申请实施例的语音检索方法。
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的设备、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种语音检索方法,包括:
    接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型;
    接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别,得到识别结果;
    将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果;
    根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词;以及
    在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果;其中,所述推荐语料库中包括多个语料,每一语料包括一个或多个名词词性的关键词。
  2. 根据权利要求1所述的语音检索方法,其中,所述接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型,包括:
    获取消费品语料,将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型;
    获取通用语料,将所述通用语料输入至第二初始N-gram模型进行训练,得到第二N-gram模型;
    根据所设置的模型融合比例,将所述第一N-gram模型和所述第二N-gram模型进行融合,得到N-gram模型。
  3. 根据权利要求2所述的语音检索方法,其中,所述将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型,包括:
    将所述消费品语料基于概率统计分词模型进行分词,得到与所述消费品语料对应的第一分词结果;
    将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型。
  4. 根据权利要求1所述的语音检索方法,其中,所述将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果,包括:
    将所述识别结果基于概率统计分词模型进行分词,得到与所述识别结果对应的语句分词结果。
  5. 根据权利要求1所述的语音检索方法,其中,所述根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词,包括:
    将所述语句分词结果作为预先训练的联合词法分析模型的输入,得到所述语句分词结果中名词词性关键词。
  6. 根据权利要求1所述的语音检索方法,其中,所述在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果之后,还包括:
    获取所述检索结果中各名词性关键词对应的消费品的摆放位置信息,将所述摆放位置信息发送至显示端进行显示。
  7. 根据权利要求6所述的语音检索方法,其中,所述获取所述检索结果中各名词性关键词对应的消费品的摆放位置信息,将所述摆放位置信息发送至显示端进行显示之后,还包括:
    将根据所述检索结果中各名词性关键词对应获取的目标消费品进行按照销量数据降序排序,得到目标消费品排序展示列表。
  8. 一种语音检索装置,包括:
    模型训练单元,用于接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型;
    语音识别单元,用于接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别,得到识别结果;
    识别结果分词单元,用于将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果;
    词性分析单元,用于根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词;以及
    检索单元,用于在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果。
  9. 根据权利要求8所述的语音检索装置,其中,所述模型训练单元,包括:
    第一训练单元,用于获取消费品语料,将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型;
    第二训练单元,用于获取通用语料,将所述通用语料输入至第二初始N-gram模型进行训练,得到第二N-gram模型;
    模型融合单元,用于根据所设置的模型融合比例,将所述第一N-gram模型和所述第二N-gram模型进行融合,得到N-gram模型。
  10. 根据权利要求9所述的语音检索装置,其中,所述第一训练单元,包括:
    分词单元,用于将所述消费品语料基于概率统计分词模型进行分词,得到与所述消费品语料对应的第一分词结果;
    分词训练单元,用于将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型。
  11. 一种计算机设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
    接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型;
    接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别,得到识别结果;
    将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果;
    根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词;以及
    在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果;其中,所述推荐语料库中包括多个语料,每一语料包括一个或多个名词词性的关键词。
  12. 根据权利要求11所述的计算机设备,其中,所述接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型,包括:
    获取消费品语料,将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型;
    获取通用语料,将所述通用语料输入至第二初始N-gram模型进行训练,得到第二N-gram模型;
    根据所设置的模型融合比例,将所述第一N-gram模型和所述第二N-gram 模型进行融合,得到N-gram模型。
  13. 根据权利要求12所述的计算机设备,其中,所述将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型,包括:
    将所述消费品语料基于概率统计分词模型进行分词,得到与所述消费品语料对应的第一分词结果;
    将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型。
  14. 根据权利要求11所述的计算机设备,其中,所述将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果,包括:
    将所述识别结果基于概率统计分词模型进行分词,得到与所述识别结果对应的语句分词结果。
  15. 根据权利要求11所述的计算机设备,其中,所述根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词,包括:
    将所述语句分词结果作为预先训练的联合词法分析模型的输入,得到所述语句分词结果中名词词性关键词。
  16. 根据权利要求11所述的计算机设备,其中,所述在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果之后,还包括:
    获取所述检索结果中各名词性关键词对应的消费品的摆放位置信息,将所述摆放位置信息发送至显示端进行显示。
  17. 根据权利要求16所述的计算机设备,其中,所述获取所述检索结果中各名词性关键词对应的消费品的摆放位置信息,将所述摆放位置信息发送至显示端进行显示之后,还包括:
    将根据所述检索结果中各名词性关键词对应获取的目标消费品进行按照销量数据降序排序,得到目标消费品排序展示列表。
  18. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序当被处理器执行时使所述处理器执行以下操作:
    接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型;其中,所述N-gram模型为N元模型;
    接收待识别语音,通过所述N-gram模型对所述待识别语音进行进行识别, 得到识别结果;
    将所述识别结果进行分词,得到与所述识别结果对应的语句分词结果;
    根据所述语句分词结果进行词法分析,得到所述语句分词结果对应的名词词性关键词;以及
    在预先存储的推荐语料库中搜索与所述名词性关键词的相似度超出预设的相似度阈值的语料,以得到检索结果;其中,所述推荐语料库中包括多个语料,每一语料包括一个或多个名词词性的关键词。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述接收训练集语料库,将所述训练集语料库输入至初始N-gram模型进行训练,得到N-gram模型,包括:
    获取消费品语料,将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型;
    获取通用语料,将所述通用语料输入至第二初始N-gram模型进行训练,得到第二N-gram模型;
    根据所设置的模型融合比例,将所述第一N-gram模型和所述第二N-gram模型进行融合,得到N-gram模型。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述将所述消费品语料输入至第一初始N-gram模型进行训练,得到第一N-gram模型,包括:
    将所述消费品语料基于概率统计分词模型进行分词,得到与所述消费品语料对应的第一分词结果;
    将所述第一分词结果输入至第一初始N-gram模型进行训练,得到第一N-gram模型。
PCT/CN2019/117872 2019-06-06 2019-11-13 语音检索方法、装置、计算机设备及存储介质 WO2020244150A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910492599.8A CN110349568A (zh) 2019-06-06 2019-06-06 语音检索方法、装置、计算机设备及存储介质
CN201910492599.8 2019-06-06

Publications (1)

Publication Number Publication Date
WO2020244150A1 true WO2020244150A1 (zh) 2020-12-10

Family

ID=68181598

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117872 WO2020244150A1 (zh) 2019-06-06 2019-11-13 语音检索方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110349568A (zh)
WO (1) WO2020244150A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563394A (zh) * 2022-11-24 2023-01-03 腾讯科技(深圳)有限公司 搜索召回方法、召回模型训练方法、装置和计算机设备

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349568A (zh) * 2019-06-06 2019-10-18 平安科技(深圳)有限公司 语音检索方法、装置、计算机设备及存储介质
CN110825844A (zh) * 2019-10-21 2020-02-21 拉扎斯网络科技(上海)有限公司 语音检索方法、装置、可读存储介质和电子设备
CN111291195B (zh) * 2020-01-21 2021-08-10 腾讯科技(深圳)有限公司 一种数据处理方法、装置、终端及可读存储介质
CN111460257B (zh) * 2020-03-27 2023-10-31 北京百度网讯科技有限公司 专题生成方法、装置、电子设备和存储介质
CN113569128A (zh) * 2020-04-29 2021-10-29 北京金山云网络技术有限公司 数据检索方法、装置及电子设备
CN111862970A (zh) * 2020-06-05 2020-10-30 珠海高凌信息科技股份有限公司 一种基于智能语音机器人的虚假宣传治理应用方法及装置
CN111783424B (zh) * 2020-06-17 2024-02-13 泰康保险集团股份有限公司 一种文本分句方法和装置
CN112381038B (zh) * 2020-11-26 2024-04-19 中国船舶工业系统工程研究院 一种基于图像的文本识别方法、系统和介质
CN112905869A (zh) * 2021-03-26 2021-06-04 北京儒博科技有限公司 语言模型的自适应训练方法、装置、存储介质及设备
CN113256379A (zh) * 2021-05-24 2021-08-13 北京小米移动软件有限公司 一种为商品关联购物需求的方法
CN113256378A (zh) * 2021-05-24 2021-08-13 北京小米移动软件有限公司 一种确定用户购物需求的方法
CN114329225B (zh) * 2022-01-24 2024-04-23 平安国际智慧城市科技股份有限公司 基于搜索语句的搜索方法、装置、设备及存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139239A (zh) * 2014-05-27 2015-12-09 无锡韩光电器有限公司 一种具有语音查询功能的超市购物系统
JP2017003812A (ja) * 2015-06-11 2017-01-05 日本電信電話株式会社 言語モデル適応装置、言語モデル適応方法、プログラム
CN106875941A (zh) * 2017-04-01 2017-06-20 彭楚奥 一种服务机器人的语音语义识别方法
CN107154260A (zh) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 一种领域自适应语音识别方法和装置
CN107204184A (zh) * 2017-05-10 2017-09-26 平安科技(深圳)有限公司 语音识别方法及系统
CN107247759A (zh) * 2017-05-31 2017-10-13 深圳正品创想科技有限公司 一种商品推荐方法及装置
CN109344830A (zh) * 2018-08-17 2019-02-15 平安科技(深圳)有限公司 语句输出、模型训练方法、装置、计算机设备及存储介质
CN109817217A (zh) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 基于语音识别的自助贩卖方法、装置、设备及介质
CN109840323A (zh) * 2018-12-14 2019-06-04 深圳壹账通智能科技有限公司 保险产品的语音识别处理方法及服务器
CN110349568A (zh) * 2019-06-06 2019-10-18 平安科技(深圳)有限公司 语音检索方法、装置、计算机设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108538286A (zh) * 2017-03-02 2018-09-14 腾讯科技(深圳)有限公司 一种语音识别的方法以及计算机
CN109388743B (zh) * 2017-08-11 2021-11-23 阿里巴巴集团控股有限公司 语言模型的确定方法和装置
CN108804414A (zh) * 2018-05-04 2018-11-13 科沃斯商用机器人有限公司 文本修正方法、装置、智能设备及可读存储介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139239A (zh) * 2014-05-27 2015-12-09 无锡韩光电器有限公司 一种具有语音查询功能的超市购物系统
JP2017003812A (ja) * 2015-06-11 2017-01-05 日本電信電話株式会社 言語モデル適応装置、言語モデル適応方法、プログラム
CN106875941A (zh) * 2017-04-01 2017-06-20 彭楚奥 一种服务机器人的语音语义识别方法
CN107154260A (zh) * 2017-04-11 2017-09-12 北京智能管家科技有限公司 一种领域自适应语音识别方法和装置
CN107204184A (zh) * 2017-05-10 2017-09-26 平安科技(深圳)有限公司 语音识别方法及系统
CN107247759A (zh) * 2017-05-31 2017-10-13 深圳正品创想科技有限公司 一种商品推荐方法及装置
CN109344830A (zh) * 2018-08-17 2019-02-15 平安科技(深圳)有限公司 语句输出、模型训练方法、装置、计算机设备及存储介质
CN109840323A (zh) * 2018-12-14 2019-06-04 深圳壹账通智能科技有限公司 保险产品的语音识别处理方法及服务器
CN109817217A (zh) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 基于语音识别的自助贩卖方法、装置、设备及介质
CN110349568A (zh) * 2019-06-06 2019-10-18 平安科技(深圳)有限公司 语音检索方法、装置、计算机设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563394A (zh) * 2022-11-24 2023-01-03 腾讯科技(深圳)有限公司 搜索召回方法、召回模型训练方法、装置和计算机设备

Also Published As

Publication number Publication date
CN110349568A (zh) 2019-10-18

Similar Documents

Publication Publication Date Title
WO2020244150A1 (zh) 语音检索方法、装置、计算机设备及存储介质
TWI732271B (zh) 人機對話方法、裝置、電子設備及電腦可讀媒體
CN109885660B (zh) 一种知识图谱赋能的基于信息检索的问答系统和方法
CN108829757B (zh) 一种聊天机器人的智能服务方法、服务器及存储介质
CN109635273B (zh) 文本关键词提取方法、装置、设备及存储介质
CN107480143B (zh) 基于上下文相关性的对话话题分割方法和系统
CN109101479B (zh) 一种用于中文语句的聚类方法及装置
WO2020244073A1 (zh) 基于语音的用户分类方法、装置、计算机设备及存储介质
US10089364B2 (en) Item recommendation device, item recommendation method, and computer program product
US9846748B2 (en) Searching for information based on generic attributes of the query
WO2019153737A1 (zh) 用于对评论进行评估的方法、装置、设备和存储介质
CN109670163B (zh) 信息识别方法、信息推荐方法、模板构建方法及计算设备
WO2018157789A1 (zh) 一种语音识别的方法、计算机、存储介质以及电子装置
Kamper et al. Semantic speech retrieval with a visually grounded model of untranscribed speech
CN112069298A (zh) 基于语义网和意图识别的人机交互方法、设备及介质
US9798776B2 (en) Systems and methods for parsing search queries
WO2021139266A1 (zh) 融合外部知识的bert模型的微调方法、装置及计算机设备
CN110147494B (zh) 信息搜索方法、装置,存储介质及电子设备
CN110096572B (zh) 一种样本生成方法、装置及计算机可读介质
CN110990533A (zh) 确定查询文本所对应标准文本的方法及装置
CN113704507B (zh) 数据处理方法、计算机设备以及可读存储介质
KR101545050B1 (ko) 정답 유형 자동 분류 방법 및 장치, 이를 이용한 질의 응답 시스템
CN113326702A (zh) 语义识别方法、装置、电子设备及存储介质
US20140365494A1 (en) Search term clustering
CN114004236B (zh) 融入事件实体知识的汉越跨语言新闻事件检索方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19931573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19931573

Country of ref document: EP

Kind code of ref document: A1