CN110491383A - A kind of voice interactive method, device, system, storage medium and processor - Google Patents

A kind of voice interactive method, device, system, storage medium and processor Download PDF

Info

Publication number
CN110491383A
CN110491383A CN201910910484.6A CN201910910484A CN110491383A CN 110491383 A CN110491383 A CN 110491383A CN 201910910484 A CN201910910484 A CN 201910910484A CN 110491383 A CN110491383 A CN 110491383A
Authority
CN
China
Prior art keywords
target
voice
result
speech recognition
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910910484.6A
Other languages
Chinese (zh)
Other versions
CN110491383B (en
Inventor
陈孝良
丁玉江
李智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sound Intelligence Technology Co Ltd
Beijing SoundAI Technology Co Ltd
Original Assignee
Beijing Sound Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sound Intelligence Technology Co Ltd filed Critical Beijing Sound Intelligence Technology Co Ltd
Priority to CN201910910484.6A priority Critical patent/CN110491383B/en
Publication of CN110491383A publication Critical patent/CN110491383A/en
Application granted granted Critical
Publication of CN110491383B publication Critical patent/CN110491383B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/32Multiple recognisers used in sequence or in parallel; Score combination systems therefor, e.g. voting systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of voice interactive method, device, system, storage medium and processors, this method comprises: obtaining input voice flow, input voice flow is distributed to each speech recognition engine and carries out speech recognition, chooses target voice recognition result obtaining each speech recognition result;Target voice recognition result is distributed to each natural language processing engine, target semanteme processing result is chosen in obtaining each semantic processes result;Input voice flow is replied according to target semanteme processing result.In the above method, target voice recognition result is filtered out in each speech recognition result, it is distributed to multiple natural language processing engines, target semanteme processing result is chosen in obtained each semantic processes result, interactive voice process is avoided to be handled using single ASR, NLP, TTS, limitation is bigger, if the problem of ASR and/or NLP identification is not allowed, influences interactive voice.

Description

A kind of voice interactive method, device, system, storage medium and processor
Technical field
The present invention relates to human-computer interaction technique field more particularly to a kind of voice interactive method, device, system, storage Jie Matter and processor.
Background technique
During interactive voice, the voice data of intelligent sound box acquisition input, by speech recognition ASR (Automatic Speech Recognition) after the text recognized is sent to natural language processing NLP (NaturalLanguage Processing), voice after semantic understanding end side is returned to using speech synthesis technique TTS (Text To Speech) to broadcast It puts.
Existing interactive voice process is to be handled using single ASR, NLP, TTS input voice flow, limitation Bigger, if ASR early period identification is inaccurate, while influencing whether that the understanding of NLP or ASR identification are accurate, NLP understands not enough meeting Influence entire interactive voice process.
Summary of the invention
In view of this, the present invention provides a kind of infrastructure services method and device based on block chain, it is existing to solve Interactive voice process be mostly single ASR, NLP, TTS processing, limitation is bigger, for example ASR early period identification not Standard, while influencing whether that the understanding of NLP or ASR identification are accurate, if NLP understands not enough, equally influence whether entire voice The problem of interactive process, concrete scheme are as follows:
A kind of voice interactive method, comprising:
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out voice knowledge Not, each speech recognition result is obtained;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes As a result;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
Above-mentioned method optionally chooses target voice recognition result in each speech recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
Above-mentioned method optionally chooses target semanteme processing result in each semantic processes result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
Above-mentioned method optionally replys the input voice flow according to the target semanteme processing result, packet It includes:
Obtain with the matched target retro of the target semanteme processing result and determine the use of the generation input voice flow Family group;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
Above-mentioned method, optionally, the determining user group for generating the input voice flow, comprising:
Obtain the type and/or face speech recognition for identifying the target voice identification engine of the target voice recognition result As a result;
According to the type and/or the face speech recognition result, the user group is determined.
A kind of voice interaction device, comprising:
The input voice flow is distributed to each target voice for obtaining input voice flow by acquisition and identification module It identifies that engine carries out speech recognition, obtains each speech recognition result;
Speech recognition result chooses module, for choosing target voice identification knot in each speech recognition result Fruit;
Processing module is obtained for the target voice recognition result to be distributed to each target natural language processing engine To each semantic processes result;
Processing result chooses module, for choosing target semanteme processing result in each semantic processes result;
Module is replied, for replying according to the target semanteme processing result the input voice flow.
Above-mentioned device, optionally, the reply module includes:
Acquisition and determination unit, for obtaining and the matched target retro of the target semanteme processing result and determining generation The user group of the input voice flow;
Determination unit, for determining target voice Compositing Engine according to the user group;
Converting unit, for the target retro to be converted to output voice flow by the target voice Compositing Engine.
A kind of voice interactive system, comprising: Cloud Server, speech recognition module, semantic processes module, technical ability module, language Sound synthesis module and intelligent sound terminal, wherein
The Cloud Server is used to obtain the input voice flow of the intelligent sound terminal acquisition, by the input voice flow It is distributed to the speech recognition module and carries out speech recognition, obtain target voice recognition result;
The target voice recognition result is sent to the Cloud Server, the Cloud Server by the speech recognition module By semantic processes module described in the target voice recognition result, target semanteme processing result is obtained;
The target semanteme processing result is sent to the Cloud Server, the Cloud Server by the semantic processes module The target semanteme processing result is sent to the technical ability module, obtains target retro;
The target retro is sent to the Cloud Server by the technical ability module, and the Cloud Server returns the target The voice synthetic module is given in recurrence, obtains output voice flow;
The output voice flow is sent to the Cloud Server by the voice synthetic module, and the Cloud Server will be described Output voice flow is sent to the intelligent sound terminal and plays out.
A kind of storage medium, the storage medium include the program of storage, wherein described program executes a kind of above-mentioned language Sound exchange method.
A kind of processor, the processor is for running program, wherein described program executes a kind of above-mentioned language when running Sound exchange method.
Compared with prior art, the present invention includes the following advantages:
The invention discloses a kind of voice interactive method, device, system, storage medium and processors, this method comprises: obtaining Input voice flow is taken, input voice flow is distributed to each speech recognition engine and carries out speech recognition, is known obtaining each voice Other result chooses target voice recognition result;Target voice recognition result is distributed to each natural language processing engine, Target semanteme processing result is chosen into each semantic processes result;Input voice flow is carried out according to target semanteme processing result It replys.In the above method, target voice recognition result is filtered out in each speech recognition result, is distributed to multiple natures Language processing engine chooses target semanteme processing result in obtained each semantic processes result, avoids interactive voice mistake Cheng Caiyong single ASR, NLP, TTS is handled, and limitation is bigger, if ASR and/or NLP identification is inaccurate, influences voice friendship Mutual problem.
Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of voice interactive method flow chart disclosed in the embodiment of the present application;
Fig. 2 is a kind of another flow chart of voice interactive method disclosed in the embodiment of the present application;
Fig. 3 is a kind of voice interactive system structural block diagram disclosed in the embodiment of the present application;
Fig. 4 is a kind of voice interaction device structural block diagram disclosed in the embodiment of the present application.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein phase one The widest scope of cause.
It the invention discloses a kind of voice interactive method and device, applies during interactive voice, existing voice is handed over Input voice flow is handled by single ASR, NLP, TTS during mutually, if the speech recognition result and/or NLP of ASR Natural language processing result and corresponding actual result deviation it is larger, it may appear that the case where giving an irrelevant answer influences interactive voice Process, the present invention provides a kind of voice interactive methods for solving the above problems, and the execution process of the exchange method is such as Shown in Fig. 1, comprising steps of
S101, input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out language Sound identification, obtains each speech recognition result;
In the embodiment of the present invention, the input voice flow is obtained from intelligent sound equipment, the intelligent sound equipment It can be intelligent sound box, intelligent sound robot, smart phone etc., the language that the intelligent sound equipment acquisition user issues Sound is converted into input voice flow, and the input voice flow is distributed to each target voice identification engine and is identified, is obtained To each speech recognition result.
Wherein, it is illustrated for the process of distributing, if in system including 10 speech recognition engines, the target language Sound identify engine quantity can be less than or equal to 10, such as: can all regard above-mentioned 10 speech recognition engines as target The quantity of speech recognition engine, i.e., the described speech recognition engine is equal with the target voice identification quantity of shade, will be described Input voice flow is distributed to above-mentioned 10 target voices identification engine and carries out speech recognition, but this processing mode is to processor It is more demanding, when the configuration of processor cannot be met the requirements, the speed that will lead to speech recognition is slow, so influence voice Interactive process causes user experience during interactive voice bad, therefore, in order to improve the speed of speech recognition, Ke Yi It is distributed to before speech recognition engine, the type of the input voice flow is obtained, according to the type to 10 above-mentioned voices Identification engine is screened, and no less than two target voice identification engines, the quantity of the engine of target voice identification at this time are obtained It can be less than or equal to 10.Wherein, the type can according to actual scene, vertically segment field, such as: the classification can be with Classified by language, can also be carried out by professional domain classification or other scenes classify, wherein divided by language Class can be subdivided into Chinese and foreign language, and Chinese can be subdivided into mandarin and dialect again, can also be directed to according to particular situation Dialect continues to segment, and foreign language can be English, Japanese, Korean etc., can also classify by professional domain, such as: computer Perhaps machinery field etc. can also be according to for computer field, the communications field or machinery field etc. for field, the communications field Continue to segment according to concrete condition, details are not described herein, can also also implement comprising other zoned formats, the present invention certainly In example, to the concrete form of the type without limiting.
S102, target voice recognition result is chosen in each speech recognition result;
In the embodiment of the present invention, engine is identified for each target voice, in output and the input voice flow pair The discrimination of the recognition result can be also exported while the recognition result answered, discrimination can be because of signal-to-noise ratio, on-line/off-line identification etc. Difference can be generated, therefore, it is necessary to obtain whether the signal-to-noise ratio of the input voice flow and target voice identification engine wait shadows online After the factor for ringing discrimination, discrimination of the input voice flow in the case where corresponding target voice identifies engine is being determined.
In real work, the direct indicator of general discrimination be Word Error Rate WER (Word Error Rate) its definition such as Under: in order to make to be consistent between the word sequence identified and the word sequence of standard, needs to be replaced, deletes or be inserted into Certain words, the total number of these insertions, replacement or the word deleted, divided by the percentage of the total number of word in the word sequence of standard, As WER.
Formula are as follows:
Accuracy=100-WER% (2)
Wherein: the number for the word that S- is replaced;
D- is deleted the number of word;
The number of I- insertion word;
N- word total number;
WER- Word Error Rate;
Accuracy- discrimination;
Wherein: WER can divide situations such as men and women, speed, accent, number/English/Chinese, respectively from the point of view of because there is insertion Word, so theoretically WER is possible to be greater than 100%, but in practice, particularly when large sample size, be it is impossible, otherwise It is just too poor, it is impossible to commercial.
Further, sentence error rate SER (Sentence Error Rate) can be used, i.e. " of sentence identification mistake Several/total sentence number ".But in actual operation, general sentence error rate is 2~3 times of character error rate, so not adopting usually Identification process is measured with sentence error rate.
Using discrimination as reference in the embodiment of the present invention, the discrimination of each speech recognition result is calculated first, it will The highest speech recognition result of discrimination is as target voice recognition result in each discrimination.
S103, the target voice recognition result is distributed to each target natural language processing engine, obtains each language Adopted processing result;
In the embodiment of the present invention, the target voice recognition result is distributed to each target and handles engine naturally, wherein It is illustrated for the process of distributing, if in system including 10 natural language processing engines, at the target natural language The quantity of engine is managed less than or equal to 10, such as: above-mentioned 10 natural language processing engines can be all used as target natural Language processing engine, i.e., the quantity of the described target natural language processing engine are equal to the quantity of the natural language processing engine, But this processing mode, when the configuration of processor cannot be met the requirements, will lead to voice knowledge to the more demanding of processor Other speed is slow, and then influences the process of interactive voice, causes user experience during interactive voice bad, therefore, voice Interactive speed can determine institute before the target identification result is distributed to each target natural language processing engine State target identification resulting class, wherein the classification can determine according to actual scene, vertical subdivision field, such as: institute State classification can be classified by language and also by professional domain carry out classification or other scenes classify, wherein press Language, which carries out classification, can be subdivided into Chinese and foreign language, and Chinese can be subdivided into mandarin and dialect again, according to particular situation It can also continue to segment for dialect, foreign language can be English, Japanese, Korean etc., can also classify by professional domain, example Such as: computer field, the communications field perhaps machinery field for computer field, the communications field or machinery field etc. Deng can also continue to segment according to concrete condition, details are not described herein, certainly also can also comprising other zoned formats, In the embodiment of the present invention, to the concrete form of classification without limiting, it is preferred that for target voice identification engine and institute Stating the classification of target natural language processing engine, there are corresponding relationships.For example, if the target voice recognition result is to pass through needle The target voice identification engine of dialect is obtained, can be directly distributed to the target natural language processing engine of dialect i.e. It can.
S104, target semanteme processing result is chosen in each semantic processes result;
In the embodiment of the present invention, for each target natural language processing engine, in output and the target voice The confidence level of the target semanteme processing result can be also exported while recognition result corresponding target semanteme processing result, with described Target natural language processing engine be Baidu NLP semantic computation general frame for, mainly divide three parts, bottom relies on Big data, web data and user behavior data and High-Performance Computing Cluster (GPU, CPU and FPGA) have been made based on DNN and general The target natural language processing engine of rate graph model, by entering the target voice recognition result to target natural language processing Engine, available target semanteme processing result, wherein the target semanteme processing result is for the input voice flow Text is replied, and then based on the semantic processes as a result, carrying out the calculating of semantic level, including semantic matches, semantic retrieval, text This classification, sequence generation and sequence labelling etc., so that it is determined that the confidence level of semantic processes result, due to different target nature language The determination method difference to confidence level of speech processing engine, may cause between each confidence level and does not have referential, will be described Each confidence level be normalized or other processing after be compared, by the highest semantic processes of confidence level in each confidence level As a result it is used as target semanteme processing result.
S105, the input voice flow is replied according to the target semanteme processing result.
It is by target language described in text using speech synthesis TTS (Text-To-Speech) technology in the embodiment of the present invention Adopted processing result is converted into output voice flow, and reads out by the way that the intelligent sound equipment is bright, is analogous to the mouth of the mankind.Example Such as: the sound heard in the various voice assistants of Siri is generated by TTS.
The invention discloses a kind of voice interactive methods, comprising: obtains input voice flow, input voice flow is distributed to respectively A speech recognition engine carries out speech recognition, chooses target voice recognition result obtaining each speech recognition result;By target Speech recognition result is distributed to each natural language processing engine, chooses at target semanteme in obtaining each semantic processes result Manage result;Input voice flow is replied according to target semanteme processing result.In the above method, in each speech recognition result In filter out target voice recognition result, multiple natural language processing engines are distributed to, in obtained each semantic processes As a result target semanteme processing result is chosen in, is avoided interactive voice process and is handled using single ASR, NLP, TTS, office It is sex-limited bigger, if the problem of ASR and/or NLP identification is not allowed, influences interactive voice.
In the embodiment of the present invention, according to the target semanteme processing result to the processing for inputting voice flow and being replied Process as shown in Fig. 2, comprising steps of
S201, it obtains and the matched target retro of the target semanteme processing result and the determining generation input voice flow User group;
In the embodiment of the present invention, the keyword in the target semanteme processing result is obtained, is determined according to the keyword Technical ability unit corresponding with the target semanteme processing result receives handling for the target voice for technical ability unit feedback Terminal objective is replied.Obtain the type and/or face language for identifying the target voice identification engine of the target voice recognition result Sound recognition result determines the user for generating the input voice flow according to the type and/or the face speech recognition result Group, the user group can be men and women, old and young, kinsfolk or the voice sender for using certain dialect or languages Deng.
S202, according to the user group, determine target voice Compositing Engine;
In the embodiment of the present invention, speech synthesis engine selection can also in conjunction with actual scene, vertical subdivision field into Row divides, and according to the target group, determines target voice Compositing Engine, such as: the target voice Compositing Engine can be by Language, which carries out classification, can be subdivided into Chinese and foreign language, and Chinese can be subdivided into mandarin and dialect again, according to particular situation It can also continue to segment for dialect, foreign language can be English, Japanese, Korean etc., in the embodiment of the present invention, to the specific of classification Form is without limiting.Such as: if the user group is the sender of dialect, therefore target voice identification engine can be adopted With target voice corresponding with dialect type identify engine, then can directly according to dialect type select speech synthesis engine as Target voice Compositing Engine.
S203, the target retro is converted into output voice flow by the target voice Compositing Engine.
In the embodiment of the present invention, the target retro is converted into output voice by the target voice Compositing Engine The type of stream, the target voice Compositing Engine is different, and the mode of reply is different.The target voice Compositing Engine can also be according to It is identified according to face recognition technology by user's portrait, such as: the intelligent sound terminal is recognized according to face recognition technology Received be input voice flow is mother's word, and is analyzed to obtain mother by historical record or the reply rule of setting Mother most wants to hear the sound of son, at this point, target voice Compositing Engine can be sent out the target retro using the sound of son Be sent to the intelligent sound terminal, certainly also can also according to particular situation by the target retro by English, dialect or Person's others mode is sent to the intelligent sound terminal.
Based on a kind of above-mentioned voice interactive method, a kind of voice interactive system is provided in the embodiment of the present invention, it is described The structural block diagram of interactive system is as shown in Figure 3, comprising: Cloud Server 301, speech recognition module 302, semantic processes module 303, Technical ability module 304, voice synthetic module 305 and intelligent sound terminal 306, wherein
The Cloud Server 301 is used to obtain the input voice flow that the intelligent sound terminal 306 acquires, by the input Voice flow is distributed to the speech recognition module 302 and carries out speech recognition, obtains target voice recognition result;
In the embodiment of the present invention, the speech recognition module 302 includes multiple speech recognition engines, it is preferred that in order to mention High recognition efficiency can preferentially screen multiple speech recognition engines in speech recognition process, obtain multiple target languages Sound identifies engine, carries out speech recognition according to multiple target voices identification engine, selects in obtained each speech recognition result Take the highest speech recognition result of discrimination as target voice recognition result.
The target voice recognition result is sent to the Cloud Server 301, the cloud by the speech recognition module 302 Semantic processes module 303 described in the target voice recognition result is obtained target semanteme processing result by server 301;
In the embodiment of the present invention, the speech recognition module 303 includes multiple natural language processing engines, it is preferred that is Example improves treatment effeciency, can screen, obtain more to multiple natural language processing engines during natural language processing The target voice recognition result is sent to multiple target natural language processing processing and drawn by a target natural language processing engine It holds up, the highest semantic processes result of confidence level is chosen in obtained multiple semantic processes results as target semantic processes knot Fruit.
The target semanteme processing result is sent to the Cloud Server 301, the cloud by the semantic processes module 303 The target semanteme processing result is sent to the technical ability module 304 by server 301, obtains target retro.
In the embodiment of the present invention, the technical ability module 304 according to the target semanteme processing result according to concrete condition into Row processing, is replied if necessary to the intelligent sound terminal 306, then the result returned is target retro, if it is control Instruction then continues to be handled in the technical ability module 304.The present invention is directed to the feelings returned the result as target retro in implementing Condition is illustrated.Such as: user says " air-conditioning for opening parlor " that target voice recognition result is exactly " to open the sky in parlor Adjust ", " field is air-conditioning, and instruction is to open, and specific location is parlor ", Cloud Server are translated into after natural language understanding 304 can distribute result in the technical ability module 304 in technical ability corresponding with air-conditioning according to field, and the technical ability of air-conditioning is according to finger It enables and position, then can be opened the air-conditioning in parlor by controlling, return to target retro after success, such as the target retro can be with For " good, parlor air-conditioning has already turned on ".
The target retro is sent to the Cloud Server 301 by the technical ability module 304, and the Cloud Server 301 will The target retro is sent to the voice synthetic module 305, obtains output voice flow;
The output voice flow is sent to the Cloud Server 301, the Cloud Server by the voice synthetic module 305 The output voice flow is sent to the intelligent sound terminal 306 to play out.
Based on a kind of above-mentioned voice interactive method, a kind of voice interaction device is provided in the embodiment of the present invention, it is described The structural block diagram of interactive device is as shown in Figure 4, comprising:
It obtains and identification module 401, speech recognition result chooses mould 402, processing module 403, processing result and choose mould 404 With reply module 405.
Wherein,
The input voice flow is distributed to each mesh for obtaining input voice flow by the acquisition and identification module 401 It marks speech recognition engine and carries out speech recognition, obtain each speech recognition result;
Institute's speech recognition result chooses module 402, for choosing target voice in each speech recognition result Recognition result;
The processing module 403, for the target voice recognition result to be distributed to each target natural language processing Engine obtains each semantic processes result;
The processing result chooses module 404, for choosing target semantic processes in each semantic processes result As a result;
The reply module 405, for replying according to the target semanteme processing result the input voice flow.
The invention discloses a kind of voice interaction devices, comprising: obtains input voice flow, input voice flow is distributed to respectively A speech recognition engine carries out speech recognition, chooses target voice recognition result obtaining each speech recognition result;By target Speech recognition result is distributed to each natural language processing engine, chooses at target semanteme in obtaining each semantic processes result Manage result;Input voice flow is replied according to target semanteme processing result.In above-mentioned apparatus, in each speech recognition result In filter out target voice recognition result, multiple natural language processing engines are distributed to, in obtained each semantic processes As a result target semanteme processing result is chosen in, is avoided interactive voice process and is handled using single ASR, NLP, TTS, office It is sex-limited bigger, if the problem of ASR and/or NLP identification is not allowed, influences interactive voice.
In the embodiment of the present invention, the reply module 405 includes:
It obtains and determination unit 406, determination unit 407 and converting unit 408.
Wherein,
It is described acquisition and determination unit 406, for obtain with the matched target retro of the target semanteme processing result and Determine the user group for generating the input voice flow;
The determination unit 407, for determining target voice Compositing Engine according to the user group;
The converting unit 408, for the target retro to be converted to output by the target voice Compositing Engine Voice flow.
The voice interaction device includes processor and memory, and above-mentioned acquisition and identification module, speech recognition result are selected Modulus, processing module, processing result are chosen mould and reply module etc. and are stored as program unit in memory, by processor Above procedure unit stored in memory is executed to realize corresponding function.
Include kernel in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can be set one Or more, target voice recognition result is filtered out in each speech recognition result, by the target voice recognition result Multiple natural language processing engines are distributed to, target semanteme processing result is chosen in each semantic processes result, is avoided Interactive voice process is handled using single ASR, NLP, TTS, and limitation is bigger, if ASR and/or NLP identification is not Standard, the problem of influencing whether entire interactive voice process.
Memory may include the non-volatile memory in computer-readable medium, random access memory (RAM) and/ Or the forms such as Nonvolatile memory, if read-only memory (ROM) or flash memory (flash RAM), memory include that at least one is deposited Store up chip.
The embodiment of the invention provides a kind of storage mediums, are stored thereon with program, real when which is executed by processor The existing voice interactive method.
The embodiment of the invention provides a kind of processor, the processor is for running program, wherein described program operation Voice interactive method described in Shi Zhihang.
The embodiment of the invention provides a kind of equipment, equipment include processor, memory and storage on a memory and can The program run on a processor, processor perform the steps of when executing program
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out voice knowledge Not, each speech recognition result is obtained;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes As a result;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
Above-mentioned method optionally chooses target voice recognition result in each speech recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
Above-mentioned method optionally chooses target semanteme processing result in each semantic processes result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
Above-mentioned method optionally replys the input voice flow according to the target semanteme processing result, packet It includes:
Obtain with the matched target retro of the target semanteme processing result and determine the use of the generation input voice flow Family group;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
Above-mentioned method, optionally, the determining user group for generating the input voice flow, comprising:
Obtain the type and/or face speech recognition for identifying the target voice identification engine of the target voice recognition result As a result;
According to the type and/or the face speech recognition result, the user group is determined.
Equipment herein can be server, PC, PAD, mobile phone etc..
Present invention also provides a kind of computer program products, when executing on data processing equipment, have been adapted for carrying out The program of following method and step:
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out voice knowledge Not, each speech recognition result is obtained;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes As a result;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
Above-mentioned method optionally chooses target voice recognition result in each speech recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
Above-mentioned method optionally chooses target semanteme processing result in each semantic processes result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
Above-mentioned method optionally replys the input voice flow according to the target semanteme processing result, packet It includes:
Obtain with the matched target retro of the target semanteme processing result and determine the use of the generation input voice flow Family group;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
Above-mentioned method, optionally, the determining user group for generating the input voice flow, comprising:
Obtain the type and/or face speech recognition for identifying the target voice identification engine of the target voice recognition result As a result;
According to the type and/or the face speech recognition result, the user group is determined.
It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment weight Point explanation is the difference from other embodiments, and the same or similar parts between the embodiments can be referred to each other. For device class embodiment, since it is basically similar to the method embodiment, so being described relatively simple, related place ginseng See the part explanation of embodiment of the method.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when invention.
As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can It realizes by means of software and necessary general hardware platform.Based on this understanding, technical solution of the present invention essence On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment or embodiment of the invention Method described in part.
A kind of voice interactive method provided by the present invention, device, system, storage medium and processor are carried out above It is discussed in detail, used herein a specific example illustrates the principle and implementation of the invention, above embodiments Illustrate to be merely used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, according to According to thought of the invention, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification It should not be construed as limiting the invention.

Claims (10)

1. a kind of voice interactive method characterized by comprising
Input voice flow is obtained, the input voice flow is distributed to each target voice identification engine and carries out speech recognition, is obtained To each speech recognition result;
Target voice recognition result is chosen in each speech recognition result;
The target voice recognition result is distributed to each target natural language processing engine, obtains each semantic processes knot Fruit;
Target semanteme processing result is chosen in each semantic processes result;
The input voice flow is replied according to the target semanteme processing result.
2. the method according to claim 1, wherein choosing target voice in each speech recognition result Recognition result, comprising:
Obtain the discrimination of each speech recognition result;
Using the highest recognition result of discrimination in each discrimination as target identification result.
3. the method according to claim 1, wherein it is semantic to choose target in each semantic processes result Processing result, comprising:
Obtain the confidence level of each semantic processes result;
Using the highest semantic processes result of confidence level in each confidence level as target semanteme processing result.
4. the method according to claim 1, wherein according to the target semanteme processing result to the input language Sound stream is replied, comprising:
Obtain with the matched target retro of the target semanteme processing result and determine the user group of the generation input voice flow Body;
According to the user group, target voice Compositing Engine is determined;
The target retro is converted into output voice flow by the target voice Compositing Engine.
5. according to the method described in claim 4, it is characterized in that, the determining user group for generating the input voice flow Body, comprising:
Obtain the type and/or face speech recognition knot for identifying the target voice identification engine of the target voice recognition result Fruit;
According to the type and/or the face speech recognition result, the user group is determined.
6. a kind of voice interaction device characterized by comprising
It obtains and the input voice flow is distributed to each target voice and identified by identification module for obtaining input voice flow Engine carries out speech recognition, obtains each speech recognition result;
Speech recognition result chooses module, for choosing target voice recognition result in each speech recognition result;
Processing module obtains each for the target voice recognition result to be distributed to each target natural language processing engine A semantic processes result;
Processing result chooses module, for choosing target semanteme processing result in each semantic processes result;
Module is replied, for replying according to the target semanteme processing result the input voice flow.
7. device according to claim 6, which is characterized in that the reply module includes:
Acquisition and determination unit for acquisition and the matched target retro of the target semanteme processing result and determine described in generation Input the user group of voice flow;
Determination unit, for determining target voice Compositing Engine according to the user group;
Converting unit, for the target retro to be converted to output voice flow by the target voice Compositing Engine.
8. a kind of voice interactive system characterized by comprising Cloud Server, speech recognition module, semantic processes module, skill Energy module, voice synthetic module and intelligent sound terminal, wherein
The Cloud Server is used to obtain the input voice flow of the intelligent sound terminal acquisition, and the input voice flow is distributed Speech recognition is carried out to the speech recognition module, obtains target voice recognition result;
The target voice recognition result is sent to the Cloud Server by the speech recognition module, and the Cloud Server is by institute Semantic processes module described in target voice recognition result is stated, target semanteme processing result is obtained;
The target semanteme processing result is sent to the Cloud Server by the semantic processes module, and the Cloud Server is by institute It states target semanteme processing result and is sent to the technical ability module, obtain target retro;
The target retro is sent to the Cloud Server by the technical ability module, and the Cloud Server sends out the target retro The voice synthetic module is given, output voice flow is obtained;
The output voice flow is sent to the Cloud Server by the voice synthetic module, and the Cloud Server is by the output Voice flow is sent to the intelligent sound terminal and plays out.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein described program right of execution Benefit require any one of 1 to 5 described in a kind of voice interactive method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit require any one of 1 to 5 described in a kind of voice interactive method.
CN201910910484.6A 2019-09-25 2019-09-25 Voice interaction method, device and system, storage medium and processor Active CN110491383B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910910484.6A CN110491383B (en) 2019-09-25 2019-09-25 Voice interaction method, device and system, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910910484.6A CN110491383B (en) 2019-09-25 2019-09-25 Voice interaction method, device and system, storage medium and processor

Publications (2)

Publication Number Publication Date
CN110491383A true CN110491383A (en) 2019-11-22
CN110491383B CN110491383B (en) 2022-02-18

Family

ID=68544152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910910484.6A Active CN110491383B (en) 2019-09-25 2019-09-25 Voice interaction method, device and system, storage medium and processor

Country Status (1)

Country Link
CN (1) CN110491383B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798848A (en) * 2020-06-30 2020-10-20 联想(北京)有限公司 Voice synchronous output method and device and electronic equipment
CN111862949A (en) * 2020-07-30 2020-10-30 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN111883122A (en) * 2020-07-22 2020-11-03 海尔优家智能科技(北京)有限公司 Voice recognition method and device, storage medium and electronic equipment
CN112003991A (en) * 2020-09-02 2020-11-27 深圳壹账通智能科技有限公司 Outbound method and related equipment
CN112509565A (en) * 2020-11-13 2021-03-16 中信银行股份有限公司 Voice recognition method and device, electronic equipment and readable storage medium
CN112614490A (en) * 2020-12-09 2021-04-06 北京罗克维尔斯科技有限公司 Method, device, medium, equipment, system and vehicle for generating voice instruction
CN112820295A (en) * 2020-12-29 2021-05-18 华人运通(上海)云计算科技有限公司 Voice processing device and system, cloud server and vehicle
CN112861542A (en) * 2020-12-31 2021-05-28 思必驰科技股份有限公司 Method and device for limiting scene voice interaction
CN112992151A (en) * 2021-03-15 2021-06-18 中国平安财产保险股份有限公司 Speech recognition method, system, device and readable storage medium
CN113077793A (en) * 2021-03-24 2021-07-06 北京儒博科技有限公司 Voice recognition method, device, equipment and storage medium
WO2021135548A1 (en) * 2020-06-05 2021-07-08 平安科技(深圳)有限公司 Voice intent recognition method and device, computer equipment and storage medium
CN113506565A (en) * 2021-07-12 2021-10-15 北京捷通华声科技股份有限公司 Speech recognition method, speech recognition device, computer-readable storage medium and processor
CN114446279A (en) * 2022-02-18 2022-05-06 青岛海尔科技有限公司 Voice recognition method, voice recognition device, storage medium and electronic equipment
CN114464179A (en) * 2022-01-28 2022-05-10 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium
WO2022262542A1 (en) * 2021-06-15 2022-12-22 南京硅基智能科技有限公司 Text output method and system, storage medium, and electronic device

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991719A (en) * 1998-04-27 1999-11-23 Fujistu Limited Semantic recognition system
CN101354886A (en) * 2007-07-27 2009-01-28 陈修志 Apparatus for recognizing speech
US20090258333A1 (en) * 2008-03-17 2009-10-15 Kai Yu Spoken language learning systems
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN105096953A (en) * 2015-08-11 2015-11-25 东莞市凡豆信息科技有限公司 Voice recognition method capable of realizing multi-language mixed use
CN106373569A (en) * 2016-09-06 2017-02-01 北京地平线机器人技术研发有限公司 Voice interaction apparatus and method
CN106648082A (en) * 2016-12-09 2017-05-10 厦门快商通科技股份有限公司 Intelligent service device capable of simulating human interactions and method
CN107093425A (en) * 2017-03-30 2017-08-25 安徽继远软件有限公司 Speech guide system, audio recognition method and the voice interactive method of power system
CN107170446A (en) * 2017-05-19 2017-09-15 深圳市优必选科技有限公司 Semantic processes server and the method for semantic processes
US10049656B1 (en) * 2013-09-20 2018-08-14 Amazon Technologies, Inc. Generation of predictive natural language processing models
CN208284230U (en) * 2018-04-20 2018-12-25 贵州小爱机器人科技有限公司 A kind of speech recognition equipment, speech recognition system and smart machine
CN109545197A (en) * 2019-01-02 2019-03-29 珠海格力电器股份有限公司 Voice instruction identification method and device and intelligent terminal
US20190102378A1 (en) * 2017-09-29 2019-04-04 Apple Inc. Rule-based natural language processing
CN109727597A (en) * 2019-01-08 2019-05-07 未来电视有限公司 The interaction householder method and device of voice messaging
CN109791767A (en) * 2016-09-30 2019-05-21 罗伯特·博世有限公司 System and method for speech recognition

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5991719A (en) * 1998-04-27 1999-11-23 Fujistu Limited Semantic recognition system
CN101354886A (en) * 2007-07-27 2009-01-28 陈修志 Apparatus for recognizing speech
US20090258333A1 (en) * 2008-03-17 2009-10-15 Kai Yu Spoken language learning systems
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
US10049656B1 (en) * 2013-09-20 2018-08-14 Amazon Technologies, Inc. Generation of predictive natural language processing models
CN105096953A (en) * 2015-08-11 2015-11-25 东莞市凡豆信息科技有限公司 Voice recognition method capable of realizing multi-language mixed use
CN106373569A (en) * 2016-09-06 2017-02-01 北京地平线机器人技术研发有限公司 Voice interaction apparatus and method
CN109791767A (en) * 2016-09-30 2019-05-21 罗伯特·博世有限公司 System and method for speech recognition
CN106648082A (en) * 2016-12-09 2017-05-10 厦门快商通科技股份有限公司 Intelligent service device capable of simulating human interactions and method
CN107093425A (en) * 2017-03-30 2017-08-25 安徽继远软件有限公司 Speech guide system, audio recognition method and the voice interactive method of power system
CN107170446A (en) * 2017-05-19 2017-09-15 深圳市优必选科技有限公司 Semantic processes server and the method for semantic processes
US20190102378A1 (en) * 2017-09-29 2019-04-04 Apple Inc. Rule-based natural language processing
CN208284230U (en) * 2018-04-20 2018-12-25 贵州小爱机器人科技有限公司 A kind of speech recognition equipment, speech recognition system and smart machine
CN109545197A (en) * 2019-01-02 2019-03-29 珠海格力电器股份有限公司 Voice instruction identification method and device and intelligent terminal
CN109727597A (en) * 2019-01-08 2019-05-07 未来电视有限公司 The interaction householder method and device of voice messaging

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
P VANAJAKSHI ET AL.: "A Detailed Survey on Large Vocabulary Continuous Speech Recognition Techniques", 《ICCCI 2017》 *
刘悦等: "语音识别技术在车载领域的应用及发展", 《控制与信息技术》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021135548A1 (en) * 2020-06-05 2021-07-08 平安科技(深圳)有限公司 Voice intent recognition method and device, computer equipment and storage medium
CN111798848B (en) * 2020-06-30 2024-05-31 联想(北京)有限公司 Voice synchronous output method and device and electronic equipment
CN111798848A (en) * 2020-06-30 2020-10-20 联想(北京)有限公司 Voice synchronous output method and device and electronic equipment
CN111883122A (en) * 2020-07-22 2020-11-03 海尔优家智能科技(北京)有限公司 Voice recognition method and device, storage medium and electronic equipment
CN111883122B (en) * 2020-07-22 2023-10-27 海尔优家智能科技(北京)有限公司 Speech recognition method and device, storage medium and electronic equipment
CN111862949A (en) * 2020-07-30 2020-10-30 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN111862949B (en) * 2020-07-30 2024-04-02 北京小米松果电子有限公司 Natural language processing method and device, electronic equipment and storage medium
CN112003991A (en) * 2020-09-02 2020-11-27 深圳壹账通智能科技有限公司 Outbound method and related equipment
CN112509565A (en) * 2020-11-13 2021-03-16 中信银行股份有限公司 Voice recognition method and device, electronic equipment and readable storage medium
CN112614490B (en) * 2020-12-09 2024-04-16 北京罗克维尔斯科技有限公司 Method, device, medium, equipment, system and vehicle for generating voice instruction
CN112614490A (en) * 2020-12-09 2021-04-06 北京罗克维尔斯科技有限公司 Method, device, medium, equipment, system and vehicle for generating voice instruction
CN112820295A (en) * 2020-12-29 2021-05-18 华人运通(上海)云计算科技有限公司 Voice processing device and system, cloud server and vehicle
CN112820295B (en) * 2020-12-29 2022-12-23 华人运通(上海)云计算科技有限公司 Voice processing device and system, cloud server and vehicle
CN112861542A (en) * 2020-12-31 2021-05-28 思必驰科技股份有限公司 Method and device for limiting scene voice interaction
CN112861542B (en) * 2020-12-31 2023-05-26 思必驰科技股份有限公司 Method and device for voice interaction in limited scene
CN112992151A (en) * 2021-03-15 2021-06-18 中国平安财产保险股份有限公司 Speech recognition method, system, device and readable storage medium
CN112992151B (en) * 2021-03-15 2023-11-07 中国平安财产保险股份有限公司 Speech recognition method, system, device and readable storage medium
CN113077793B (en) * 2021-03-24 2023-06-13 北京如布科技有限公司 Voice recognition method, device, equipment and storage medium
CN113077793A (en) * 2021-03-24 2021-07-06 北京儒博科技有限公司 Voice recognition method, device, equipment and storage medium
US11651139B2 (en) 2021-06-15 2023-05-16 Nanjing Silicon Intelligence Technology Co., Ltd. Text output method and system, storage medium, and electronic device
WO2022262542A1 (en) * 2021-06-15 2022-12-22 南京硅基智能科技有限公司 Text output method and system, storage medium, and electronic device
CN113506565A (en) * 2021-07-12 2021-10-15 北京捷通华声科技股份有限公司 Speech recognition method, speech recognition device, computer-readable storage medium and processor
CN113506565B (en) * 2021-07-12 2024-06-04 北京捷通华声科技股份有限公司 Speech recognition method, device, computer readable storage medium and processor
WO2023143439A1 (en) * 2022-01-28 2023-08-03 达闼机器人股份有限公司 Speech interaction method, system and apparatus, and device and storage medium
CN114464179A (en) * 2022-01-28 2022-05-10 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium
CN114464179B (en) * 2022-01-28 2024-03-19 达闼机器人股份有限公司 Voice interaction method, system, device, equipment and storage medium
CN114446279A (en) * 2022-02-18 2022-05-06 青岛海尔科技有限公司 Voice recognition method, voice recognition device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110491383B (en) 2022-02-18

Similar Documents

Publication Publication Date Title
CN110491383A (en) A kind of voice interactive method, device, system, storage medium and processor
CN106776936B (en) Intelligent interaction method and system
CN103345467B (en) Speech translation system
JP2021018797A (en) Conversation interaction method, apparatus, computer readable storage medium, and program
CN103456314B (en) A kind of emotion identification method and device
CN110148416A (en) Audio recognition method, device, equipment and storage medium
WO2019084810A1 (en) Information processing method and terminal, and computer storage medium
CN110459222A (en) Sound control method, phonetic controller and terminal device
WO2021114841A1 (en) User report generating method and terminal device
CN107591155A (en) Audio recognition method and device, terminal and computer-readable recording medium
US10108707B1 (en) Data ingestion pipeline
CN108932945A (en) A kind of processing method and processing device of phonetic order
CN108804609A (en) Song recommendation method and device
CN108735201A (en) continuous speech recognition method, device, equipment and storage medium
CN113051362B (en) Data query method, device and server
US20200265843A1 (en) Speech broadcast method, device and terminal
CN110162780A (en) The recognition methods and device that user is intended to
CN105893351B (en) Audio recognition method and device
CN109741735A (en) The acquisition methods and device of a kind of modeling method, acoustic model
US20220261545A1 (en) Systems and methods for producing a semantic representation of a document
CN110297893A (en) Natural language question-answering method, device, computer installation and storage medium
CN108804525A (en) A kind of intelligent Answering method and device
CN108763202A (en) Method, apparatus, equipment and the readable storage medium storing program for executing of the sensitive text of identification
CN109410934A (en) A kind of more voice sound separation methods, system and intelligent terminal based on vocal print feature
CN109739968A (en) A kind of data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant