CN116564290A - Multi-mode voice pause judging method and device - Google Patents
Multi-mode voice pause judging method and device Download PDFInfo
- Publication number
- CN116564290A CN116564290A CN202310543706.1A CN202310543706A CN116564290A CN 116564290 A CN116564290 A CN 116564290A CN 202310543706 A CN202310543706 A CN 202310543706A CN 116564290 A CN116564290 A CN 116564290A
- Authority
- CN
- China
- Prior art keywords
- information
- audio
- judged
- judging
- pause
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000036651 mood Effects 0.000 claims abstract description 46
- 230000004927 fusion Effects 0.000 claims abstract description 18
- 238000004891 communication Methods 0.000 claims description 14
- 238000013145 classification model Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 6
- 206010071299 Slow speech Diseases 0.000 claims description 2
- 230000003993 interaction Effects 0.000 abstract description 14
- 238000013135 deep learning Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The application discloses a multi-mode voice pause judging method and device. The multi-mode voice pause judging method comprises the following steps: acquiring audio to be judged; acquiring text information according to the audio to be judged; acquiring language information and sound velocity information according to the audio to be judged; obtaining a trained pause judging model; extracting and fusing the characteristics of the mood information, the sound speed information and the text information, thereby obtaining fusion characteristics; and inputting the fusion characteristic into the pause judging model so as to obtain a pause judging result. The multi-mode voice pause judging method provided by the application comprehensively judges whether the section of voice is a complete sentence with meaning and correctly understood through text information, mood information and sound speed information, and solves the problems that the prior art does not fully understand the meaning of a user, so that the interaction is invalid and the user experience is relatively poor.
Description
Technical Field
The present disclosure relates to the field of speech pause recognition technologies, and in particular, to a multi-modal speech pause judging method and a multi-modal speech pause judging device.
Background
The voice interaction in the intelligent cabin is the start of intelligent voice, the integrity and fluency of semantic interaction are the important importance in the interaction, the voice interaction is always delayed after the user speaks, or sentences finish recognition in advance when the user pauses, and the returned semantic understanding is always incomplete or wrong, such as the voice of a person: play + (long silence) + Liu Dehua song; the method is characterized in that the method comprises the steps of helping me to navigate + (silence) +Beijing university and the like, natural pauses exist in user speaking, playing is split into a semantic by voice stopping, the semantic is sent to recognition, understanding of the semantic is incomplete, playing is not known, the help me to navigate is split into a semantic, if the silence time is set to be long enough, delay in interaction is too long, reaction is not sensitive enough, statistics show that 5% -10% of user speaking is not finished, user requests are not finished yet, corresponding replies are given after the semantic understanding is finished, the meaning of the user is not completely understood, and therefore, user experience is relatively poor for invalid interaction.
The traditional scheme is to judge and stop by using the VAD according to the tail mute time length, the mute time length of the scheme determines the end length of the sentence, and is generally set to be a fixed threshold value for the tail mute time length, for example, 300ms-500ms is generally set, if the mute time length exceeds the threshold value, the user is considered to be finished, if the pause time length of the user exceeds the threshold value, the judgment is carried out prematurely, the semantics are incomplete, if a larger threshold value is set, the interaction is delayed, the interaction is slow, and the judgment can be carried out after waiting for the tail mute with enough length.
It is therefore desirable to have a solution that solves or at least alleviates the above-mentioned drawbacks of the prior art.
Disclosure of Invention
The present invention is directed to a multi-mode voice pause judging method, which solves at least one of the above-mentioned problems.
The invention provides the following scheme:
according to one aspect of the present invention, there is provided a multi-modal speech pause judging method, the multi-modal speech pause judging method including:
acquiring audio to be judged;
acquiring text information according to the audio to be judged;
acquiring tone information and sound velocity information according to the audio to be judged;
obtaining a trained pause judging model;
extracting and fusing the characteristics of the mood information, the sound speed information and the text information, thereby obtaining fusion characteristics;
and inputting the fusion characteristic into the pause judging model so as to obtain a pause judging result.
Optionally, the obtaining the mood information according to the audio to be judged includes:
obtaining a trained mood classification model;
extracting the mood characteristics of the audio to be judged;
and inputting the mood characteristics of the audio to be judged into the mood classification model so as to acquire mood information.
Optionally, the acquiring the sound speed information according to the audio to be judged includes:
acquiring a trained sonic classification model;
extracting the sound speed characteristics of the audio to be judged;
and inputting the sound speed characteristics of the audio to be judged into the sound speed classification model so as to acquire sound speed information.
Optionally, the acquiring text information according to the audio to be judged includes:
and identifying text information of the audio to be judged through ASR.
Optionally, before the obtaining the trained pause judging model, the multi-modal speech pause judging method further includes:
acquiring a stop judgment database, wherein the stop judgment database comprises at least one stop judgment rule;
judging whether the acquired text information accords with one judging rule in the judging database or not, if not, judging whether the acquired text information accords with one judging rule in the judging database or not
A trained dwell determination model is obtained.
Optionally, the mood information includes thinking mood information, normal mood information, and no judgment mood information.
Optionally, the speech rate information includes fast speech rate information, medium speed speech rate information, and slow speech rate information.
The application also provides a multi-mode voice pause judging device, which comprises:
the audio acquisition module to be judged is used for acquiring the audio to be judged;
the text information acquisition module is used for acquiring text information according to the audio to be judged;
the mood information acquisition module is used for acquiring mood information according to the audio to be judged;
the sound speed information acquisition module is used for acquiring sound speed information according to the audio to be judged;
the pause judgment model acquisition module is used for acquiring a trained pause judgment model;
the fusion module is used for extracting and fusing the characteristics of the mood information, the sound speed information and the text information so as to acquire fusion characteristics;
and the result acquisition module is used for inputting the fusion characteristic into the pause judging model so as to acquire a pause judging result.
The application also provides an electronic device, which comprises: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the multi-modal speech pause determination method as described above.
The present application also provides a computer readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of the multi-modal speech pause judging method as described above.
The multi-mode voice pause judging method provided by the application comprehensively judges whether the section of voice is a complete sentence with meaning and correctly understood through text information, mood information and sound speed information, and solves the problems that the prior art does not fully understand the meaning of a user, so that the interaction is invalid and the user experience is relatively poor.
Drawings
FIG. 1 is a flow chart of a multi-modal speech pause determination method provided by one or more embodiments of the present invention.
Fig. 2 is a block diagram of an electronic device according to a multi-modal voice pause judging method according to one or more embodiments of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flow chart of a multi-modal speech pause determination method provided by one or more embodiments of the present invention.
The multi-mode voice pause judging method shown in fig. 1 comprises the following steps:
step 1: acquiring audio to be judged;
step 2: acquiring text information according to the audio to be judged;
step 3: acquiring tone information and sound velocity information according to the audio to be judged;
step 4: obtaining a trained pause judging model;
step 5: extracting and fusing the characteristics of the mood information, the sound speed information and the text information, thereby obtaining fusion characteristics;
step 6: and inputting the fusion characteristic into the pause judging model so as to obtain a pause judging result.
For example, the pause judging model architecture of the present application may be as follows:
the reference model is bert (12 layer transducer), ALBERT tiny (4 layer transducer) was distilled for end use, or lstm model could be distilled.
The multi-mode voice pause judging method provided by the application comprehensively judges whether the section of voice is a complete sentence with meaning and correctly understood through text information, mood information and sound speed information, and solves the problems that the prior art does not fully understand the meaning of a user, so that the interaction is invalid and the user experience is relatively poor.
In this embodiment, the pause judging model adopts a BERT pre-training model.
In this embodiment, the obtaining the mood information according to the audio to be determined includes:
obtaining a trained mood classification model;
extracting the mood characteristics of the audio to be judged;
and inputting the mood characteristics of the audio to be judged into the mood classification model so as to acquire mood information.
Specifically, according to the audio frequency, training the mood model to judge whether the mood of the speaker is hesitant thinking or normal mood, and judging whether to provide the characteristics.
The language model is used for judging whether the language of the speaker is hesitant or normal, if the language is thinking, the voice is not stopped, waiting is needed, and if the language model is not hesitant, the subsequent judging and stopping model can be sent to, and the last layer is extracted to be used as the characteristic.
The language model can use a deep learning classification model to classify the vehicle-mounted existing corpus into three categories according to the language classification of a speaker, namely thinking language information, normal language information and language information which cannot be judged, and conduct classification training, wherein the last layer is used as a feature extraction layer which is used as a language characteristic value, and the language characteristic value represents thinking language information, normal language information or language information which cannot be judged.
In this embodiment, the obtaining the sound velocity information according to the audio to be determined includes:
acquiring a trained sonic classification model;
extracting the sound speed characteristics of the audio to be judged;
and inputting the sound speed characteristics of the audio to be judged into the sound speed classification model so as to acquire sound speed information.
Specifically, according to the audio frequency, training a speech speed model, judging the speech speed problem of the voice, namely, the middle speed, and according to the middle speed, judging whether the voice is stopped or not in a specified time for subsequent recognition, and extracting the characteristics of the voice;
the speech rate model can acquire the speed of speech by using a traditional mode or a deep learning mode, if one audio belongs to a quick speech, the number of the speaking text in the time is small, the speech rate model can be judged to be not stopped, and if the number of the recognition is large enough and the speech rate is slow, the speech rate is judged to be stopped.
The speech rate model can be used for recording different audio corpus and existing vehicle-mounted data for classification according to the speech rate requirements of speed, the speech audio data are classified according to asr, the words of the sentence are recognized in one minute to judge speaking speed, the speech audio data are trained in three types, and the last layer is used as a characteristic extraction layer for extracting characteristic values.
In this embodiment, the obtaining text information according to the audio to be determined includes:
and identifying text information of the audio to be judged through ASR.
In this embodiment, before the obtaining the trained pause judging model, the multi-modal speech pause judging method further includes:
acquiring a stop judgment database, wherein the stop judgment database comprises at least one stop judgment rule;
judging whether the acquired text information accords with one judging rule in the judging database or not, if not, judging whether the acquired text information accords with one judging rule in the judging database or not
A trained dwell determination model is obtained.
In this embodiment, the audio is sent to ASR to identify the text, and a specific stopping database is set according to the characteristics of the rule system and the characteristics of the interaction of the intelligent cabin to perform rule stopping, and if the rule stopping can give a stopping result, the result can be directly sent to the subsequent interaction of nlu if the result is stopping.
If any stopping rule cannot be met, stopping is performed through the stopping judgment model. If the rule system judges that the stop cannot judge, the recognition of the stop is carried out, if the language is fast, the language is not considered, the language speed characteristics, the language characteristics, the acoustic characteristics and the text characteristics are extracted, the semantic judgment is carried out through the model, if the recognition is stopped, the recognition is carried out in the subsequent interaction of nlu, and if the recognition is not stopped, the operation is continued.
In this embodiment, rule matching is a method of rule matching in the prior art, for example, some templates are used as first matching in the first step of semantic understanding, that is, because infinite templates cannot be enumerated, too many templates have too slow matching speed, so some templates are used as templates in common use, and other templates are used as second semantic understanding, so that the judgment is the same, and the common templates can be directly used for rule matching.
For example, header query: text exact matches, such as: (1) i want to listen (2) navigate to? (3) play.
Query of specific patterns, regular matching, such as: (call |dial).+ -. (phone |WeChat)?
The application also provides a multi-mode voice pause judging device, which comprises an audio acquisition module to be judged, a text information acquisition module, a mood information acquisition module, a sound speed information acquisition module, a pause judging model acquisition module, a fusion module and a result acquisition module,
the audio acquisition module to be judged is used for acquiring audio to be judged;
the text information acquisition module is used for acquiring text information according to the audio to be judged;
the mood information acquisition module is used for acquiring mood information according to the audio to be judged;
the sound speed information acquisition module is used for acquiring sound speed information according to the audio to be judged;
the pause judging model acquisition module is used for acquiring a trained pause judging model;
the fusion module is used for extracting and fusing the characteristics of the mood information, the sound speed information and the text information so as to acquire fusion characteristics;
and the result acquisition module is used for inputting the fusion characteristic into the pause judging model so as to acquire a pause judging result.
Fig. 2 is a block diagram of an electronic device according to one or more embodiments of the present invention.
As shown in fig. 2, the present application further discloses an electronic device, including: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; the memory stores a computer program which, when executed by the processor, causes the processor to perform the steps of the multi-modal speech pause determination method.
The present application also provides a computer readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of a multi-modal speech pause determination method.
The communication bus mentioned for the above-mentioned electronic devices may be a Peripheral component interconnect standard (Peripheral ComponentInterconnect, PCI) bus or an extended industry standard structure (extended industry StandardArchitecture, EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The electronic device includes a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on top of the operating system. The hardware layer comprises a CPU (CPU, centralProcessingUnit), a memory management unit (MMU, memoryManagementUnit), a memory and other hardware. The operating system may be any one or more computer operating systems that implement electronic device control via processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system, etc. In addition, in the embodiment of the present invention, the electronic device may be a handheld device such as a smart phone, a tablet computer, or an electronic device such as a desktop computer, a portable computer, which is not particularly limited in the embodiment of the present invention.
The execution body controlled by the electronic device in the embodiment of the invention can be the electronic device or a functional module in the electronic device, which can call a program and execute the program. The electronic device may obtain firmware corresponding to the storage medium, where the firmware corresponding to the storage medium is provided by the vendor, and the firmware corresponding to different storage media may be the same or different, which is not limited herein. After the electronic device obtains the firmware corresponding to the storage medium, the firmware corresponding to the storage medium can be written into the storage medium, specifically, the firmware corresponding to the storage medium is burned into the storage medium. The process of burning the firmware into the storage medium may be implemented by using the prior art, and will not be described in detail in the embodiment of the present invention.
The electronic device may further obtain a reset command corresponding to the storage medium, where the reset command corresponding to the storage medium is provided by the provider, and the reset commands corresponding to different storage media may be the same or different, which is not limited herein.
At this time, the storage medium of the electronic device is a storage medium in which the corresponding firmware is written, and the electronic device may respond to a reset command corresponding to the storage medium in which the corresponding firmware is written, so that the electronic device resets the storage medium in which the corresponding firmware is written according to the reset command corresponding to the storage medium. The process of resetting the storage medium according to the reset command may be implemented in the prior art, and will not be described in detail in the embodiments of the present invention.
For convenience of description, the above devices are described as being functionally divided into various units and modules. Of course, the functions of each unit, module, etc. may be implemented in one or more pieces of software and/or hardware when implementing the present application.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated by one of ordinary skill in the art that the methodologies are not limited by the order of acts, as some acts may, in accordance with the methodologies, take place in other order or concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
From the above description of embodiments, it will be apparent to those skilled in the art that the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solutions of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform the methods described in the embodiments or some parts of the embodiments of the present application.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.
Claims (10)
1. The multi-modal voice pause judging method is characterized by comprising the following steps of:
acquiring audio to be judged;
acquiring text information according to the audio to be judged;
acquiring tone information and sound velocity information according to the audio to be judged;
obtaining a trained pause judging model;
extracting and fusing the characteristics of the mood information, the sound speed information and the text information, thereby obtaining fusion characteristics;
and inputting the fusion characteristic into the pause judging model so as to obtain a pause judging result.
2. The method for multi-modal speech pause determination as in claim 1, wherein said obtaining mood information from said audio to be determined comprises:
obtaining a trained mood classification model;
extracting the mood characteristics of the audio to be judged;
and inputting the mood characteristics of the audio to be judged into the mood classification model so as to acquire mood information.
3. The multi-modal speech pause judging method according to claim 2, wherein the acquiring sound velocity information according to the audio to be judged includes:
acquiring a trained sonic classification model;
extracting the sound speed characteristics of the audio to be judged;
and inputting the sound speed characteristics of the audio to be judged into the sound speed classification model so as to acquire sound speed information.
4. The multi-modal speech pause judging method according to claim 3, wherein the acquiring text information according to the audio to be judged includes:
and identifying text information of the audio to be judged through ASR.
5. The multi-modal speech pause judging method of claim 4, wherein prior to said obtaining a trained pause judging model, said multi-modal speech pause judging method further comprises:
acquiring a stop judgment database, wherein the stop judgment database comprises at least one stop judgment rule;
judging whether the acquired text information accords with one judging rule in the judging database or not, if not, judging whether the acquired text information accords with one judging rule in the judging database or not
A trained dwell determination model is obtained.
6. The method of claim 5, wherein the mood information includes thinking mood information, normal mood information, and non-judgment mood information.
7. The multi-modal speech pause judging method as claimed in claim 6, wherein the speech rate information includes fast speech rate information, medium speech rate information and slow speech rate information.
8. A multi-modal speech pause judging device, characterized in that the multi-modal speech pause judging device comprises:
the audio acquisition module to be judged is used for acquiring the audio to be judged;
the text information acquisition module is used for acquiring text information according to the audio to be judged;
the mood information acquisition module is used for acquiring mood information according to the audio to be judged;
the sound speed information acquisition module is used for acquiring sound speed information according to the audio to be judged;
the pause judgment model acquisition module is used for acquiring a trained pause judgment model;
the fusion module is used for extracting and fusing the characteristics of the mood information, the sound speed information and the text information so as to acquire fusion characteristics;
and the result acquisition module is used for inputting the fusion characteristic into the pause judging model so as to acquire a pause judging result.
9. An electronic device, the electronic device comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a computer program is stored in a memory, which when executed by a processor causes the processor to perform the steps of the multi-modal speech pause determination method according to any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program executable by an electronic device, which when run on the electronic device causes the electronic device to perform the steps of the multi-modal speech pause determination method according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310543706.1A CN116564290A (en) | 2023-05-15 | 2023-05-15 | Multi-mode voice pause judging method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310543706.1A CN116564290A (en) | 2023-05-15 | 2023-05-15 | Multi-mode voice pause judging method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116564290A true CN116564290A (en) | 2023-08-08 |
Family
ID=87501507
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310543706.1A Pending CN116564290A (en) | 2023-05-15 | 2023-05-15 | Multi-mode voice pause judging method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116564290A (en) |
-
2023
- 2023-05-15 CN CN202310543706.1A patent/CN116564290A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11817094B2 (en) | Automatic speech recognition with filler model processing | |
KR102360924B1 (en) | speech classifier | |
CN113327609B (en) | Method and apparatus for speech recognition | |
WO2017071182A1 (en) | Voice wakeup method, apparatus and system | |
CN109686383B (en) | Voice analysis method, device and storage medium | |
US20210366462A1 (en) | Emotion classification information-based text-to-speech (tts) method and apparatus | |
CN110689877A (en) | Voice end point detection method and device | |
US11640832B2 (en) | Emotion-based voice interaction method, storage medium and terminal device using pitch, fluctuation and tone | |
CN109545197B (en) | Voice instruction identification method and device and intelligent terminal | |
CN110008481B (en) | Translated voice generating method, device, computer equipment and storage medium | |
CN114385800A (en) | Voice conversation method and device | |
JP6915637B2 (en) | Information processing equipment, information processing methods, and programs | |
Płaza et al. | Call transcription methodology for contact center systems | |
CN113611316A (en) | Man-machine interaction method, device, equipment and storage medium | |
CN107886940B (en) | Voice translation processing method and device | |
CN116564290A (en) | Multi-mode voice pause judging method and device | |
CN110895938B (en) | Voice correction system and voice correction method | |
CN115116442B (en) | Voice interaction method and electronic equipment | |
CN115691492A (en) | Vehicle-mounted voice control system and method | |
CN117116266A (en) | Vehicle-mounted voice interaction system and method | |
CN116844555A (en) | Method and device for vehicle voice interaction, vehicle, electronic equipment and storage medium | |
CN116052663A (en) | Speech recognition and semantic understanding integrated method, system and electronic equipment | |
CN116612744A (en) | Voice awakening method, voice awakening device and vehicle | |
KR20210115879A (en) | Method and apparatus for training language skills for older people using speech recognition model | |
CN111292739A (en) | Voice control method and device, storage medium and air conditioner |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |