CN110070866B - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
CN110070866B
CN110070866B CN201910281318.4A CN201910281318A CN110070866B CN 110070866 B CN110070866 B CN 110070866B CN 201910281318 A CN201910281318 A CN 201910281318A CN 110070866 B CN110070866 B CN 110070866B
Authority
CN
China
Prior art keywords
car machine
voice
machine
vehicle
playing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910281318.4A
Other languages
Chinese (zh)
Other versions
CN110070866A (en
Inventor
周星杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Original Assignee
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Intelligent Connectivity Beijing Technology Co Ltd filed Critical Apollo Intelligent Connectivity Beijing Technology Co Ltd
Priority to CN201910281318.4A priority Critical patent/CN110070866B/en
Priority to CN202111117307.6A priority patent/CN113990309A/en
Publication of CN110070866A publication Critical patent/CN110070866A/en
Application granted granted Critical
Publication of CN110070866B publication Critical patent/CN110070866B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Navigation (AREA)

Abstract

The invention provides a voice recognition method and a voice recognition device, wherein the method comprises the following steps: judging whether a car machine is in a voice playing state or not when a voice recognition triggering instruction sent by the car machine is received; if the car machine is in a voice playing state, sending a playing pause instruction to the car machine; judging whether a response message returned by the vehicle machine after the playing is paused is received; if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire user voice acquired by the vehicle-mounted device; and performing voice recognition on the user voice. Therefore, when the terminal equipment is interconnected with the car machine and the car machine is controlled to record the user voice, the recorded user voice does not contain the audio played by the car machine, so that the noise is not doped in the user voice recorded by the car machine as far as possible, the accuracy of voice recognition is ensured, and the user experience is improved.

Description

Voice recognition method and device
Technical Field
The present invention relates to the field of speech processing technologies, and in particular, to a speech recognition method and apparatus.
Background
With the development of internet technology and terminal equipment technology, interconnection between different terminals is more and more common. For example, the mobile phone may be interconnected with a car machine in a vehicle, the mobile phone may control the car machine to play audio information such as music playing, navigation broadcasting and the like by sending a voice playing instruction to the car machine, and simultaneously, the mobile phone may also send a recording instruction to the car machine to control the car machine to record user voice and receive user voice returned by the car machine, and the mobile phone performs voice recognition on the user voice. Obviously, if the user's voice recorded in the car device is mixed with a lot of audio information played by the car device, it will cause the voice recognition to be wrong.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
To this end, a first object of the present invention is to propose a speech recognition method.
A second object of the present invention is to provide a speech recognition apparatus.
A third object of the present invention is to propose another speech recognition apparatus.
A fourth object of the invention is to propose a non-transitory computer-readable storage medium.
A fifth object of the invention is to propose a computer program product.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a speech recognition method, including:
when a voice recognition triggering instruction sent by a vehicle machine is received, judging whether the vehicle machine is in a voice playing state;
if the car machine is in a voice playing state, sending a playing pause instruction to the car machine;
judging whether a response message returned by the vehicle machine after the playing is paused is received;
if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire user voice acquired by the vehicle-mounted device;
and performing voice recognition on the user voice.
Further, the method further comprises:
and if the car machine is not in a voice playing state, sending a recording instruction to the car machine to acquire the user voice collected by the car machine.
Further, to the car machine sends the recording instruction, before obtaining the user's pronunciation that the car machine gathered, still includes:
carrying out environment detection on the car machine, and judging whether the current environment of the car machine meets the recording condition;
correspondingly, to the car machine sends the recording instruction, acquires the user's pronunciation that the car machine gathered includes:
and when the current environment of the car machine meets the recording condition, sending a recording instruction to the car machine to acquire the user voice collected by the car machine.
Further, it is right the car machine carries out environment detection, judges whether the current environment of car machine satisfies the recording condition, include:
carrying out environment detection on the car machine to obtain environment sounds;
judging whether the frequency information of the environmental sound is consistent with preset voice playing frequency information or not;
if the frequency information of the environmental sound is consistent with preset voice playing frequency information, determining that the current environment of the vehicle machine does not meet a recording condition;
and if the frequency information of the environmental sound is inconsistent with the preset voice playing frequency information, determining that the current environment of the vehicle-mounted device meets the recording condition.
Further, it is right the car machine carries out environment detection, judges whether the current environment of car machine satisfies the recording condition, still includes:
and if the environmental sound is not obtained within a preset time period, determining that the current environment of the vehicle-mounted device meets the recording condition.
Further, the method further comprises:
and when the recording instruction is determined to be sent to the car machine, displaying prompt information on a preset interface so that a user can send out voice according to the prompt information for collection.
Further, the voice playing state comprises: music play status, and/or navigation broadcast status.
According to the voice recognition method, when a voice recognition triggering instruction sent by a vehicle machine is received, whether the vehicle machine is in a voice playing state is judged; if the car machine is in a voice playing state, sending a playing pause instruction to the car machine; judging whether a response message returned by the vehicle machine after the playing is paused is received; if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire user voice acquired by the vehicle-mounted device; and performing voice recognition on the user voice. Therefore, when the terminal equipment is interconnected with the car machine and the car machine is controlled to record the user voice, the recorded user voice does not contain the audio played by the car machine, so that the noise is not doped in the user voice recorded by the car machine as far as possible, the accuracy of voice recognition is ensured, and the user experience is improved.
In order to achieve the above object, a second embodiment of the present invention provides a speech recognition apparatus, including:
the system comprises a judging module, a voice playing module and a voice playing module, wherein the judging module is used for judging whether a vehicle machine is in a voice playing state or not when receiving a voice recognition triggering instruction sent by the vehicle machine;
the sending module is used for sending a play pause instruction to the car machine when the car machine is in a voice play state;
the judging module is further used for judging whether a response message returned by the vehicle machine after the playing is paused is received;
the sending module is further configured to send a recording instruction to the car machine when receiving the response message, and obtain the user voice collected by the car machine;
and the voice recognition module is used for carrying out voice recognition on the user voice.
Further, the sending module is further configured to send a recording instruction to the car machine when the car machine is not in a voice playing state, so as to obtain the user voice collected by the car machine.
Further, the apparatus further comprises: the detection module is used for carrying out environment detection on the car machine and judging whether the current environment of the car machine meets the recording condition;
the sending module is specifically used for sending a recording instruction to the car machine when the current environment of the car machine meets the recording condition, and obtaining the user voice collected by the car machine.
Further, the detection module is specifically configured to,
carrying out environment detection on the car machine to obtain environment sounds;
judging whether the frequency information of the environmental sound is consistent with preset voice playing frequency information or not;
if the frequency information of the environmental sound is consistent with preset voice playing frequency information, determining that the current environment of the vehicle machine does not meet a recording condition;
and if the frequency information of the environmental sound is inconsistent with the preset voice playing frequency information, determining that the current environment of the vehicle-mounted device meets the recording condition.
Further, the detection module is specifically further configured to,
and if the environmental sound is not obtained within a preset time period, determining that the current environment of the vehicle-mounted device meets the recording condition.
Further, the apparatus further comprises:
and the display module is used for displaying prompt information on a preset interface when the detection module determines to send the recording instruction to the car machine, so that a user can send out voice according to the prompt information for collection.
Further, the voice playing state comprises: music play status, and/or navigation broadcast status.
The voice recognition device of the embodiment of the invention judges whether the car machine is in a voice playing state or not by receiving a voice recognition triggering instruction sent by the car machine; if the car machine is in a voice playing state, sending a playing pause instruction to the car machine; judging whether a response message returned by the vehicle machine after the playing is paused is received; if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire user voice acquired by the vehicle-mounted device; and performing voice recognition on the user voice. Therefore, when the terminal equipment is interconnected with the car machine and the car machine is controlled to record the user voice, the recorded user voice does not contain the audio played by the car machine, so that the noise is not doped in the user voice recorded by the car machine as far as possible, the accuracy of voice recognition is ensured, and the user experience is improved.
In order to achieve the above object, a third embodiment of the present invention provides another speech recognition apparatus, including: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the speech recognition method as described above when executing the program.
In order to achieve the above object, a fourth aspect of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the speech recognition method as described above.
In order to achieve the above object, a fifth embodiment of the present invention provides a computer program product, which when executed by an instruction processor in the computer program product, implements the speech recognition method as described above.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating another speech recognition method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of another speech recognition apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
A speech recognition method and apparatus according to an embodiment of the present invention will be described with reference to the drawings.
Fig. 1 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention. As shown in fig. 1, the speech recognition method includes the steps of:
s101, when receiving a voice recognition triggering instruction sent by a vehicle machine, judging whether the vehicle machine is in a voice playing state.
The execution subject of the voice recognition method provided by the invention is a voice recognition device, and the voice recognition device can be hardware equipment such as terminal equipment and a server, or software installed on the hardware equipment. For convenience of understanding, the present embodiment takes the voice recognition device as an example of a terminal device interconnected with the car machine, where the terminal device is, for example, a mobile phone, a tablet computer, a wearable device, and the like.
In this embodiment, the car machine may be understood as a device installed in an automobile to provide entertainment information, and the car machine has functions such as voice broadcasting, recording, telephone dialing, short-message sending and receiving, online movie and television, and the like; the car machine can realize the interaction of the car and the outside, such as the interaction of the car and people, the interaction of the car and the car, and the like.
In this embodiment, the terminal device interconnected with the car machine can control the car machine to play and pause audio information, video information, and the like, and control the car machine to record user voice; the car machine can send the recorded voice to the terminal device, and the terminal device performs voice recognition on the voice of the user, and certainly, the interconnection between the terminal device and the car machine is not limited to playing audio information, pausing playing audio information and recording.
In this embodiment, a user may trigger the car machine to send a voice recognition trigger instruction to the terminal device through a key with a voice recognition trigger function, which is arranged in the vehicle, where the key is, for example, a hard key on a steering wheel of the vehicle, so as to facilitate user operation; the car machine may also receive a voice recognition trigger instruction input by a user through interaction with the user, and forward the voice recognition trigger instruction to the terminal device, where the voice recognition trigger instruction input by the user is, for example, in a voice input mode, the user speaks "start a voice recognition function" towards the car machine, or a button with a voice recognition trigger function is provided on a human-computer interaction interface of the car machine, and the user touches the button in a click or sliding mode to trigger the voice recognition trigger function.
In this embodiment, after receiving the voice recognition trigger instruction sent by the car machine, the terminal device determines the current state of the car machine. Specifically, since the terminal device is interconnected with the car machine, the terminal device can acquire the instruction sent to the car machine last time by querying the history record, and if the instruction sent to the car machine last time is a voice playing instruction, it is determined that the current state of the car machine is a voice playing state; and if the instruction sent to the vehicle machine last time is a pause playing instruction, determining that the current state of the vehicle machine is a pause playing state.
In this embodiment, the voice playing status includes: a music playing status, and/or a navigation broadcasting status, but not limited thereto. When the car machine is in music broadcast state, car machine broadcast music, when the car machine is in the navigation and reports the state, the car machine for example carries out map navigation voice broadcast or news report etc..
And S102, if the car machine is in a voice playing state, sending a pause playing instruction to the car machine.
S103, judging whether a response message returned by the vehicle machine after the playing is paused is received.
In this embodiment, in order to ensure that the user voices recorded by the car machine are not doped with noise as much as possible, so as to ensure the accuracy of voice recognition, it is necessary to ensure that the car machine is not in a voice playing state. The car machine which is not in the voice playing state, namely the car machine, does not play audio information, the audio information broadcasted by the car machine cannot be mixed in the environment where the car machine is located, and the audio information broadcasted by the car machine is music, navigation broadcast and the like.
In this embodiment, if the car machine is in the voice playing state, the car machine is playing the audio information at this time, and before controlling the car machine to record the voice of the user, a pause playing instruction is sent to the car machine in the voice playing state, so that the car machine pauses playing the audio information.
Specifically, when detecting that the car machine is in a playing state, the terminal device sends a play pause instruction to the car machine; after the vehicle machine receives the playing pause instruction, the vehicle machine pauses playing the audio information, and after the playing pause, the vehicle machine returns a response message for indicating that the vehicle machine has paused playing to the terminal equipment. And after receiving the response message, the terminal equipment determines that the car machine is switched from the voice playing state to the pause playing state.
And S104, if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire the user voice acquired by the vehicle-mounted device.
And S105, performing voice recognition on the user voice.
In this embodiment, if the terminal device receives a response message returned by the car machine to instruct the car machine to pause playing, the terminal device determines that the car machine has switched from the voice playing state to the pause playing state, and at this time, the terminal device sends a recording instruction to the car machine to control the car machine to collect user voice; and the terminal equipment receives the user voice sent by the vehicle machine and carries out voice recognition on the user voice.
Further, to facilitate interaction with a user, the method further comprises: and when the recording instruction is determined to be sent to the car machine, displaying prompt information on a preset interface so that a user can send out voice according to the prompt information for collection.
In this embodiment, the preset interface is a human-computer interaction interface in the terminal device, the displayed prompt information is, for example, "voice input can be started," and the user determines that voice input operation can be performed according to the prompt information.
Further, after step S101, the method further comprises the steps of:
and S106, if the car machine is not in a voice playing state, sending a recording instruction to the car machine to acquire the user voice collected by the car machine.
In this embodiment, when the car machine is not in the voice playing state, it is indicated that audio information played by the car machine does not exist in the environment where the car machine is located, and at this time, a recording instruction is directly sent to the car machine, so that the speed of acquiring the user voice of the car machine by the terminal device is increased, and the speed of voice recognition is increased.
According to the voice recognition method, when a voice recognition triggering instruction sent by a vehicle machine is received, whether the vehicle machine is in a voice playing state is judged; if the car machine is in a voice playing state, sending a playing pause instruction to the car machine; judging whether a response message returned by the vehicle machine after the playing is paused is received; if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire user voice acquired by the vehicle-mounted device; and performing voice recognition on the user voice. Therefore, when the terminal equipment is interconnected with the car machine and the car machine is controlled to record the user voice, the recorded user voice does not contain the audio played by the car machine, so that the noise is not doped in the user voice recorded by the car machine as far as possible, the accuracy of voice recognition is ensured, and the user experience is improved.
In practical application, after receiving a pause instruction sent by terminal equipment, the car machine pauses the currently played audio information; and then, the car machine receives the recording instruction sent by the terminal equipment, and the car machine starts to record the voice of the user. However, the car machine has a certain delay, that is, the time when the car machine stops playing the audio information is later than the time when the car machine starts recording, so that the user voice collected by the car machine contains part of the audio information, and the accuracy of voice recognition is affected. Therefore, before a recording instruction is sent to the car machine and user voice collected by the car machine is acquired, environment detection is carried out on the car machine, and whether the current environment of the car machine meets a recording condition or not is judged; and when the current environment of the car machine meets the recording condition, sending a recording instruction to the car machine to acquire the user voice collected by the car machine. This situation is further explained below in conjunction with fig. 2.
Fig. 2 is a flowchart illustrating another speech recognition method according to an embodiment of the present invention. With reference to fig. 2 in combination, on the basis of the embodiment shown in fig. 1, the speech recognition method includes the following steps:
s201, when a voice recognition triggering instruction sent by a vehicle machine is received, judging whether the vehicle machine is in a voice playing state.
S202, if the car machine is in a voice playing state, sending a pause playing instruction to the car machine.
S203, judging whether a response message returned by the vehicle machine after the playing is paused is received.
The implementation manners of steps S201, S202, and S203 in the embodiment of the present invention are the same as the implementation manners of steps S101, S102, and S103 in the embodiment shown in fig. 1, and are not described again here.
And S204, if the response message is received, carrying out environment detection on the car machine, and judging whether the current environment of the car machine meets the recording condition.
In this embodiment, if the terminal device receives a response message returned by the car machine and indicating that the car machine is to pause playing, the terminal device determines that the car machine has switched from the voice playing state to the pause playing state.
Because the car machine has a certain time delay, when a response message returned by the car machine is received, the terminal does not immediately send a recording instruction to the car machine, but carries out environment detection on the current environment of the car machine, and sends the recording instruction to the car machine after judging that the current environment of the car machine meets the recording condition, so as to ensure that the voice frequency information played by the car machine is not doped in the user voice collected by the car machine as much as possible.
In the embodiment, environment detection is performed on the car machine, and if the environment sound is obtained, whether the current environment of the car machine meets the recording condition is judged according to the frequency information of the environment sound and preset voice playing frequency information; and if the environmental sound is not obtained within the preset time period, determining that the current environment of the vehicle-mounted device meets the recording condition. The preset time period is set according to an actual situation, and the preset time period is, for example, 1 second. And when the environment sound is not obtained within the preset time period, the car machine completely stops playing the audio information, and at the moment, the audio information played by the car machine cannot be detected in the environment where the car machine is located, so that the current environment of the car machine is determined to meet the recording condition.
In one possible implementation manner, "judging whether the current environment of the car machine meets the recording condition according to the frequency information of the environmental sound and the preset voice playing frequency information" is specifically implemented as follows:
and S1, judging whether the frequency information of the environmental sound is consistent with the preset voice playing frequency information.
And S2, if the frequency information of the environment sound is consistent with the preset voice playing frequency information, determining that the current environment of the vehicle machine does not meet the recording condition.
And S3, if the frequency information of the environment sound is inconsistent with the preset voice playing frequency information, determining that the current environment of the vehicle machine meets the recording condition.
In this embodiment, the preset voice playing frequency information may be understood as frequency information of audio information played by the car machine, for example, frequency information of music or navigation sound played by the car machine. The preset voice playing frequency information is set according to the actual situation.
Specifically, the terminal device has a sound frequency analysis function, and after the terminal device determines that the car machine is in a pause state, the terminal device collects the environmental sound through a sound collection device such as a microphone, and simultaneously starts the sound frequency analysis function to perform frequency analysis on the collected environmental sound, so as to identify the frequency information of the environmental sound.
If the frequency information of the environmental sound identified by the terminal device is consistent with the preset voice playing frequency information, it is indicated that although the car machine is playing temporarily, the car machine is still playing the audio information due to a certain delay, at this time, if the car machine is controlled to collect the user voice, the user voice still has a lot of audio information played by the car machine, and therefore, when the frequency information of the environmental sound identified by the terminal device is consistent with the preset voice playing frequency information, it is determined that the current environment of the car machine does not meet the recording condition. Otherwise, if the frequency information of the environment sound identified by the terminal equipment is inconsistent with the preset voice playing frequency information, the car machine completely suspends playing the audio information, at this time, the audio information played by the car machine cannot be detected in the environment where the car machine is located, and the current environment of the car machine is determined to meet the recording condition.
S205, when the current environment of the car machine meets the recording condition, sending a recording instruction to the car machine to acquire the user voice collected by the car machine.
S206, carrying out voice recognition on the user voice.
In this embodiment, when the terminal device determines that the car machine has switched from the voice playing state to the pause playing state and determines that the current environment of the car machine meets the recording condition, the terminal device sends a recording instruction to the car machine to control the car machine to collect the user voice; and the terminal equipment receives the user voice sent by the vehicle machine and carries out voice recognition on the user voice.
Further, after step S201, the method further comprises the steps of:
s207, if the car machine is not in a voice playing state, carrying out environment detection on the car machine, and judging whether the current environment of the car machine meets a recording condition.
In this embodiment, when the car machine is not in the voice playing state, it is indicated that audio information played by the car machine does not exist in the environment where the car machine is located, and at this time, a recording instruction is directly sent to the car machine, so that the speed of acquiring the user voice of the car machine by the terminal device is increased, and the speed of voice recognition is increased.
According to the voice recognition method, when a voice recognition triggering instruction sent by a vehicle machine is received, whether the vehicle machine is in a voice playing state is judged; if the car machine is in a voice playing state, sending a playing pause instruction to the car machine; judging whether a response message returned by the vehicle machine after the playing is paused is received; if the response message is received, performing environment detection on the car machine, and judging whether the current environment of the car machine meets a recording condition; when the current environment of the car machine meets a recording condition, sending a recording instruction to the car machine to acquire user voice collected by the car machine; and performing voice recognition on the user voice. Therefore, when the terminal device is interconnected with the car machine, the car machine is determined to be switched from the voice playing state to the pause playing state by the terminal device, and when the current environment of the car machine is determined to meet the recording condition, a recording instruction is sent to the car machine to control the car machine to collect user voice, recorded user voice does not contain audio information played by the car machine, noise is not doped in the user voice recorded by the car machine as far as possible, the accuracy of voice recognition is further improved, and user experience is improved.
Fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention. As shown in fig. 3, includes: a judging module 11, a sending module 12 and a voice recognition module 13.
The system comprises a judging module 11, a voice playing module and a voice playing module, wherein the judging module is used for judging whether a car machine is in a voice playing state or not when receiving a voice recognition triggering instruction sent by the car machine;
a sending module 12, configured to send a pause playing instruction to the car machine when the car machine is in a voice playing state;
the judging module 11 is further configured to judge whether a response message returned by the car machine after the playing is paused is received;
the sending module 12 is further configured to send a recording instruction to the car machine when receiving the response message, and obtain the user voice collected by the car machine;
and the voice recognition module 13 is configured to perform voice recognition on the user voice.
Further, the sending module 13 is further configured to send a recording instruction to the car machine when the car machine is not in a voice playing state, so as to obtain the user voice collected by the car machine.
Further, the apparatus further comprises: the detection module is used for carrying out environment detection on the car machine and judging whether the current environment of the car machine meets the recording condition;
the sending module is specifically used for sending a recording instruction to the car machine when the current environment of the car machine meets the recording condition, and obtaining the user voice collected by the car machine.
Further, the detection module is specifically configured to:
carrying out environment detection on the car machine to obtain environment sounds;
judging whether the frequency information of the environmental sound is consistent with preset voice playing frequency information or not;
if the frequency information of the environmental sound is consistent with preset voice playing frequency information, determining that the current environment of the vehicle machine does not meet a recording condition;
and if the frequency information of the environmental sound is inconsistent with the preset voice playing frequency information, determining that the current environment of the vehicle-mounted device meets the recording condition.
Further, the detection module is specifically further configured to,
and if the environmental sound is not obtained within a preset time period, determining that the current environment of the vehicle-mounted device meets the recording condition.
Further, the apparatus further comprises:
and the display module is used for displaying prompt information on a preset interface when the detection module determines to send the recording instruction to the car machine, so that a user can send out voice according to the prompt information for collection.
Further, the voice playing state comprises: music play status, and/or navigation broadcast status.
It should be noted that the foregoing explanation of the embodiment of the speech recognition method is also applicable to the speech recognition apparatus of the embodiment, and is not repeated herein.
The voice recognition device of the embodiment of the invention judges whether the car machine is in a voice playing state or not by receiving a voice recognition triggering instruction sent by the car machine; if the car machine is in a voice playing state, sending a playing pause instruction to the car machine; judging whether a response message returned by the vehicle machine after the playing is paused is received; if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire user voice acquired by the vehicle-mounted device; and performing voice recognition on the user voice. Therefore, when the terminal equipment is interconnected with the car machine and the car machine is controlled to record the user voice, the recorded user voice does not contain the audio played by the car machine, so that the noise is not doped in the user voice recorded by the car machine as far as possible, the accuracy of voice recognition is ensured, and the user experience is improved.
Fig. 4 is a schematic structural diagram of another speech recognition apparatus according to an embodiment of the present invention. The speech recognition apparatus includes:
memory 1001, processor 1002, and computer programs stored on memory 1001 and executable on processor 1002.
The processor 1002, when executing the program, implements the speech recognition method provided in the above-described embodiments.
Further, the speech recognition apparatus further includes:
a communication interface 1003 for communicating between the memory 1001 and the processor 1002.
A memory 1001 for storing computer programs that may be run on the processor 1002.
Memory 1001 may include high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 1002 is configured to implement the speech recognition method according to the foregoing embodiment when executing the program.
If the memory 1001, the processor 1002, and the communication interface 1003 are implemented independently, the communication interface 1003, the memory 1001, and the processor 1002 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 1001, the processor 1002, and the communication interface 1003 are integrated on one chip, the memory 1001, the processor 1002, and the communication interface 1003 may complete communication with each other through an internal interface.
The processor 1002 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present invention.
The invention also provides a non-transitory computer-readable storage medium on which a computer program is stored which, when executed by a processor, implements a speech recognition method as described above.
The invention also provides a computer program product, which when executed by an instruction processor in the computer program product, implements the speech recognition method as described above.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (15)

1. A voice recognition method is applied to terminal equipment interconnected with a vehicle machine, and is characterized by comprising the following steps:
when a voice recognition triggering instruction sent by a vehicle machine is received, judging whether the vehicle machine is in a voice playing state;
if the car machine is in a voice playing state, sending a playing pause instruction to the car machine;
judging whether a response message returned by the vehicle machine after the playing is paused is received;
if the response message is received, sending a recording instruction to the vehicle-mounted device to acquire user voice acquired by the vehicle-mounted device;
performing voice recognition on the user voice; the to the car machine sends the recording instruction, acquires before the user's pronunciation that the car machine gathered, still includes:
carrying out environment detection on the car machine, and judging whether the current environment of the car machine meets the recording condition;
correspondingly, to the car machine sends the recording instruction, acquires the user's pronunciation that the car machine gathered includes:
and when the current environment of the car machine meets the recording condition, sending a recording instruction to the car machine to acquire the user voice collected by the car machine.
2. The method of claim 1, further comprising:
and if the car machine is not in a voice playing state, sending a recording instruction to the car machine to acquire the user voice collected by the car machine.
3. The method according to claim 1, wherein the performing environment detection on the car machine and determining whether the current environment of the car machine meets a recording condition comprises:
carrying out environment detection on the car machine to obtain environment sounds;
judging whether the frequency information of the environmental sound is consistent with preset voice playing frequency information or not;
if the frequency information of the environmental sound is consistent with preset voice playing frequency information, determining that the current environment of the vehicle machine does not meet a recording condition;
and if the frequency information of the environmental sound is inconsistent with the preset voice playing frequency information, determining that the current environment of the vehicle-mounted device meets the recording condition.
4. The method according to claim 3, wherein the performing environment detection on the car machine and determining whether the current environment of the car machine meets a recording condition further comprises:
and if the environmental sound is not obtained within a preset time period, determining that the current environment of the vehicle-mounted device meets the recording condition.
5. The method of claim 1, further comprising:
and when the recording instruction is determined to be sent to the car machine, displaying prompt information on a preset interface so that a user can send out voice according to the prompt information for collection.
6. The method of claim 1, wherein the voice play state comprises: music play status, and/or navigation broadcast status.
7. The utility model provides a speech recognition device, is applied to the terminal equipment with car machine interconnection, its characterized in that includes:
the system comprises a judging module, a voice playing module and a voice playing module, wherein the judging module is used for judging whether a vehicle machine is in a voice playing state or not when receiving a voice recognition triggering instruction sent by the vehicle machine;
the sending module is used for sending a play pause instruction to the car machine when the car machine is in a voice play state;
the judging module is further used for judging whether a response message returned by the vehicle machine after the playing is paused is received;
the sending module is further configured to send a recording instruction to the car machine when receiving the response message, and obtain the user voice collected by the car machine;
the voice recognition module is used for carrying out voice recognition on the user voice;
further comprising: the detection module is used for carrying out environment detection on the car machine and judging whether the current environment of the car machine meets the recording condition;
the sending module is specifically used for sending a recording instruction to the car machine when the current environment of the car machine meets the recording condition, and obtaining the user voice collected by the car machine.
8. The device according to claim 7, wherein the sending module is further configured to send a recording instruction to the car machine when the car machine is not in a voice playing state, so as to obtain the user voice collected by the car machine.
9. The apparatus of claim 7, wherein the detection module is specifically configured to,
carrying out environment detection on the car machine to obtain environment sounds;
judging whether the frequency information of the environmental sound is consistent with preset voice playing frequency information or not;
if the frequency information of the environmental sound is consistent with preset voice playing frequency information, determining that the current environment of the vehicle machine does not meet a recording condition;
and if the frequency information of the environmental sound is inconsistent with the preset voice playing frequency information, determining that the current environment of the vehicle-mounted device meets the recording condition.
10. The device according to claim 9, characterized in that the detection module is further configured to,
and if the environmental sound is not obtained within a preset time period, determining that the current environment of the vehicle-mounted device meets the recording condition.
11. The apparatus of claim 7, further comprising:
and the display module is used for displaying prompt information on a preset interface when the detection module determines to send the recording instruction to the car machine, so that a user can send out voice according to the prompt information for collection.
12. The apparatus of claim 7, wherein the voice play state comprises: music play status, and/or navigation broadcast status.
13. A speech recognition apparatus, comprising:
memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the speech recognition method according to any of claims 1-6 when executing the program.
14. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the speech recognition method according to any one of claims 1 to 6.
15. A computer program product comprising a computer program which, when executed by a processor, implements a speech recognition method according to any one of claims 1-6.
CN201910281318.4A 2019-04-09 2019-04-09 Voice recognition method and device Active CN110070866B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910281318.4A CN110070866B (en) 2019-04-09 2019-04-09 Voice recognition method and device
CN202111117307.6A CN113990309A (en) 2019-04-09 2019-04-09 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910281318.4A CN110070866B (en) 2019-04-09 2019-04-09 Voice recognition method and device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202111117307.6A Division CN113990309A (en) 2019-04-09 2019-04-09 Voice recognition method and device

Publications (2)

Publication Number Publication Date
CN110070866A CN110070866A (en) 2019-07-30
CN110070866B true CN110070866B (en) 2021-12-24

Family

ID=67367236

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111117307.6A Pending CN113990309A (en) 2019-04-09 2019-04-09 Voice recognition method and device
CN201910281318.4A Active CN110070866B (en) 2019-04-09 2019-04-09 Voice recognition method and device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202111117307.6A Pending CN113990309A (en) 2019-04-09 2019-04-09 Voice recognition method and device

Country Status (1)

Country Link
CN (2) CN113990309A (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113990309A (en) * 2019-04-09 2022-01-28 百度国际科技(深圳)有限公司 Voice recognition method and device
CN112306221A (en) * 2019-08-02 2021-02-02 上海擎感智能科技有限公司 Intelligent vehicle-mounted machine interaction method and device, storage medium and terminal
CN112435690B (en) * 2019-08-08 2024-06-04 百度在线网络技术(北京)有限公司 Duplex Bluetooth translation processing method, duplex Bluetooth translation processing device, computer equipment and storage medium
CN111369989B (en) * 2019-11-29 2022-07-05 添可智能科技有限公司 Voice interaction method of cleaning equipment and cleaning equipment
CN113129902B (en) * 2019-12-30 2023-10-24 北京猎户星空科技有限公司 Voice processing method and device, electronic equipment and storage medium
CN111210820B (en) * 2020-01-21 2022-11-18 达闼机器人股份有限公司 Robot control method, robot control device, electronic device, and storage medium
CN116055775A (en) * 2022-12-27 2023-05-02 深圳创维-Rgb电子有限公司 Control method, device, equipment and storage medium of television equipment
CN118588068A (en) * 2023-03-02 2024-09-03 蔚来移动科技有限公司 Voice control method, device, medium and system for vehicle-machine cooperative electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103796125A (en) * 2013-11-21 2014-05-14 广州视源电子科技股份有限公司 Sound adjusting method based on earphone playing
CN204316540U (en) * 2014-11-15 2015-05-06 深圳市掌翼星通科技有限公司 A kind of vehicle-carried sound-controlled voice Telephoning system
CN106767884A (en) * 2016-12-19 2017-05-31 东风汽车公司 Automobile instrument air navigation aid based on mobile phone interconnection
CN106910500A (en) * 2016-12-23 2017-06-30 北京第九实验室科技有限公司 The method and apparatus of Voice command is carried out to the equipment with microphone array
US20180060031A1 (en) * 2016-08-26 2018-03-01 Bragi GmbH Voice assistant for wireless earpieces
US10045110B2 (en) * 2016-07-06 2018-08-07 Bragi GmbH Selective sound field environment processing system and method
CN109243438A (en) * 2018-08-24 2019-01-18 上海擎感智能科技有限公司 A kind of car owner's emotion adjustment method, system and storage medium
US10194026B1 (en) * 2014-03-26 2019-01-29 Open Invention Network, Llc IVR engagements and upfront background noise
CN109360567A (en) * 2018-12-12 2019-02-19 苏州思必驰信息科技有限公司 The customizable method and apparatus waken up
CN109493865A (en) * 2018-10-17 2019-03-19 北京车和家信息技术有限公司 Signal processing method, terminal and vehicle

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7848927B2 (en) * 2004-11-30 2010-12-07 Panasonic Corporation Speech recognition device and method of recognizing speech using a language model
CN105719656A (en) * 2014-12-03 2016-06-29 广州汽车集团股份有限公司 Vehicle-mounted voice recognition system
CN107264447B (en) * 2017-06-06 2019-12-10 安克创新科技股份有限公司 Method, system and device for controlling speech recognition in vehicle
CN108711426A (en) * 2018-05-04 2018-10-26 四川斐讯信息技术有限公司 A kind of wireless extensions device configuration method and system based on voice control
CN113990309A (en) * 2019-04-09 2022-01-28 百度国际科技(深圳)有限公司 Voice recognition method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103796125A (en) * 2013-11-21 2014-05-14 广州视源电子科技股份有限公司 Sound adjusting method based on earphone playing
US10194026B1 (en) * 2014-03-26 2019-01-29 Open Invention Network, Llc IVR engagements and upfront background noise
CN204316540U (en) * 2014-11-15 2015-05-06 深圳市掌翼星通科技有限公司 A kind of vehicle-carried sound-controlled voice Telephoning system
US10045110B2 (en) * 2016-07-06 2018-08-07 Bragi GmbH Selective sound field environment processing system and method
US20180060031A1 (en) * 2016-08-26 2018-03-01 Bragi GmbH Voice assistant for wireless earpieces
CN106767884A (en) * 2016-12-19 2017-05-31 东风汽车公司 Automobile instrument air navigation aid based on mobile phone interconnection
CN106910500A (en) * 2016-12-23 2017-06-30 北京第九实验室科技有限公司 The method and apparatus of Voice command is carried out to the equipment with microphone array
CN109243438A (en) * 2018-08-24 2019-01-18 上海擎感智能科技有限公司 A kind of car owner's emotion adjustment method, system and storage medium
CN109493865A (en) * 2018-10-17 2019-03-19 北京车和家信息技术有限公司 Signal processing method, terminal and vehicle
CN109360567A (en) * 2018-12-12 2019-02-19 苏州思必驰信息科技有限公司 The customizable method and apparatus waken up

Also Published As

Publication number Publication date
CN110070866A (en) 2019-07-30
CN113990309A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
CN110070866B (en) Voice recognition method and device
US10600415B2 (en) Method, apparatus, device, and storage medium for voice interaction
CN111107421B (en) Video processing method and device, terminal equipment and storage medium
CN106998494B (en) Video recording method and related device
US10068390B2 (en) Method for obtaining product feedback from drivers in a non-distracting manner
CN112231021B (en) Method and device for guiding new functions of software
US11200899B2 (en) Voice processing method, apparatus and device
RU2656693C2 (en) Event prompting method and device
CN109657091B (en) State presentation method, device and equipment of voice interaction equipment and storage medium
CN107273086A (en) Audio-frequency processing method and device based on navigation
CN107147795A (en) A kind of reminding method and mobile terminal
CN110069227B (en) Data interaction display method and device
CN109725869B (en) Continuous interaction control method and device
CN106156036B (en) Vehicle-mounted audio processing method and vehicle-mounted equipment
CN104092809A (en) Communication sound recording method and recorded communication sound playing method and device
CN109246742B (en) Automatic answering method for incoming call paging and mobile terminal
CN109040912B (en) Plugging hole treatment method and related product
CN107454265B (en) Method and device for recording call information based on call mode change
CN109195072B (en) Audio playing control system and method based on automobile
CN111063349B (en) Key query method and device based on artificial intelligence voice
CN109147783B (en) Voice recognition method, medium and system based on Karaoke system
CN104899058A (en) Pre-downloading method and apparatus
CN105072243A (en) Incoming call prompting method and apparatus
CN111556406B (en) Audio processing method, audio processing device and earphone
CN112735481A (en) POP sound detection method and device, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211013

Address after: 100176 101, floor 1, building 1, yard 7, Ruihe West 2nd Road, Beijing Economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd.

Address before: Unit D, Unit 3, 301, Productivity Building No. 5, High-tech Secondary Road, Nanshan District, Shenzhen City, Guangdong Province

Applicant before: BAIDU INTERNATIONAL TECHNOLOGY (SHENZHEN) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant