WO2020042077A1 - 语音识别方法、装置、拍摄系统和计算机可读存储介质 - Google Patents

语音识别方法、装置、拍摄系统和计算机可读存储介质 Download PDF

Info

Publication number
WO2020042077A1
WO2020042077A1 PCT/CN2018/103219 CN2018103219W WO2020042077A1 WO 2020042077 A1 WO2020042077 A1 WO 2020042077A1 CN 2018103219 W CN2018103219 W CN 2018103219W WO 2020042077 A1 WO2020042077 A1 WO 2020042077A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
probability information
standard
keywords
target
Prior art date
Application number
PCT/CN2018/103219
Other languages
English (en)
French (fr)
Inventor
赵文泉
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2018/103219 priority Critical patent/WO2020042077A1/zh
Priority to CN201880032452.4A priority patent/CN110770820A/zh
Publication of WO2020042077A1 publication Critical patent/WO2020042077A1/zh
Priority to US17/187,156 priority patent/US20210183388A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present invention relates to the technical field of speech processing, and in particular, to a speech recognition method, device, shooting system, and computer-readable storage medium.
  • the invention provides a speech recognition method, device, photographing system, and computer-readable storage medium, which are used to solve the global adoption of a single speech recognition model in the prior art, which not only increases the complexity of training the speech recognition model, It also reduces the speech recognition efficiency of terminal equipment, which is not conducive to the problem of intelligent development of terminal equipment.
  • a first aspect of the present invention is to provide a speech recognition method, including:
  • a second aspect of the present invention is to provide a voice recognition device, including:
  • a processor for running a computer program stored in the memory to implement: acquiring a voice instruction input by a user, the voice instruction being used for voice control of a terminal device; determining a working state of the terminal device; and according to the work The state determines a target voice recognition model corresponding to the working state; and uses the target voice recognition model to recognize the voice instruction.
  • a third aspect of the present invention is to provide a voice recognition device, including:
  • An acquisition module configured to acquire a voice instruction input by a user, where the voice instruction is used to perform voice control on a terminal device;
  • a determining module configured to determine a working state of the terminal device
  • a processing module configured to determine a target speech recognition model corresponding to the working state according to the working state
  • a recognition module is configured to recognize the voice instruction by using the target voice recognition model.
  • a fourth aspect of the present invention is to provide a photographing system, including:
  • a voice recognition device is communicatively connected with the photographing device, and the voice recognition device includes:
  • a processor for running a computer program stored in the memory to implement: acquiring a voice instruction input by a user, the voice instruction being used for voice control of a terminal device; determining a working state of the terminal device; and according to the work The state determines a target voice recognition model corresponding to the working state; and uses the target voice recognition model to recognize the voice instruction.
  • a fifth aspect of the present invention is to provide a computer-readable storage medium, where the computer-readable storage medium stores program instructions, and the program instructions are used to implement the speech recognition method according to the first aspect.
  • the speech recognition method, device, shooting system and computer-readable storage medium provided by the present invention determine the working state of the terminal device, determine a target speech recognition model corresponding to the working state according to the working state, and use the target speech recognition model to Recognition by voice instructions has realized that different voice recognition models can be used according to different working conditions of the terminal equipment. This not only reduces the complexity of the training of the voice recognition model, but also improves the efficiency and accuracy of the voice recognition of the terminal equipment. For the intelligent development of terminal equipment, the practicability of the method is improved.
  • FIG. 1 is a schematic flowchart of a speech recognition method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of identifying the voice instruction by using the target voice recognition model according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of identifying the keywords by using the target speech recognition model according to an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a process of identifying a keyword according to probability information of each standard keyword relative to the keyword according to an embodiment of the present invention
  • FIG. 5 is a schematic flowchart of determining a target keyword corresponding to the keyword among a plurality of standard keywords according to the probability information and relative probability information according to an embodiment of the present invention
  • FIG. 6 is a schematic flowchart of determining a working state of the terminal device according to an embodiment of the present invention.
  • FIG. 7 is a schematic flowchart of a speech recognition method according to an embodiment of the present invention.
  • FIG. 8 is a first schematic structural diagram of a speech recognition device according to an embodiment of the present invention.
  • FIG. 9 is a second schematic structural diagram of a voice recognition device according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a voice recognition method according to an embodiment of the present invention
  • FIG. 6 is a schematic flowchart of determining a working state of a terminal device according to an embodiment of the present invention
  • a speech recognition method that can recognize speech quickly and accurately. Specifically, the method includes:
  • S101 Acquire a voice instruction input by a user, and the voice instruction is used for voice control of the terminal device;
  • the terminal device may be one or more of a smart phone, a vehicle-mounted terminal, a photographing device, and a wearable device (watch or bracelet); the acquired voice instruction may perform voice control on the terminal device.
  • the terminal device may have different working states in specific applications.
  • the working state of the smart phone may include at least one of the following: standby state, call state, etc.
  • the working state of the in-vehicle terminal may include at least one of the following: music playback state, navigation state, image display state, etc .
  • the working state of the shooting device includes at least one of the following: standby state, camera operation Status, camera working status, etc.
  • different working modes may have different scene modes, for example, when the terminal device is a smart phone and the smart phone is in a standby state, it may have: a Bluetooth connection mode, a WiFi connection mode, a flight mode, etc .; when When the terminal device is a photographing device and the photographing device is in a photographing working state, it may have a time-lapse photography mode, a slow-motion photography mode, a normal photography mode, and the like.
  • the working state of the terminal device can be determined.
  • the specific determination method of the working state is not limited in this embodiment, and those skilled in the art can determine according to specific design requirements Set, for example: different working states of each terminal device can correspond to different ranges of voltage information and current information, and then obtain the voltage information and / or current information of the terminal device, and determine the terminal based on the voltage information and / or current information The working status of the device.
  • different working states of each terminal device can correspond to different ranges of voltage information and current information, and then obtain the voltage information and / or current information of the terminal device, and determine the terminal based on the voltage information and / or current information The working status of the device.
  • those skilled in the art can also use other methods to determine the working status of the terminal device, as long as the accuracy of the working status determination can be ensured, and details are not described herein again.
  • a state machine be provided in the terminal device, and the state machine stores state data corresponding to the working state of the terminal device; at this time, the terminal device is determined
  • the working status can include:
  • a query instruction may be sent to the state machine, so that the state machine determines state data corresponding to the current working state according to the query instruction, so that the state data can be obtained.
  • S1022 Determine the working status of the terminal device according to the status data.
  • the state data stored in the state machine corresponds to the current working state of the terminal device, and different state data corresponds to different working states. Therefore, after obtaining the state data, the terminal device can be determined according to the state data For example: when the terminal device is a photographing device and the acquired status data is 01, according to the status data 01, it can be determined that the working status of the terminal device is standby; when the acquired status data is 02 According to the status data 02, it can be determined that the working status of the terminal device is a camera working status and the like.
  • S103 Determine a target speech recognition model corresponding to the working state according to the working state
  • each working state on the terminal device can correspond to a speech recognition model, and each speech recognition model has a different recognition database; for example: when the terminal device is a smart phone, and the smart phone's working state When it is in a call state, the recognition database in the target speech recognition model corresponding to the working state at this time may include the following recognition keywords: hold call, hang up call, mute, hands-free, etc .; when the working state of the smartphone is in a standby state At this time, the recognition database in the target speech recognition model corresponding to the working state at this time may include the following recognition keywords: making a call, playing music, finding, searching, etc .; and when the terminal device is a shooting device, and the working state of the shooting device When it is in the camera state, the recognition database in the target voice recognition model corresponding to the working state at this time may include the following recognition keywords: stop the camera, highlight, etc .; when the terminal device is a shooting device, and the working state of the shooting device is taking a picture State, the target voice
  • a corresponding target speech recognition model can be determined according to the working state, and the speech recognition model is used to perform speech recognition on the terminal device in the corresponding working state.
  • S104 Recognize a voice instruction by using a target voice recognition model.
  • the target speech recognition model can be used to recognize the voice instructions input by the user, thereby improving the accuracy and efficiency of speech recognition.
  • the voice recognition method provided in this embodiment determines a working state of a terminal device, determines a target voice recognition model corresponding to the working state according to the working state, and uses the target voice recognition model to recognize a voice instruction.
  • Different working states of the device use different speech recognition models, which not only reduces the complexity of training the speech recognition model, but also improves the efficiency and accuracy of speech recognition of the terminal device, which helps the intelligent development of the terminal device, thereby improving The practicality of this method.
  • FIG. 2 is a schematic flowchart of identifying a voice instruction by using a target voice recognition model according to an embodiment of the present invention; based on the above embodiment, and referring to FIG. 2, it can be known that the embodiment uses a target voice recognition model to perform voice instructions.
  • the specific implementation of the recognition is not limited. Those skilled in the art can set it according to specific design requirements. More preferably, the use of the target voice recognition model to identify voice instructions in this embodiment may include:
  • the voice command input by the user is "Please help me play a song”
  • feature extraction processing is performed on the voice command.
  • the keywords corresponding to the voice command can be "play, song "For another example, the voice command input by the user is” Please help me to turn on the hands-free ".
  • feature extraction processing is performed on the voice command.
  • the keywords corresponding to the voice command can be" open, Hands-free "and so on.
  • S1042 Recognize keywords using a target speech recognition model.
  • the target voice recognition model can be used to identify the keywords, which can improve the The efficiency and accuracy of voice commands for recognition.
  • FIG. 3 is a schematic flowchart of keyword recognition using a target speech recognition model according to an embodiment of the present invention. based on the above embodiment, and referring to FIG. 3, it can be seen that the keyword recognition process in this embodiment is An achievable way is: using a target speech recognition model to identify keywords may include:
  • the target speech recognition model may include one or more standard keywords, and the probability information of the standard keywords relative to the keywords is used to identify the similarity between the keywords and the standard keywords. Therefore, each standard keyword may be based on The degree of similarity with keywords to determine the probability information of each standard keyword relative to the keywords.
  • the target speech recognition model includes: a first standard keyword, a second standard keyword, and a third standard keyword.
  • the keywords are analyzed and processed through the target speech recognition model to obtain the first standard keyword.
  • the degree of similarity with respect to the keyword is S1, and the corresponding probability information that can be obtained according to the similarity degree S1 is P1.
  • the degree of similarity between the second standard keyword and the keyword is S2, and the corresponding degree can be obtained according to the similarity degree S2.
  • the probability information is P2, and the similarity degree of the third standard keyword with respect to the keywords is S3.
  • the keywords are identified according to the probability information of each standard keyword relative to the keywords.
  • identifying the keywords according to the probability information of each standard keyword relative to the keywords may include:
  • S104221 Determine a target keyword corresponding to the keyword among a plurality of standard keywords according to the probability information of each standard keyword relative to the keyword.
  • the target keywords can be determined based on the magnitude relationship of the probability information.
  • determining a target keyword corresponding to the keyword among a plurality of standard keywords according to the probability information of each standard keyword relative to the keyword may include:
  • a standard keyword corresponding to the probability information is determined as a target keyword.
  • the probability threshold is set in advance, and those skilled in the art can set it according to specific design requirements.
  • the probability threshold can be 90%, 95%, or 98%, etc.
  • the probability information can be analyzed and compared with the probability threshold.
  • the comparison result is that the probability information of a standard keyword relative to the keyword is greater than a preset probability threshold, the standard keyword and the voice instruction are explained.
  • the similarity of the keywords is high.
  • the standard keywords corresponding to the probability information can be determined as the target keywords.
  • the probability information of the first standard keyword relative to the keyword in the target speech recognition model is 0.93
  • the probability information of the second standard keyword relative to the keyword is 0.02
  • the third standard keyword relative to the keyword The probability information is 0.05.
  • the probability information of the first standard keyword relative to the keyword is 0.93 greater than the probability threshold of 0.9, which indicates that the first standard keyword and the speech The similarity of the commanded keywords is high, and then the first standard keyword can be determined as the target keyword.
  • FIG. 4 is a schematic flowchart of identifying keywords according to probability information of each standard keyword relative to keywords according to an embodiment of the present invention.
  • FIG. 4 it can be known that for keywords In terms of the specific identification processing process, another achievable way is: performing identification processing on keywords based on the probability information of each standard keyword relative to the keywords may include:
  • S201 Determine the relative probability information between the standard keywords and other standard keywords according to the probability information of each standard keyword relative to the keywords;
  • the relative probability information is used to identify the degree of similarity of the probability information of different standard keywords relative to the same keyword.
  • the relative probability information may be the difference between the two probability information of the two standard keywords relative to the keywords.
  • the ratio of the value to the probability information of one of the standard keywords, that is, the relative probability information may be: (first probability information-second probability information) / first probability information, or the relative probability information may be: (first Probability information-second probability information) / second probability information, and the relative probability information is greater than or equal to 0, here the relative probability information is only an example, and can be obtained by other methods, such as the difference between two probability information or two Ratio of probability information and so on.
  • the probability information of the first standard keyword relative to the keyword is P1
  • the probability information of the second standard keyword relative to the keyword is P2
  • the probability information of the third standard keyword relative to the keyword is P3,
  • the relative probability information between the first and second standard keywords can be obtained as (P1-P2) / P1, (P2-P1) / P1, (P1-P2) / P2, or (P2 -P1) / P2
  • the relative probability information between the first standard keyword and the third standard keyword can be (P1-P3) / P1, (P1-P3) / P3, (P3-P1) / P1, or ( P3-P1) / P3,
  • the relative probability information between the second and third standard keywords may be (P2-P3) / P2, (P3-P2) / P2, (P2-P3) / P3 Or (P3-P2) / P3.
  • S202 Determine a target keyword corresponding to the keyword among a plurality of standard keywords according to the probability information and the relative probability information.
  • the probability information and the relative probability information can be analyzed and processed to determine a target keyword corresponding to the keyword.
  • the probability information Identifying a target keyword corresponding to a keyword among a plurality of standard keywords and relative probability information may include:
  • the probability threshold and the relative probability threshold are set in advance, and those skilled in the art may set according to specific design requirements.
  • the probability threshold may be 0.6, 0.55, or 0.5, etc.
  • the relative probability threshold may be 0.1. , 0.05, 0.01, 0.15, etc.
  • the probability information can be analyzed and compared with a preset probability threshold, and the relative probability information can be analyzed and compared with the relative probability threshold. If the probability information of a standard keyword relative to a keyword is greater than a preset probability threshold, and the relative probability information is greater than or equal to the relative probability threshold, then the similarity between the standard keyword and the keyword of the voice command is high. At this time, the standard keyword corresponding to the standard keyword corresponding to the probability information and the relative probability information may be determined as the target keyword.
  • the method may further include:
  • the analysis result of probability information and relative probability information is that the probability information of a standard keyword relative to a keyword is greater than a preset probability threshold, and the relative probability information is less than the relative probability threshold, then it means that there are two probabilities Standard keywords with similar information, these two standard keywords may be a first standard keyword and a second standard keyword corresponding to the relative probability information.
  • the probability information of the first standard keyword relative to the keyword in the target speech recognition model is 0.53, the probability information of the second standard keyword relative to the keyword is 0.46, and the third standard keyword relative to the keyword
  • the probability information is 0.01, and the relative probability information between the first standard keyword and the second standard keyword can be determined through the above probability information.
  • the relative probability information between the first standard keyword and the third standard keyword can be determined. Is 0.981, and the relative probability information between the second standard keyword and the third standard keyword may be 0.978, and then the above probability information and relative probability information are analyzed with preset probability thresholds 0.5 and relative probability thresholds 0.15, respectively.
  • the probability information 0.53 is greater than the probability threshold 0.5, and the relative probability information 0.132 is less than the relative probability threshold 0.15.
  • the two standard keywords corresponding to the relative probability information 0.132 are closer in probability information, which can be based on the relative probability information of 0.132 Keywords set the first standard and the second standard keywords.
  • S2023 Determine the first standard keyword or the second standard keyword as the target keyword according to a preset priority processing strategy.
  • the target keywords can be determined based on a preset priority processing strategy, wherein the priority processing strategy can be set artificially by the user, and can be specifically based on the needs of the application scenario or the use requirements To determine the priority of the first standard keyword and the second standard keyword. Therefore, the target keyword may be the first standard keyword, or the target keyword may also be the second standard keyword.
  • Determining the target keywords through the above methods effectively ensures the accuracy and reliability of the target keyword determination, and further improves the accuracy of the method.
  • the method in this embodiment further includes:
  • S301 Control the terminal device to perform a corresponding operation according to the target keyword.
  • the terminal device can be controlled according to the operation corresponding to the target keyword.
  • the terminal device can be controlled to perform operations in the current working state, and the terminal device can be controlled to switch from the current working state to other Working status; for example: when the terminal device is a photographing device and the current working status is a photographing working status, at this time, assuming that the target keyword is "continuous shooting", the shooting device can be controlled to perform continuous shooting according to the target keyword Operation; assuming that the target keyword is "camera", according to the target keyword, the photographing device can be controlled to switch from the photographing work state to the photographing work state, and perform the photographing operation in the photographing work state.
  • the target speech recognition model may include one or more state keywords related to the switching state; further, controlling the terminal device to perform corresponding operations according to the target keywords may include:
  • the terminal device is a photographing device, and the current working state of the photographing device is a photographing working state.
  • the state keywords in the target speech recognition model may include "camera”, and the state keywords are related to the camera work in the photographing device.
  • the state corresponds, so when the target keyword is "camera", the terminal device can be controlled to switch from the current photographing work state to the camera work state according to the target keyword, and the terminal device can be further controlled to perform the camera operation according to the target keyword .
  • the terminal device can be controlled to perform corresponding operations according to the target keyword, which effectively improves the convenience of the user in controlling the terminal device, and further improves the practicability of the method.
  • this application embodiment provides a voice recognition method.
  • this application embodiment uses a shooting device as a terminal device as an example.
  • the shooting devices may have different , And different voice recognition models in different states.
  • the voice commands recognized by the voice recognition model are: A1 (such as taking pictures), B1 (such as video), C1 (Such as shutdown), at this time, the voice recognition model corresponding to the voice recognition function in the shooting state is the first voice recognition model;
  • the voice instructions that the voice recognition model can recognize are: A2 (such as Stop), B2 (such as highlighting), etc.
  • the voice recognition model corresponding to the voice recognition function in the shooting state is the second voice recognition model.
  • the speech instruction words to be recognized are actually different, and the required speech recognition models are also different.
  • the speech recognition method includes the following steps:
  • a1 Get the voice command input by the user
  • a2 Use preset preprocessors to process voice instructions, such as smoothing, filtering, or denoising processing to reduce the interference and influence of external environmental factors on voice instructions;
  • a3 Perform feature extraction processing on the processed voice instructions to obtain keywords corresponding to the voice instructions
  • a4 determine the working state of the shooting device; the working state here is different from the working mode of the shooting device, for example: the shooting device has a photographing mode, a video recording mode, a portrait mode, etc., when the action is not started, the above
  • the working mode can be regarded as a working state, that is, a state in which no continuous action is performed.
  • the shooting device can recognize voice instructions such as “photographing”, “video recording”, and “off”.
  • the shooting device is in a continuous motion working state, such as: the camera is in slow motion video, continuous shooting, 4KP60 video, etc., at this time, the voice command that the camera can recognize can only include "stop", "mark a frame Video "and more.
  • the photographing device is in a video working state, the user issues a “photograph” instruction, and at this time, the photographing device may refuse to recognize and execute.
  • a5 determine the target speech recognition model corresponding to the working state; for example: when the working state of the shooting device is state 1, the corresponding target speech recognition model is the recognition model 1; when the working state of the shooting device is state 2, The corresponding target speech recognition model is recognition model 2, and so on.
  • Different recognition models have different standard keywords.
  • the target speech recognition model is used to recognize the voice instructions to determine the candidate standard keywords.
  • the target speech recognition model is the recognition model 1, which can recognize 4 Voice commands, such as "take a picture", "video", “shut down", and "continuous shooting".
  • a7 Determine a target keyword from the candidate standard keywords, and control the shooting device to perform corresponding operations according to the target keyword.
  • P (photograph) 0.8
  • P (video) 0.01
  • P (shut down) 0.02
  • P (continuous shooting) 0.02
  • P (rejection recognition) 0.45
  • the target keyword can be determined according to a preset priority processing strategy For example, the priority of "rejection recognition” is higher than the priority of "shooting”. At this time, the target keyword can be determined as "rejection recognition”.
  • controlling the shooting device When controlling the shooting device to perform corresponding operations according to the target keywords, it may include controlling the shooting state to change the working state. For example, when the shooting device is in the camera working state and the user issues a "recording" instruction, the shooting device is switched from the camera working state to Camera working status, and can perform video recording operation according to the above-mentioned "video" instruction.
  • the speech recognition method provided by this application embodiment adopts different speech recognition models globally, which can improve the recognition rate of demand instructions of terminal devices in different states, and can be enhanced for specific words in certain states.
  • the camera is in In the video recording state, the device needs only “stop recording” and highlighting “Highlight”, which are different from many kinds in its standby state, such as: start recording, shut down, take pictures, etc., and when the camera is in the standby state, no Need to recognize "stop recording” and other instructions.
  • the speech recognition method logically reduces the possibility of false triggers to unnecessary instructions. Some states of the terminal device logically do not require certain instructions. Direct elimination can prevent false triggers. These useless instructions effectively improve the recognition efficiency and accuracy, reduce the difficulty of training the speech recognition model, effectively ensure the practicability of the method, and be conducive to market promotion and application.
  • FIG. 8 is a first schematic structural diagram of a voice recognition device according to an embodiment of the present invention. referring to FIG. 8, it can be seen that this embodiment provides a voice recognition device that can perform the above-mentioned voice recognition method.
  • the speech recognition device may include:
  • a memory 301 configured to store a computer program
  • the processor 302 is configured to run a computer program stored in the memory 301 to implement: acquiring a voice instruction input by a user, and the voice instruction is used for voice control of the terminal device; determining a working state of the terminal device; and determining a working state according to the working state Corresponding target speech recognition model; use the target speech recognition model to recognize voice instructions.
  • the terminal device may include a photographing device, and the operating state of the photographing device includes at least one of the following: a standby state, a camera operating state, and a video capturing operating state.
  • a state machine is set in the terminal device; when the processor 302 determines the working state of the terminal device, the processor 302 is configured to:
  • the working status of the terminal device is determined according to the status data.
  • the processor 302 when the processor 302 recognizes a voice instruction by using the target speech recognition model, the processor 302 is configured to:
  • the target speech recognition model includes one or more standard keywords; further, when the processor 302 uses the target speech recognition model to identify the keywords, the processor 302 is configured to:
  • the keywords are identified according to the probability information of each standard keyword relative to the keywords.
  • the processor 302 when the processor 302 recognizes keywords based on probability information of each standard keyword relative to the keywords, the processor 302 is configured to:
  • a target keyword corresponding to the keyword is determined from a plurality of standard keywords according to the probability information of each standard keyword relative to the keyword.
  • the processor 302 determines a target keyword corresponding to the keyword among a plurality of standard keywords according to the probability information of each standard keyword relative to the keyword, the processor 302 is configured to:
  • a standard keyword corresponding to the probability information is determined as a target keyword.
  • the processor 302 when the processor 302 recognizes keywords based on probability information of each standard keyword relative to the keywords, the processor 302 is configured to:
  • a target keyword corresponding to the keyword is determined from a plurality of standard keywords according to the probability information and relative probability information.
  • an implementable manner is: when the processor 302 determines a target keyword corresponding to the keyword among a plurality of standard keywords according to the probability information and relative probability information, the processor 302 is configured to:
  • a standard keyword corresponding to the probability information and the relative probability information is determined as the target keyword.
  • processor 302 determines a target keyword corresponding to the keyword among a plurality of standard keywords according to the probability information and relative probability information, the processor 302 is configured to:
  • the probability information is greater than a preset probability threshold and the relative probability information is less than the relative probability threshold, obtaining a first standard keyword and a second standard keyword corresponding to the relative probability information;
  • the first standard keyword or the second standard keyword is determined as the target keyword according to a preset priority processing strategy.
  • processor 302 is further configured to:
  • the terminal device After the target voice recognition model is used to recognize the voice instructions, the terminal device is controlled to perform corresponding operations according to the target keywords.
  • the target speech recognition model includes one or more state keywords related to the switching state. Further, when the processor 302 controls the terminal device to perform a corresponding operation according to the target keyword, the processor 302 is configured to:
  • the terminal device When the target keyword is a state keyword in the target speech recognition model, the terminal device is controlled to switch from the current working state to the working state corresponding to the state keyword according to a voice instruction.
  • the speech recognition apparatus provided in this embodiment can be used to execute the methods corresponding to the embodiments in FIG. 1 to FIG. 7, and the specific implementation manners and beneficial effects thereof are similar, and details are not described herein again.
  • FIG. 9 is a second schematic structural diagram of a voice recognition device according to an embodiment of the present invention. As can be seen with reference to FIG. 9, this embodiment provides another voice recognition device, including:
  • the obtaining module 101 is configured to obtain a voice instruction input by a user, and the voice instruction is used to perform voice control on a terminal device;
  • a determining module 102 configured to determine a working state of a terminal device
  • a processing module 103 configured to determine a target speech recognition model corresponding to the working state according to the working state;
  • the recognition module 104 is configured to recognize a voice instruction by using a target voice recognition model.
  • the acquisition module 101, the determination module 102, the processing module 103, and the recognition module 104 in the voice recognition device provided in this embodiment can be used to execute the methods corresponding to the embodiments in FIG. 1 to FIG. I won't repeat them here.
  • Another aspect of this embodiment provides a photographing system, including:
  • a voice recognition device is communicatively connected with the photographing device, and the voice recognition device includes:
  • a processor for running a computer program stored in the memory to implement: acquiring a voice instruction input by a user, the voice instruction being used for voice control of a terminal device; determining a working state of the terminal device; and according to the work The state determines a target voice recognition model corresponding to the working state; and uses the target voice recognition model to recognize the voice instruction.
  • the voice recognition device in the shooting system provided by this embodiment is the same as the voice recognition device in the embodiment corresponding to FIG. 8.
  • the specific implementation principles and effects are the same.
  • Another aspect of this embodiment provides a computer-readable storage medium.
  • the computer-readable storage medium stores program instructions, and the program instructions are used to implement the speech recognition method in the embodiments corresponding to FIG. 1 to FIG. 7.
  • the related remote control device and method disclosed may be implemented in other ways.
  • the embodiments of the remote control device described above are only schematic.
  • the division of the module or unit is only a logical function division.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of the remote control device or unit, and may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present invention essentially or part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium , Including a number of instructions to cause the computer processor 101 (processor) to perform all or part of the steps of the method described in various embodiments of the present invention.
  • the foregoing storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes.

Abstract

本发明公开了一种语音识别方法、装置、拍摄系统和计算机可读存储介质,方法包括:获取用户输入的语音指令,语音指令用于对终端设备进行语音控制;确定终端设备的工作状态;根据工作状态确定一与工作状态相对应的目标语音识别模型;利用目标语音识别模型对语音指令进行识别。本发明公开的技术方案,通过确定终端设备的工作状态,并根据工作状态确定一与工作状态相对应的目标语音识别模型,并利用目标语音识别模型对语音指令进行识别,实现了可以根据终端设备的不同工作状态采用不同的语音识别模型,这样不仅降低了语音识别模型训练的复杂程度,并且也提高了终端设备的语音识别效率和准确率,有助于终端设备的智能化发展。

Description

语音识别方法、装置、拍摄系统和计算机可读存储介质 技术领域
本发明涉及语音处理技术领域,尤其涉及一种语音识别方法、装置、拍摄系统和计算机可读存储介质。
背景技术
语音作为人际交互最自然的方式,同样适用于人机交互。目前,市场上人机交互的终端设备大多具有图形化操作界面,需要使用者紧盯界面,并需要用手进行操作控制。这样增加了使用者的操作复杂度,因此,为了便于使用者操作,能够识别语音的终端设备应运而生,此时,终端设备可以识别使用者发出的语音指令,并根据所识别出的语音指令执行相应的动作。
现有技术中,大部分的终端设备全局采用单一的语音识别模型,不仅增加了语音识别模型训练的复杂程度,并且也降低了终端设备的语音识别效率,不利于终端设备的智能化发展。
发明内容
本发明提供了一种语音识别方法、装置、拍摄系统和计算机可读存储介质,用于解决现有技术中存在的全局采用单一的语音识别模型,不仅增加了语音识别模型训练的复杂程度,并且也降低了终端设备的语音识别效率,不利于终端设备智能化发展的问题。
本发明的第一方面是为了提供一种语音识别方法,包括:
获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;
确定所述终端设备的工作状态;
根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;
利用所述目标语音识别模型对所述语音指令进行识别。
本发明的第二方面是为了提供一种语音识别装置,包括:
存储器,用于存储计算机程序;
处理器,用于运行所述存储器中存储的计算机程序以实现:获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;确定所述终端设备的工作状态;根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;利用所述目标语音识别模型对所述语音指令进行识别。
本发明的第三方面是为了提供一种语音识别装置,包括:
获取模块,用于获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;
确定模块,用于确定所述终端设备的工作状态;
处理模块,用于根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;
识别模块,用于利用所述目标语音识别模型对所述语音指令进行识别。
本发明的第四方面是为了提供一种拍摄系统,包括:
拍摄装置;
语音识别装置,与所述拍摄装置通信连接,所述语音识别装置包括:
存储器,用于存储计算机程序;
处理器,用于运行所述存储器中存储的计算机程序以实现:获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;确定所述终端设备的工作状态;根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;利用所述目标语音识别模型对所述语音指令进行识别。
本发明的第五方面是为了提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序指令,所述程序指令用于实现上述第一方面所述的语音识别方法。
本发明提供的语音识别方法、装置、拍摄系统和计算机可读存储介质,通过确定终端设备的工作状态,根据工作状态确定一与工作状态相对应的目标语音识别模型,并利用目标语音识别模型对语音指令进行识别,实现了可以根据终端设备的不同工作状态采用不同的语音识别模型,这样不仅降低了语音识别模型训练的复杂程度,同时也提高了终端设备的语音识别效率和准确率,有助于终端设备的智能化发展,从而提高了该方法的实用性。
附图说明
图1为本发明实施例提供的一种语音识别方法的流程示意图;
图2为本发明实施例提供的利用所述目标语音识别模型对所述语音指令进行识别的流程示意图;
图3为本发明实施例提供的利用所述目标语音识别模型对所述关键词进行识别的流程示意图;
图4为本发明实施例提供的根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理的流程示意图;
图5为本发明实施例提供的根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词的流程示意图;
图6为本发明实施例提供的确定所述终端设备的工作状态的流程示意图;
图7为本发明应用实施例提供的一种语音识别方法的流程示意图;
图8为本发明实施例提供的一种语音识别装置的结构示意图一;
图9为本发明实施例提供的一种语音识别装置的结构示意图二。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。
下面结合附图,对本发明的一些实施方式作详细说明。在各实施例之间不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。
图1为本发明实施例提供的一种语音识别方法的流程示意图,图6为本发明实施例提供的确定终端设备的工作状态的流程示意图;参考附图1、6可知,本实施例提供了一种语音识别方法,该语音识别方法可以快速、准确地 对语音进行识别,具体的,该方法包括:
S101:获取用户输入的语音指令,语音指令用于对终端设备进行语音控制;
其中,终端设备可以为智能手机、车载终端、拍摄装置和穿戴式设备(手表或者手环)中的一种或多种;所获取的语音指令可以对上述的终端设备进行语音控制。
S102:确定终端设备的工作状态;
其中,终端设备在具体应用时可以具有不同的工作状态,例如:当终端设备为智能手机时,智能手机的工作状态可以包括以下至少之一:待机状态、通话状态等,当终端设备为车载终端时,车载终端的工作状态可以包括以下至少之一:音乐播放状态、导航状态、图像显示状态等;当终端设备为拍摄装置时,拍摄装置的工作状态包括以下至少之一:待机状态、相机工作状态、摄像工作状态等等。进一步的,不同的工作状态下可以具有不同的情景模式,例如:当终端设备为智能手机时,且智能手机处于待机状态下,可以具有:蓝牙连接模式、WiFi连接模式、飞行模式等等;当终端设备为拍摄装置,且拍摄装置处于摄像工作状态时,可以具有:延时拍摄模式、慢动作拍摄模式、正常拍摄模式等等。
因此,为了提高对终端设备语音控制的精确度和识别效率,可以确定终端设备的工作状态,其中,本实施例对于工作状态的具体确定方式不做限定,本领域技术人员可以根据具体的设计需求进行设置,例如:每个终端设备的不同工作状态可以对应有不同范围的电压信息和电流信息,进而可以获取终端设备的电压信息和/或电流信息,根据电压信息和/或电流信息来确定终端设备的工作状态。当然的,本领域技术人员还可以采用其他的方式来确定终端设备的工作状态,只要能够保证工作状态确定的准确性即可,在此不再赘述。
为了提高对工作状态确定的效率和准确性,较为优选的,在终端设备中可以设置有状态机,该状态机中存储有与终端设备的工作状态相对应的状态数据;此时,确定终端设备的工作状态可以包括:
S1021:获取状态机中存储的状态数据;
具体的,可以向状态机发送查询指示,以使得状态机根据查询指示确定与当前工作状态相对应的状态数据,从而可以获取到状态数据。
S1022:根据状态数据确定终端设备的工作状态。
其中,状态机中所存储的状态数据与终端设备的当前工作状态相对应,并且,不同的状态数据与不同的工作状态相对应,因此,在获取到状态数据之后,可以根据状态数据确定终端设备的工作状态;举例来说:在终端设备为拍摄装置,且所获取的状态数据为01时,根据该状态数据01可以确定终端设备的工作状态为待机状态;在所获取的状态数据为02时,根据该状态数据02可以确定终端设备的工作状态为相机工作状态等等。
S103:根据工作状态确定一与工作状态相对应的目标语音识别模型;
其中,终端设备上的每一个工作状态均可以对应有一个语音识别模型,而每个语音识别模型中存在有不同的识别数据库;举例来说:当终端设备为智能手机,且智能手机的工作状态为通话状态时,此时工作状态所对应的目标语音识别模型中的识别数据库可以包括以下识别关键词:保持通话、挂断通话、静音、免提等等;当智能手机的工作状态为待机状态时,此时工作状态所对应的目标语音识别模型中的识别数据库可以包括以下识别关键词:拨打电话、播放音乐、查找、搜索等等;而当终端设备为拍摄装置,且拍摄装置的工作状态为摄像状态时,此时工作状态所对应的目标语音识别模型中的识别数据库可以包括以下识别关键词:停止摄像、高亮等等;当终端设备为拍摄装置,且拍摄装置的工作状态为拍照状态时,此时工作状态所对应的目标语音识别模型中的识别数据库可以包括以下识别关键词:开始录像、关机、拍照等等。
因此,在确定工作状态之后,可以根据工作状态确定相对应的目标语音识别模型,该语音识别模型用于对在相应工作状态下的终端设备进行语音识别。
S104:利用目标语音识别模型对语音指令进行识别。
在确定目标语音识别模型之后,可以利用目标语音识别模型对用户输入的语音指令进行识别,从而提高了语音识别的准确率和效率。
本实施例提供的语音识别方法,通过确定终端设备的工作状态,根据工作状态确定一与工作状态相对应的目标语音识别模型,并利用目标语音识别模型对语音指令进行识别,实现了可以根据终端设备的不同工作状态采用不同的语音识别模型,这样不仅降低了语音识别模型训练的复杂程度,同时也 提高了终端设备的语音识别效率和准确率,有助于终端设备的智能化发展,进而提高了该方法的实用性。
图2为本发明实施例提供的利用目标语音识别模型对语音指令进行识别的流程示意图;在上述实施例的基础上,继续参考附图2可知,本实施例对于利用目标语音识别模型对语音指令进行识别的具体实现方式不做限定,本领域技术人员可以根据具体的设计需求进行设置,较为优选的,本实施例中的利用目标语音识别模型对语音指令进行识别可以包括:
S1041:对语音指令进行特征提取处理,获取与语音指令相对应的关键词;
举例来说,在用户输入的语音指令为“请帮我播放歌曲”,此时对语音指令进行特征提取处理,经过处理后,可以获取到与语音指令相对应的关键词可以为“播放、歌曲”;再例如,用户输入的语音指令为“请帮我打开免提”,此时对语音指令进行特征提取处理,经过处理后,可以获取到与语音指令相对应的关键词可以为“打开、免提”等等。
S1042:利用目标语音识别模型对关键词进行识别。
由于与语音指令相对应的关键词完全表示出语音指令的含义与意图,因此,在获取到与语音指令相对应的关键词之后,可以利用目标语音识别模型对关键词进行识别,从而可以提高对语音指令进行识别的效率和准确率。
图3为本发明实施例提供的利用目标语音识别模型对关键词进行识别的流程示意图;在上述实施例的基础上,继续参考附图3可知,本实施例对于对关键词进行识别过程而言,一种可实现的方式为:利用目标语音识别模型对关键词进行识别可以包括:
S10421:获取目标语音识别模型中每个标准关键词相对于关键词的概率信息;
其中,目标语音识别模型中可以包括一个或多个标准关键词,而标准关键词相对于关键词的概率信息用于标识关键词与标准关键词的相似程度,因此,可以基于每个标准关键词与关键词的相似程度来确定每个标准关键词相对于关键词的概率信息。
举例来说,目标语音识别模型中包括:第一标准关键词、第二标准关键词和第三标准关键词,通过目标语音识别模型对关键词进行分析处理,可以 获取到上述第一标准关键词相对于关键词的相似程度为S1,根据该相似程度S1可以获取到对应的概率信息为P1,第二标准关键词相对于关键词的相似程度为S2,根据该相似程度S2可以获取到对应的概率信息为P2,第三标准关键词相对于关键词的相似程度为S3,根据该相似程度S3可以获取到对应的概率信息为P3,其中,P1+P2+P3=1,并且,P1>P2>P3,此时则可以说明:第一标准关键词与关键词的相似程度较高,第三标准关键词与关键词的相似程度较低。
S10422:根据每个标准关键词相对于关键词的概率信息对关键词进行识别处理。
其中,对于关键词的具体识别处理过程,一种可实现的方式为:根据每个标准关键词相对于关键词的概率信息对关键词进行识别处理可以包括:
S104221:根据每个标准关键词相对于关键词的概率信息在多个标准关键词中确定一与关键词相对应的目标关键词。
接上述举例说明,在第一标准关键词相对于关键词的概率信息为P1,第二标准关键词相对于关键词的概率信息为P2,第三标准关键词相对于关键词的概率信息为P3,并且,P1>P2>P3,此时,则可以将概率信息的大小关系来确定目标关键词。
较为优选的,根据每个标准关键词相对于关键词的概率信息在多个标准关键词中确定一与关键词相对应的目标关键词可以包括:
在概率信息大于预设的概率阈值时,将概率信息相对应的标准关键词确定为目标关键词。
其中,概率阈值为预先设置的,本领域技术人员可以根据具体的设计需求进行设置,例如,概率阈值可以为90%、95%或者98%等等,在获取到每个标准关键词相对于关键词的概率信息之后,可以将概率信息与概率阈值进行分析比较,当比较结果为某一标准关键词相对于关键词的概率信息大于预设的概率阈值时,则说明该标准关键词与语音指令的关键词的相似度较高,此时,可以将该概率信息所对应的标准关键词确定为目标关键词。
接上述举例说明,目标语音识别模型中的第一标准关键词相对于关键词的概率信息为0.93,第二标准关键词相对于关键词的概率信息为0.02,第三标准关键词相对于关键词的概率信息为0.05,此时,将上述三个概率信息与 概率阈值进行比较可知,第一标准关键词相对于关键词的概率信息为0.93大于概率阈值0.9,则说明第一标准关键词与语音指令的关键词的相似度较高,进而则可以确定第一标准关键词为目标关键词。
图4为本发明实施例提供的根据每个标准关键词相对于关键词的概率信息对关键词进行识别处理的流程示意图,在上述实施例的基础上,继续参考附图4可知,对于关键词的具体识别处理过程而言,另一种可实现的方式为:根据每个标准关键词相对于关键词的概率信息对关键词进行识别处理可以包括:
S201:根据每个标准关键词相对于关键词的概率信息确定标准关键词与其他标准关键词之间的相对概率信息;
其中,该相对概率信息用于标识不同标准关键词相对于同一个关键词的概率信息的近似程度,具体的,相对概率信息可以为两个标准关键词相对于关键词的两个概率信息的差值与其中一个标准关键词的概率信息的比值,也即,相对概率信息可以为:(第一概率信息-第二概率信息)/第一概率信息,或者,相对概率信息可以为:(第一概率信息-第二概率信息)/第二概率信息,并且该相对概率信息大于或等于0,这里相对概率信息只是举例,可以是其它的方法得到的,例如两个概率信息的差值或者两个概率信息的比值等等。
接上述举例说明,第一标准关键词相对于关键词的概率信息为P1,第二标准关键词相对于关键词的概率信息为P2,第三标准关键词相对于关键词的概率信息为P3,此时,可以获取第一标准关键词与第二标准关键词之间的相对概率信息可以为(P1-P2)/P1、(P2-P1)/P1、(P1-P2)/P2或者(P2-P1)/P2,第一标准关键词与第三标准关键词之间的相对概率信息可以为(P1-P3)/P1、(P1-P3)/P3、(P3-P1)/P1或者(P3-P1)/P3,而第二标准关键词与第三标准关键词之间的相对概率信息可以为(P2-P3)/P2、(P3-P2)/P2、(P2-P3)/P3或者(P3-P2)/P3。
S202:根据概率信息和相对概率信息在多个标准关键词中确定一与关键词相对应的目标关键词。
在获取到概率信息和相对概率信息之后,可以对概率信息和相对概率信息进行分析处理,以确定一个与关键词相对应的目标关键词,具体的,一种 可实现的方式为:根据概率信息和相对概率信息在多个标准关键词中确定一与关键词相对应的目标关键词可以包括:
S2021:在概率信息大于预设的概率阈值,且相对概率信息大于或等于相对概率阈值时,则将概率信息和相对概率信息相对应的标准关键词确定为目标关键词。
其中,概率阈值和相对概率阈值为预先设置的,本领域技术人员可以根据具体的设计需求进行设置,例如,概率阈值可以为0.6、0.55或者0.5等等,相对应的,相对概率阈值可以为0.1、0.05、0.01或者0.15等等,在获取概率信息和相对概率信息之后,可以将概率信息与预先设置的概率阈值进行分析比较,同时,将相对概率信息与相对概率阈值进行分析比较,当比较结果为某一标准关键词相对于关键词的概率信息大于预设的概率阈值,且相对概率信息大于或等于相对概率阈值时,则说明该标准关键词与语音指令的关键词的相似度较高,此时,可以将该概率信息和相对概率信息相对应的标准关键词所对应的标准关键词确定为目标关键词。
继续参考附图5可知,可以理解的是,在根据概率信息和相对概率信息在多个标准关键词中确定一与关键词相对应的目标关键词时,该方法还可以包括:
S2022:在概率信息大于预设的概率阈值,且相对概率信息小于相对概率阈值时,则获取与相对概率信息相对应的第一标准关键词和第二标准关键词;
当对概率信息和相对概率信息的分析结果为某一标准关键词相对于关键词的概率信息大于预设的概率阈值,且相对概率信息小于相对概率阈值时,此时则说明,存在两个概率信息相近的标准关键词,这两个标准关键词可以为与相对概率信息相对应的第一标准关键词和第二标准关键词。
举例来说:目标语音识别模型中的第一标准关键词相对于关键词的概率信息为0.53,第二标准关键词相对于关键词的概率信息为0.46,第三标准关键词相对于关键词的概率信息为0.01,通过上述概率信息可以确定第一标准关键词与第二标准关键词之间的相对概率信息可以为0.132,第一标准关键词与第三标准关键词之间的相对概率信息可以为0.981,而第二标准关键词与第三标准关键词之间的相对概率信息可以为0.978,而后将上述的概率信息和相对概率信息分别与预设的概率阈值0.5和相对概率阈值0.15进行分析比较, 对于第一标准关键词而言,概率信息0.53大于概率阈值0.5,相对概率信息0.132小于相对概率阈值0.15,此时,则说明相对概率信息0.132所对应的两个标准关键词(第一标准关键词和第二标准关键词)的概率信息较为接近,从而可以根据相对概率信息0.132来确定第一标准关键词和第二标准关键词。
S2023:按照预设的优先处理策略将第一标准关键词或者第二标准关键词确定为目标关键词。
在获取到第一标准关键词和第二标准关键词之后,可以基于预设的优先处理策略确定目标关键词,其中,优先处理策略可以为用户人为设置,具体可以基于应用场景的需求或者使用需求来确定第一标准关键词和第二标准关键词的优先级,因此,该目标关键词可以为第一标准关键词,或者,该目标关键词也可以为第二标准关键词。
通过上述方式来确定目标关键词,有效地保证了目标关键词确定的准确可靠性,进而提高了该方法使用的精确程度。
进一步的,在利用目标语音识别模型对语音指令进行识别之后,本实施例中的方法还包括:
S301:根据目标关键词控制终端设备执行相应操作。
在获取到目标关键词之后,可以根据目标关键词所对应的操作对终端设备进行控制,其中,可以控制终端设备在当前工作状态下执行操作,也可以控制终端设备由当前的工作状态切换至其他工作状态;举例来说:在终端设备为拍摄装置,且当前工作状态为拍照工作状态时,此时,假设目标关键词为“连拍”,则可以根据该目标关键词控制拍摄装置执行连续拍照操作;假设目标关键词为“摄像”,则可以根据该目标关键词控制拍摄装置由拍照工作状态切换至摄像工作状态,并在摄像工作状态下执行摄像操作。
具体的,目标语音识别模型中可以包括一个或多个与切换状态相关的状态关键词;进而,根据目标关键词控制终端设备执行相应操作可以包括:
S3011:在目标关键词为目标语音识别模型中的状态关键词时,则根据语音指令控制终端设备由当前的工作状态切换至于状态关键词相对应的工作状态。
举例来说,终端设备为拍摄装置,拍摄装置当前的工作状态为拍照工作 状态,此时,目标语音识别模型中的状态关键词可以包括“摄像”,该状态关键词与拍摄装置中的摄像工作状态相对应,因此,在目标关键词为“摄像”时,可以根据该目标关键词控制终端设备由当前的拍照工作状态切换至摄像工作状态,并可以进一步根据目标关键词控制终端设备执行摄像操作。
本实施例中,在确定目标关键词之后,可以根据目标关键词控制终端设备执行相应操作,有效地提高了用户对终端设备进行控制的方便程度,进而提高了该方法的实用性。
具体应用时,参考附图7所示,本应用实施例提供了一种语音识别方法,为了便于说明,本应用实施例以拍摄装置作为终端设备为例进行说明,其中,该拍摄装置可以具有不同的工作状态,并且不同状态时对应有不同的语音识别模型,例如:拍摄装置处于第一工作状态时,语音识别模型可识别的语音指令有:A1(如拍照)、B1(如录像)、C1(如关机)等,此时,拍摄状态的语音识别功能所对应的语音识别模型为第一语音识别模型;拍摄装置处于第二工作状态时,语音识别模型可识别的语音指令有:A2(如停止)、B2(如高亮)等,此时,拍摄状态的语音识别功能所对应的语音识别模型为第二语音识别模型。依次类推,不同工作状态下,实际上要识别的语音指令词是不相同的,所需要的语音识别模型也不相同。
具体的,语音识别方法包括以下步骤:
a1:获取用户输入的语音指令;
a2:利用预设的预处理器对语音指令进行处理,例如:平滑、过滤或者去噪处理,以降低外接环境因素对语音指令的干扰和影响;
a3:对经过处理后的语音指令进行特征提取处理,获取与语音指令相对应的关键词;
a4:确定拍摄装置的工作状态;其中,此处的工作状态与拍摄装置的工作模式不同,举例来说:拍摄装置具有拍照模式、录像模式、人像模式等,在未开始执行动作时,上述的工作模式可以看作为一个工作状态,即未执行连续性动作的状态,在该工作状态下,拍摄装置可以识别“拍照”、“录像”、“关机”等语音指令。而当拍摄装置处于连续性动作的工作状态,如:相机处于慢动作录像、连续性拍照、4KP60录像等状态时,此时,相机可以识别的语音指令可以只包括“停止”、“标记某帧视频”等。例如:在拍摄装置 处于摄像工作状态下,用户下达“拍照”指令,此时,则拍摄装置可以拒绝识别和执行。
a5:确定与工作状态相对应的目标语音识别模型;例如:当拍摄装置的工作状态为状态1时,所对应的目标语音识别模型为识别模型1;当拍摄装置的工作状态为状态2时,所对应的目标语音识别模型为识别模型2,依次类推,不同的识别模型对应有不同的标准关键词。
a6:在确定目标语音识别模型之后,利用目标语音识别模型对语音指令进行识别,确定出候选的标准关键词;举例来说,目标语音识别模型为识别模型1,该识别模型1可以识别4条语音指令,例如:“拍照”、“录像”、“关机”和“连拍”。那么,通过识别模型1对语音指令进行分析识别后,就可以获取到识别模型1中各个标准关键词所对应的概率值,例如:P(拍照)=0.8,P(录像)=0.05,P(关机)=0.03,P(连拍)=0.1,P(拒绝识别)=0.02,从而确定出候选的标准关键词。
a7:在候选的标准关键词中确定一目标关键词,根据该目标关键词控制拍摄装置执行相应操作。
具体的,目标关键词可以为概率最大的P(拍照)=0.8所对应的标准关键词,并将该标准关键词作为识别出的指令。当然,也可以选取其他概率的标准关键词作为目标关键词,举例来说:在标准关键词的概率信息分别为:P(拍照)=0.5,P(录像)=0.01,P(关机)=0.02,P(连拍)=0.02,P(拒绝识别)=0.45时,由于P(拍照)与P(拒绝识别)的概率值较为接近,此时,可以根据预设的优先处理策略确定目标关键词,例如:“拒绝识别”的优先级要高于“拍摄”的优先级,此时,则可以确定目标关键词为“拒绝识别”。
在根据目标关键词控制拍摄装置执行相应操作时,可以包括控制拍摄状态改变工作状态,例如:在拍摄装置处于相机工作状态时,使用者下达“录像”指令,则拍摄装置由相机工作状态切换至摄像工作状态,并可以根据上述“录像”指令执行录像操作。
本应用实施例提供的语音识别方法,全局采用不同的语音识别模型,可以提高终端设备在不同状态下需求指令的识别率,并可以某些状态下针对特定词汇进行强化,举例来说,相机在录像状态下,设备需求只有“停止录像” 和高亮“Highlight”等,不同于其待机状态下的多种,如:开始录像,关机,拍照等等,并且,在相机处于待机状态下,不需要识别“停止录像”等指令。另外,该语音识别方法从逻辑角度来讲,减少了误触发到不必要的指令上去的可能性,终端设备的有些状态从逻辑上讲是不需要某些指令的,直接剔除可防止误触发到这些无用指令;从而有效地提高了识别效率和准确性,并降低了语音识别模型的训练难度,有效地保证了该方法的实用性,有利于市场的推广与应用。
图8为本发明实施例提供的一种语音识别装置的结构示意图一;参考附图8可知,本实施例提供了一种语音识别装置,该语音识别装置可以执行上述的语音识别方法,具体的,语音识别装置可以包括:
存储器301,用于存储计算机程序;
处理器302,用于运行存储器301中存储的计算机程序以实现:获取用户输入的语音指令,语音指令用于对终端设备进行语音控制;确定终端设备的工作状态;根据工作状态确定一与工作状态相对应的目标语音识别模型;利用目标语音识别模型对语音指令进行识别。
其中,终端设备可以包括:拍摄装置,拍摄装置的工作状态包括以下至少之一:待机状态,相机工作状态、摄像工作状态。
另外,终端设备中设置有状态机;在处理器302确定终端设备的工作状态时,处理器302被配置为:
获取状态机中存储的状态数据;
根据状态数据确定终端设备的工作状态。
进一步的,在处理器302利用目标语音识别模型对语音指令进行识别时,处理器302被配置为:
对语音指令进行特征提取处理,获取与语音指令相对应的关键词;
利用目标语音识别模型对关键词进行识别。
其中,目标语音识别模型中包括一个或多个标准关键词;进而,在处理器302利用目标语音识别模型对关键词进行识别时,处理器302被配置为:
获取目标语音识别模型中每个标准关键词相对于关键词的概率信息;
根据每个标准关键词相对于关键词的概率信息对关键词进行识别处理。
可选的,在处理器302根据每个标准关键词相对于关键词的概率信息对关键词进行识别处理时,处理器302被配置为:
根据每个标准关键词相对于关键词的概率信息在多个标准关键词中确定一与关键词相对应的目标关键词。
具体的,在处理器302根据每个标准关键词相对于关键词的概率信息在多个标准关键词中确定一与关键词相对应的目标关键词时,处理器302被配置为:
在概率信息大于预设的概率阈值时,将概率信息相对应的标准关键词确定为目标关键词。
可选的,在处理器302根据每个标准关键词相对于关键词的概率信息对关键词进行识别处理时,处理器302被配置为:
根据每个标准关键词相对于关键词的概率信息确定标准关键词与其他标准关键词之间的相对概率信息;
根据概率信息和相对概率信息在多个标准关键词中确定一与关键词相对应的目标关键词。
其中,一种可实现的方式为:在处理器302根据概率信息和相对概率信息在多个标准关键词中确定一与关键词相对应的目标关键词时,处理器302被配置为:
在概率信息大于预设的概率阈值,且相对概率信息大于或等于相对概率阈值时,则将概率信息和相对概率信息相对应的标准关键词确定为目标关键词。
另一种可实现的方式为,在处理器302根据概率信息和相对概率信息在多个标准关键词中确定一与关键词相对应的目标关键词时,处理器302被配置为:
在概率信息大于预设的概率阈值,且相对概率信息小于相对概率阈值时,则获取与相对概率信息相对应的第一标准关键词和第二标准关键词;
按照预设的优先处理策略将第一标准关键词或者第二标准关键词确定为目标关键词。
进一步的,处理器302还用于:
在利用目标语音识别模型对语音指令进行识别之后,根据目标关键词控 制终端设备执行相应操作。
其中,目标语音识别模型中包括一个或多个与切换状态相关的状态关键词;进而,在处理器302根据目标关键词控制终端设备执行相应操作时,处理器302被配置为:
在目标关键词为目标语音识别模型中的状态关键词时,则根据语音指令控制终端设备由当前的工作状态切换至于状态关键词相对应的工作状态。
本实施例提供的语音识别装置能够用于执行图1-图7实施例所对应的方法,其具体执行方式和有益效果类似,在这里不再赘述。
图9为本发明实施例提供的一种语音识别装置的结构示意图二,参考附图9可知,本实施例提供了另一种语音识别装置,包括:
获取模块101,用于获取用户输入的语音指令,语音指令用于对终端设备进行语音控制;
确定模块102,用于确定终端设备的工作状态;
处理模块103,用于根据工作状态确定一与工作状态相对应的目标语音识别模型;
识别模块104,用于利用目标语音识别模型对语音指令进行识别。
本实施例提供的语音识别装置中的获取模块101、确定模块102、处理模块103以及识别模块104能够用于执行图1-图7实施例所对应的方法,其具体执行方式和有益效果类似,在这里不再赘述。
本实施例的又一方面提供了一种拍摄系统,包括:
拍摄装置;以及,
语音识别装置,与所述拍摄装置通信连接,所述语音识别装置包括:
存储器,用于存储计算机程序;
处理器,用于运行所述存储器中存储的计算机程序以实现:获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;确定所述终端设备的工作状态;根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;利用所述目标语音识别模型对所述语音指令进行识别。
本实施例提供的拍摄系统中的语音识别装置与图8所对应实施例中的语 音识别装置中的具体实现原理与实现效果相同,具体可参考上述陈述内容,在这里不再赘述。
本实施例的再一方面提供了一种计算机可读存储介质,该计算机可读存储介质中存储有程序指令,程序指令用于实现图1-图7所对应的实施例中的语音识别方法。
以上各个实施例中的技术方案、技术特征在与本相冲突的情况下均可以单独,或者进行组合,只要未超出本领域技术人员的认知范围,均属于本申请保护范围内的等同实施例。
在本发明所提供的几个实施例中,应该理解到,所揭露的相关遥控装置和方法,可以通过其它的方式实现。例如,以上所描述的遥控装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,遥控装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得计算机处理器101(processor)执行本发 明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或者光盘等各种可以存储程序代码的介质。
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (37)

  1. 一种语音识别方法,其特征在于,包括:
    获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;
    确定所述终端设备的工作状态;
    根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;
    利用所述目标语音识别模型对所述语音指令进行识别。
  2. 根据权利要求1所述的方法,其特征在于,利用所述目标语音识别模型对所述语音指令进行识别,包括:
    对所述语音指令进行特征提取处理,获取与所述语音指令相对应的关键词;
    利用所述目标语音识别模型对所述关键词进行识别。
  3. 根据权利要求2所述的方法,其特征在于,所述目标语音识别模型中包括一个或多个标准关键词;利用所述目标语音识别模型对所述关键词进行识别,包括:
    获取所述目标语音识别模型中每个标准关键词相对于所述关键词的概率信息;
    根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理。
  4. 根据权利要求3所述的方法,其特征在于,根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理,包括:
    根据每个标准关键词相对于所述关键词的概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词。
  5. 根据权利要求4所述的方法,其特征在于,根据每个标准关键词相对于所述关键词的概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词,包括:
    在所述概率信息大于预设的概率阈值时,将所述概率信息相对应的标准关键词确定为所述目标关键词。
  6. 根据权利要求3所述的方法,其特征在于,根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理,包括:
    根据每个标准关键词相对于所述关键词的概率信息确定所述标准关键词与其他标准关键词之间的相对概率信息;
    根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词。
  7. 根据权利要求6所述的方法,其特征在于,根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词,包括:
    在所述概率信息大于预设的概率阈值,且所述相对概率信息大于或等于相对概率阈值时,则将所述概率信息和相对概率信息相对应的标准关键词确定为所述目标关键词。
  8. 根据权利要求6所述的方法,其特征在于,根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词,包括:
    在所述概率信息大于预设的概率阈值,且所述相对概率信息小于相对概率阈值时,则获取与所述相对概率信息相对应的第一标准关键词和第二标准关键词;
    按照预设的优先处理策略将所述第一标准关键词或者第二标准关键词确定为所述目标关键词。
  9. 根据权利要求4-8中任意一项所述的方法,其特征在于,在利用所述目标语音识别模型对所述语音指令进行识别之后,所述方法还包括:
    根据所述目标关键词控制所述终端设备执行相应操作。
  10. 根据权利要求9所述的方法,其特征在于,所述目标语音识别模型中包括一个或多个与切换状态相关的状态关键词;根据所述目标关键词控制所述终端设备执行相应操作,包括:
    在所述目标关键词为所述目标语音识别模型中的状态关键词时,则根据所述语音指令控制所述终端设备由当前的工作状态切换至于所述状态关键词相对应的工作状态。
  11. 根据权利要求1-8中任意一项所述的方法,其特征在于,所述终端设备中设置有状态机;确定所述终端设备的工作状态,包括:
    获取所述状态机中存储的状态数据;
    根据所述状态数据确定所述终端设备的工作状态。
  12. 根据权利要求1-8中任意一项所述的方法,其特征在于,所述终端设备包括:拍摄装置,所述拍摄装置的工作状态包括以下至少之一:待机状态、相机工作状态、摄像工作状态。
  13. 一种语音识别装置,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于运行所述存储器中存储的计算机程序以实现:获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;确定所述终端设备的工作状态;根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;利用所述目标语音识别模型对所述语音指令进行识别。
  14. 根据权利要求13所述的装置,其特征在于,在所述处理器利用所述目标语音识别模型对所述语音指令进行识别时,所述处理器被配置为:
    对所述语音指令进行特征提取处理,获取与所述语音指令相对应的关键词;
    利用所述目标语音识别模型对所述关键词进行识别。
  15. 根据权利要求14所述的装置,其特征在于,所述目标语音识别模型中包括一个或多个标准关键词;在所述处理器利用所述目标语音识别模型对所述关键词进行识别时,所述处理器被配置为:
    获取所述目标语音识别模型中每个标准关键词相对于所述关键词的概率信息;
    根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理。
  16. 根据权利要求15所述的装置,其特征在于,在所述处理器根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理时,所述处理器被配置为:
    根据每个标准关键词相对于所述关键词的概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词。
  17. 根据权利要求16所述的装置,其特征在于,在所述处理器根据每个标准关键词相对于所述关键词的概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词时,所述处理器被配置为:
    在所述概率信息大于预设的概率阈值时,将所述概率信息相对应的标准关键词确定为所述目标关键词。
  18. 根据权利要求15所述的装置,其特征在于,在所述处理器根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理时,所述处理器被配置为:
    根据每个标准关键词相对于所述关键词的概率信息确定所述标准关键词与其他标准关键词之间的相对概率信息;
    根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词。
  19. 根据权利要求18所述的装置,其特征在于,在所述处理器根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词时,所述处理器被配置为:
    在所述概率信息大于预设的概率阈值,且所述相对概率信息大于或等于相对概率阈值时,则将所述概率信息和相对概率信息相对应的标准关键词确定为所述目标关键词。
  20. 根据权利要求18所述的装置,其特征在于,在所述处理器根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词时,所述处理器被配置为:
    在所述概率信息大于预设的概率阈值,且所述相对概率信息小于相对概率阈值时,则获取与所述相对概率信息相对应的第一标准关键词和第二标准关键词;
    按照预设的优先处理策略将所述第一标准关键词或者第二标准关键词确定为所述目标关键词。
  21. 根据权利要求16-20中任意一项所述的装置,其特征在于,所述处理器,还用于:
    在利用所述目标语音识别模型对所述语音指令进行识别之后,根据所述目标关键词控制所述终端设备执行相应操作。
  22. 根据权利要求21所述的装置,其特征在于,所述目标语音识别模型中包括一个或多个与切换状态相关的状态关键词;在所述处理器根据所述目标关键词控制所述终端设备执行相应操作时,所述处理器被配置为:
    在所述目标关键词为所述目标语音识别模型中的状态关键词时,则根据所述语音指令控制所述终端设备由当前的工作状态切换至于所述状态关键词相对应的工作状态。
  23. 根据权利要求13-20中任意一项所述的装置,其特征在于,所述终端设备中设置有状态机;在所述处理器确定所述终端设备的工作状态时,所述处理器被配置为:
    获取所述状态机中存储的状态数据;
    根据所述状态数据确定所述终端设备的工作状态。
  24. 根据权利要求13-20中任意一项所述的装置,其特征在于,所述终端设备包括:拍摄装置,所述拍摄装置的工作状态包括以下至少之一:待机状态、相机工作状态、摄像工作状态。
  25. 一种拍摄系统,其特征在于,包括:
    拍摄装置;
    语音识别装置,与所述拍摄装置通信连接,所述语音识别装置包括:
    存储器,用于存储计算机程序;
    处理器,用于运行所述存储器中存储的计算机程序以实现:获取用户输入的语音指令,所述语音指令用于对终端设备进行语音控制;确定所述终端设备的工作状态;根据所述工作状态确定一与所述工作状态相对应的目标语音识别模型;利用所述目标语音识别模型对所述语音指令进行识别。
  26. 根据权利要求25所述的系统,其特征在于,在所述处理器利用所述目标语音识别模型对所述语音指令进行识别时,所述处理器被配置为:
    对所述语音指令进行特征提取处理,获取与所述语音指令相对应的关键词;
    利用所述目标语音识别模型对所述关键词进行识别。
  27. 根据权利要求26所述的系统,其特征在于,所述目标语音识别模型中包括一个或多个标准关键词;在所述处理器利用所述目标语音识别模型对所述关键词进行识别时,所述处理器被配置为:
    获取所述目标语音识别模型中每个标准关键词相对于所述关键词的概率信息;
    根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识 别处理。
  28. 根据权利要求27所述的系统,其特征在于,在所述处理器根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理时,所述处理器被配置为:
    根据每个标准关键词相对于所述关键词的概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词。
  29. 根据权利要求28所述的系统,其特征在于,在所述处理器根据每个标准关键词相对于所述关键词的概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词时,所述处理器被配置为:
    在所述概率信息大于预设的概率阈值时,将所述概率信息相对应的标准关键词确定为所述目标关键词。
  30. 根据权利要求27所述的系统,其特征在于,在所述处理器根据每个标准关键词相对于所述关键词的概率信息对所述关键词进行识别处理时,所述处理器被配置为:
    根据每个标准关键词相对于所述关键词的概率信息确定所述标准关键词与其他标准关键词之间的相对概率信息;
    根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词。
  31. 根据权利要求30所述的系统,其特征在于,在所述处理器根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词时,所述处理器被配置为:
    在所述概率信息大于预设的概率阈值,且所述相对概率信息大于或等于相对概率阈值时,则将所述概率信息和相对概率信息相对应的标准关键词确定为所述目标关键词。
  32. 根据权利要求30所述的系统,其特征在于,在所述处理器根据所述概率信息和相对概率信息在多个标准关键词中确定一与所述关键词相对应的目标关键词时,所述处理器被配置为:
    在所述概率信息大于预设的概率阈值,且所述相对概率信息小于相对概率阈值时,则获取与所述相对概率信息相对应的第一标准关键词和第二标准关键词;
    按照预设的优先处理策略将所述第一标准关键词或者第二标准关键词确定为所述目标关键词。
  33. 根据权利要求28-32中任意一项所述的系统,其特征在于,所述处理器,还用于:
    在利用所述目标语音识别模型对所述语音指令进行识别之后,根据所述目标关键词控制所述终端设备执行相应操作。
  34. 根据权利要求33所述的系统,其特征在于,所述目标语音识别模型中包括一个或多个与切换状态相关的状态关键词;在所述处理器根据所述目标关键词控制所述终端设备执行相应操作时,所述处理器被配置为:
    在所述目标关键词为所述目标语音识别模型中的状态关键词时,则根据所述语音指令控制所述终端设备由当前的工作状态切换至于所述状态关键词相对应的工作状态。
  35. 根据权利要求25-32中任意一项所述的系统,其特征在于,所述终端设备中设置有状态机;在所述处理器确定所述终端设备的工作状态时,所述处理器被配置为:
    获取所述状态机中存储的状态数据;
    根据所述状态数据确定所述终端设备的工作状态。
  36. 根据权利要求25-32中任意一项所述的系统,其特征在于,所述终端设备包括:拍摄装置,所述拍摄装置的工作状态包括以下至少之一:待机状态、相机工作状态、摄像工作状态。
  37. 一种计算机可读存储介质,其特征在于,该计算机可读存储介质中存储有程序指令,所述程序指令用于实现权利要求1-12中任意一项所述的语音识别方法。
PCT/CN2018/103219 2018-08-30 2018-08-30 语音识别方法、装置、拍摄系统和计算机可读存储介质 WO2020042077A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
PCT/CN2018/103219 WO2020042077A1 (zh) 2018-08-30 2018-08-30 语音识别方法、装置、拍摄系统和计算机可读存储介质
CN201880032452.4A CN110770820A (zh) 2018-08-30 2018-08-30 语音识别方法、装置、拍摄系统和计算机可读存储介质
US17/187,156 US20210183388A1 (en) 2018-08-30 2021-02-26 Voice recognition method and device, photographing system, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/103219 WO2020042077A1 (zh) 2018-08-30 2018-08-30 语音识别方法、装置、拍摄系统和计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/187,156 Continuation US20210183388A1 (en) 2018-08-30 2021-02-26 Voice recognition method and device, photographing system, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2020042077A1 true WO2020042077A1 (zh) 2020-03-05

Family

ID=69329000

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/103219 WO2020042077A1 (zh) 2018-08-30 2018-08-30 语音识别方法、装置、拍摄系统和计算机可读存储介质

Country Status (3)

Country Link
US (1) US20210183388A1 (zh)
CN (1) CN110770820A (zh)
WO (1) WO2020042077A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611294A (zh) * 2021-06-30 2021-11-05 展讯通信(上海)有限公司 语音唤醒方法、装置、设备及介质
CN115314630A (zh) * 2022-01-24 2022-11-08 李宁 一种基于图像识别分析技术的婚庆摄影摄像智能化调控管理系统
CN114449252B (zh) * 2022-02-12 2023-08-01 北京蜂巢世纪科技有限公司 基于解说音频的现场视频动态调整方法、装置、设备、系统和介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915729A (zh) * 2011-08-01 2013-02-06 佳能株式会社 语音关键词检出系统、创建用于其的词典的系统和方法
CN104902070A (zh) * 2015-04-13 2015-09-09 青岛海信移动通信技术股份有限公司 一种移动终端语音控制的方法及移动终端
CN105869635A (zh) * 2016-03-14 2016-08-17 江苏时间环三维科技有限公司 一种语音识别方法及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103871409B (zh) * 2012-12-17 2018-01-23 联想(北京)有限公司 一种语音识别的方法、信息处理的方法及电子设备
US9959865B2 (en) * 2012-11-13 2018-05-01 Beijing Lenovo Software Ltd. Information processing method with voice recognition
CN104143328B (zh) * 2013-08-15 2015-11-25 腾讯科技(深圳)有限公司 一种关键词检测方法和装置
CN103594088A (zh) * 2013-11-11 2014-02-19 联想(北京)有限公司 一种信息处理方法和电子设备
CN104535071B (zh) * 2014-12-05 2018-12-14 百度在线网络技术(北京)有限公司 一种语音导航方法及装置
CN108447471B (zh) * 2017-02-15 2021-09-10 腾讯科技(深圳)有限公司 语音识别方法及语音识别装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915729A (zh) * 2011-08-01 2013-02-06 佳能株式会社 语音关键词检出系统、创建用于其的词典的系统和方法
CN104902070A (zh) * 2015-04-13 2015-09-09 青岛海信移动通信技术股份有限公司 一种移动终端语音控制的方法及移动终端
CN105869635A (zh) * 2016-03-14 2016-08-17 江苏时间环三维科技有限公司 一种语音识别方法及系统

Also Published As

Publication number Publication date
US20210183388A1 (en) 2021-06-17
CN110770820A (zh) 2020-02-07

Similar Documents

Publication Publication Date Title
KR102411766B1 (ko) 음성 인식 서비스를 활성화하는 방법 및 이를 구현한 전자 장치
US9940929B2 (en) Extending the period of voice recognition
US9741343B1 (en) Voice interaction application selection
US20210183388A1 (en) Voice recognition method and device, photographing system, and computer-readable storage medium
EP3690877B1 (en) Method and apparatus for controlling device
CN101860704B (zh) 自动关闭图像显示的显示装置及其实现方法
DE102016122719A1 (de) Nutzerfokus aktivierte Spracherkennung
WO2021203823A1 (zh) 图像分类方法、装置、存储介质及电子设备
CN113778663B (zh) 一种多核处理器的调度方法及电子设备
US20210125616A1 (en) Voice Processing Method, Non-Transitory Computer Readable Medium, and Electronic Device
US20160371340A1 (en) Modifying search results based on context characteristics
CN111028828A (zh) 一种基于画屏的语音交互方法、画屏及存储介质
CN110706707A (zh) 用于语音交互的方法、装置、设备和计算机可读存储介质
CN106375594A (zh) 一种设备调节方法和装置及电子设备
CN113810765B (zh) 视频处理方法、装置、设备和介质
WO2020030018A1 (en) Method for updating a speech recognition model, electronic device and storage medium
CN108415572B (zh) 应用于移动终端的模块控制方法、装置及存储介质
CN112333258A (zh) 一种智能客服方法、存储介质及终端设备
CN112149599A (zh) 表情追踪方法、装置、存储介质和电子设备
CN115312068B (zh) 语音控制方法、设备及存储介质
KR20210110068A (ko) 제스처 인식 기반의 영상 편집 방법 및 이를 지원하는 전자 장치
CN105574155A (zh) 一种照片搜索方法和装置
US20140215250A1 (en) Electronic device and power saving method for electronic device
CN112965590A (zh) 一种人工智能交互方法、系统、计算机设备及存储介质
CN112233674A (zh) 一种多模交互方法及其系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18931471

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18931471

Country of ref document: EP

Kind code of ref document: A1