CN110992954A - Method, device, equipment and storage medium for voice recognition - Google Patents

Method, device, equipment and storage medium for voice recognition Download PDF

Info

Publication number
CN110992954A
CN110992954A CN201911358974.6A CN201911358974A CN110992954A CN 110992954 A CN110992954 A CN 110992954A CN 201911358974 A CN201911358974 A CN 201911358974A CN 110992954 A CN110992954 A CN 110992954A
Authority
CN
China
Prior art keywords
command data
voice command
target
application program
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911358974.6A
Other languages
Chinese (zh)
Inventor
廖安华
黄少鹏
王永亮
贺刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wutong Chelian Technology Co Ltd
Original Assignee
Beijing Wutong Chelian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wutong Chelian Technology Co Ltd filed Critical Beijing Wutong Chelian Technology Co Ltd
Priority to CN201911358974.6A priority Critical patent/CN110992954A/en
Publication of CN110992954A publication Critical patent/CN110992954A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

The application discloses a voice recognition method, a voice recognition device, voice recognition equipment and a storage medium, and belongs to the field of voice recognition. The method comprises the following steps: acquiring user voice data; determining target voice command data matched with the user voice data in the voice command data of the application program in the current running state; and executing a target operation instruction corresponding to the target voice command data. By adopting the method and the device, the recognition error rate of the user voice can be effectively reduced, and the experience of the user using the voice interaction function is improved.

Description

Method, device, equipment and storage medium for voice recognition
Technical Field
The present application relates to the field of speech recognition, and in particular, to a method, an apparatus, a device, and a storage medium for speech recognition.
Background
With the development of artificial intelligence and voice recognition technology, voice interaction functions are applied to various scenes, such as a voice control system in a vehicle, a voice assistant on a mobile phone, and the like.
In the prior art, technicians need to store various preset voice command data and corresponding operation instructions in a terminal, after the terminal receives a voice of a user, the voice of the user can be recognized to generate corresponding data information, the corresponding data information is compared with the voice command data stored in the terminal, and if the data information of the user is the voice command data stored in the terminal, an operation corresponding to the voice command data is executed.
In the process of implementing the present application, the inventor finds that the prior art has at least the following problems:
in order to improve the experience of the user in using voice interaction, technicians can store a large amount of voice command data in the terminal, when the voice command data stored in the terminal become more, command words with similar pronunciations may exist, and the corresponding voice command data are also similar, so that the error rate of recognition of the voice of the terminal to the user becomes higher, for example, the user says 'navigate to get big principle', and the terminal may recognize as 'play to get big principle', thereby influencing the accuracy of voice control on the terminal.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for voice recognition, which can effectively reduce the recognition error rate of user voice, and the technical scheme is as follows:
in one aspect, a method for performing speech recognition is provided, and is applied to a speech recognition application program, and the method includes:
acquiring user voice data;
determining target voice command data matched with the user voice data in the voice command data of the application program in the current running state;
and executing a target operation instruction corresponding to the target voice command data.
Optionally, the method further includes:
when a target application program is started, receiving voice command data and a corresponding operation instruction of the target application program, which are sent by the target application program, and loading the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the currently loaded voice command data.
Optionally, the method further includes:
and when the target application program is closed, deleting the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program.
Optionally, the method further includes:
when a target application program is started, changing the state of voice command data corresponding to the target application program into usable state;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the voice command data with the current state being available.
Optionally, the method further includes:
changing a state of voice command data corresponding to the target application to unavailable when the target application is closed.
Optionally, the executing the target operation instruction corresponding to the target voice command data includes:
and sending the target operation instruction to a second application program to which the target voice command data belongs so as to execute the target operation instruction through the second application program.
In another aspect, an apparatus for performing speech recognition is provided, the apparatus being applied to a speech recognition application, the apparatus comprising:
an acquisition module configured to acquire user voice data;
the determining module is configured to determine target voice command data matched with the user voice data in the voice command data of the application program in the running state;
and the execution module is configured to execute the target operation instruction corresponding to the target voice command data.
Optionally, the apparatus further includes a loading module configured to:
when a target application program is started, receiving voice command data and a corresponding operation instruction of the target application program, which are sent by the target application program, and loading the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the currently loaded voice command data.
Optionally, the apparatus further includes a first deletion module configured to:
and when the target application program is closed, deleting the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program.
Optionally, the apparatus further comprises a modification module configured to:
when a target application program is started, changing the state of voice command data corresponding to the target application program into usable state;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the voice command data with the current state being available.
Optionally, the apparatus further includes a second deletion module configured to:
changing a state of voice command data corresponding to the target application to unavailable when the target application is closed.
Optionally, the execution module is configured to:
and sending the target operation instruction to a second application program to which the target voice command data belongs so as to execute the target operation instruction through the second application program.
In yet another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction is stored, and loaded and executed by the processor to implement the operations performed by the voice recognition method as described above.
In yet another aspect, a computer-readable storage medium having at least one instruction stored therein is provided, which is loaded and executed by a processor to implement the operations performed by the speech recognition method as described above.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
and comparing the voice data of the user with the voice command data of the application program in the running state at present to determine the matched voice command data, and then executing the operation corresponding to the voice command data. Therefore, by adopting the method and the device, only the voice data of the user is compared with the voice command data of the currently running application program, the voice command data compared with the voice data of the user can be reduced, the error rate of recognition of the voice of the user can be effectively reduced, and the experience of the user in using the voice interaction function is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a speech recognition method provided in an embodiment of the present application;
FIG. 2 is a flow chart of a speech recognition method provided by an embodiment of the present application;
FIG. 3 is a flow chart of a speech recognition method provided by an embodiment of the present application;
fig. 4 is a schematic diagram of a structure of a speech recognition apparatus according to an embodiment of the present application;
fig. 5 is a block diagram of a terminal structure provided in an embodiment of the present application;
FIG. 6 is a diagram illustrating a speech recognition method according to an embodiment of the present application;
fig. 7 is a schematic diagram of a speech recognition method according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The voice recognition method can be realized by the terminal. The terminal can operate an application program with a voice recognition function, such as the voice recognition application program, the terminal can be provided with a microphone, a loudspeaker and other components, the terminal has a communication function and can be connected to the Internet, and the terminal can be a mobile phone, a tablet personal computer, intelligent wearable equipment, a desktop computer, a notebook computer, a vehicle-mounted terminal, an intelligent sound box and the like.
The voice recognition function is a function commonly used in daily life at present, and can be applied to mobile phones, smart homes and vehicle-mounted terminals. In a terminal having a voice recognition function, voice command data corresponding to a corresponding command word is stored. For example, "open map", "play", "navigate", "weather of today" corresponds to voice command data, wherein the command word may be a keyword or a sentence, and the voice command data may be audio information corresponding to the command word or voice feature information obtained through the audio information. The user can control the terminal with the voice recognition function through voice, after the terminal receives the voice of the user, corresponding voice command data can be generated according to the voice, then the voice command data is compared with the voice command data stored in the terminal, and according to a comparison result, a control instruction is sent to an application program corresponding to the voice command data to control the corresponding application program to execute the operation corresponding to the voice command data.
The voice recognition method provided by the embodiment of the application can determine the voice command data which needs to be compared with the voice command data corresponding to the voice of the user according to the currently running application program in the terminal, thereby reducing the calculation amount of the terminal and improving the comparison accuracy. In the embodiment of the present application, a speech recognition application having a speech recognition function is taken as an example, and the detailed description is given to the scheme, and other cases are similar and will not be repeated.
Fig. 1 is a flowchart of a speech recognition method according to an embodiment of the present application. Referring to fig. 1, the embodiment includes:
step 101, obtaining user voice data.
In implementation, a voice recognition application program can be run in the terminal, and a sound receiving component such as a microphone of the terminal can be used for acquiring voice audio of a user in real time, and generating corresponding voice data according to the voice audio of the user.
Step 102, in the voice command data of the application program in the current running state, determining target voice command data matched with the user voice data.
In implementation, besides the voice recognition application, a plurality of applications may be running in the terminal, for example, a map application and a music playing application. The voice recognition application may compare the obtained voice data of the user with the voice command data corresponding to the application currently in the running state, and determine the target voice command data matching the voice data of the user.
And 103, executing a target operation instruction corresponding to the target voice command data.
In implementation, different application programs correspond to different voice command data, the different voice command data also correspond to different operation instructions, when a target operation instruction corresponding to the target voice command data is determined, the target operation instruction can be sent to the corresponding target application program, and the operation corresponding to the target operation instruction is completed through the target application program.
According to the voice command data comparison method and device, the voice data of the user are compared with the voice command data of the application program in the running state at present, and therefore matched voice command data are determined.
Fig. 2 is a flowchart of a speech recognition method according to an embodiment of the present application. Referring to fig. 2, the embodiment includes:
step 201, acquiring user voice data.
In an implementation, the voice recognition application may be continuously running in the terminal, and the microphone of the terminal is used to obtain the voice audio of the user, and convert the voice audio of the user into the voice data in the same data form as the voice command word, for example, if the voice command word is the audio feature information of the corresponding command word, the audio of the user is converted into the corresponding audio feature information. The voice recognition application program is provided with a storage unit for storing voice command data and operation instructions corresponding to the voice command data, wherein the operation instructions corresponding to different voice command data control different application programs.
The voice command data stored in the voice recognition program and the operation instruction corresponding to the voice command data can be sent to the voice recognition program by the corresponding application program. The corresponding processing may be as follows: and when the target application program is started, receiving the voice command data and the corresponding operation instruction of the target application program, which are sent by the target application program, and loading the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program.
In implementation, the voice command data corresponding to the application program and the operation instruction corresponding to the voice command data may be stored in advance in each application program. As shown in fig. 6, when the target application program is started, a registration interface preset in the speech recognition application program may be called, and the speech command data and the corresponding operation instruction of the target application program are registered in a command pool in the speech recognition application program, that is, the speech command data and the corresponding operation instruction are sent to the speech recognition application program. The voice recognition application program can load the voice command data and the corresponding operation instruction of the target application program in the command pool, and update the storage unit for storing the voice command data and the corresponding operation instruction.
Correspondingly, when the target application program is closed, the voice command data and the corresponding operation instruction of the target application program can be deleted in the voice recognition application program.
In implementation, after the target application receives the closing instruction, a pre-set anti-registration interface in the speech recognition application may be called before closing, and the speech command data and the corresponding operation instruction of the target application may be deleted in the storage unit through the anti-registration interface.
Step 202, in the currently loaded voice command data, determining target voice command data matched with the user voice data.
In implementation, after the audio information of the user is generated into corresponding voice data, the voice data of the user can be compared with the voice command data registered through the registration interface, and the target voice command data matched with the voice data of the user can be determined. For example, the similarity between the voice data of the user and each voice command data may be compared, and the voice command data having the highest similarity to the voice data of the user may be determined as the target voice command data matching the voice data of the user.
It should be noted that besides the voice command data that needs to be registered by other applications, the voice recognition application may include basic voice command data that does not need to be registered, for example, voice command data corresponding to opening various applications. The voice data of the user can be compared with the basic voice command data and the registered voice command data at the same time, and the target voice command data matched with the voice data of the user is determined.
Step 203, sending the target operation instruction to the second application program to which the target voice command data belongs, so as to execute the target operation instruction through the second application program.
The second application program is an application program corresponding to the target voice command, that is, the target voice command is registered to the voice application program by the second application program calling the registration interface of the voice application program.
In an implementation, after determining the target voice command data matching the user voice data, the second application corresponding to the target voice command data may be determined, for example, an identifier of the corresponding application may be added to the voice command data, and when determining the voice command data, the corresponding application of the voice command data may be determined according to the corresponding identifier of the voice command data. After the second application program corresponding to the target voice command data is determined, the target operation instruction corresponding to the target voice command data can be sent to the second application program, the second application program receives the target operation instruction, and the corresponding function is completed according to the target operation instruction.
According to the voice command data comparison method and device, the voice data of the user are compared with the voice command data of the application program in the running state at present, and therefore matched voice command data are determined.
Fig. 3 is a flowchart of a speech recognition method according to an embodiment of the present application. Referring to fig. 3, the embodiment includes:
step 301, obtaining user voice data.
In an implementation, the voice recognition application may be continuously running in the terminal, and acquire the audio of the user through a microphone of the terminal, and convert the audio information of the user into voice data in the same data form as the voice command word, for example, if the voice command word is the audio feature information of the corresponding command word, convert the audio of the user into the corresponding audio feature information. The voice recognition application may be provided with a storage unit for storing voice command data and operation instructions corresponding to the voice command data, and the operation instructions corresponding to different voice command data control different applications. In addition, a corresponding application program identifier may be further set in each voice command data, and the application program identifier is used to identify an application program controlled by the control instruction corresponding to the voice command data.
Wherein the voice command data stored by the voice recognition application may be provided with different status information. The state of the application program in the terminal can be changed according to the opening and closing states of the application program in the terminal. The corresponding processing may be as follows: when the target application is started, the state of the voice command data corresponding to the target application is changed to be available.
In implementation, the voice recognition application may be provided with a monitoring program for monitoring the opening and closing of each application in the terminal. As shown in fig. 7, after the monitor monitors that the target application is started, the application identifier corresponding to the target application may be acquired, and the voice recognition application may modify the voice command data corresponding to the identifier of the target application into an available state according to the application identifier corresponding to the target application and the start information.
Correspondingly, when the target application is closed, the state of the voice command data corresponding to the target application may be changed to unavailable.
In implementation, when the monitor monitors that the target application is closed, the application identifier corresponding to the target application may be acquired, and the voice application may modify the voice command data corresponding to the identifier of the target application into an unavailable state according to the application identifier corresponding to the target application and the closing information.
Step 302, in the voice command data whose current state is available, target voice command data matched with the user voice data is determined.
In implementation, after the audio information of the user is generated into corresponding voice data, the voice data of the user may be compared with the voice command data of which the status information is available, and the voice command data with the highest contrast similarity may be determined as the target voice command data matching the voice data of the user.
It should be noted that voice command data whose status information is always available may be set in the voice recognition application. For example, voice command data corresponding to various applications is opened.
Step 303, sending the target operation instruction to the second application program to which the target voice command data belongs, so as to execute the target operation instruction through the second application program.
And the application program identifier corresponding to the target voice command is the application program identifier of the second application program.
In an implementation, after the target voice command data matching the user voice data is determined, the second application may be determined according to the application identification corresponding to the target voice command data. After the second application program is determined, the target operation instruction corresponding to the target voice command data can be sent to the second application program, and the second application program receives the target operation instruction and completes the corresponding function according to the target operation instruction.
According to the voice command data comparison method and device, voice data of the user are compared with voice command data of the application program in the running state at present, and therefore matched voice command data are determined
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
Fig. 4 is a schematic structural diagram of an apparatus for performing speech recognition according to an embodiment of the present application, where the apparatus may be a terminal in the foregoing embodiment, and the apparatus includes:
an obtaining module 410 configured to obtain user voice data;
a determining module 420 configured to determine target voice command data matching the user voice data among voice command data of the application program currently in a running state;
and the execution module 430 is configured to execute a target operation instruction corresponding to the target voice command data.
Optionally, the apparatus further includes a loading module configured to:
when a target application program is started, receiving voice command data and a corresponding operation instruction of the target application program, which are sent by the target application program, and loading the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program;
the determining module 420 configured to:
and determining target voice command data matched with the user voice data in the currently loaded voice command data.
Optionally, the apparatus further includes a first deletion module configured to:
and when the target application program is closed, deleting the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program.
Optionally, the apparatus further comprises a modification module configured to:
when a target application program is started, changing the state of voice command data corresponding to the target application program into usable state;
the determining module 420 configured to:
and determining target voice command data matched with the user voice data in the voice command data with the current state being available.
Optionally, the apparatus further includes a second deletion module configured to:
changing a state of voice command data corresponding to the target application to unavailable when the target application is closed.
Optionally, the executing module 430 is configured to:
and sending the target operation instruction to a second application program to which the target voice command data belongs so as to execute the target operation instruction through the second application program.
It should be noted that: in the speech recognition apparatus provided in the above embodiment, only the division of the functional modules is illustrated when performing speech recognition, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the speech recognition apparatus provided in the above embodiments and the speech recognition method embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments and are not described herein again.
Fig. 5 shows a block diagram of a terminal 500 according to an exemplary embodiment of the present application. The terminal 500 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer iv, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 500 may also be referred to by other names such as user equipment, portable terminal, laptop terminal, desktop terminal, and the like.
In general, the terminal 500 includes: a processor 501 and a memory 502.
The processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the speech recognition methods provided by method embodiments herein.
In some embodiments, the terminal 500 may further optionally include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502 and peripheral interface 503 may be connected by a bus or signal lines. Each peripheral may be connected to the peripheral interface 503 by a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, touch screen display 505, camera 506, audio circuitry 507, positioning components 508, and power supply 509.
The peripheral interface 503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 501 and the memory 502. In some embodiments, the processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 501, the memory 502, and the peripheral interface 503 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 504 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 504 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 504 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 504 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display screen 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 505 is a touch display screen, the display screen 505 also has the ability to capture touch signals on or over the surface of the display screen 505. The touch signal may be input to the processor 501 as a control signal for processing. At this point, the display screen 505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 505 may be one, providing the front panel of the terminal 500; in other embodiments, the display screens 505 may be at least two, respectively disposed on different surfaces of the terminal 500 or in a folded design; in still other embodiments, the display 505 may be a flexible display disposed on a curved surface or on a folded surface of the terminal 500. Even more, the display screen 505 can be arranged in a non-rectangular irregular figure, i.e. a shaped screen. The Display screen 505 may be made of LCD (liquid crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 506 is used to capture images or video. Optionally, camera assembly 506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
Audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 to realize voice communication. For the purpose of stereo sound collection or noise reduction, a plurality of microphones may be provided at different portions of the terminal 500. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 507 may also include a headphone jack.
The positioning component 508 is used to locate the current geographic position of the terminal 500 for navigation or LBS (location based Service). The positioning component 508 may be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.
Power supply 509 is used to power the various components in terminal 500. The power source 509 may be alternating current, direct current, disposable or rechargeable. When power supply 509 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, terminal 500 also includes one or more sensors 510. The one or more sensors 510 include, but are not limited to: acceleration sensor 511, gyro sensor 512, pressure sensor 513, fingerprint sensor 514, optical sensor 515, and proximity sensor 516.
The acceleration sensor 511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 500. For example, the acceleration sensor 511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 501 may control the touch screen 505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 511. The acceleration sensor 511 may also be used for acquisition of motion data of a game or a user.
The gyro sensor 512 may detect a body direction and a rotation angle of the terminal 500, and the gyro sensor 512 may cooperate with the acceleration sensor 511 to acquire a 3D motion of the user on the terminal 500. The processor 501 may implement the following functions according to the data collected by the gyro sensor 512: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.
The pressure sensor 513 may be disposed on a side bezel of the terminal 500 and/or an underlying layer of the touch display screen 505. When the pressure sensor 513 is disposed on the side frame of the terminal 500, a user's holding signal of the terminal 500 may be detected, and the processor 501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 513. When the pressure sensor 513 is disposed at the lower layer of the touch display screen 505, the processor 501 controls the operability control on the UI interface according to the pressure operation of the user on the touch display screen 505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.
The fingerprint sensor 514 is used for collecting a fingerprint of the user, and the processor 501 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 514, or the fingerprint sensor 514 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 514 may be provided on the front, back, or side of the terminal 500. When a physical button or a vendor Logo is provided on the terminal 500, the fingerprint sensor 514 may be integrated with the physical button or the vendor Logo.
The optical sensor 515 is used to collect the ambient light intensity. In one embodiment, the processor 501 may control the display brightness of the touch display screen 505 based on the ambient light intensity collected by the optical sensor 515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 505 is turned down. In another embodiment, processor 501 may also dynamically adjust the shooting parameters of camera head assembly 506 based on the ambient light intensity collected by optical sensor 515.
A proximity sensor 516, also referred to as a distance sensor, is typically disposed on the front panel of the terminal 500. The proximity sensor 516 is used to collect the distance between the user and the front surface of the terminal 500. In one embodiment, when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 gradually decreases, the processor 501 controls the touch display screen 505 to switch from the bright screen state to the dark screen state; when the proximity sensor 516 detects that the distance between the user and the front surface of the terminal 500 becomes gradually larger, the processor 501 controls the touch display screen 505 to switch from the screen-rest state to the screen-on state.
Those skilled in the art will appreciate that the configuration shown in fig. 5 is not intended to be limiting of terminal 500 and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a computer-readable storage medium, such as a memory, is also provided that includes instructions executable by a processor in a terminal to perform the method of speech recognition in the above-described embodiments. The computer readable storage medium may be non-transitory. For example, the computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (14)

1. A method for performing speech recognition, the method being applied to a speech recognition application, the method comprising:
acquiring user voice data;
determining target voice command data matched with the user voice data in the voice command data of the application program in the current running state;
and executing a target operation instruction corresponding to the target voice command data.
2. The method of claim 1, further comprising:
when a target application program is started, receiving voice command data and a corresponding operation instruction of the target application program, which are sent by the target application program, and loading the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the currently loaded voice command data.
3. The method of claim 2, further comprising:
and when the target application program is closed, deleting the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program.
4. The method of claim 1, further comprising:
when a target application program is started, changing the state of voice command data corresponding to the target application program into usable state;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the voice command data with the current state being available.
5. The method of claim 4, further comprising:
changing a state of voice command data corresponding to the target application to unavailable when the target application is closed.
6. The method according to any one of claims 1-5, wherein executing the target operation instruction corresponding to the target voice command data comprises:
and sending the target operation instruction to a second application program to which the target voice command data belongs so as to execute the target operation instruction through the second application program.
7. An apparatus for performing speech recognition, for use in a speech recognition application, the apparatus comprising:
an acquisition module configured to acquire user voice data;
the determining module is configured to determine target voice command data matched with the user voice data in the voice command data of the application program in the running state;
and the execution module is configured to execute the target operation instruction corresponding to the target voice command data.
8. The apparatus of claim 7, further comprising a loading module configured to:
when a target application program is started, receiving voice command data and a corresponding operation instruction of the target application program, which are sent by the target application program, and loading the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the currently loaded voice command data.
9. The apparatus of claim 8, further comprising a first deletion module configured to:
and when the target application program is closed, deleting the voice command data and the corresponding operation instruction of the target application program in the voice recognition application program.
10. The apparatus of claim 7, further comprising an alteration module configured to:
when a target application program is started, changing the state of voice command data corresponding to the target application program into usable state;
the determining, in the voice command data of the application program currently in the running state, target voice command data that matches the user voice data includes:
and determining target voice command data matched with the user voice data in the voice command data with the current state being available.
11. The apparatus of claim 10, further comprising a second deletion module configured to:
changing a state of voice command data corresponding to the target application to unavailable when the target application is closed.
12. The apparatus according to any one of claims 7-10, wherein the execution module is configured to:
and sending the target operation instruction to a second application program to which the target voice command data belongs so as to execute the target operation instruction through the second application program.
13. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction that is loaded and executed by the processor to perform operations performed by the speech recognition method of any of claims 1 to 6.
14. A computer-readable storage medium having stored therein at least one instruction which is loaded and executed by a processor to perform operations performed by the speech recognition method of any one of claims 1 to 6.
CN201911358974.6A 2019-12-25 2019-12-25 Method, device, equipment and storage medium for voice recognition Pending CN110992954A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911358974.6A CN110992954A (en) 2019-12-25 2019-12-25 Method, device, equipment and storage medium for voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911358974.6A CN110992954A (en) 2019-12-25 2019-12-25 Method, device, equipment and storage medium for voice recognition

Publications (1)

Publication Number Publication Date
CN110992954A true CN110992954A (en) 2020-04-10

Family

ID=70075584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911358974.6A Pending CN110992954A (en) 2019-12-25 2019-12-25 Method, device, equipment and storage medium for voice recognition

Country Status (1)

Country Link
CN (1) CN110992954A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282355A (en) * 2021-05-18 2021-08-20 Oppo广东移动通信有限公司 Instruction execution method and device based on state machine, terminal and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885783A (en) * 2014-04-03 2014-06-25 深圳市三脚蛙科技有限公司 Voice control method and device of application program
CN104916287A (en) * 2015-06-10 2015-09-16 青岛海信移动通信技术股份有限公司 Voice control method and device and mobile device
CN108305626A (en) * 2018-01-31 2018-07-20 百度在线网络技术(北京)有限公司 The sound control method and device of application program
CN109767762A (en) * 2018-12-14 2019-05-17 深圳壹账通智能科技有限公司 Application control method and terminal device based on speech recognition
CN110148405A (en) * 2019-04-10 2019-08-20 北京梧桐车联科技有限责任公司 Phonetic order processing method and processing device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103885783A (en) * 2014-04-03 2014-06-25 深圳市三脚蛙科技有限公司 Voice control method and device of application program
CN104916287A (en) * 2015-06-10 2015-09-16 青岛海信移动通信技术股份有限公司 Voice control method and device and mobile device
CN108305626A (en) * 2018-01-31 2018-07-20 百度在线网络技术(北京)有限公司 The sound control method and device of application program
CN109767762A (en) * 2018-12-14 2019-05-17 深圳壹账通智能科技有限公司 Application control method and terminal device based on speech recognition
CN110148405A (en) * 2019-04-10 2019-08-20 北京梧桐车联科技有限责任公司 Phonetic order processing method and processing device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282355A (en) * 2021-05-18 2021-08-20 Oppo广东移动通信有限公司 Instruction execution method and device based on state machine, terminal and storage medium

Similar Documents

Publication Publication Date Title
CN110602321B (en) Application program switching method and device, electronic device and storage medium
CN110308956B (en) Application interface display method and device and mobile terminal
CN112907725B (en) Image generation, training of image processing model and image processing method and device
CN109068008B (en) Ringtone setting method, device, terminal and storage medium
CN110288689B (en) Method and device for rendering electronic map
CN110797042B (en) Audio processing method, device and storage medium
CN109102811B (en) Audio fingerprint generation method and device and storage medium
CN111613213B (en) Audio classification method, device, equipment and storage medium
CN110705614A (en) Model training method and device, electronic equipment and storage medium
CN110677713B (en) Video image processing method and device and storage medium
CN109783176B (en) Page switching method and device
CN107943484B (en) Method and device for executing business function
CN111611414A (en) Vehicle retrieval method, device and storage medium
CN110992954A (en) Method, device, equipment and storage medium for voice recognition
CN114595019A (en) Theme setting method, device and equipment of application program and storage medium
CN114594885A (en) Application icon management method, device and equipment and computer readable storage medium
CN111063372B (en) Method, device and equipment for determining pitch characteristics and storage medium
CN113843814A (en) Control system, method, device and storage medium for mechanical arm equipment
CN113408989A (en) Automobile data comparison method and device and computer storage medium
CN108519913B (en) Application program running state management method and device, storage medium and terminal
CN109275015B (en) Method, device and storage medium for displaying virtual article
CN108881715B (en) Starting method and device of shooting mode, terminal and storage medium
CN108347672B (en) Method, device and storage medium for playing audio
CN111708581A (en) Application starting method, device, equipment and computer storage medium
CN110941458A (en) Method, device and equipment for starting application program and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410

RJ01 Rejection of invention patent application after publication