CN112735424A - Speech recognition method and device, storage medium and electronic device - Google Patents

Speech recognition method and device, storage medium and electronic device Download PDF

Info

Publication number
CN112735424A
CN112735424A CN202011541666.XA CN202011541666A CN112735424A CN 112735424 A CN112735424 A CN 112735424A CN 202011541666 A CN202011541666 A CN 202011541666A CN 112735424 A CN112735424 A CN 112735424A
Authority
CN
China
Prior art keywords
voice recognition
voice
target page
configuration parameters
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011541666.XA
Other languages
Chinese (zh)
Other versions
CN112735424B (en
Inventor
刘兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Original Assignee
Qingdao Haier Technology Co Ltd
Haier Smart Home Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Haier Technology Co Ltd, Haier Smart Home Co Ltd filed Critical Qingdao Haier Technology Co Ltd
Priority to CN202011541666.XA priority Critical patent/CN112735424B/en
Publication of CN112735424A publication Critical patent/CN112735424A/en
Application granted granted Critical
Publication of CN112735424B publication Critical patent/CN112735424B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The invention discloses a voice recognition method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters and indicating a target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.

Description

Speech recognition method and device, storage medium and electronic device
Technical Field
The present invention relates to the field of speech recognition, and in particular, to a speech recognition method and apparatus, a storage medium, and an electronic apparatus.
Background
Automatic Speech Recognition (ASR) is the most fundamental AI technique in speech interaction, and is a process of converting voice into text. Common such as Siri, smart speakers, etc.
Most of the current device control detail pages of the smart APP adopt the front-end H5 technology, and the front-end page has the functional requirements of speech recognition, for example: when the smart App refrigerator H5 detail page uses the voice recognition food material, the voice input of the user is needed, and the APP recognition voice content is converted into characters for the front end H5 page to use. In the process, the front end cannot process the aspects of calling system recording, performing voice recognition conversion processing and the like, and depends on the support of the native capability and the SDK.
Aiming at the problems that the front end can not carry out voice recognition and the like in the related technology, an effective solution is not provided yet.
Disclosure of Invention
The embodiment of the invention provides a voice recognition method and device, a storage medium and an electronic device, which are used for at least solving the problems that the front end cannot perform voice recognition and the like in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a speech recognition method, including: acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; and under the condition that the target page acquires the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.
In an exemplary embodiment, after obtaining the configuration instruction of the speech recognition module on the target page to determine the configuration parameters of the speech recognition module, the method further includes: initializing a software tool kit corresponding to the voice recognition module; and after the software toolkit is initialized, selecting a target voice recognition model from the software toolkit according to the configuration parameters so that the voice recognition object can perform voice recognition according to the voice recognition model.
In one exemplary embodiment, creating a speech recognition object according to the acquired configuration parameters includes: determining a voice recognition mode of the voice recognition module according to the configuration parameters, wherein the voice recognition mode is used for indicating that target type voice recognition is carried out on voice to be recognized, and the configuration parameters and the voice recognition mode are stored in a preset position in a one-to-one correspondence manner; and creating a voice recognition object corresponding to the voice recognition mode.
In an exemplary embodiment, when the target page collects a speech to be recognized, performing speech recognition on the speech to be recognized according to the speech recognition object, including: obtaining a subscription result of the target page for a voice recognition object; and under the condition that the subscription result indicates that the target page has successfully subscribed the voice recognition object, calling the voice recognition object to recognize the voice to be recognized collected on the target page.
In an exemplary embodiment, obtaining configuration parameters of a speech recognition module on a target page includes: acquiring a configuration instruction triggered by a target object on the target page, and analyzing the configuration instruction through a target container capacity interface library to obtain an analysis result; and determining the configuration parameters corresponding to the configuration instructions according to the analysis result.
In an exemplary embodiment, performing speech recognition on the speech to be recognized according to the speech recognition object includes: and under the condition that the voice silence time of the voice to be recognized which is currently collected is detected to exceed a preset threshold value, the voice collection is suspended, and the voice to be recognized which is currently collected is subjected to voice recognition according to the voice recognition object.
According to another aspect of the embodiments of the present invention, there is also provided a speech recognition apparatus including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring configuration parameters of a voice recognition module on a target page, and the configuration parameters are used for determining the recognition requirements of the voice recognition module; the first processing module is used for creating a voice recognition object according to the acquired configuration parameters and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice acquired on the target page; and the second processing module is used for carrying out voice recognition on the voice to be recognized according to the voice recognition object under the condition that the voice to be recognized is collected by the target page.
In one exemplary embodiment, further comprising: the initialization module is used for initializing the software toolkit corresponding to the voice recognition module; and the model selection module is used for selecting a target voice recognition model from the software toolkit according to the configuration parameters after the software toolkit is initialized so that the voice recognition object can perform voice recognition according to the voice recognition model.
According to a further aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned speech recognition method when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the voice recognition method through the computer program.
In the embodiment of the invention, the configuration parameters of a voice recognition module on a target page are obtained, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware configuration of a computer terminal of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flow diagram of a speech recognition method according to an embodiment of the present invention;
FIG. 3 is a schematic illustration of a speech recognition process according to an alternative embodiment of the present invention;
fig. 4 is a block diagram of a structure of a voice recognition apparatus according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The method provided by the embodiment of the application can be executed in a computer terminal or a similar operation device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a speech recognition method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and in an exemplary embodiment, may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1.
The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the speech recognition method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In the embodiment, a speech recognition method is provided, and fig. 2 is a flowchart of a speech recognition method according to an embodiment of the present invention, where the flowchart includes the following steps:
step S202, obtaining configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module;
step S204, a voice recognition object is created according to the acquired configuration parameters, and the target page is instructed to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page;
and step S206, under the condition that the target page collects the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.
Through the steps, the configuration parameters of the voice recognition module on the target page are obtained, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.
In an optional embodiment, after obtaining the configuration instruction of the speech recognition module on the target page to determine the configuration parameters of the speech recognition module, the method further includes: initializing a software tool kit corresponding to the voice recognition module; and after the software toolkit is initialized, selecting a target voice recognition model from the software toolkit according to the configuration parameters so that the voice recognition object can perform voice recognition according to the voice recognition model.
That is, after the configuration parameters of the speech recognition module are determined, the corresponding software toolkit needs to be initialized, then the speech recognition model is selected from the toolkit according to the configuration parameters, and speech recognition is performed according to the target speech recognition model.
In one exemplary embodiment, creating a speech recognition object according to the acquired configuration parameters includes: determining a voice recognition mode of the voice recognition module according to the configuration parameters, wherein the voice recognition mode is used for indicating that target type voice recognition is carried out on voice to be recognized, and the configuration parameters and the voice recognition mode are stored in a preset position in a one-to-one correspondence manner; and creating a voice recognition object corresponding to the voice recognition mode.
That is, the configuration parameters are acquired to create the speech recognition object, and a speech recognition mode for instructing target type speech recognition on the speech to be recognized needs to be determined according to the configuration parameters.
In an exemplary embodiment, when the target page collects a speech to be recognized, performing speech recognition on the speech to be recognized according to the speech recognition object, including: obtaining a subscription result of the target page for a voice recognition object; and under the condition that the subscription result indicates that the target page has successfully subscribed the voice recognition object, calling the voice recognition object to recognize the voice to be recognized collected on the target page.
Namely, performing voice recognition on a voice to be recognized according to a voice recognition object, acquiring a subscription result of a target page to the voice recognition object, and recognizing the voice to be recognized by calling the collected voice to be recognized under the condition that the successful subscription of the voice recognition object is confirmed through the subscription result.
In an exemplary embodiment, obtaining configuration parameters of a speech recognition module on a target page includes: acquiring a configuration instruction triggered by a target object on the target page, and analyzing the configuration instruction through a target container capacity interface library to obtain an analysis result; and determining the configuration parameters corresponding to the configuration instructions according to the analysis result.
That is, a configuration instruction triggered by the target object on the target page needs to be acquired first, then the configuration instruction is analyzed, and the configuration parameters are determined according to the analysis result, so that the configuration parameters of the voice recognition module on the target page are acquired.
In an exemplary embodiment, performing speech recognition on the speech to be recognized according to the speech recognition object includes: and under the condition that the voice silence time of the voice to be recognized which is currently collected is detected to exceed a preset threshold value, the voice collection is suspended, and the voice to be recognized which is currently collected is subjected to voice recognition according to the voice recognition object.
That is, the speech to be recognized is recognized according to the speech and the other object, the speech silence time of the speech to be recognized currently being collected needs to be detected first, the collection is suspended when the speech silence time exceeds a preset threshold, and the collected speech is recognized when the speech is started.
Fig. 3 is a schematic diagram of a speech recognition process according to an alternative embodiment of the present invention, as shown in fig. 3, including the following steps:
step S1, SDK initialization (sdkinit (config)); when App is developed, after the SDK is downloaded and integrated, the SDK needs to be initialized to use the method of the SDK. The reason why the front end H5 calls the method initialization SDK is mainly that front end incoming parameters are needed, and if H5 is not needed to determine the incoming parameters, the method can be developed by native (i.e., Android or iOS development direct initialization).
Alternatively, the parameters (Params) input during the initialization of the SDK may be as shown in table 1:
TABLE 1
Figure BDA0002854845360000071
Step S2, SDK initialization (sdkinit (config));
step S3, initialize (init (config));
step S4, returning an initialization result;
step S5, returning an initialization result;
step S6, returning an initialization result;
step S7, creating a speech recognition object (createaserrrecorder);
step S8, creating a speech recognition object (createaserrrecorder);
step S9, creating a speech recognition object (createaserrrecorder);
optionally, before starting voice recording, the method is called to subscribe to voice recognition, otherwise, a voice recognition result cannot be obtained. The parameters and the most important results among them are shown in table 2:
TABLE 2
Figure BDA0002854845360000081
Step S10, return to asr object;
step S11, subscribing a voice recognition result, namely, attach (iAsrRecorderCallback);
step S12, returning asr subscription result;
step S13, returning asr subscription result;
step S14, returning asr subscription result;
step S15, subscribe to voice listening to attachasrrrecorder (listener.);
step S16, subscribing to voice listening attachasrRecorder (listener);
step S17, returning asr subscription result;
step S18, returning asr subscription result;
optionally, the subscribing voice listening returns a ret Date description corresponding to the subscribing result, as shown in table 3:
TABLE 3
Figure BDA0002854845360000091
Step S19, starting a voice monitoring start AsrRecorder (config); when recording voice, if the voice pause exceeds 2s, automatically stopping voice recording and carrying out voice recognition;
step S20, starting a voice monitoring start AsrRecorder (config);
step S21, start (config);
step S22, returning a starting asr result;
step S23, returning a starting asr result;
step S24, return to start asr result.
Specifically, starting the SDK (SDKInit), starting the SDK by h5, and transmitting a starting config parameter; creating a speech recognition object (createaserrrecorder); subscribing to a speech recognition result (attechasrrregister); voice recording is started (start AsrRecorder), then H5 transmits a config parameter, and the voice recognition result is returned to the H5 page through the subscription.
Optionally, the calling method is as follows:
plusapi.upSpeechRecognitionModule.attachAsrRecorder({
"AsrErrorListener":(errMessage)=>console.log(errMessage),
"AsrRsultListener":(resultMessage)=>console.log(resultMessage),
"AsrEventListener":(eventMessage)=>console.log(eventMessage),
"AsrVolumeListener":(volumeMessage)=>console.log(volumeMessage)
}).then((result)=>console.log('result',result));
in other words, the H5 page initializes the SDK, calls the uplusapi SDKInit (config) interface, and initializes the SDK. The H5 page creates a speech recognition object, calling the uplusapi createAsrRecorder () interface. The H5 page subscribes to the speech recognition interface, calls the upusapi attachasrRecorder (listener.) interface, and the speech recognition result is returned through listener. And starting recording on the H5 page, calling an uplusapi startAsrRecorder (config) interface, and returning the voice recognition result in real time through monitoring set in the step 3.
As an optional implementation manner, a voice file is generated through front-end voice input, a background interface is called, the file is transmitted to a background, the voice file is converted into characters through background processing, the characters are transmitted to a front-end page, and the front-end page is processed. This approach has the following disadvantages: 1. the method is not universal enough and can be used in respective pages only; 2. typically only one voice SDK is invoked; 3. the front end has large service volume, relates to the acquisition of voice files and the calling of background interfaces, and simultaneously needs to register and monitor to realize automatic monitoring.
As an alternative embodiment, the native capability of the smart app is encapsulated by the speech recognition module in uplusampi, and the native uses an artificial intelligent AI SDK which calls different speech SDKs through the smart speech server, such as: baidu voice recognition; the H5 page can realize voice recognition by only introducing UPLUSAPI and calling the method in upSpeechModule, thereby greatly simplifying the steps of realizing voice recognition by the front end; the H5 page can determine the finally invoked voice SDK by only entering corresponding parameters in the initialization method. Different voice SDKs are used for guaranteeing, so that the stability and the accuracy of voice recognition are improved; the H5 page receives the speech recognition result in real time by means of subscription.
Through the optional embodiment, the voice SDK, native capability and various interactions among the voice SDK and the native capability are packaged, only the simplest core library is exposed finally, the front end H5 is convenient to use, the front end H5 can directly call the method of the core library no matter whether voice recognition food material is realized or voice search is realized, file acquisition, interface calling, registration monitoring and the like are not needed to be realized by the front end H5, voice recognition is greatly facilitated, and the interaction efficiency of a front end page and a user is improved.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a speech recognition apparatus is further provided, and the speech recognition apparatus is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 4 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus including:
an obtaining module 42, configured to obtain configuration parameters of a speech recognition module on a target page, where the configuration parameters are used to determine a recognition requirement of the speech recognition module;
the first processing module 44 is configured to create a voice recognition object according to the acquired configuration parameters, and instruct the target page to subscribe to the voice recognition object, where the voice recognition object is used to recognize voice collected on the target page;
and a second processing module 46, configured to perform speech recognition on the speech to be recognized according to the speech recognition object under the condition that the speech to be recognized is acquired by the target page.
The device is used for acquiring the configuration parameters of the voice recognition module on the target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.
In an optional embodiment, the initialization module is configured to initialize a software toolkit corresponding to the speech recognition module; and the model selection module is used for selecting a target voice recognition model from the software toolkit according to the configuration parameters after the software toolkit is initialized so that the voice recognition object can perform voice recognition according to the voice recognition model.
That is, after the configuration parameters of the speech recognition module are determined, the corresponding software toolkit needs to be initialized, then the speech recognition model is selected from the toolkit according to the configuration parameters, and speech recognition is performed according to the target speech recognition model.
In an exemplary embodiment, the obtaining module is further configured to: determining a voice recognition mode of the voice recognition module according to the configuration parameters, wherein the voice recognition mode is used for indicating that target type voice recognition is carried out on voice to be recognized, and the configuration parameters and the voice recognition mode are stored in a preset position in a one-to-one correspondence manner; and creating a voice recognition object corresponding to the voice recognition mode.
That is, the configuration parameters are acquired to create the speech recognition object, and a speech recognition mode for instructing target type speech recognition on the speech to be recognized needs to be determined according to the configuration parameters.
In an exemplary embodiment, the second processing module is further configured to: obtaining a subscription result of the target page for a voice recognition object; and under the condition that the subscription result indicates that the target page has successfully subscribed the voice recognition object, calling the voice recognition object to recognize the voice to be recognized collected on the target page.
Namely, performing voice recognition on a voice to be recognized according to a voice recognition object, acquiring a subscription result of a target page to the voice recognition object, and recognizing the voice to be recognized by calling the collected voice to be recognized under the condition that the successful subscription of the voice recognition object is confirmed through the subscription result.
In an exemplary embodiment, the obtaining module is further configured to: acquiring a configuration instruction triggered by a target object on the target page, and analyzing the configuration instruction through a target container capacity interface library to obtain an analysis result; and determining the configuration parameters corresponding to the configuration instructions according to the analysis result.
That is, a configuration instruction triggered by the target object on the target page needs to be acquired first, then the configuration instruction is analyzed, and the configuration parameters are determined according to the analysis result, so that the configuration parameters of the voice recognition module on the target page are acquired.
In an exemplary embodiment, the first processing module is further configured to: and under the condition that the voice silence time of the voice to be recognized which is currently collected is detected to exceed a preset threshold value, the voice collection is suspended, and the voice to be recognized which is currently collected is subjected to voice recognition according to the voice recognition object.
That is, the speech to be recognized is recognized according to the speech and the other object, the speech silence time of the speech to be recognized currently being collected needs to be detected first, the collection is suspended when the speech silence time exceeds a preset threshold, and the collected speech is recognized when the speech is started.
It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module;
s2, creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page;
and S3, under the condition that the target page collects the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.
Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.
In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
In an exemplary embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module;
s2, creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page;
and S3, under the condition that the target page collects the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.
It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A speech recognition method, comprising:
acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module;
creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page;
and under the condition that the target page acquires the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.
2. The method of claim 1, wherein after obtaining the configuration instruction of the speech recognition module on the target page to determine the configuration parameters of the speech recognition module, the method further comprises:
initializing a software tool kit corresponding to the voice recognition module;
and after the software toolkit is initialized, selecting a target voice recognition model from the software toolkit according to the configuration parameters so that the voice recognition object can perform voice recognition according to the voice recognition model.
3. The method of claim 1, wherein creating a speech recognition object according to the obtained configuration parameters comprises:
determining a voice recognition mode of the voice recognition module according to the configuration parameters, wherein the voice recognition mode is used for indicating that target type voice recognition is carried out on voice to be recognized, and the configuration parameters and the voice recognition mode are stored in a preset position in a one-to-one correspondence manner;
and creating a voice recognition object corresponding to the voice recognition mode.
4. The method according to claim 1, wherein in a case where the target page collects a speech to be recognized, performing speech recognition on the speech to be recognized according to the speech recognition object includes:
obtaining a subscription result of the target page for a voice recognition object;
and under the condition that the subscription result indicates that the target page has successfully subscribed the voice recognition object, calling the voice recognition object to recognize the voice to be recognized collected on the target page.
5. The method of claim 1, wherein obtaining configuration parameters of a speech recognition module on a target page comprises:
acquiring a configuration instruction triggered by a target object on the target page, and analyzing the configuration instruction through a target container capacity interface library to obtain an analysis result;
and determining the configuration parameters corresponding to the configuration instructions according to the analysis result.
6. The method according to claim 1, wherein performing speech recognition on the speech to be recognized according to the speech recognition object comprises:
and under the condition that the voice silence time of the voice to be recognized which is currently collected is detected to exceed a preset threshold value, the voice collection is suspended, and the voice to be recognized which is currently collected is subjected to voice recognition according to the voice recognition object.
7. A speech processing apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring configuration parameters of a voice recognition module on a target page, and the configuration parameters are used for determining the recognition requirements of the voice recognition module;
the first processing module is used for creating a voice recognition object according to the acquired configuration parameters and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice acquired on the target page;
and the second processing module is used for carrying out voice recognition on the voice to be recognized according to the voice recognition object under the condition that the voice to be recognized is collected by the target page.
8. The apparatus of claim 7, further comprising:
the initialization module is used for initializing the software toolkit corresponding to the voice recognition module;
and the model selection module is used for selecting a target voice recognition model from the software toolkit according to the configuration parameters after the software toolkit is initialized so that the voice recognition object can perform voice recognition according to the voice recognition model.
9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 6.
10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.
CN202011541666.XA 2020-12-23 2020-12-23 Speech recognition method and device, storage medium and electronic device Active CN112735424B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011541666.XA CN112735424B (en) 2020-12-23 2020-12-23 Speech recognition method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011541666.XA CN112735424B (en) 2020-12-23 2020-12-23 Speech recognition method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN112735424A true CN112735424A (en) 2021-04-30
CN112735424B CN112735424B (en) 2023-03-28

Family

ID=75604594

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011541666.XA Active CN112735424B (en) 2020-12-23 2020-12-23 Speech recognition method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112735424B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436627A (en) * 2021-08-27 2021-09-24 广州小鹏汽车科技有限公司 Voice interaction method, device, system, vehicle and medium
CN113450801A (en) * 2021-08-27 2021-09-28 广州小鹏汽车科技有限公司 Voice interaction method, device, system, vehicle and medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201194109Y (en) * 2008-02-04 2009-02-11 夏晓冰 Speech interactive full 3D tridimensional network construction based on webpage
US20100324910A1 (en) * 2009-06-19 2010-12-23 Microsoft Corporation Techniques to provide a standard interface to a speech recognition platform
US20170337177A1 (en) * 2016-05-19 2017-11-23 Palo Alto Research Center Incorporated Natural language web browser
CN108090156A (en) * 2017-12-12 2018-05-29 广东广业开元科技有限公司 A kind of method that book keeping operation artificial intelligence accounting system can be cooperateed with based on HTML5 establishments
CN108364645A (en) * 2018-02-08 2018-08-03 北京奇安信科技有限公司 A kind of method and device for realizing page interaction based on phonetic order
CN109410932A (en) * 2018-10-17 2019-03-01 百度在线网络技术(北京)有限公司 Voice operating method and apparatus based on HTML5 webpage
US20190174008A1 (en) * 2016-07-18 2019-06-06 Hangzhou Hikvision Digital Technology Co., Ltd. Method and Apparatus for Sending and Receiving Voice of Browser, and Voice Intercom System
CN109857392A (en) * 2019-01-03 2019-06-07 深圳壹账通智能科技有限公司 A kind of intelligent developed method, apparatus and electronic equipment of HTML5 component
CN110597508A (en) * 2019-08-14 2019-12-20 平安国际智慧城市科技股份有限公司 Interface dynamic configuration method, device and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201194109Y (en) * 2008-02-04 2009-02-11 夏晓冰 Speech interactive full 3D tridimensional network construction based on webpage
US20100324910A1 (en) * 2009-06-19 2010-12-23 Microsoft Corporation Techniques to provide a standard interface to a speech recognition platform
US20170337177A1 (en) * 2016-05-19 2017-11-23 Palo Alto Research Center Incorporated Natural language web browser
US20190174008A1 (en) * 2016-07-18 2019-06-06 Hangzhou Hikvision Digital Technology Co., Ltd. Method and Apparatus for Sending and Receiving Voice of Browser, and Voice Intercom System
CN108090156A (en) * 2017-12-12 2018-05-29 广东广业开元科技有限公司 A kind of method that book keeping operation artificial intelligence accounting system can be cooperateed with based on HTML5 establishments
CN108364645A (en) * 2018-02-08 2018-08-03 北京奇安信科技有限公司 A kind of method and device for realizing page interaction based on phonetic order
CN109410932A (en) * 2018-10-17 2019-03-01 百度在线网络技术(北京)有限公司 Voice operating method and apparatus based on HTML5 webpage
CN109857392A (en) * 2019-01-03 2019-06-07 深圳壹账通智能科技有限公司 A kind of intelligent developed method, apparatus and electronic equipment of HTML5 component
CN110597508A (en) * 2019-08-14 2019-12-20 平安国际智慧城市科技股份有限公司 Interface dynamic configuration method, device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
我的执着: "HTML5 Web Speech API 结合Ext实现浏览器语音识别以及输入", 《HTTPS://BLOG.CSDN.NET/LEECHO571/ARTICLE/DETAILS/9316799》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113436627A (en) * 2021-08-27 2021-09-24 广州小鹏汽车科技有限公司 Voice interaction method, device, system, vehicle and medium
CN113450801A (en) * 2021-08-27 2021-09-28 广州小鹏汽车科技有限公司 Voice interaction method, device, system, vehicle and medium

Also Published As

Publication number Publication date
CN112735424B (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN112735424B (en) Speech recognition method and device, storage medium and electronic device
CN109981910B (en) Service recommendation method and device
CN109548045B (en) Equipment debugging method, device, system and storage medium
CN109561002B (en) Voice control method and device for household electrical appliance
CN109068346B (en) Method and device for configuring WiFi parameters
CN112632420A (en) Interface skipping method and device, storage medium and electronic device
CN112908321A (en) Device control method, device, storage medium, and electronic apparatus
CN105338204A (en) Interactive voice response method and device
CN111736938A (en) Information display method and device, storage medium and electronic device
CN110209619B (en) Method for automatically matching multi-model drivers and related device
US9552813B2 (en) Self-adaptive intelligent voice device and method
CN111090770A (en) Data information acquisition method, device and system and household appliance
CN112735406A (en) Device control method and apparatus, storage medium, and electronic apparatus
CN111382259A (en) Analysis method and device for APP crash logs
CN111376255B (en) Robot data acquisition method and device and terminal equipment
CN111723785A (en) Animal estrus determination method and device
CN105577459B (en) Flow monitoring method and system, cloud server and client
CN109243437A (en) Reminding method, device, storage medium and the electronic device of information
CN112150590B (en) Animation file output method and device
CN114090074A (en) Method and device for configuring operating environment, storage medium and electronic device
CN114265866A (en) Streaming data processing method, rule plug-in, streaming data processing module and system
CN109410933B (en) Device control method and apparatus, storage medium, and electronic apparatus
CN112698948A (en) Method and device for acquiring product resources, storage medium and electronic device
CN110288982B (en) Voice prompt broadcasting method, device, storage medium and device for charging station
CN109524002A (en) Intelligent voice recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant