CN112735424A

CN112735424A - Speech recognition method and device, storage medium and electronic device

Info

Publication number: CN112735424A
Application number: CN202011541666.XA
Authority: CN
Inventors: 刘兰
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-04-30
Anticipated expiration: 2040-12-23
Also published as: CN112735424B

Abstract

The invention discloses a voice recognition method and device, a storage medium and an electronic device, wherein the method comprises the following steps: acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters and indicating a target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.

Description

Speech recognition method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of speech recognition, and in particular, to a speech recognition method and apparatus, a storage medium, and an electronic apparatus.

Background

Automatic Speech Recognition (ASR) is the most fundamental AI technique in speech interaction, and is a process of converting voice into text. Common such as Siri, smart speakers, etc.

Most of the current device control detail pages of the smart APP adopt the front-end H5 technology, and the front-end page has the functional requirements of speech recognition, for example: when the smart App refrigerator H5 detail page uses the voice recognition food material, the voice input of the user is needed, and the APP recognition voice content is converted into characters for the front end H5 page to use. In the process, the front end cannot process the aspects of calling system recording, performing voice recognition conversion processing and the like, and depends on the support of the native capability and the SDK.

Aiming at the problems that the front end can not carry out voice recognition and the like in the related technology, an effective solution is not provided yet.

Disclosure of Invention

The embodiment of the invention provides a voice recognition method and device, a storage medium and an electronic device, which are used for at least solving the problems that the front end cannot perform voice recognition and the like in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a speech recognition method, including: acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; and under the condition that the target page acquires the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.

In an exemplary embodiment, after obtaining the configuration instruction of the speech recognition module on the target page to determine the configuration parameters of the speech recognition module, the method further includes: initializing a software tool kit corresponding to the voice recognition module; and after the software toolkit is initialized, selecting a target voice recognition model from the software toolkit according to the configuration parameters so that the voice recognition object can perform voice recognition according to the voice recognition model.

In one exemplary embodiment, creating a speech recognition object according to the acquired configuration parameters includes: determining a voice recognition mode of the voice recognition module according to the configuration parameters, wherein the voice recognition mode is used for indicating that target type voice recognition is carried out on voice to be recognized, and the configuration parameters and the voice recognition mode are stored in a preset position in a one-to-one correspondence manner; and creating a voice recognition object corresponding to the voice recognition mode.

In an exemplary embodiment, when the target page collects a speech to be recognized, performing speech recognition on the speech to be recognized according to the speech recognition object, including: obtaining a subscription result of the target page for a voice recognition object; and under the condition that the subscription result indicates that the target page has successfully subscribed the voice recognition object, calling the voice recognition object to recognize the voice to be recognized collected on the target page.

In an exemplary embodiment, obtaining configuration parameters of a speech recognition module on a target page includes: acquiring a configuration instruction triggered by a target object on the target page, and analyzing the configuration instruction through a target container capacity interface library to obtain an analysis result; and determining the configuration parameters corresponding to the configuration instructions according to the analysis result.

In an exemplary embodiment, performing speech recognition on the speech to be recognized according to the speech recognition object includes: and under the condition that the voice silence time of the voice to be recognized which is currently collected is detected to exceed a preset threshold value, the voice collection is suspended, and the voice to be recognized which is currently collected is subjected to voice recognition according to the voice recognition object.

According to another aspect of the embodiments of the present invention, there is also provided a speech recognition apparatus including: the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring configuration parameters of a voice recognition module on a target page, and the configuration parameters are used for determining the recognition requirements of the voice recognition module; the first processing module is used for creating a voice recognition object according to the acquired configuration parameters and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice acquired on the target page; and the second processing module is used for carrying out voice recognition on the voice to be recognized according to the voice recognition object under the condition that the voice to be recognized is collected by the target page.

In one exemplary embodiment, further comprising: the initialization module is used for initializing the software toolkit corresponding to the voice recognition module; and the model selection module is used for selecting a target voice recognition model from the software toolkit according to the configuration parameters after the software toolkit is initialized so that the voice recognition object can perform voice recognition according to the voice recognition model.

According to a further aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned speech recognition method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the voice recognition method through the computer program.

In the embodiment of the invention, the configuration parameters of a voice recognition module on a target page are obtained, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a computer terminal of a speech recognition method according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a speech recognition method according to an embodiment of the present invention;

FIG. 3 is a schematic illustration of a speech recognition process according to an alternative embodiment of the present invention;

fig. 4 is a block diagram of a structure of a voice recognition apparatus according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings in conjunction with the embodiments.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method provided by the embodiment of the application can be executed in a computer terminal or a similar operation device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a speech recognition method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal may include one or more (only one shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and in an exemplary embodiment, may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1.

The memory 104 may be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the speech recognition method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer programs stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In the embodiment, a speech recognition method is provided, and fig. 2 is a flowchart of a speech recognition method according to an embodiment of the present invention, where the flowchart includes the following steps:

step S202, obtaining configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module;

step S204, a voice recognition object is created according to the acquired configuration parameters, and the target page is instructed to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page;

and step S206, under the condition that the target page collects the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.

Through the steps, the configuration parameters of the voice recognition module on the target page are obtained, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.

In an optional embodiment, after obtaining the configuration instruction of the speech recognition module on the target page to determine the configuration parameters of the speech recognition module, the method further includes: initializing a software tool kit corresponding to the voice recognition module; and after the software toolkit is initialized, selecting a target voice recognition model from the software toolkit according to the configuration parameters so that the voice recognition object can perform voice recognition according to the voice recognition model.

That is, after the configuration parameters of the speech recognition module are determined, the corresponding software toolkit needs to be initialized, then the speech recognition model is selected from the toolkit according to the configuration parameters, and speech recognition is performed according to the target speech recognition model.

That is, the configuration parameters are acquired to create the speech recognition object, and a speech recognition mode for instructing target type speech recognition on the speech to be recognized needs to be determined according to the configuration parameters.

Namely, performing voice recognition on a voice to be recognized according to a voice recognition object, acquiring a subscription result of a target page to the voice recognition object, and recognizing the voice to be recognized by calling the collected voice to be recognized under the condition that the successful subscription of the voice recognition object is confirmed through the subscription result.

That is, a configuration instruction triggered by the target object on the target page needs to be acquired first, then the configuration instruction is analyzed, and the configuration parameters are determined according to the analysis result, so that the configuration parameters of the voice recognition module on the target page are acquired.

That is, the speech to be recognized is recognized according to the speech and the other object, the speech silence time of the speech to be recognized currently being collected needs to be detected first, the collection is suspended when the speech silence time exceeds a preset threshold, and the collected speech is recognized when the speech is started.

Fig. 3 is a schematic diagram of a speech recognition process according to an alternative embodiment of the present invention, as shown in fig. 3, including the following steps:

step S1, SDK initialization (sdkinit (config)); when App is developed, after the SDK is downloaded and integrated, the SDK needs to be initialized to use the method of the SDK. The reason why the front end H5 calls the method initialization SDK is mainly that front end incoming parameters are needed, and if H5 is not needed to determine the incoming parameters, the method can be developed by native (i.e., Android or iOS development direct initialization).

Alternatively, the parameters (Params) input during the initialization of the SDK may be as shown in table 1:

TABLE 1

Step S2, SDK initialization (sdkinit (config));

step S3, initialize (init (config));

step S4, returning an initialization result;

step S5, returning an initialization result;

step S6, returning an initialization result;

step S7, creating a speech recognition object (createaserrrecorder);

step S8, creating a speech recognition object (createaserrrecorder);

step S9, creating a speech recognition object (createaserrrecorder);

optionally, before starting voice recording, the method is called to subscribe to voice recognition, otherwise, a voice recognition result cannot be obtained. The parameters and the most important results among them are shown in table 2:

TABLE 2

Step S10, return to asr object;

step S11, subscribing a voice recognition result, namely, attach (iAsrRecorderCallback);

step S12, returning asr subscription result;

step S13, returning asr subscription result;

step S14, returning asr subscription result;

step S15, subscribe to voice listening to attachasrrrecorder (listener.);

step S16, subscribing to voice listening attachasrRecorder (listener);

step S17, returning asr subscription result;

step S18, returning asr subscription result;

optionally, the subscribing voice listening returns a ret Date description corresponding to the subscribing result, as shown in table 3:

TABLE 3

Step S19, starting a voice monitoring start AsrRecorder (config); when recording voice, if the voice pause exceeds 2s, automatically stopping voice recording and carrying out voice recognition;

step S20, starting a voice monitoring start AsrRecorder (config);

step S21, start (config);

step S22, returning a starting asr result;

step S23, returning a starting asr result;

step S24, return to start asr result.

Specifically, starting the SDK (SDKInit), starting the SDK by h5, and transmitting a starting config parameter; creating a speech recognition object (createaserrrecorder); subscribing to a speech recognition result (attechasrrregister); voice recording is started (start AsrRecorder), then H5 transmits a config parameter, and the voice recognition result is returned to the H5 page through the subscription.

Optionally, the calling method is as follows:

plusapi.upSpeechRecognitionModule.attachAsrRecorder({

"AsrErrorListener":(errMessage)＝>console.log(errMessage),

"AsrRsultListener":(resultMessage)＝>console.log(resultMessage),

"AsrEventListener":(eventMessage)＝>console.log(eventMessage),

"AsrVolumeListener":(volumeMessage)＝>console.log(volumeMessage)

}).then((result)＝>console.log('result',result))；

in other words, the H5 page initializes the SDK, calls the uplusapi SDKInit (config) interface, and initializes the SDK. The H5 page creates a speech recognition object, calling the uplusapi createAsrRecorder () interface. The H5 page subscribes to the speech recognition interface, calls the upusapi attachasrRecorder (listener.) interface, and the speech recognition result is returned through listener. And starting recording on the H5 page, calling an uplusapi startAsrRecorder (config) interface, and returning the voice recognition result in real time through monitoring set in the step 3.

As an optional implementation manner, a voice file is generated through front-end voice input, a background interface is called, the file is transmitted to a background, the voice file is converted into characters through background processing, the characters are transmitted to a front-end page, and the front-end page is processed. This approach has the following disadvantages: 1. the method is not universal enough and can be used in respective pages only; 2. typically only one voice SDK is invoked; 3. the front end has large service volume, relates to the acquisition of voice files and the calling of background interfaces, and simultaneously needs to register and monitor to realize automatic monitoring.

As an alternative embodiment, the native capability of the smart app is encapsulated by the speech recognition module in uplusampi, and the native uses an artificial intelligent AI SDK which calls different speech SDKs through the smart speech server, such as: baidu voice recognition; the H5 page can realize voice recognition by only introducing UPLUSAPI and calling the method in upSpeechModule, thereby greatly simplifying the steps of realizing voice recognition by the front end; the H5 page can determine the finally invoked voice SDK by only entering corresponding parameters in the initialization method. Different voice SDKs are used for guaranteeing, so that the stability and the accuracy of voice recognition are improved; the H5 page receives the speech recognition result in real time by means of subscription.

Through the optional embodiment, the voice SDK, native capability and various interactions among the voice SDK and the native capability are packaged, only the simplest core library is exposed finally, the front end H5 is convenient to use, the front end H5 can directly call the method of the core library no matter whether voice recognition food material is realized or voice search is realized, file acquisition, interface calling, registration monitoring and the like are not needed to be realized by the front end H5, voice recognition is greatly facilitated, and the interaction efficiency of a front end page and a user is improved.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a speech recognition apparatus is further provided, and the speech recognition apparatus is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram of a voice recognition apparatus according to an embodiment of the present invention, as shown in fig. 4, the apparatus including:

an obtaining module 42, configured to obtain configuration parameters of a speech recognition module on a target page, where the configuration parameters are used to determine a recognition requirement of the speech recognition module;

the first processing module 44 is configured to create a voice recognition object according to the acquired configuration parameters, and instruct the target page to subscribe to the voice recognition object, where the voice recognition object is used to recognize voice collected on the target page;

and a second processing module 46, configured to perform speech recognition on the speech to be recognized according to the speech recognition object under the condition that the speech to be recognized is acquired by the target page.

The device is used for acquiring the configuration parameters of the voice recognition module on the target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module; creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page; under the condition that the target page collects the voice to be recognized, the voice to be recognized is subjected to voice recognition according to the voice recognition object, the problems that the front end cannot perform voice recognition and the like in the related technology are solved, the front end H5 is greatly facilitated to realize voice recognition, and the interaction efficiency of the front end page and a user is improved.

In an optional embodiment, the initialization module is configured to initialize a software toolkit corresponding to the speech recognition module; and the model selection module is used for selecting a target voice recognition model from the software toolkit according to the configuration parameters after the software toolkit is initialized so that the voice recognition object can perform voice recognition according to the voice recognition model.

In an exemplary embodiment, the obtaining module is further configured to: determining a voice recognition mode of the voice recognition module according to the configuration parameters, wherein the voice recognition mode is used for indicating that target type voice recognition is carried out on voice to be recognized, and the configuration parameters and the voice recognition mode are stored in a preset position in a one-to-one correspondence manner; and creating a voice recognition object corresponding to the voice recognition mode.

In an exemplary embodiment, the second processing module is further configured to: obtaining a subscription result of the target page for a voice recognition object; and under the condition that the subscription result indicates that the target page has successfully subscribed the voice recognition object, calling the voice recognition object to recognize the voice to be recognized collected on the target page.

In an exemplary embodiment, the obtaining module is further configured to: acquiring a configuration instruction triggered by a target object on the target page, and analyzing the configuration instruction through a target container capacity interface library to obtain an analysis result; and determining the configuration parameters corresponding to the configuration instructions according to the analysis result.

In an exemplary embodiment, the first processing module is further configured to: and under the condition that the voice silence time of the voice to be recognized which is currently collected is detected to exceed a preset threshold value, the voice collection is suspended, and the voice to be recognized which is currently collected is subjected to voice recognition according to the voice recognition object.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module;

s2, creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page;

and S3, under the condition that the target page collects the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.

Embodiments of the present invention also provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above-mentioned method embodiments when executed.

In an exemplary embodiment, the computer-readable storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

In an exemplary embodiment, the processor may be configured to execute the following steps by a computer program:

It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and they may be implemented using program code executable by the computing devices, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into various integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A speech recognition method, comprising:

acquiring configuration parameters of a voice recognition module on a target page, wherein the configuration parameters are used for determining the recognition requirements of the voice recognition module;

creating a voice recognition object according to the acquired configuration parameters, and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice collected on the target page;

and under the condition that the target page acquires the voice to be recognized, performing voice recognition on the voice to be recognized according to the voice recognition object.

2. The method of claim 1, wherein after obtaining the configuration instruction of the speech recognition module on the target page to determine the configuration parameters of the speech recognition module, the method further comprises:

initializing a software tool kit corresponding to the voice recognition module;

and after the software toolkit is initialized, selecting a target voice recognition model from the software toolkit according to the configuration parameters so that the voice recognition object can perform voice recognition according to the voice recognition model.

3. The method of claim 1, wherein creating a speech recognition object according to the obtained configuration parameters comprises:

determining a voice recognition mode of the voice recognition module according to the configuration parameters, wherein the voice recognition mode is used for indicating that target type voice recognition is carried out on voice to be recognized, and the configuration parameters and the voice recognition mode are stored in a preset position in a one-to-one correspondence manner;

and creating a voice recognition object corresponding to the voice recognition mode.

4. The method according to claim 1, wherein in a case where the target page collects a speech to be recognized, performing speech recognition on the speech to be recognized according to the speech recognition object includes:

obtaining a subscription result of the target page for a voice recognition object;

and under the condition that the subscription result indicates that the target page has successfully subscribed the voice recognition object, calling the voice recognition object to recognize the voice to be recognized collected on the target page.

5. The method of claim 1, wherein obtaining configuration parameters of a speech recognition module on a target page comprises:

acquiring a configuration instruction triggered by a target object on the target page, and analyzing the configuration instruction through a target container capacity interface library to obtain an analysis result;

and determining the configuration parameters corresponding to the configuration instructions according to the analysis result.

6. The method according to claim 1, wherein performing speech recognition on the speech to be recognized according to the speech recognition object comprises:

and under the condition that the voice silence time of the voice to be recognized which is currently collected is detected to exceed a preset threshold value, the voice collection is suspended, and the voice to be recognized which is currently collected is subjected to voice recognition according to the voice recognition object.

7. A speech processing apparatus, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring configuration parameters of a voice recognition module on a target page, and the configuration parameters are used for determining the recognition requirements of the voice recognition module;

the first processing module is used for creating a voice recognition object according to the acquired configuration parameters and indicating the target page to subscribe the voice recognition object, wherein the voice recognition object is used for recognizing the voice acquired on the target page;

and the second processing module is used for carrying out voice recognition on the voice to be recognized according to the voice recognition object under the condition that the voice to be recognized is collected by the target page.

8. The apparatus of claim 7, further comprising:

the initialization module is used for initializing the software toolkit corresponding to the voice recognition module;

and the model selection module is used for selecting a target voice recognition model from the software toolkit according to the configuration parameters after the software toolkit is initialized so that the voice recognition object can perform voice recognition according to the voice recognition model.

9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 6.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.