CN109036431A - A kind of speech recognition system and method - Google Patents
A kind of speech recognition system and method Download PDFInfo
- Publication number
- CN109036431A CN109036431A CN201810758940.5A CN201810758940A CN109036431A CN 109036431 A CN109036431 A CN 109036431A CN 201810758940 A CN201810758940 A CN 201810758940A CN 109036431 A CN109036431 A CN 109036431A
- Authority
- CN
- China
- Prior art keywords
- processing module
- engine
- module
- targeting
- preposition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
Abstract
The embodiment of the invention discloses a kind of speech recognition system and method, the system comprises: engine resource scheduler module and at least two engine processing modules;Wherein, the engine resource scheduler module is used for the state selection target engine processing module according at least two engines processing module;Targeting engine processing module in at least two engines processing module is used to carry out speech recognition to the voice data received.Scheduling due to engine resource scheduler module to engine processing module, the engine processing module being in idle condition can be allowed to carry out speech recognition, avoid resource contention caused by when multiple voice data are handled as an engine processing module simultaneously, the problem of reducing recognition efficiency, the discrimination and service efficiency for promoting speech recognition engine in high concurrent are realized, while removing the somewhat complex design of engine processing module identification multi-path voice.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of speech recognition systems and method.
Background technique
Speech recognition technology, also referred to as automatic speech recognition (ASR, Automatic Speech Recognition),
It is vocabulary Content Transformation in the voice by the mankind is computer-readable input.Currently, speech recognition technology becomes artificial
One more common technology of smart field.
Speech recognition system in the prior art as a whole, incorporates all application logics, not only to handle
The identification of voice data will also handle the concurrently access and various engine parameters of multichannel data, the dynamic configuration etc. of model
Deng such benefit is to can be convenient deployment, and structure is relatively easy.But since system is excessively huge, maintenance upgrade is inconvenient.
Meanwhile speech recognition engine can only single channel identify voice data, pass through multithreading support identification engine synchronization handle multichannel
Audio data be it is extremely complex, multiple threads multi-path audio-frequency data is easy to appear bug, and implement it is extremely difficult, because
This, the reliability and efficiency of speech processes are all low.
Summary of the invention
The present invention provides a kind of speech recognition system and method, and speech recognition engine can be promoted in high concurrent
Discrimination and service efficiency.
In a first aspect, the embodiment of the invention provides a kind of speech recognition system, the system comprises: engine resource scheduling
Module and at least two engine processing modules;
Wherein, the engine resource scheduler module is used to select mesh according to the state of at least two engines processing module
Processing module is held up in index;
The targeting engine processing module in at least two engines processing module is used for the voice number received
According to progress speech recognition.
Optionally, the system also includes preposition speech processing module, for receiving the voice data of user's transmission and right
The voice data is pre-processed, and pretreated voice data is sent to the targeting engine processing module.
Optionally, the engine resource scheduler module is also used to:
After selection target engine processing module, the address of the targeting engine processing module is sent to described preposition
Voice module;
Correspondingly, the preposition speech processing module is specifically used for: will according to the address of the targeting engine processing module
Pretreated voice data is sent to the targeting engine processing module.
Optionally, the system also includes account number correction verification module, for storing user information and related to user account number
Engine model parameter information.
Optionally, the engine resource scheduler module is also used to: acquisition and active user from the account number correction verification module
The relevant targeting engine model parameter information of account, and it is sent to preposition speech processing module;
Correspondingly, be also used to will be described according to the address of the targeting engine processing module for the preposition speech processing module
Targeting engine model parameter information is sent to the targeting engine processing module;
Correspondingly, the targeting engine processing module is also used to receive according to the targeting engine model parameter information butt joint
Voice data carry out speech recognition.
Optionally, the account number correction verification module is also used to carry out account to the user account for sending the voice data to test
Card.
Optionally, the system also includes system-monitoring modules, for the engine resource scheduler module and at least two
The operating status of a engine processing module is monitored.
Optionally, the system also includes: proxy modules, the voice data for sending user are sent to institute
State preposition speech processing module.Second aspect, the embodiment of the invention also provides a kind of audio recognition method, the method packets
It includes:
Engine resource scheduler module is according to the state selection target engine processing modules of at least two engine processing modules;
The targeting engine processing module receives voice data to be identified, and carries out speech recognition.
Optionally, before targeting engine processing module receives voice data to be identified, the method also includes:
The address of the targeting engine processing module is sent to preposition speech processes mould by the engine resource scheduler module
Block;
The preposition speech processing module is according to the address of the targeting engine processing module by the voice to be identified
Data are sent to the targeting engine processing module.
Optionally, the engine resource scheduler module is according to the state selection target engines of at least two engine processing modules
Before processing module, the method also includes:
Preposition speech processing module sends engine acquisition request to the engine resource scheduler module, so as to engine money
Source scheduler module is in response to the engine acquisition request, at the state selection target engine of at least two engine processing modules
Manage module.
Optionally, before targeting engine processing module carries out speech recognition, the method also includes:
The engine resource scheduler module obtains targeting engine relevant to active user's account from account number correction verification module
Model parameter information, and it is sent to preposition speech processing module, so that the preposition speech processing module is drawn according to the target
The targeting engine model parameter information is sent to the targeting engine processing module by the address for holding up processing module;
Correspondingly, the targeting engine processing module receives voice data to be identified, and carry out speech recognition, comprising:
The targeting engine processing module receives voice data to be identified, and believes according to the targeting engine model parameter
Breath carries out speech recognition;
Wherein, the account number correction verification module is for storing user information and engine model parameter relevant to user account number
Information.
Optionally, the preposition speech processing module will be described to be identified according to the address of the targeting engine processing module
Voice data be sent to the targeting engine processing module, comprising:
The preposition speech processing module receives the voice data to be identified that user sends and pre-processes;
The preposition speech processing module is according to the address of the targeting engine processing module, by the pretreated language
Sound data are sent to the targeting engine processing module.
Optionally, before preposition speech reception module sends engine acquisition request to the engine resource scheduler module, institute
State method further include:
Account correction verification module verifies the user account for sending voice data to be processed.
Optionally, the method also includes:
System-monitoring module to the operating status of the engine resource scheduler module and at least two engine processing modules into
Row monitoring.
Optionally, the preposition speech processing module receives the voice data to be identified of user's transmission and pre-processes,
Include:
The preposition speech processing module receives the voice data to be identified of user's transmission by proxy server and carries out
Pretreatment.A kind of speech recognition system disclosed by the embodiments of the present invention includes: engine resource scheduler module and at least two engines
Processing module;Wherein, the engine resource scheduler module is used to be selected according to the state of at least two engines processing module
Targeting engine processing module;Targeting engine processing module in at least two engines processing module is used for the language received
Sound data carry out speech recognition.Since engine resource scheduler module can dispatch engine processing module, allows and be in idle condition
Engine processing module carry out current speech identification, when avoiding multiple voice data and being handled simultaneously by engine processing module
Caused by resource contention, reduce recognition efficiency the problem of, realize in high concurrent promoted speech recognition engine identification
Rate and service efficiency, while removing the somewhat complex design of engine processing module identification multi-path voice.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of one of the embodiment of the present invention one speech recognition system.
Fig. 2 is the structural schematic diagram of one of the embodiment of the present invention two speech recognition system.
Fig. 3 is the structural schematic diagram of one of the embodiment of the present invention three speech recognition system.
Fig. 4 is the structural schematic diagram of another speech recognition system in the embodiment of the present invention three.
Fig. 5 is the flow chart of one of the embodiment of the present invention four audio recognition method.
Fig. 6 is the flow chart of another audio recognition method in the embodiment of the present invention four.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of structural schematic diagram for speech recognition system that the embodiment of the present invention one provides, and the present embodiment is applicable
In the speech recognition the case where, as shown in Figure 1, the speech recognition system includes: engine resource scheduler module (Scheduling
Server) 110 and at least two engine processing modules (Engine Server) 120;Wherein, the engine resource scheduler module
110 for the state selection target engine processing module according at least two engines processing module 120, wherein at engine
The state for managing module may include: operating status, it is preferred that can be available mode, for example engine processing module is in the free time
When state or non-operating state, then it can be considered as can choose it as targeting engine processing module in available mode;Institute
The targeting engine processing module at least two engine processing modules 120 is stated to be used to carry out voice knowledge to the voice data received
Not.
Wherein, engine processing module 120 can have automatic speech recognition technical functionality, to carry out language to voice data
Sound identification, for example voice messaging can be identified and be converted into text information.Automatic speech recognition technology (ASR, Automatic
Speech Recognition) it is a kind of technology that the voice of people is converted to text.Due to the diversity of voice signal and multiple
Polygamy, thus, it is ensured that accurately identifying for voice data also can be very high to speech recognition system requirement.Engine processing module 120
Can have in speech recognition system in embodiments of the present invention it is multiple, specifically can according to business need be designed.In general,
It, accordingly can be more with the number of design engine processing module 120 if portfolio is very big.Engine processing module 120 it
Between can independently dispose, respectively operate under whole system cluster, also can be convenient and manage and maintain.Specifically, at engine
Reason module 120 can be the multiple network nodes being distributed in voice system, can carry out voice knowledge to the voice data received
Not.Illustratively, 200 network nodes can be set as 200 independent engine processing modules.
Engine resource scheduler module 110 can carry out rationally effective adjusting and be surveyed to the resource of engine processing module 120
It measures and analyzes and uses.Illustratively, if preceding 100 engine processing modules in 200 engine processing modules are in work
State (such as carrying out voice recognition processing), then 100 are in inoperative shape after engine resource scheduler module can dispatch
Some in the idle engine processing module of state or multiple identifying processings for carrying out current speech data.Wherein, targeting engine
Processing module can be to be used to execute currently to language by what engine resource scheduler module 110 was selected in engine processing module 120
The engine processing module of the identifying processing of sound data.Specifically, engine resource scheduler module 110 can be according to engine processing module
Whether 120 be in idle condition, and chooses whether as targeting engine processing module.Therefore, the targeting engine processing selected
Module is to be in idle condition, and can carry out identifying processing to current voice data to be identified in time.Such technical solution,
Avoid in the same time handle multi-path voice data complexity, and caused by resource contention will affect speech recognition standard
The problem of true rate and efficiency decline, improves the utilization rate of audio identification efficiency and engine processing module resource.
Specifically, engine processing module 120 can be divided into high level engine processing according to the computing capability of engine processing module
Module, the engine processing module of intermediate engine processing module and low level data amount.Illustratively, the voice data greater than 1G can be with
Handled by high level engine processing module, greater than 500M less than 1G voice data can be by intermediate engine processing module at
Reason, the voice data less than 500M can be handled by low level engine processing module.Of course, it is possible to need to be arranged not according to business
The engine processing module of ad eundem.
In addition, the advantage that data in different formats can also be handled according to engine processing module of engine processing module 120 into
Row classification.Illustratively, classify according to the audio processings ability such as WMA, MP3, WAV, RA and MIDI.In order to engine resource
Scheduler module, which is combined, selects suitable engine processing module according to the data of voice to be identified, to handle corresponding audio data,
Processing capacity can be improved in this way, it is effective and reasonable to utilize engine resource, improve the accuracy rate of speech recognition and the benefit of engine resource
With rate.
Optionally, the system also includes system-monitoring modules (Monitor Server), for the engine resource
The operating status of scheduler module and at least two engine processing modules is monitored, to be responsible for the entire speech recognition system of monitoring
The health examination of operating status, such as some engine processing module such as are gone offline or are stopped working at the abnormal behaviours, can and
When the operation such as recognize and safeguarded and corrected, guarantee the normal work of speech recognition system.
The technical solution of the present embodiment, the scheduling due to engine resource scheduler module to engine processing module can allow place
Speech recognition is carried out in the engine processing module of idle state, avoids multiple voice data simultaneously by an engine processing module
It the problem of resource contention caused by when processing, reduction recognition efficiency, realizes and promotes speech recognition engine in high concurrent
Discrimination and service efficiency, while remove engine processing module identification multi-path voice somewhat complex design.
Embodiment two
Fig. 2 is a kind of speech recognition system provided in the embodiment of the present invention two, on the basis of the above embodiments, optional
, the speech recognition system further include: preposition speech processing module (Business Server) 130, as shown in Fig. 2, before described
Speech processing module 130 is set, for receiving the voice data of user's transmission and being pre-processed to the voice data, will be located in advance
Voice data after reason is sent to the targeting engine processing module.Since engine processing module 120 undertakes speech recognition and meter
The effect of calculation, therefore, if there is needing to receive mass data in the short time, it is possible that the unbearable situation of interface,
The problem of causing server crash or network blockage.Therefore, the preposition speech processing module 130 provided in the embodiment of the present invention
It can have the effect of data acquisition server, the voice data received can first be located in advance before speech recognition
Reason, shares the calculating pressure of engine processing module 120.Wherein, pretreatment can be in targeting engine processing module to receiving
Voice data carries out the preposition processing before speech recognition, such as the identification of the format including audio, size, the filtering of audio with
And the operation such as encoding and decoding.The calculation amount that also can reduce engine processing module in this way, further increases recognition efficiency.
Optionally, the engine resource scheduler module 110 is also used to: after selection target engine processing module, by institute
The address for stating targeting engine processing module is sent to the preposition voice module 130;
Correspondingly, the preposition speech processing module 130 is specifically used for: according to the address of the targeting engine processing module
Pretreated voice data is sent to the targeting engine processing module.
Illustratively, preposition speech processing module 130 can be provided when receiving the voice data of user's transmission to engine
Source scheduler module 110 sends corresponding engine acquisition request, wherein engine acquisition request can carry pending identification audio number
According to size and the information such as format, so that engine resource scheduler module 110 can be according to the ability of different engine processing modules 120
And available mode selects targeting engine processing module, specifically, targeting engine can be handled mould by engine scheduling module 110
The address of block is sent to preposition speech processing module 130, and preposition speech processing module 130 can be pre-processed according to the address
Audio data afterwards is sent to targeting engine processing module.Engine resource more reasonably can be distributed and be used in this way, sound is improved
The utilization rate of frequency recognition efficiency and engine processing module.
Optionally, the system also includes: proxy modules, the voice data for sending user are sent to institute
State preposition speech processing module.Wherein, proxy server (Proxy Server) can be used for connecting Internet
(Internet) and local area network (Local Area Network) it is, a kind of special network service, allows a network terminal
(generally client) passes through this proxy modules and another network terminal (generally server, such as the present invention
Speech recognition system in embodiment) carry out indirect connection.Illustratively, the client of user first with proxy server
Creation connection, then according to agency agreement used in proxy server, request creates connection to destination server or obtains
The specified resource of destination server.User's checking and book keeping operation function can be set using proxy server, to the power of different user
Limit is configured and manages, and can also increase buffer, improves access speed etc..
Optionally, the voice system can also include: license server module (License Server), wherein
The application vendor of some applications or server is how definition application and license server work together to track software
Use and permit and the licensing standard suggested.License server can assign engine processing module as user's operation in this way
Ability.
The technical solution of the embodiment of the present invention can be right by disposing the preposition speech processing module of speech recognition system
The voice data for receiving user's transmission pre-processes, and can mitigate the calculation amount of targeting engine processing module, further increases
Add the efficiency of speech recognition and the utilization rate of engine modules, using cluster resource and hardware cost can be saved to greatest extent.
It can be convenient the connection of user and speech recognition system by proxy modules, be also conducive to developer to the pipe of user account
Reason.Meanwhile the somewhat complex design of engine processing module identification multi-path voice is removed, distributed deployment can be in later maintenance and adjustment
Optimize voice system or carries out reducing difficulty and complexity during testing on line.
Embodiment three
Fig. 3 is a kind of structural schematic diagram for speech recognition system that the embodiment of the present invention three provides, in above-described embodiment
On the basis of, optionally, the system also includes: account number correction verification module (Profile Server) 140, for transmission institute's predicate
The user account of sound data carries out account verifying.Account can be user using speech recognition system in embodiments of the present invention
System carries out identity when speech recognition.Account verifying (Auth, authenticate) can refer to complete by certain means
The confirmation of pairs of user identity, to confirm whether it can carry out speech recognition, such as by confirming whether user is to have existed
User or member of related web site registration etc..Illustratively, the verifying of account can be carried out by identifying code or password, verifying is logical
Later, then the process of subsequent speech recognition can be carried out.Certainly, in the present embodiment, the step of account being verified and preposition language
Sound processing module is not construed as limiting the sequencing for the pre-treatment step that user sends voice data.
Optionally, the account correction verification module 140 is also used to store user information and engine relevant to user account number
Model parameter information.Wherein, user information may include the account of user, the time of login, speech recognition quantity or other
Historical data etc. and user in speech recognition system relevant personal information.Engine model parameter letter relevant to user account
Breath can be record user speech relevant to user account and identify relevant personalizing parameters information.Illustratively, the user
Sound it is more special, for example the frequency of sounding may have a difference with ordinary people's sound frequency, therefore common engine handles mould
Block may not identify not come out, and the configuration file of exclusive voice messaging can be arranged for the voice data of the user at this time, wherein
Relevant engine model parameter information can be recorded in configuration file, to help engine processing module to be accurately identified.Certainly,
Relevant engine model parameter information can also refer specifically to the parameter information of engine processing module associated with the user, exemplary
, the account that certain user can be set is associated with certain particular engine processing modules, i.e., specific engine can be used in he
Processing module carries out speech recognition, such as user A the 100th engine processing module of specific use.
Optionally, the engine resource scheduler module 110 be also used to obtain from the account correction verification module 140 with it is current
The relevant targeting engine model parameter information of user account, and it is sent to preposition speech processing module 130, correspondingly, before described
Speech processing module 130 is set to be also used to be believed the targeting engine model parameter according to the address of the targeting engine processing module
Breath is sent to the targeting engine processing module;Correspondingly, the targeting engine processing module is also used to draw according to the target
It holds up the voice data that model parameter information butt joint receives and carries out speech recognition.
Wherein, engine resource scheduler module 110 gets the relevant mesh of active user's account in account correction verification module 140
Model parameter information is held up in index, can be sent to preposition speech processing module 130.Correspondingly, preposition speech processing module 130 can
Targeting engine model parameter information relevant to active user's account is sent to targeting engine processing module, so that target is drawn
Hold up processing module further can more accurately identify the voice data of user according to targeting engine model parameter information.
Illustratively, Fig. 4 is the structural schematic diagram of another voice system provided in an embodiment of the present invention, as shown in figure 4,
The system includes: proxy modules, preposition speech processing module, engine processing module, engine resource scheduler module, account
Authentication module and system-monitoring module.Specifically, before the voice data that user sends can be sent to by proxy modules
Set speech processing module;Preposition speech processing module can be carried out after voice data pretreatment by account authentication module
Account verifying;After being verified, engine acquisition request is sent to engine resource scheduler module;Engine resource scheduler module can be with
According to the available mode selection target engine processing module of engine processing module, speech processing module preposition in this way can be by voice
Data are sent to targeting engine processing module and carry out speech recognition.Optionally, engine resource scheduler module can be simultaneously from account
Engine model parameter information relevant to user account is obtained in authentication module, is back to preposition speech processing module;Preposition language
It can be sent to targeting engine processing module with voice data by sound processing module jointly, to realize that more accurate voice is known
Not.In the process, system-monitoring module can monitor the state of modules operation at any time, guarantee timely maintenance, realize language
The normal work of sound identifying system.
The technical solution of the embodiment of the present invention can be to transmission by disposing the account correction verification module of speech recognition system
The user of voice data carries out account verifying, in addition, storing engine model ginseng relevant to user account in account correction verification module
Number information, can help engine processing module to accurately identify the voice data of individual user, improve audio identification efficiency
With the utilization rate of engine modules, using cluster resource and hardware cost can be saved to greatest extent.Meanwhile removing engine processing
Module identifies the somewhat complex design of multi-path voice, during testing on later maintenance and adjusting and optimizing voice system or progress line
Reduce difficulty and complexity.
Example IV
Fig. 5 is a kind of flow chart for audio recognition method that the embodiment of the present invention four provides, and this method can be adapted for language
The case where sound identifies can be executed with speech recognition system.As shown in figure 5, this method comprises:
S510, engine resource scheduler module handle mould according to the state selection target engine of at least two engine processing modules
Block.
Wherein, the state of engine processing module may include: operating status, specifically can be available mode, such as engine
Processing module is in idle condition or when non-operating state, then can be considered as can choose it as mesh in available mode
Processing module is held up in index.Illustratively, whether can be at the identification for carrying out voice data according to engine processing module
To determine whether being in operating status during reason, if being not on operating status, it is believed that it is available, it can
It is on the contrary, then it is assumed that its is unavailable to be selected as the processing that targeting engine processing module executes current voice data to be identified.
S520, the targeting engine processing module receive voice data to be identified, and carry out speech recognition.
Optionally, before targeting engine processing module receives voice data to be identified, the method also includes:
The address of the targeting engine processing module is sent to preposition speech processes mould by the engine resource scheduler module
Block;
The preposition speech processing module is according to the address of the targeting engine processing module by the voice to be identified
Data are sent to the targeting engine processing module.
Optionally, engine resource scheduler module is handled according to the state selection target engine of at least two engine processing modules
Before module, the method also includes:
Preposition speech processing module sends engine acquisition request to the engine resource scheduler module, so as to engine money
Source scheduler module is in response to the engine acquisition request, at the state selection target engine of at least two engine processing modules
Manage module.
Optionally, described before targeting engine processing module carries out speech recognition, the method also includes:
Engine resource scheduler module obtains targeting engine relevant to active user's account from configuration file setup module
Model parameter information, and it is sent to preposition speech processing module, so that the preposition speech processing module is drawn according to the target
The targeting engine model parameter information is sent to the targeting engine processing module by the address for holding up processing module;Correspondingly,
The targeting engine processing module receives voice data to be identified, and carries out speech recognition, comprising:
The targeting engine processing module receives voice data to be identified, and believes according to the targeting engine model parameter
Breath carries out speech recognition;
Wherein, account correction verification module, for storing user information and engine model parameter relevant to user account number letter
Breath.
Optionally, the preposition speech processing module will be described to be identified according to the address of the targeting engine processing module
Voice data be sent to the targeting engine processing module, comprising:
The preposition speech processing module receives the voice data to be identified that user sends and pre-processes;
The preposition speech processing module is according to the address of the targeting engine processing module, by the pretreatment
Voice data afterwards is sent to the targeting engine processing module.
Optionally, before preposition speech reception module sends engine acquisition request to the engine resource scheduler module, institute
State method further include: account correction verification module verifies the user account for sending voice data to be processed.Wherein, account is verified
Can in no particular order with preprocessing process of the preposition speech processing module to voice data, that is, it can be first pre- to voice data
Processing, can also first carry out verifying the account of user, carry out the pretreatment of voice data after being verified again.
Optionally, the method also includes: system-monitoring module draws the engine resource scheduler module and at least two
The operating status for holding up processing module is monitored.
Optionally, the preposition speech processing module receives the voice data to be identified of user's transmission and pre-processes,
Include:
The preposition speech processing module receives the voice data to be identified of user's transmission by proxy server and carries out
Pretreatment.In the present embodiment, the function or workflow of modules and relevant explanation may refer to of the invention any real
The description in one of example speech recognition system is applied, details are not described herein.
Illustratively, Fig. 6 is the flow chart for another audio recognition method that the embodiment of the present invention four provides, in Fig. 6
For step (1) to the process of step (10), to illustrate the audio recognition method in the present invention.Step (1), user is first carried out
It sends voice data (send audio) and arrives preposition speech processing module (Business Server);Further execute step
(2), preposition speech processing module sends account checking request (auth) to account authentication module (Profile Server);Step
(3) if, account authentication module send account be verified information (ok) to the preposition processing module of voice;Then follow the steps (4),
Preposition speech processing module sends engine acquisition request (get engine address) to engine resource scheduler module
(Scheduling Server).Engine resource scheduler module can draw according to the available mode selection target of engine processing module
Hold up processing module.Optionally, it may be performed simultaneously step (5), engine resource scheduler module to send and use to account correction verification module
The relevant engine model parameter information of family account (get profile);And step (6), account correction verification module draw relevant
It holds up model parameter information and is back to engine resource scheduler module (return profile);Then step (7), engine can be executed
The address of targeting engine processing module and engine model parameter information relevant to account can be sent to by scheduling of resource module
Preposition speech processing module (return engine address and profile info).Further execute step (8), preceding
It sets engine processing module and voice data to be identified is sent to by targeting engine according to the address of the targeting engine processing module
Processing module (Asr-Engine).Optionally, this voice data can be by the preposition pretreated language of engine processing module
Sound data, after targeting engine processing module completes voice data to be identified identification;Step (9) can be executed, by voice
Recognition result is back to preposition speech processing module (return asr result).Further, step (10), preposition are executed
The text information (asr text) of speech recognition can be sent to user by speech processing module.
The technical solution of the embodiment of the present invention can be such that engine handles by disposing the modules of speech recognition system
Module concentrates one's energy to handle phonetic recognization rate problem to greatest extent, engineering issues is removed from engine processing module, sufficiently
Using the function of other modules in system, the utilization rate of audio identification efficiency and engine modules can be improved, it can be with maximum limit
Degree utilizes cluster resource and saves hardware cost.Meanwhile in later maintenance and adjusting and optimizing voice system or survey on line
Difficulty and complexity are reduced during examination.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (12)
1. a kind of speech recognition system, which is characterized in that the system comprises: engine resource scheduler module and at least two engines
Processing module;
Wherein, the engine resource scheduler module according to the state selection target of at least two engines processing module for drawing
Hold up processing module;
The targeting engine processing module in at least two engines processing module be used for the voice data received into
Row speech recognition.
2. system according to claim 1, which is characterized in that the system also includes: preposition speech processing module is used for
It receives the voice data that user sends and the voice data is pre-processed, pretreated voice data is sent to institute
State targeting engine processing module.
3. system according to claim 2, which is characterized in that the engine resource scheduler module is also used to:
After selection target engine processing module, the address of the targeting engine processing module is sent to the preposition voice
Module;
Correspondingly, the preposition speech processing module is specifically used for: will be located in advance according to the address of the targeting engine processing module
Voice data after reason is sent to the targeting engine processing module.
4. system according to claim 3, which is characterized in that the system also includes: account number correction verification module, for storing
User information and engine model parameter information relevant to user account number.
5. system according to claim 4, which is characterized in that the engine resource scheduler module is also used to: from the account
Targeting engine model parameter information relevant to active user's account is obtained in number correction verification module, and is sent to preposition speech processes
Module;
Correspondingly, the preposition speech processing module is also used to the target according to the address of the targeting engine processing module
Engine model parameter information is sent to the targeting engine processing module;
Correspondingly, the targeting engine processing module is also used to the language received according to the targeting engine model parameter information butt joint
Sound data carry out speech recognition.
6. system according to claim 1, which is characterized in that the account number correction verification module is also used to transmission institute's predicate
The user account of sound data carries out account verifying.
7. a kind of audio recognition method, which is characterized in that the described method includes:
Engine resource scheduler module is according to the state selection target engine processing modules of at least two engine processing modules;
The targeting engine processing module receives voice data to be identified, and carries out speech recognition.
8. the method according to the description of claim 7 is characterized in that receiving voice number to be identified in targeting engine processing module
According to before, the method also includes:
The address of the targeting engine processing module is sent to preposition speech processing module by the engine resource scheduler module;
The preposition speech processing module is according to the address of the targeting engine processing module by the voice data to be identified
It is sent to the targeting engine processing module.
9. the method according to the description of claim 7 is characterized in that the engine resource scheduler module is according at least two engines
Before the state selection target engine processing module of processing module, the method also includes:
Preposition speech processing module sends engine acquisition request to the engine resource scheduler module, so as to the engine resource tune
Module is spent in response to the engine acquisition request, and mould is handled according to the state selection target engine of at least two engine processing modules
Block.
10. according to the method described in claim 8, it is characterized in that, targeting engine processing module carry out speech recognition before,
The method also includes:
The engine resource scheduler module obtains targeting engine model relevant to active user's account from account number correction verification module
Parameter information, and be sent to preposition speech processing module, so as to the preposition speech processing module according to the targeting engine at
The targeting engine model parameter information is sent to the targeting engine processing module by the address of reason module;
Correspondingly, the targeting engine processing module receives voice data to be identified, and carry out speech recognition, comprising:
The targeting engine processing module receives voice data to be identified, and according to the targeting engine model parameter information into
Row speech recognition;
Wherein, the account number correction verification module is for storing user information and engine model parameter relevant to user account number letter
Breath.
11. according to the method described in claim 8, it is characterized in that, the preposition speech processing module is drawn according to the target
The voice data to be identified is sent to the targeting engine processing module by the address for holding up processing module, comprising:
The preposition speech processing module receives the voice data to be identified that user sends and pre-processes;
The preposition speech processing module is according to the address of the targeting engine processing module, by the pretreated voice number
According to being sent to the targeting engine processing module.
12. according to the method described in claim 9, it is characterized in that, preposition speech reception module is dispatched to the engine resource
Before module sends engine acquisition request, the method also includes:
Account correction verification module verifies the user account for sending voice data to be processed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810758940.5A CN109036431A (en) | 2018-07-11 | 2018-07-11 | A kind of speech recognition system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810758940.5A CN109036431A (en) | 2018-07-11 | 2018-07-11 | A kind of speech recognition system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109036431A true CN109036431A (en) | 2018-12-18 |
Family
ID=64641126
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810758940.5A Pending CN109036431A (en) | 2018-07-11 | 2018-07-11 | A kind of speech recognition system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109036431A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110289016A (en) * | 2019-06-20 | 2019-09-27 | 深圳追一科技有限公司 | A kind of voice quality detecting method, device and electronic equipment based on actual conversation |
CN111460093A (en) * | 2020-03-16 | 2020-07-28 | 云知声智能科技股份有限公司 | Method, device and system for configuring multiple engines based on single voice input |
CN111862972A (en) * | 2020-07-08 | 2020-10-30 | 北京梧桐车联科技有限责任公司 | Voice interaction service method, device, equipment and storage medium |
CN113093596A (en) * | 2021-03-29 | 2021-07-09 | 北京金山云网络技术有限公司 | Control instruction processing method and device |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101677329A (en) * | 2008-09-18 | 2010-03-24 | 中兴通讯股份有限公司 | Comprehensive voice resource platform proxy server and its data processing method |
CN101903946A (en) * | 2007-12-21 | 2010-12-01 | Nvoq股份有限公司 | Distributed dictation/transcription system |
CN103458056A (en) * | 2013-09-24 | 2013-12-18 | 贵阳世纪恒通科技有限公司 | Speech intention judging method based on automatic classification technology for automatic outbound system |
JP2014010458A (en) * | 2012-06-27 | 2014-01-20 | Naver Corp | Music relevant information providing device and method by music recognition in television system, and computer readable recording medium |
CN103870411A (en) * | 2012-12-11 | 2014-06-18 | 三星电子株式会社 | Memory controller and memory system including the same |
CN104009991A (en) * | 2014-05-28 | 2014-08-27 | 广州华多网络科技有限公司 | Audio communication system and method |
CN104380258A (en) * | 2012-07-18 | 2015-02-25 | 英特尔公司 | Performing scheduling operations for graphics hardware |
CN105245607A (en) * | 2015-10-23 | 2016-01-13 | 中国联合网络通信集团有限公司 | Proxy server dynamic automatic selection method and system |
CN105528737A (en) * | 2015-12-15 | 2016-04-27 | 国网北京市电力公司 | Swap station data processing system, method and device |
CN105654954A (en) * | 2016-04-06 | 2016-06-08 | 普强信息技术(北京)有限公司 | Cloud voice recognition system and method |
CN107068145A (en) * | 2016-12-30 | 2017-08-18 | 中南大学 | Speech evaluating method and system |
-
2018
- 2018-07-11 CN CN201810758940.5A patent/CN109036431A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101903946A (en) * | 2007-12-21 | 2010-12-01 | Nvoq股份有限公司 | Distributed dictation/transcription system |
CN101677329A (en) * | 2008-09-18 | 2010-03-24 | 中兴通讯股份有限公司 | Comprehensive voice resource platform proxy server and its data processing method |
JP2014010458A (en) * | 2012-06-27 | 2014-01-20 | Naver Corp | Music relevant information providing device and method by music recognition in television system, and computer readable recording medium |
CN104380258A (en) * | 2012-07-18 | 2015-02-25 | 英特尔公司 | Performing scheduling operations for graphics hardware |
CN103870411A (en) * | 2012-12-11 | 2014-06-18 | 三星电子株式会社 | Memory controller and memory system including the same |
CN103458056A (en) * | 2013-09-24 | 2013-12-18 | 贵阳世纪恒通科技有限公司 | Speech intention judging method based on automatic classification technology for automatic outbound system |
CN104009991A (en) * | 2014-05-28 | 2014-08-27 | 广州华多网络科技有限公司 | Audio communication system and method |
CN105245607A (en) * | 2015-10-23 | 2016-01-13 | 中国联合网络通信集团有限公司 | Proxy server dynamic automatic selection method and system |
CN105528737A (en) * | 2015-12-15 | 2016-04-27 | 国网北京市电力公司 | Swap station data processing system, method and device |
CN105654954A (en) * | 2016-04-06 | 2016-06-08 | 普强信息技术(北京)有限公司 | Cloud voice recognition system and method |
CN107068145A (en) * | 2016-12-30 | 2017-08-18 | 中南大学 | Speech evaluating method and system |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110289016A (en) * | 2019-06-20 | 2019-09-27 | 深圳追一科技有限公司 | A kind of voice quality detecting method, device and electronic equipment based on actual conversation |
CN111460093A (en) * | 2020-03-16 | 2020-07-28 | 云知声智能科技股份有限公司 | Method, device and system for configuring multiple engines based on single voice input |
CN111862972A (en) * | 2020-07-08 | 2020-10-30 | 北京梧桐车联科技有限责任公司 | Voice interaction service method, device, equipment and storage medium |
CN111862972B (en) * | 2020-07-08 | 2023-11-14 | 北京梧桐车联科技有限责任公司 | Voice interaction service method, device, equipment and storage medium |
CN113093596A (en) * | 2021-03-29 | 2021-07-09 | 北京金山云网络技术有限公司 | Control instruction processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109036431A (en) | A kind of speech recognition system and method | |
CN106844198B (en) | Distributed dispatching automation test platform and method | |
CN109889551A (en) | A kind of method of the Internet of Things cloud platform of Intelligent hardware access | |
CN107645562A (en) | Data transmission processing method, device, equipment and system | |
CN102880503A (en) | Data analysis system and data analysis method | |
CN105205625A (en) | Employee attendance checking method and system | |
CN108604177A (en) | Sequence relevant data messages in the computer network environment of voice activation are integrated | |
CN113505520A (en) | Method, device and system for supporting heterogeneous federated learning | |
CN109729067A (en) | Voice punch card method, device, equipment and computer storage medium | |
CN104484167B (en) | Task processing method and device | |
CN109615138A (en) | A kind of work flow tuning method and device | |
CN109324815A (en) | Monitoring method, device, equipment and the computer readable storage medium of system upgrade | |
EP2690560B1 (en) | Method of benchmarking the behaviour of a replacement information system with the old system | |
CN111143404B (en) | Service processing method and device | |
CN109389306B (en) | User order synchronization method and device | |
CN109388482A (en) | Dispatching method, device and the storage medium of task | |
CN111641684A (en) | Method and system for adapting vehicle operation signal and remote control signal data | |
CN106502842A (en) | Data reconstruction method and system | |
CN113641455B (en) | Message conversion method and platform | |
CN114217875A (en) | Method, device and equipment for processing order and storage medium | |
CN201590836U (en) | Device for transmitting and processing massive data files | |
CN114186046A (en) | Information processing method, information processing apparatus, server, and storage medium | |
CN110764931B (en) | Processing method, system, equipment and storage medium for OTA website uploading certificate | |
CN115730780A (en) | OMS (operation management system) ticket overhauling system and method of comprehensive scheduling terminal | |
CN113778709A (en) | Interface calling method, device, server and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181218 |
|
RJ01 | Rejection of invention patent application after publication |