CN109036431A - A kind of speech recognition system and method - Google Patents

A kind of speech recognition system and method Download PDF

Info

Publication number
CN109036431A
CN109036431A CN201810758940.5A CN201810758940A CN109036431A CN 109036431 A CN109036431 A CN 109036431A CN 201810758940 A CN201810758940 A CN 201810758940A CN 109036431 A CN109036431 A CN 109036431A
Authority
CN
China
Prior art keywords
processing module
engine
module
targeting
preposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810758940.5A
Other languages
Chinese (zh)
Inventor
余启洪
柳青
宋征轩
张海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Intelligent Housekeeper Technology Co Ltd
Original Assignee
Beijing Intelligent Housekeeper Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Intelligent Housekeeper Technology Co Ltd filed Critical Beijing Intelligent Housekeeper Technology Co Ltd
Priority to CN201810758940.5A priority Critical patent/CN109036431A/en
Publication of CN109036431A publication Critical patent/CN109036431A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services

Abstract

The embodiment of the invention discloses a kind of speech recognition system and method, the system comprises: engine resource scheduler module and at least two engine processing modules;Wherein, the engine resource scheduler module is used for the state selection target engine processing module according at least two engines processing module;Targeting engine processing module in at least two engines processing module is used to carry out speech recognition to the voice data received.Scheduling due to engine resource scheduler module to engine processing module, the engine processing module being in idle condition can be allowed to carry out speech recognition, avoid resource contention caused by when multiple voice data are handled as an engine processing module simultaneously, the problem of reducing recognition efficiency, the discrimination and service efficiency for promoting speech recognition engine in high concurrent are realized, while removing the somewhat complex design of engine processing module identification multi-path voice.

Description

A kind of speech recognition system and method
Technical field
The present invention relates to Internet technical field more particularly to a kind of speech recognition systems and method.
Background technique
Speech recognition technology, also referred to as automatic speech recognition (ASR, Automatic Speech Recognition), It is vocabulary Content Transformation in the voice by the mankind is computer-readable input.Currently, speech recognition technology becomes artificial One more common technology of smart field.
Speech recognition system in the prior art as a whole, incorporates all application logics, not only to handle The identification of voice data will also handle the concurrently access and various engine parameters of multichannel data, the dynamic configuration etc. of model Deng such benefit is to can be convenient deployment, and structure is relatively easy.But since system is excessively huge, maintenance upgrade is inconvenient. Meanwhile speech recognition engine can only single channel identify voice data, pass through multithreading support identification engine synchronization handle multichannel Audio data be it is extremely complex, multiple threads multi-path audio-frequency data is easy to appear bug, and implement it is extremely difficult, because This, the reliability and efficiency of speech processes are all low.
Summary of the invention
The present invention provides a kind of speech recognition system and method, and speech recognition engine can be promoted in high concurrent Discrimination and service efficiency.
In a first aspect, the embodiment of the invention provides a kind of speech recognition system, the system comprises: engine resource scheduling Module and at least two engine processing modules;
Wherein, the engine resource scheduler module is used to select mesh according to the state of at least two engines processing module Processing module is held up in index;
The targeting engine processing module in at least two engines processing module is used for the voice number received According to progress speech recognition.
Optionally, the system also includes preposition speech processing module, for receiving the voice data of user's transmission and right The voice data is pre-processed, and pretreated voice data is sent to the targeting engine processing module.
Optionally, the engine resource scheduler module is also used to:
After selection target engine processing module, the address of the targeting engine processing module is sent to described preposition Voice module;
Correspondingly, the preposition speech processing module is specifically used for: will according to the address of the targeting engine processing module Pretreated voice data is sent to the targeting engine processing module.
Optionally, the system also includes account number correction verification module, for storing user information and related to user account number Engine model parameter information.
Optionally, the engine resource scheduler module is also used to: acquisition and active user from the account number correction verification module The relevant targeting engine model parameter information of account, and it is sent to preposition speech processing module;
Correspondingly, be also used to will be described according to the address of the targeting engine processing module for the preposition speech processing module Targeting engine model parameter information is sent to the targeting engine processing module;
Correspondingly, the targeting engine processing module is also used to receive according to the targeting engine model parameter information butt joint Voice data carry out speech recognition.
Optionally, the account number correction verification module is also used to carry out account to the user account for sending the voice data to test Card.
Optionally, the system also includes system-monitoring modules, for the engine resource scheduler module and at least two The operating status of a engine processing module is monitored.
Optionally, the system also includes: proxy modules, the voice data for sending user are sent to institute State preposition speech processing module.Second aspect, the embodiment of the invention also provides a kind of audio recognition method, the method packets It includes:
Engine resource scheduler module is according to the state selection target engine processing modules of at least two engine processing modules;
The targeting engine processing module receives voice data to be identified, and carries out speech recognition.
Optionally, before targeting engine processing module receives voice data to be identified, the method also includes:
The address of the targeting engine processing module is sent to preposition speech processes mould by the engine resource scheduler module Block;
The preposition speech processing module is according to the address of the targeting engine processing module by the voice to be identified Data are sent to the targeting engine processing module.
Optionally, the engine resource scheduler module is according to the state selection target engines of at least two engine processing modules Before processing module, the method also includes:
Preposition speech processing module sends engine acquisition request to the engine resource scheduler module, so as to engine money Source scheduler module is in response to the engine acquisition request, at the state selection target engine of at least two engine processing modules Manage module.
Optionally, before targeting engine processing module carries out speech recognition, the method also includes:
The engine resource scheduler module obtains targeting engine relevant to active user's account from account number correction verification module Model parameter information, and it is sent to preposition speech processing module, so that the preposition speech processing module is drawn according to the target The targeting engine model parameter information is sent to the targeting engine processing module by the address for holding up processing module;
Correspondingly, the targeting engine processing module receives voice data to be identified, and carry out speech recognition, comprising:
The targeting engine processing module receives voice data to be identified, and believes according to the targeting engine model parameter Breath carries out speech recognition;
Wherein, the account number correction verification module is for storing user information and engine model parameter relevant to user account number Information.
Optionally, the preposition speech processing module will be described to be identified according to the address of the targeting engine processing module Voice data be sent to the targeting engine processing module, comprising:
The preposition speech processing module receives the voice data to be identified that user sends and pre-processes;
The preposition speech processing module is according to the address of the targeting engine processing module, by the pretreated language Sound data are sent to the targeting engine processing module.
Optionally, before preposition speech reception module sends engine acquisition request to the engine resource scheduler module, institute State method further include:
Account correction verification module verifies the user account for sending voice data to be processed.
Optionally, the method also includes:
System-monitoring module to the operating status of the engine resource scheduler module and at least two engine processing modules into Row monitoring.
Optionally, the preposition speech processing module receives the voice data to be identified of user's transmission and pre-processes, Include:
The preposition speech processing module receives the voice data to be identified of user's transmission by proxy server and carries out Pretreatment.A kind of speech recognition system disclosed by the embodiments of the present invention includes: engine resource scheduler module and at least two engines Processing module;Wherein, the engine resource scheduler module is used to be selected according to the state of at least two engines processing module Targeting engine processing module;Targeting engine processing module in at least two engines processing module is used for the language received Sound data carry out speech recognition.Since engine resource scheduler module can dispatch engine processing module, allows and be in idle condition Engine processing module carry out current speech identification, when avoiding multiple voice data and being handled simultaneously by engine processing module Caused by resource contention, reduce recognition efficiency the problem of, realize in high concurrent promoted speech recognition engine identification Rate and service efficiency, while removing the somewhat complex design of engine processing module identification multi-path voice.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of one of the embodiment of the present invention one speech recognition system.
Fig. 2 is the structural schematic diagram of one of the embodiment of the present invention two speech recognition system.
Fig. 3 is the structural schematic diagram of one of the embodiment of the present invention three speech recognition system.
Fig. 4 is the structural schematic diagram of another speech recognition system in the embodiment of the present invention three.
Fig. 5 is the flow chart of one of the embodiment of the present invention four audio recognition method.
Fig. 6 is the flow chart of another audio recognition method in the embodiment of the present invention four.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of structural schematic diagram for speech recognition system that the embodiment of the present invention one provides, and the present embodiment is applicable In the speech recognition the case where, as shown in Figure 1, the speech recognition system includes: engine resource scheduler module (Scheduling Server) 110 and at least two engine processing modules (Engine Server) 120;Wherein, the engine resource scheduler module 110 for the state selection target engine processing module according at least two engines processing module 120, wherein at engine The state for managing module may include: operating status, it is preferred that can be available mode, for example engine processing module is in the free time When state or non-operating state, then it can be considered as can choose it as targeting engine processing module in available mode;Institute The targeting engine processing module at least two engine processing modules 120 is stated to be used to carry out voice knowledge to the voice data received Not.
Wherein, engine processing module 120 can have automatic speech recognition technical functionality, to carry out language to voice data Sound identification, for example voice messaging can be identified and be converted into text information.Automatic speech recognition technology (ASR, Automatic Speech Recognition) it is a kind of technology that the voice of people is converted to text.Due to the diversity of voice signal and multiple Polygamy, thus, it is ensured that accurately identifying for voice data also can be very high to speech recognition system requirement.Engine processing module 120 Can have in speech recognition system in embodiments of the present invention it is multiple, specifically can according to business need be designed.In general, It, accordingly can be more with the number of design engine processing module 120 if portfolio is very big.Engine processing module 120 it Between can independently dispose, respectively operate under whole system cluster, also can be convenient and manage and maintain.Specifically, at engine Reason module 120 can be the multiple network nodes being distributed in voice system, can carry out voice knowledge to the voice data received Not.Illustratively, 200 network nodes can be set as 200 independent engine processing modules.
Engine resource scheduler module 110 can carry out rationally effective adjusting and be surveyed to the resource of engine processing module 120 It measures and analyzes and uses.Illustratively, if preceding 100 engine processing modules in 200 engine processing modules are in work State (such as carrying out voice recognition processing), then 100 are in inoperative shape after engine resource scheduler module can dispatch Some in the idle engine processing module of state or multiple identifying processings for carrying out current speech data.Wherein, targeting engine Processing module can be to be used to execute currently to language by what engine resource scheduler module 110 was selected in engine processing module 120 The engine processing module of the identifying processing of sound data.Specifically, engine resource scheduler module 110 can be according to engine processing module Whether 120 be in idle condition, and chooses whether as targeting engine processing module.Therefore, the targeting engine processing selected Module is to be in idle condition, and can carry out identifying processing to current voice data to be identified in time.Such technical solution, Avoid in the same time handle multi-path voice data complexity, and caused by resource contention will affect speech recognition standard The problem of true rate and efficiency decline, improves the utilization rate of audio identification efficiency and engine processing module resource.
Specifically, engine processing module 120 can be divided into high level engine processing according to the computing capability of engine processing module Module, the engine processing module of intermediate engine processing module and low level data amount.Illustratively, the voice data greater than 1G can be with Handled by high level engine processing module, greater than 500M less than 1G voice data can be by intermediate engine processing module at Reason, the voice data less than 500M can be handled by low level engine processing module.Of course, it is possible to need to be arranged not according to business The engine processing module of ad eundem.
In addition, the advantage that data in different formats can also be handled according to engine processing module of engine processing module 120 into Row classification.Illustratively, classify according to the audio processings ability such as WMA, MP3, WAV, RA and MIDI.In order to engine resource Scheduler module, which is combined, selects suitable engine processing module according to the data of voice to be identified, to handle corresponding audio data, Processing capacity can be improved in this way, it is effective and reasonable to utilize engine resource, improve the accuracy rate of speech recognition and the benefit of engine resource With rate.
Optionally, the system also includes system-monitoring modules (Monitor Server), for the engine resource The operating status of scheduler module and at least two engine processing modules is monitored, to be responsible for the entire speech recognition system of monitoring The health examination of operating status, such as some engine processing module such as are gone offline or are stopped working at the abnormal behaviours, can and When the operation such as recognize and safeguarded and corrected, guarantee the normal work of speech recognition system.
The technical solution of the present embodiment, the scheduling due to engine resource scheduler module to engine processing module can allow place Speech recognition is carried out in the engine processing module of idle state, avoids multiple voice data simultaneously by an engine processing module It the problem of resource contention caused by when processing, reduction recognition efficiency, realizes and promotes speech recognition engine in high concurrent Discrimination and service efficiency, while remove engine processing module identification multi-path voice somewhat complex design.
Embodiment two
Fig. 2 is a kind of speech recognition system provided in the embodiment of the present invention two, on the basis of the above embodiments, optional , the speech recognition system further include: preposition speech processing module (Business Server) 130, as shown in Fig. 2, before described Speech processing module 130 is set, for receiving the voice data of user's transmission and being pre-processed to the voice data, will be located in advance Voice data after reason is sent to the targeting engine processing module.Since engine processing module 120 undertakes speech recognition and meter The effect of calculation, therefore, if there is needing to receive mass data in the short time, it is possible that the unbearable situation of interface, The problem of causing server crash or network blockage.Therefore, the preposition speech processing module 130 provided in the embodiment of the present invention It can have the effect of data acquisition server, the voice data received can first be located in advance before speech recognition Reason, shares the calculating pressure of engine processing module 120.Wherein, pretreatment can be in targeting engine processing module to receiving Voice data carries out the preposition processing before speech recognition, such as the identification of the format including audio, size, the filtering of audio with And the operation such as encoding and decoding.The calculation amount that also can reduce engine processing module in this way, further increases recognition efficiency.
Optionally, the engine resource scheduler module 110 is also used to: after selection target engine processing module, by institute The address for stating targeting engine processing module is sent to the preposition voice module 130;
Correspondingly, the preposition speech processing module 130 is specifically used for: according to the address of the targeting engine processing module Pretreated voice data is sent to the targeting engine processing module.
Illustratively, preposition speech processing module 130 can be provided when receiving the voice data of user's transmission to engine Source scheduler module 110 sends corresponding engine acquisition request, wherein engine acquisition request can carry pending identification audio number According to size and the information such as format, so that engine resource scheduler module 110 can be according to the ability of different engine processing modules 120 And available mode selects targeting engine processing module, specifically, targeting engine can be handled mould by engine scheduling module 110 The address of block is sent to preposition speech processing module 130, and preposition speech processing module 130 can be pre-processed according to the address Audio data afterwards is sent to targeting engine processing module.Engine resource more reasonably can be distributed and be used in this way, sound is improved The utilization rate of frequency recognition efficiency and engine processing module.
Optionally, the system also includes: proxy modules, the voice data for sending user are sent to institute State preposition speech processing module.Wherein, proxy server (Proxy Server) can be used for connecting Internet (Internet) and local area network (Local Area Network) it is, a kind of special network service, allows a network terminal (generally client) passes through this proxy modules and another network terminal (generally server, such as the present invention Speech recognition system in embodiment) carry out indirect connection.Illustratively, the client of user first with proxy server Creation connection, then according to agency agreement used in proxy server, request creates connection to destination server or obtains The specified resource of destination server.User's checking and book keeping operation function can be set using proxy server, to the power of different user Limit is configured and manages, and can also increase buffer, improves access speed etc..
Optionally, the voice system can also include: license server module (License Server), wherein The application vendor of some applications or server is how definition application and license server work together to track software Use and permit and the licensing standard suggested.License server can assign engine processing module as user's operation in this way Ability.
The technical solution of the embodiment of the present invention can be right by disposing the preposition speech processing module of speech recognition system The voice data for receiving user's transmission pre-processes, and can mitigate the calculation amount of targeting engine processing module, further increases Add the efficiency of speech recognition and the utilization rate of engine modules, using cluster resource and hardware cost can be saved to greatest extent. It can be convenient the connection of user and speech recognition system by proxy modules, be also conducive to developer to the pipe of user account Reason.Meanwhile the somewhat complex design of engine processing module identification multi-path voice is removed, distributed deployment can be in later maintenance and adjustment Optimize voice system or carries out reducing difficulty and complexity during testing on line.
Embodiment three
Fig. 3 is a kind of structural schematic diagram for speech recognition system that the embodiment of the present invention three provides, in above-described embodiment On the basis of, optionally, the system also includes: account number correction verification module (Profile Server) 140, for transmission institute's predicate The user account of sound data carries out account verifying.Account can be user using speech recognition system in embodiments of the present invention System carries out identity when speech recognition.Account verifying (Auth, authenticate) can refer to complete by certain means The confirmation of pairs of user identity, to confirm whether it can carry out speech recognition, such as by confirming whether user is to have existed User or member of related web site registration etc..Illustratively, the verifying of account can be carried out by identifying code or password, verifying is logical Later, then the process of subsequent speech recognition can be carried out.Certainly, in the present embodiment, the step of account being verified and preposition language Sound processing module is not construed as limiting the sequencing for the pre-treatment step that user sends voice data.
Optionally, the account correction verification module 140 is also used to store user information and engine relevant to user account number Model parameter information.Wherein, user information may include the account of user, the time of login, speech recognition quantity or other Historical data etc. and user in speech recognition system relevant personal information.Engine model parameter letter relevant to user account Breath can be record user speech relevant to user account and identify relevant personalizing parameters information.Illustratively, the user Sound it is more special, for example the frequency of sounding may have a difference with ordinary people's sound frequency, therefore common engine handles mould Block may not identify not come out, and the configuration file of exclusive voice messaging can be arranged for the voice data of the user at this time, wherein Relevant engine model parameter information can be recorded in configuration file, to help engine processing module to be accurately identified.Certainly, Relevant engine model parameter information can also refer specifically to the parameter information of engine processing module associated with the user, exemplary , the account that certain user can be set is associated with certain particular engine processing modules, i.e., specific engine can be used in he Processing module carries out speech recognition, such as user A the 100th engine processing module of specific use.
Optionally, the engine resource scheduler module 110 be also used to obtain from the account correction verification module 140 with it is current The relevant targeting engine model parameter information of user account, and it is sent to preposition speech processing module 130, correspondingly, before described Speech processing module 130 is set to be also used to be believed the targeting engine model parameter according to the address of the targeting engine processing module Breath is sent to the targeting engine processing module;Correspondingly, the targeting engine processing module is also used to draw according to the target It holds up the voice data that model parameter information butt joint receives and carries out speech recognition.
Wherein, engine resource scheduler module 110 gets the relevant mesh of active user's account in account correction verification module 140 Model parameter information is held up in index, can be sent to preposition speech processing module 130.Correspondingly, preposition speech processing module 130 can Targeting engine model parameter information relevant to active user's account is sent to targeting engine processing module, so that target is drawn Hold up processing module further can more accurately identify the voice data of user according to targeting engine model parameter information.
Illustratively, Fig. 4 is the structural schematic diagram of another voice system provided in an embodiment of the present invention, as shown in figure 4, The system includes: proxy modules, preposition speech processing module, engine processing module, engine resource scheduler module, account Authentication module and system-monitoring module.Specifically, before the voice data that user sends can be sent to by proxy modules Set speech processing module;Preposition speech processing module can be carried out after voice data pretreatment by account authentication module Account verifying;After being verified, engine acquisition request is sent to engine resource scheduler module;Engine resource scheduler module can be with According to the available mode selection target engine processing module of engine processing module, speech processing module preposition in this way can be by voice Data are sent to targeting engine processing module and carry out speech recognition.Optionally, engine resource scheduler module can be simultaneously from account Engine model parameter information relevant to user account is obtained in authentication module, is back to preposition speech processing module;Preposition language It can be sent to targeting engine processing module with voice data by sound processing module jointly, to realize that more accurate voice is known Not.In the process, system-monitoring module can monitor the state of modules operation at any time, guarantee timely maintenance, realize language The normal work of sound identifying system.
The technical solution of the embodiment of the present invention can be to transmission by disposing the account correction verification module of speech recognition system The user of voice data carries out account verifying, in addition, storing engine model ginseng relevant to user account in account correction verification module Number information, can help engine processing module to accurately identify the voice data of individual user, improve audio identification efficiency With the utilization rate of engine modules, using cluster resource and hardware cost can be saved to greatest extent.Meanwhile removing engine processing Module identifies the somewhat complex design of multi-path voice, during testing on later maintenance and adjusting and optimizing voice system or progress line Reduce difficulty and complexity.
Example IV
Fig. 5 is a kind of flow chart for audio recognition method that the embodiment of the present invention four provides, and this method can be adapted for language The case where sound identifies can be executed with speech recognition system.As shown in figure 5, this method comprises:
S510, engine resource scheduler module handle mould according to the state selection target engine of at least two engine processing modules Block.
Wherein, the state of engine processing module may include: operating status, specifically can be available mode, such as engine Processing module is in idle condition or when non-operating state, then can be considered as can choose it as mesh in available mode Processing module is held up in index.Illustratively, whether can be at the identification for carrying out voice data according to engine processing module To determine whether being in operating status during reason, if being not on operating status, it is believed that it is available, it can It is on the contrary, then it is assumed that its is unavailable to be selected as the processing that targeting engine processing module executes current voice data to be identified.
S520, the targeting engine processing module receive voice data to be identified, and carry out speech recognition.
Optionally, before targeting engine processing module receives voice data to be identified, the method also includes:
The address of the targeting engine processing module is sent to preposition speech processes mould by the engine resource scheduler module Block;
The preposition speech processing module is according to the address of the targeting engine processing module by the voice to be identified Data are sent to the targeting engine processing module.
Optionally, engine resource scheduler module is handled according to the state selection target engine of at least two engine processing modules Before module, the method also includes:
Preposition speech processing module sends engine acquisition request to the engine resource scheduler module, so as to engine money Source scheduler module is in response to the engine acquisition request, at the state selection target engine of at least two engine processing modules Manage module.
Optionally, described before targeting engine processing module carries out speech recognition, the method also includes:
Engine resource scheduler module obtains targeting engine relevant to active user's account from configuration file setup module Model parameter information, and it is sent to preposition speech processing module, so that the preposition speech processing module is drawn according to the target The targeting engine model parameter information is sent to the targeting engine processing module by the address for holding up processing module;Correspondingly, The targeting engine processing module receives voice data to be identified, and carries out speech recognition, comprising:
The targeting engine processing module receives voice data to be identified, and believes according to the targeting engine model parameter Breath carries out speech recognition;
Wherein, account correction verification module, for storing user information and engine model parameter relevant to user account number letter Breath.
Optionally, the preposition speech processing module will be described to be identified according to the address of the targeting engine processing module Voice data be sent to the targeting engine processing module, comprising:
The preposition speech processing module receives the voice data to be identified that user sends and pre-processes;
The preposition speech processing module is according to the address of the targeting engine processing module, by the pretreatment
Voice data afterwards is sent to the targeting engine processing module.
Optionally, before preposition speech reception module sends engine acquisition request to the engine resource scheduler module, institute State method further include: account correction verification module verifies the user account for sending voice data to be processed.Wherein, account is verified Can in no particular order with preprocessing process of the preposition speech processing module to voice data, that is, it can be first pre- to voice data Processing, can also first carry out verifying the account of user, carry out the pretreatment of voice data after being verified again.
Optionally, the method also includes: system-monitoring module draws the engine resource scheduler module and at least two The operating status for holding up processing module is monitored.
Optionally, the preposition speech processing module receives the voice data to be identified of user's transmission and pre-processes, Include:
The preposition speech processing module receives the voice data to be identified of user's transmission by proxy server and carries out Pretreatment.In the present embodiment, the function or workflow of modules and relevant explanation may refer to of the invention any real The description in one of example speech recognition system is applied, details are not described herein.
Illustratively, Fig. 6 is the flow chart for another audio recognition method that the embodiment of the present invention four provides, in Fig. 6 For step (1) to the process of step (10), to illustrate the audio recognition method in the present invention.Step (1), user is first carried out It sends voice data (send audio) and arrives preposition speech processing module (Business Server);Further execute step (2), preposition speech processing module sends account checking request (auth) to account authentication module (Profile Server);Step (3) if, account authentication module send account be verified information (ok) to the preposition processing module of voice;Then follow the steps (4), Preposition speech processing module sends engine acquisition request (get engine address) to engine resource scheduler module (Scheduling Server).Engine resource scheduler module can draw according to the available mode selection target of engine processing module Hold up processing module.Optionally, it may be performed simultaneously step (5), engine resource scheduler module to send and use to account correction verification module The relevant engine model parameter information of family account (get profile);And step (6), account correction verification module draw relevant It holds up model parameter information and is back to engine resource scheduler module (return profile);Then step (7), engine can be executed The address of targeting engine processing module and engine model parameter information relevant to account can be sent to by scheduling of resource module Preposition speech processing module (return engine address and profile info).Further execute step (8), preceding It sets engine processing module and voice data to be identified is sent to by targeting engine according to the address of the targeting engine processing module Processing module (Asr-Engine).Optionally, this voice data can be by the preposition pretreated language of engine processing module Sound data, after targeting engine processing module completes voice data to be identified identification;Step (9) can be executed, by voice Recognition result is back to preposition speech processing module (return asr result).Further, step (10), preposition are executed The text information (asr text) of speech recognition can be sent to user by speech processing module.
The technical solution of the embodiment of the present invention can be such that engine handles by disposing the modules of speech recognition system Module concentrates one's energy to handle phonetic recognization rate problem to greatest extent, engineering issues is removed from engine processing module, sufficiently Using the function of other modules in system, the utilization rate of audio identification efficiency and engine modules can be improved, it can be with maximum limit Degree utilizes cluster resource and saves hardware cost.Meanwhile in later maintenance and adjusting and optimizing voice system or survey on line Difficulty and complexity are reduced during examination.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (12)

1. a kind of speech recognition system, which is characterized in that the system comprises: engine resource scheduler module and at least two engines Processing module;
Wherein, the engine resource scheduler module according to the state selection target of at least two engines processing module for drawing Hold up processing module;
The targeting engine processing module in at least two engines processing module be used for the voice data received into Row speech recognition.
2. system according to claim 1, which is characterized in that the system also includes: preposition speech processing module is used for It receives the voice data that user sends and the voice data is pre-processed, pretreated voice data is sent to institute State targeting engine processing module.
3. system according to claim 2, which is characterized in that the engine resource scheduler module is also used to:
After selection target engine processing module, the address of the targeting engine processing module is sent to the preposition voice Module;
Correspondingly, the preposition speech processing module is specifically used for: will be located in advance according to the address of the targeting engine processing module Voice data after reason is sent to the targeting engine processing module.
4. system according to claim 3, which is characterized in that the system also includes: account number correction verification module, for storing User information and engine model parameter information relevant to user account number.
5. system according to claim 4, which is characterized in that the engine resource scheduler module is also used to: from the account Targeting engine model parameter information relevant to active user's account is obtained in number correction verification module, and is sent to preposition speech processes Module;
Correspondingly, the preposition speech processing module is also used to the target according to the address of the targeting engine processing module Engine model parameter information is sent to the targeting engine processing module;
Correspondingly, the targeting engine processing module is also used to the language received according to the targeting engine model parameter information butt joint Sound data carry out speech recognition.
6. system according to claim 1, which is characterized in that the account number correction verification module is also used to transmission institute's predicate The user account of sound data carries out account verifying.
7. a kind of audio recognition method, which is characterized in that the described method includes:
Engine resource scheduler module is according to the state selection target engine processing modules of at least two engine processing modules;
The targeting engine processing module receives voice data to be identified, and carries out speech recognition.
8. the method according to the description of claim 7 is characterized in that receiving voice number to be identified in targeting engine processing module According to before, the method also includes:
The address of the targeting engine processing module is sent to preposition speech processing module by the engine resource scheduler module;
The preposition speech processing module is according to the address of the targeting engine processing module by the voice data to be identified It is sent to the targeting engine processing module.
9. the method according to the description of claim 7 is characterized in that the engine resource scheduler module is according at least two engines Before the state selection target engine processing module of processing module, the method also includes:
Preposition speech processing module sends engine acquisition request to the engine resource scheduler module, so as to the engine resource tune Module is spent in response to the engine acquisition request, and mould is handled according to the state selection target engine of at least two engine processing modules Block.
10. according to the method described in claim 8, it is characterized in that, targeting engine processing module carry out speech recognition before, The method also includes:
The engine resource scheduler module obtains targeting engine model relevant to active user's account from account number correction verification module Parameter information, and be sent to preposition speech processing module, so as to the preposition speech processing module according to the targeting engine at The targeting engine model parameter information is sent to the targeting engine processing module by the address of reason module;
Correspondingly, the targeting engine processing module receives voice data to be identified, and carry out speech recognition, comprising:
The targeting engine processing module receives voice data to be identified, and according to the targeting engine model parameter information into Row speech recognition;
Wherein, the account number correction verification module is for storing user information and engine model parameter relevant to user account number letter Breath.
11. according to the method described in claim 8, it is characterized in that, the preposition speech processing module is drawn according to the target The voice data to be identified is sent to the targeting engine processing module by the address for holding up processing module, comprising:
The preposition speech processing module receives the voice data to be identified that user sends and pre-processes;
The preposition speech processing module is according to the address of the targeting engine processing module, by the pretreated voice number According to being sent to the targeting engine processing module.
12. according to the method described in claim 9, it is characterized in that, preposition speech reception module is dispatched to the engine resource Before module sends engine acquisition request, the method also includes:
Account correction verification module verifies the user account for sending voice data to be processed.
CN201810758940.5A 2018-07-11 2018-07-11 A kind of speech recognition system and method Pending CN109036431A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810758940.5A CN109036431A (en) 2018-07-11 2018-07-11 A kind of speech recognition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810758940.5A CN109036431A (en) 2018-07-11 2018-07-11 A kind of speech recognition system and method

Publications (1)

Publication Number Publication Date
CN109036431A true CN109036431A (en) 2018-12-18

Family

ID=64641126

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810758940.5A Pending CN109036431A (en) 2018-07-11 2018-07-11 A kind of speech recognition system and method

Country Status (1)

Country Link
CN (1) CN109036431A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110289016A (en) * 2019-06-20 2019-09-27 深圳追一科技有限公司 A kind of voice quality detecting method, device and electronic equipment based on actual conversation
CN111460093A (en) * 2020-03-16 2020-07-28 云知声智能科技股份有限公司 Method, device and system for configuring multiple engines based on single voice input
CN111862972A (en) * 2020-07-08 2020-10-30 北京梧桐车联科技有限责任公司 Voice interaction service method, device, equipment and storage medium
CN113093596A (en) * 2021-03-29 2021-07-09 北京金山云网络技术有限公司 Control instruction processing method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101677329A (en) * 2008-09-18 2010-03-24 中兴通讯股份有限公司 Comprehensive voice resource platform proxy server and its data processing method
CN101903946A (en) * 2007-12-21 2010-12-01 Nvoq股份有限公司 Distributed dictation/transcription system
CN103458056A (en) * 2013-09-24 2013-12-18 贵阳世纪恒通科技有限公司 Speech intention judging method based on automatic classification technology for automatic outbound system
JP2014010458A (en) * 2012-06-27 2014-01-20 Naver Corp Music relevant information providing device and method by music recognition in television system, and computer readable recording medium
CN103870411A (en) * 2012-12-11 2014-06-18 三星电子株式会社 Memory controller and memory system including the same
CN104009991A (en) * 2014-05-28 2014-08-27 广州华多网络科技有限公司 Audio communication system and method
CN104380258A (en) * 2012-07-18 2015-02-25 英特尔公司 Performing scheduling operations for graphics hardware
CN105245607A (en) * 2015-10-23 2016-01-13 中国联合网络通信集团有限公司 Proxy server dynamic automatic selection method and system
CN105528737A (en) * 2015-12-15 2016-04-27 国网北京市电力公司 Swap station data processing system, method and device
CN105654954A (en) * 2016-04-06 2016-06-08 普强信息技术(北京)有限公司 Cloud voice recognition system and method
CN107068145A (en) * 2016-12-30 2017-08-18 中南大学 Speech evaluating method and system

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101903946A (en) * 2007-12-21 2010-12-01 Nvoq股份有限公司 Distributed dictation/transcription system
CN101677329A (en) * 2008-09-18 2010-03-24 中兴通讯股份有限公司 Comprehensive voice resource platform proxy server and its data processing method
JP2014010458A (en) * 2012-06-27 2014-01-20 Naver Corp Music relevant information providing device and method by music recognition in television system, and computer readable recording medium
CN104380258A (en) * 2012-07-18 2015-02-25 英特尔公司 Performing scheduling operations for graphics hardware
CN103870411A (en) * 2012-12-11 2014-06-18 三星电子株式会社 Memory controller and memory system including the same
CN103458056A (en) * 2013-09-24 2013-12-18 贵阳世纪恒通科技有限公司 Speech intention judging method based on automatic classification technology for automatic outbound system
CN104009991A (en) * 2014-05-28 2014-08-27 广州华多网络科技有限公司 Audio communication system and method
CN105245607A (en) * 2015-10-23 2016-01-13 中国联合网络通信集团有限公司 Proxy server dynamic automatic selection method and system
CN105528737A (en) * 2015-12-15 2016-04-27 国网北京市电力公司 Swap station data processing system, method and device
CN105654954A (en) * 2016-04-06 2016-06-08 普强信息技术(北京)有限公司 Cloud voice recognition system and method
CN107068145A (en) * 2016-12-30 2017-08-18 中南大学 Speech evaluating method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110289016A (en) * 2019-06-20 2019-09-27 深圳追一科技有限公司 A kind of voice quality detecting method, device and electronic equipment based on actual conversation
CN111460093A (en) * 2020-03-16 2020-07-28 云知声智能科技股份有限公司 Method, device and system for configuring multiple engines based on single voice input
CN111862972A (en) * 2020-07-08 2020-10-30 北京梧桐车联科技有限责任公司 Voice interaction service method, device, equipment and storage medium
CN111862972B (en) * 2020-07-08 2023-11-14 北京梧桐车联科技有限责任公司 Voice interaction service method, device, equipment and storage medium
CN113093596A (en) * 2021-03-29 2021-07-09 北京金山云网络技术有限公司 Control instruction processing method and device

Similar Documents

Publication Publication Date Title
CN109036431A (en) A kind of speech recognition system and method
CN106844198B (en) Distributed dispatching automation test platform and method
CN109889551A (en) A kind of method of the Internet of Things cloud platform of Intelligent hardware access
CN107645562A (en) Data transmission processing method, device, equipment and system
CN102880503A (en) Data analysis system and data analysis method
CN105205625A (en) Employee attendance checking method and system
CN108604177A (en) Sequence relevant data messages in the computer network environment of voice activation are integrated
CN113505520A (en) Method, device and system for supporting heterogeneous federated learning
CN109729067A (en) Voice punch card method, device, equipment and computer storage medium
CN104484167B (en) Task processing method and device
CN109615138A (en) A kind of work flow tuning method and device
CN109324815A (en) Monitoring method, device, equipment and the computer readable storage medium of system upgrade
EP2690560B1 (en) Method of benchmarking the behaviour of a replacement information system with the old system
CN111143404B (en) Service processing method and device
CN109389306B (en) User order synchronization method and device
CN109388482A (en) Dispatching method, device and the storage medium of task
CN111641684A (en) Method and system for adapting vehicle operation signal and remote control signal data
CN106502842A (en) Data reconstruction method and system
CN113641455B (en) Message conversion method and platform
CN114217875A (en) Method, device and equipment for processing order and storage medium
CN201590836U (en) Device for transmitting and processing massive data files
CN114186046A (en) Information processing method, information processing apparatus, server, and storage medium
CN110764931B (en) Processing method, system, equipment and storage medium for OTA website uploading certificate
CN115730780A (en) OMS (operation management system) ticket overhauling system and method of comprehensive scheduling terminal
CN113778709A (en) Interface calling method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181218

RJ01 Rejection of invention patent application after publication