CN101452703A

CN101452703A - System for providing voice identification engine by utilizing network and method thereof

Info

Publication number: CN101452703A
Application number: CNA2007100774996A
Authority: CN
Inventors: 王瑞璋
Original assignee: Individual
Current assignee: Individual
Priority date: 2007-11-30
Filing date: 2007-11-30
Publication date: 2009-06-10

Abstract

The invention provides a system for providing a speech recognition engine by utilizing a network and a method thereof, wherein the system comprises a user login unit, a storage unit, a speech recognition engine generation unit and an engine download unit. The method comprises the following steps: a, recording the speech through a used device, transmitting the speech through the network and storing the speech in the storage; and b, producing the speaker-related speech recognition engine which is applicable to the device by utilizing the speech recognition engine generation unit according to the speech recorded by a user and the characteristic of the device. The system and the method effectively save the time of the user, and are convenient for the user to use the speaker-related speech recognition engine on different devices.

Description

A kind of system and method thereof of utilizing network that speech recognition engine is provided

Technical field

The present invention relates to a kind of System and method for that speech recognition engine is provided, particularly a kind of network that utilizes provides indivedual users System and method in multiple device person's related voice recognition engine that makes the term.

Background technology

The speech recognition technology is to operate various electronic installation more expediently for the user, for example desktop computer, notebook computer, mobile phone or personal digital assistant device etc.The user for example only need utilize speech input devices such as microphone, it can be given orders or instructions to change into literal or further change instruction into, is able to import more expediently or operate various electronic installation for the user.For example, with regard to the speech recognition technology, the user can adopt the oral account mode to write, or carries out phonetic dialing when using mobile phone.The speech recognition technology is not only for general bringing great convenience property of user, and is for special user such as for example physical handicaps or muscular atrophy etc., more precious.

Generally speaking, aspect the use of speech recognition technology, speech recognition engine can be divided into two kinds of patterns: a kind of for the speech recognition engine relevant with the language person, another kind then is and language person speech recognition engine independently.

At first, about with language person speech recognition engine aspect independently because it is the speech samples that pre-deposits a large amount of various language persons, therefore, the user can be through the process of this speech recognition engine of training, and can directly use.Yet, but though the step of its omission training speech recognition engine, because everyone pronunciation is all variant, as criterion, will cause using the accuracy rate speech recognition engine person relevant with the language person of speech recognition independently far below use with the language person with non-user's speech samples itself.

About the speech recognition engine aspect relevant with the language person, the user needs earlier this speech recognition engine to be trained or adjusted, and promptly need import this user's self sample voice, can bring into use this speech recognition engine.With the mobile phone speech dial feature is example, and the user must record the voice of himself earlier, for example receives words people name, then can bring into use.Though its speech recognition accuracy is higher, however very inconvenience on it uses.In other words, train the engine of the language person related voice identification of finishing in its employed electronic installation hard as the user, if when desire substitutes with new electronic installation, then must train the language person related voice recognition engine in the new replacing electronic installation again; Just be example with the mobile phone, after changing a new cell-phone, the user must be recorded in all voice data in the new cell-phone again again, to train the language person related voice recognition engine in it, uses for the user.

Because the generalization of electronic installation, each user may have multiple electronic installation simultaneously.As mentioned above, during person's related voice recognition engine that makes the term, the process that the user all must repeat to train at different types of electronic installation in valuable time of this not only empty consumption user, more will consume the interest of user for the speech recognition use.Otherwise, if can effectively solve the inconvenience of these person's related voice recognition engine of utilizing the language, or can make identification capability comparatively accurately language person related voice recognition engine utilize more popularly, and then promote the development of voice technology industry.

Summary of the invention

The technical problem to be solved in the present invention is to provide a kind of network that utilizes that can overcome above-mentioned prior art defective that the system and the method thereof of speech recognition engine are provided, the present invention can keep the high accuracy of the speech recognition engine relevant with the language person and not need to carry out earlier tediously long training work, effectively save user's time, the speech recognition engine that makes things convenient for the user person that on different device, makes the term to be correlated with.

The present invention also provides a kind of can utilize the long-term same user of collection of network speech samples on different device, and be used to produce the language person related voice recognition engine that does not need the user can on new equipment, use through training earlier, thereby continuous sharpening user is employed language person related voice recognition engine on each device, and makes its speech recognition engine more meet user's self demand.

The above-mentioned technical matters of the present invention is achieved through the following technical solutions: a kind of system that utilizes network that speech recognition engine is provided comprises:

Storage unit is used to store the voice that the user records on any record device; And

The speech recognition engine generation unit according to the voice that the user recorded and the characteristic of each record device, produces and is fit to the language person related voice recognition engine used on each record device.

In system provided by the present invention, comprise that also the user logins the unit, for the different users on the record device of use speech recognition, via the network accessing system.

In system provided by the present invention, also comprise the engine download unit, download speech recognition engine that each language person is correlated with to this corresponding recording device respectively, for the user so that the voice identification function that the term person is correlated with.

In system provided by the present invention, described record device is mobile phone, desktop computer, notebook computer or individual action aid.

In system provided by the present invention, employed network is internet, mobile telephone communications network network or fixed-phone communication network.

In system provided by the present invention, described speech recognition engine generation unit is to adjust technology via model training technology or model, according to the characteristic of this user's sound and this record device, be applicable to language person related voice recognition engine on this record device with generation.

A kind of method of utilizing network that speech recognition engine is provided may further comprise the steps:

A, user record its voice via employed record device, and all recording will transmit and be stored in the storage unit that is provided in the plateform system on the network through network; And

B, the speech recognition engine generation unit that utilizes on the network to be provided according to the voice that the user recorded and the record device characteristic of using, are fit to the required language person related voice recognition engine of this record device to produce.

In method provided by the present invention, before the described step a, also comprise step a1: the user uses record device, logins the unit via the user who is provided on the network, logins in this system.

In method provided by the present invention, behind the described step b, also comprise step c: the engine download unit that is used on the network to be provided sees through network for the user and the language person related voice recognition engine that is produced is downloaded and is installed in this record device uses.

In method provided by the present invention, described record device is mobile phone, desktop computer, notebook computer or personal digital assistant device.

In method provided by the present invention, described network is internet, mobile telephone communications network network or fixed-phone communication network.

In method provided by the present invention, the generation of the predicate person of institute related voice recognition engine is the characteristic according to this user's sound and this record device, adjusts technology via model training technology or model and produces.

The present invention relatively and prior art following advantage is arranged:

1, by the present invention, the user can be with its voice data storage on network, and when using different device, according to its voice data of storing and respectively the characteristic of this operative installations directly produce and be fit to the language person related voice recognition engine that this device respectively uses, and exempt puzzlement for the required training of arbitrary employed new equipment.

2, by the present invention, can gather user's speech samples for a long time, thus continuous sharpening user employed language person related voice recognition engine on each device, and make its speech recognition engine more meet indivedual users' demand.

3, by the present invention, voice identification system of the present invention can be established on the large-scale entry network site with network, so not only can be just in the long-term collection of user and utilize its audio document, to obtain more superior language person related voice recognition engine, also can allow the entry network site that is used obtain traveller steady in a long-term simultaneously, and enjoy its profit mutually.

Description of drawings

Fig. 1 is that first embodiment of the invention utilizes network that the system schematic of speech recognition engine is provided;

Fig. 2 is that second embodiment of the invention utilizes network that the system schematic of speech recognition engine is provided;

Fig. 3 is that second embodiment of the invention utilizes network that another synoptic diagram of speech recognition engine is provided;

Fig. 4 is that second embodiment of the invention utilizes network that another schematic block diagram of speech recognition engine is provided;

Fig. 5 is that the present invention utilizes network that the method flow diagram of speech recognition engine is provided.

Embodiment

The present invention is described in further detail below in conjunction with embodiment and accompanying drawing, but embodiments of the present invention are not limit at this.

Embodiment one

As shown in Figure 1, a kind of system that utilizes network that speech recognition engine is provided of the present invention is in the platform 1 that is erected on the network, and system comprises storage unit 20 and speech recognition engine generation unit 30; Wherein this storage unit 20 is to be used to store the voice that the user records by mobile phone 2; This speech recognition engine generation unit 30 then is the voice recorded according to the user and the characteristic of this mobile phone 2, produces language person related voice recognition engine, uses for this mobile phone 2.

Wherein, this speech recognition engine generation unit 30 is to adjust technology via model training technology or model, and according to this user's sound to produce language person related voice recognition engine; The speech recognition engine that is produced is the search comparison element that comprises element, the comparison sample that has trained and the Model Distinguish of extracting special little parameter from sound.In addition, the speech recognition for its generation can be suitable for the hardware environment of need consideration operative installations on this operative installations.

Embodiment two

As shown in Figure 2, a kind of system that utilizes network that speech recognition engine is provided of the present invention is in the platform 1 that is erected on the network, and this system comprises that the user logins unit 10, storage unit 20, speech recognition engine generation unit 30 and engine download unit 40.

Described user login unit 10 be used for for the different users when using different record device via the network accessing system; Said memory cells 20 is to be used to store the voice that the user is recorded on any record device, described recording is can distinguish to store the recording of indivedual users on different record devices categorizedly, and this recording is to see through network to transmit and be stored in this storage unit; Above-mentioned speech recognition engine generation unit 30 is the voice recorded according to the user and the characteristic of each record device, is fit to the language person related voice recognition engine used on each record device to produce; Above-mentioned engine download unit 40 then is to supply the user when each record device is downloaded, and produces the language person related voice recognition engine that is fit to this record device.

The mobile phone 2 that has a voice identification function when use is during as record device, the user can utilize network to login in the speech recognition engine generation of the present invention system via login unit 10, and the message receiving trap that utilizes mobile phone itself to be provided with is recorded user's voice, sees through network and uploads and be stored in this storage unit 20.The then voice of having recorded according to the user and the characteristic of mobile phone 2, the engine of person's related voice identification that above-mentioned speech recognition engine generation unit 30 can produce the language, and see through network download to user's mobile phone 2 via this download unit 40.

As shown in Figure 3, the user deposits its voice in the storage unit 20 of voice identification system via the mobile phone 2 that uses before earlier; When user's desire was operated other mobile phones 2 ', the user can login the information of new cell-phone 2 ' in the voice identification system of the present invention via the above-mentioned login of network utilisation unit 10.So, above-mentioned speech recognition engine generation unit 30 just can be according to the characteristic of previous voice of recording of user and new cell-phone 2 ' to produce the engine of the language person related voice identification that is suitable for new cell-phone 2 '.At last, see through network download to user's mobile phone 2 ' via download unit 40.Thus, the user can directly use the voice identification function of new cell-phone 2 ', and does not need earlier new cell-phone 2 ' to be carried out tediously long training work.Afterwards, the still sustainable network that sees through deposits the voice of recording on the new cell-phone 2 ' in the storage unit 20 in, accumulation user voice data storage amount, thereby the speech recognition engine that continuous sharpening speech recognition engine generation unit 30 is produced, significantly increase new cell-phone 2 ' and go up the rate of officiallying enroll that speech recognition is used, also can help sharpening to be used for the speech recognition engine that other install simultaneously.In addition, the previous operative installations of user and then the kind of employed other non-original operative installationss can be different people.

As shown in Figure 4, the user sets up related data using mobile phone 2 and mobile phone 2 ' to see through network, and the language material of being recorded uploaded is stored in the unit 20; Then, when user's desire on notebook computer 3 is used voice identification function, can see through network and set up the information of this notebook computer in login unit 10, above-mentioned speech recognition engine generation unit 30 just can be applicable to the engine of the language person related voice identification of notebook computer 3 according to the characteristic of the voice of previous mobile phone of recording 2 of user and mobile phone 2 ' and notebook computer 3 with generation; At last, see through network download to user's notebook computer 3 via download unit 40.Thus, the user can directly use the voice identification function of notebook computer 3, and does not need earlier notebook computer 3 to be carried out tediously long training work.Afterwards, the still sustainable network that sees through deposits the voice of recording on the notebook computer as in the storage unit 20, accumulation user voice data storage amount, the language person related voice recognition engine that continuous thus sharpening speech recognition engine generation unit 30 is produced, significantly increase the accuracy that speech recognition is used on the notebook computer 3, also can help sharpening to be used for the speech recognition engine that other install simultaneously.

As mentioned above, the language person related voice identification system that the present invention utilizes network to provide straddle mounting to put use, this system is erected in the platform 1 on the network, and this platform can be arranged on the large-scale entry network site, for example Google, Yahoo, Apple or MSN (Microsoft Network).Thus, the user can utilize cyberspace that these entry network sites provide and long-term its voice data of storage that continues expediently, and the usefulness of person's related voice recognition engine of constantly improving the language is in the hope of reaching the user mode of the best.

The present invention also provides a kind of network that utilizes that indivedual users are provided the method in multiple device person's related voice recognition engine that makes the term, and as shown in Figure 5, this method may further comprise the steps:

Step a1: the user uses any record device, via one of providing the user to login the unit on the network, logins in the system that is arranged on a platform on the network.

Step a, user record its voice via any employed record device, and all recording will transmit and be stored in the storage unit of the system of the network platform through network;

Step b, be used in the speech recognition engine generation unit that is provided on the network,, be fit to the required language person related voice recognition engine of this device to produce according to voice and the employed equipment energy characteristic that the user recorded.

Step c: the engine download unit that is used on the network to be provided sees through network for the user use in this device is downloaded and be installed in to the language person related voice recognition engine that is produced.

Above-mentioned recording transmits and is stored in the storage unit 20 on the network through network, and produces the speech recognition engine 30 that the language person is correlated with according to the characteristic of operative installations, still can proceed on same apparatus or different device.

In addition, in the system and method for the present invention, the device that the user uses can be electronic installations such as mobile phone, desktop computer, notebook computer or personal digital assistant device, and it is all the applicable scope of the present invention.Employed network then includes computer network, mobile telephone communications network network or fixed-phone communication network etc.

Described embodiment is a preferred implementation of the present invention; but embodiments of the present invention are not limited by the examples; other any do not deviate from change, the modification done under spirit of the present invention and the principle, substitutes, combination, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.

Claims

1, a kind of system that utilizes network that speech recognition engine is provided is characterized in that, comprising:

Storage unit is used to store the voice that the user records on record device; And

The speech recognition engine generation unit according to the voice that the user recorded and the characteristic of each record device, is used to produce and is fit to the language person related voice recognition engine used on each record device.

2, a kind of system that utilizes network that speech recognition engine is provided as claimed in claim 1 is characterized in that, comprises that also the user logins the unit, is used for for record device by the network accessing system.

3, a kind of system that utilizes network that speech recognition engine is provided as claimed in claim 1 or 2, it is characterized in that, also comprise the engine download unit, be used for downloading speech recognition engine that each language person is correlated with to this corresponding recording device respectively, so that the voice identification function that the term person is correlated with for the user.

4, a kind of system that utilizes network that speech recognition engine is provided as claimed in claim 1 is characterized in that, described record device is mobile phone, desktop computer, notebook computer or individual action aid.

5, a kind of system that utilizes network that speech recognition engine is provided as claimed in claim 2 is characterized in that, described network is internet, mobile telephone communications network network or fixed-phone communication network.

6, a kind of system that utilizes network that speech recognition engine is provided as claimed in claim 1, it is characterized in that, described speech recognition engine generation unit is to adjust technology via model training technology or model, according to the characteristic of this user's sound and this record device, be applicable to the language person related voice recognition engine of this record device with generation.

7, a kind of method of utilizing network that speech recognition engine is provided is characterized in that it may further comprise the steps:

A, user record its voice via employed record device, and all recording will transmit and be stored on the storage unit of network platform system through network; And

8, a kind of method of utilizing network that speech recognition engine is provided as claimed in claim 7 is characterized in that, before the described step a, also comprise step a1: the user uses record device, logins the unit via the user who is provided on the network, logins in this system.

9, as claim 7 or 8 described a kind of methods of utilizing network that speech recognition engine is provided, it is characterized in that, behind the described step b, also comprise step c:, the engine download unit that the user utilizes on the network to be provided is downloaded the language person related voice recognition engine that is produced by network, and this engine is installed in this record device uses.

10, a kind of method of utilizing network that speech recognition engine is provided as claimed in claim 7 is characterized in that, described record device is mobile phone, desktop computer, notebook computer or personal digital assistant device.

11, a kind of method of utilizing network that speech recognition engine is provided as claimed in claim 7 is characterized in that, described network is internet, mobile telephone communications network network or fixed-phone communication network.

12, a kind of method of utilizing network that speech recognition engine is provided as claimed in claim 7, it is characterized in that, the generation of the predicate person of institute related voice recognition engine is the characteristic according to this user's sound and this record device, adjusts technology via model training technology or model and produces.