CN105376429A

CN105376429A - Cloud computing based voice ability service open system

Info

Publication number: CN105376429A
Application number: CN201510815457.2A
Authority: CN
Inventors: 兰玉杰
Original assignee: SUZHOU INDUSTRIAL PARK YUNSHI INFORMATION TECHNOLOGY Co Ltd
Current assignee: SUZHOU INDUSTRIAL PARK YUNSHI INFORMATION TECHNOLOGY Co Ltd
Priority date: 2015-11-23
Filing date: 2015-11-23
Publication date: 2016-03-02
Anticipated expiration: 2035-11-23
Also published as: CN105376429B

Abstract

The invention discloses a cloud computing based voice ability service open system. The cloud computing based voice ability service open system is deployed to a cloud computing platform and sequentially comprises an application database layer which provides various application database resources required by voice services; a data service layer which is used for access, synchronization, verification and logic conversion of data; a service implementing layer which is used for implementing all the functional components related to a voice service ability; a service layer which is used for packaging all the issued voice service abilities; an enterprise service bus layer which is used for providing a service access function for the outside; a service process layer which is used for arranging multiple services according to the service process and providing a composite service and a process service by combining or arranging the services; and a user experience layer which is used for supporting access of clients in all types. The cloud computing based voice ability service open system realizes uniform management and output of the voice ability; and application for accessing a voice ability open platform is just required, so that the use threshold, use cost and development period of the voice ability are reduced.

Description

Based on the speech capability service open system of cloud computing

Technical field

the invention belongs to intelligent sound service field, be specifically related to a kind of speech capability based on cloud computing service open system.

Background technology

along with the continuous maturation of intelligent sound technology, the application of phonetic synthesis, Voice Navigation and Application on Voiceprint Recognition constantly increases.The scope of application of voice technology is comparatively extensive, needs IVR(self-assisted voice mutual for those) scene all applicable, as telecommunications industry, banking and insurance business industry etc.

phonetic synthesis realizes the transfer process of Text To Speech, produces artificial voice by special method.Technically, any text message (comprising word, letter, numeral etc.) can be converted into the massage voice reading of the people of standard out in real time.

voice Navigation belongs to online speech recognition category, by the efficient voice data of user to be imported in real time, this technology identifies that engine is decoded, and user's rear system of having spoken can return voice identification result in a short period of time.

application on Voiceprint Recognition, by proposing some certain speech parameters that can identify speaker's feature in speaker's voice, realizes the determination of user identity.

as shown in Figure 1, existing speech capability presentation mode is all respective stand-alone development, deployment, and each speech capability belongs to different ability platform, as will be with the use of between ability, must develop new interface and communicate.

under existing conditions, enterprise is to obtain required speech capability, and be all realized by independent system Construction, threshold is higher, and the construction cycle is long, and needs the cost dropping into great number.

existing voice platform technology realizes there is following defect:

1., from the construction of ability, existing techniques in realizing is all realize each ability by building independently ability platform, separated from one another between each ability.The multiple speech capability of needs with the use of in need each platform development interface to communicate.

2. the construction of existing voice platform, be all dispose realization based on traditional minicomputer or PCServer, cost of investment is higher, and resource utilization is not high.

3., from the use of ability, enterprise or individual, to obtain these speech capabilities, must be realized by independent system Construction.Like this to realize threshold higher, the construction cycle is long, and needs the cost dropping into great number.For individual common application, as individual application developer, have no idea to enjoy these speech capability services at all.

4., from the scope of application of ability, the scope that current speech capability uses is comparatively narrow, mostly is telecommunications and banking, much needs the industry using speech capability to use.

Summary of the invention

the unicity on speech capability, limitation is being realized in order to solve prior art, realize the development model that how many abilities just need to create how many cover application system, the present invention seeks to: provide a kind of speech capability based on cloud computing to serve open system, achieve unified management and the output of speech capability; Meanwhile, for the user that access uses speech capability, only need application access speech capability open platform, reduce the use threshold of speech capability, greatly reduce use cost and the construction cycle of user.

technical scheme of the present invention is:

a kind of service of the speech capability based on cloud computing open system, it is characterized in that, described Account Dept is deployed on cloud computing platform, comprises application data base layer, data service layer, business realizing layer, service layer, ESB layer, business process level and Consumer's Experience layer from top to bottom successively;

described application data base layer, for providing the database resource of the types of applications needed for speech business;

described data service layer, for the access of data, synchronous, checking and conversion logic;

described business realizing layer, for realizing the relevant all functions assembly of speech business ability, comprises phonetic synthesis Service Component, Voice Navigation Service Component and Application on Voiceprint Recognition Service Component;

described service layer, for encapsulating the speech business ability of all external issues, comprises phonetic synthesis service, Voice Navigation service and Application on Voiceprint Recognition service;

described ESB layer, for externally providing safe, reliable, high performance service access function;

described business process level, for multiple service is carried out layout according to operation flow, by the combination of serving or layout, provides composite service and flow services;

described Consumer's Experience layer, for supporting all types of client-access, comprises WEB mode, ad hoc mode.

preferably, described business realizing layer comprises phonetic synthesis Service Component, Voice Navigation Service Component and Application on Voiceprint Recognition Service Component.Phonetic synthesis Service Component, realizes the transfer process of " text-> voice ", any Word message is converted into the massage voice reading of standard smoothness out in real time; Voice Navigation assembly, realizes the efficient voice data of user to decode in real time, within the extremely short time, returns voice identification result; Application on Voiceprint Recognition assembly, by proposing the speech parameter of stating speaker's feature in speaker's voice, realizes the determination of user identity.

preferably, described service layer comprises phonetic synthesis service, Voice Navigation service and Application on Voiceprint Recognition service.Phonetic synthesis is served, and by the encapsulation to voice synthesis module function, external system is provided to the online word synthetic speech service of standard; Voice Navigation is served, and by the encapsulation to Voice Navigation assembly function, external system is provided to the online speech-recognition services of standard; Application on Voiceprint Recognition is served, and by the encapsulation to Application on Voiceprint Recognition assembly function, provides the voice parameter identification of standard to serve to external system.

preferably, described system is to Application on Voiceprint Recognition ability, Voice Navigation ability and phonetic synthesis ability carry out API encapsulation, external issue, described Application on Voiceprint Recognition ability comprises voiceprint registration, voice print verification and vocal print are nullified, described voiceprint registration is used for the vocal print registering specific user in systems in which, described voice print verification is according to the vocal print sample of input, determine whether the sound of specific user, described vocal print nullifies the vocal print for nullifying specific user, described Voice Navigation ability comprises startup voice ONLINE RECOGNITION, suspend voice ONLINE RECOGNITION, recover voice ONLINE RECOGNITION and stop voice ONLINE RECOGNITION, described phonetic synthesis ability is used for carrying out TTS playback to the text of input.

the invention also discloses a kind of method that user based on the above-mentioned speech capability based on cloud computing service open system applies for use ability API, it is characterized in that, comprise the steps:

(1) the Portal page that user is provided by speech capability service open system carries out application and development application, information of registered users;

(2) speech capability service open system management person to user apply create application audit, audit by rear, for user creates Application Certificate relevant information;

(3) application developer is according to the certificate that provides of speech capability service open system, required ability API is searched in speech capability service open system, the corresponding application of exploitation, and carry out uniting and adjustment test by the test environment that speech capability service open system provides;

(4) application and speech capability are served open system uniting and adjustment and are tested by rear, application developer submits application access examination & verification to, speech capability service open system management person carries out examination & verification assessment to the access security of application and performance index of correlation, audits by rear, will carry out the trial run of applying;

(5) speech capability service open system management person is according to business result of trial operation, determines whether this application can carry out issue of reaching the standard grade.

preferably, the user profile in described step (1) at least comprises: user name, contact method, Apply Names, application type.

compared with prior art, advantage of the present invention is:

1, intelligent sound correlation technique is realized carrying out capabilities, externally carry out the output of ability, by providing the mode of standard interface API open to enterprises and individuals user, greatly reduce the threshold using these abilities, the construction cycle reducing enterprises and individuals and cost.Meanwhile, this programme utilizes cloud computing technology, Account Dept is deployed in cloud platform, greatly reduces cost of investment, improves the utilization rate of existing computing power.

2, unified management and the output of speech capability is achieved; Meanwhile, for the user that access uses speech capability, only need application access speech capability open platform, reduce the use threshold of speech capability, greatly reduce use cost and the construction cycle of user.

3, the integration of handling capacity exports, and the scope of application of voice technology can be expanded greatly, not only for traditional telecommunications and banking, can also need the industry of carrying out personal identification, interactive voice service for all.

Accompanying drawing explanation

below in conjunction with drawings and Examples, the invention will be further described:

fig. 1 is existing voice ability presentation mode block diagram;

fig. 2 is the speech capability service open platform Organization Chart that the present invention is based on cloud computing;

fig. 3 is the flow chart that user of the present invention applies for use ability API.

Embodiment

for making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with embodiment also with reference to accompanying drawing, the present invention is described in more detail.Should be appreciated that, these describe just exemplary, and do not really want to limit the scope of the invention.In addition, in the following description, the description to known features and technology is eliminated, to avoid unnecessarily obscuring concept of the present invention.

embodiment:

fig. 2 is the Organization Chart of the speech capability service open system based on cloud computing, this system is disposed with on cloud computing platform, mainly be divided into seven layers, be followed successively by application data base layer, data service layer, business realizing layer, service layer, ESB(ESB from top to bottom) layer, business process level and Consumer's Experience layer.

1, application data base layer

types of applications required for speech business database resource is provided.

2, data service layer

data, services as a kind of special business service, encapsulates all business data, is responsible for the access of data, synchronous, checking and logic required for conversion.Data service layer effectively creates a level of abstraction, makes business function avoid the details of operation of data.

3, business realizing layer

the Component service comprised is issued with the form external disclosure of interface, realizes all functions assembly that speech business ability is relevant, comprises phonetic synthesis Service Component, Voice Navigation Service Component and Application on Voiceprint Recognition Service Component.

4, service layer

encapsulate the speech business ability of all external issues, comprise phonetic synthesis service, Voice Navigation service and Application on Voiceprint Recognition service.

5, ESB(ESB) layer

eSB (EnterpriseServiceBus, abbreviation ESB), is the backbone of Service-Oriented Architecture Based, completes the access of service, on the communication between service and interactive basis, also provides fail safe, reliability, high performance service ability guarantee.Adopt SOA framework, Enterprise information integration is carried out based on ESB bus, mutual between application system is undertaken by bus, the degree of coupling of application system, each assembly and correlation technique can be reduced like this, eliminate the point-to-point integrated bottleneck of application system, reduce Integrated Development difficulty, improve multiplexing, promote system development and operational efficiency, be convenient to operation system and reconstruct flexibly, quick adaptation business and flow change needs.

6, business process level

multiple service is carried out layout according to operation flow, by the combination/layout to service, composite service that realize various complexity, that need and flow services.

7, Consumer's Experience layer

consumer's Experience layer provides support all types of client-access, comprises WEB mode, ad hoc mode and other modes.

speech capability service open system possesses the functions such as Capability promulgation, Capacity Management, ability access, safety management, performance and abnormal monitoring.

ability open system provides following ability API for external call, the service application that user (enterprise or individual) can realize required for it according to these API.

application on Voiceprint Recognition ability API comprises:

voiceprint registration, for registering the vocal print of specific user in systems in which;

voice print verification, according to the vocal print sample of input, determines whether the sound of specific user;

vocal print is nullified, and nullifies the vocal print of specific user.

voice Navigation ability API comprises:

start voice ONLINE RECOGNITION, for opening voice navigation feature;

suspend voice ONLINE RECOGNITION, for suspending speech navigation function;

recover voice ONLINE RECOGNITION, for recovering speech navigation function;

stop voice ONLINE RECOGNITION, for stopping speech navigation function.

phonetic synthesis ability API comprises:

phonetic synthesis playback, carries out TTS playback to the particular text of input.

below in conjunction with Fig. 3, user is applied for that the workflow of use ability API is described in detail:

1. the Portal page that enterprise/personal development person uses speech capability service open system to provide carries out application and development application, need information of registered users in this system, fill in necessary field (as user name, contact method, Apply Names, application type etc.).

2. speech capability service open system management person to user apply create application audit, audit by rear, for user creates Application Certificate relevant information.

3. the certificate that provides according to system of application developer, the ability API required for system searching, exploitation meets the application of self needss, and carries out uniting and adjustment test by the test environment that speech capability is served open system and provided.

4. application and speech capability serves open system uniting and adjustment and tests by rear, and application developer can be submitted to application to access and audit.The keeper of speech capability service open system carries out examination & verification assessment to the access security of application and performance index of correlation.Examination & verification, by rear, will carry out the trial run of applying.

5. speech capability service open system management person is according to business result of trial operation, determines whether this application can carry out issue of reaching the standard grade.If this is applied in the trial run stage, indices meets relevant regulations, then formally to reach the standard grade issue to this application; If this is applied in the trial run stage index against regulation, then notify that application developer is modified optimization.

achieve unified management and the output of speech capability; Meanwhile, for the user that access uses speech capability, only need application access speech capability open platform, reduce the use threshold of speech capability, greatly reduce use cost and the construction cycle of user.

should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims

1. the service of the speech capability based on a cloud computing open system, it is characterized in that, described Account Dept is deployed on cloud computing platform, comprises application data base layer, data service layer, business realizing layer, service layer, ESB layer, business process level and Consumer's Experience layer from top to bottom successively;

Described application data base layer, for providing the database resource of the types of applications needed for speech business;

Described data service layer, for the access of data, synchronous, checking and conversion logic;

Described ESB layer, for externally providing service access function;

Described business process level, for multiple service is carried out layout according to operation flow, by the combination of serving or layout, provides composite service and flow services;

Described Consumer's Experience layer, for supporting all types of client-access, comprises WEB mode, ad hoc mode.

2. the service of the speech capability based on cloud computing open system according to claim 1, it is characterized in that, described phonetic synthesis Service Component, for being converted into voice messaging in real time by Word message; Described Voice Navigation assembly, for the efficient voice data real-time decoding by user, returns voice identification result; Described Application on Voiceprint Recognition assembly, by extracting speech parameter characteristic in voice, realizes the determination of user identity.

3. the service of the speech capability based on cloud computing open system according to claim 1, is characterized in that, described phonetic synthesis service, by the encapsulation to voice synthesis module function, external system is provided to the online word synthetic speech service of standard; Described Voice Navigation service, by the encapsulation to Voice Navigation assembly function, provides the online speech-recognition services of standard to external system; Described Application on Voiceprint Recognition service, by the encapsulation to Application on Voiceprint Recognition assembly function, provides the voice parameter identification of standard to serve to external system.

4. the service of the speech capability based on cloud computing open system according to claim 1, it is characterized in that, described system is to Application on Voiceprint Recognition ability, Voice Navigation ability and phonetic synthesis ability carry out API encapsulation, external issue, described Application on Voiceprint Recognition ability comprises voiceprint registration, voice print verification and vocal print are nullified, described voiceprint registration is used for the vocal print registering specific user in systems in which, described voice print verification is according to the vocal print sample of input, determine whether the sound of specific user, described vocal print nullifies the vocal print for nullifying specific user, described Voice Navigation ability comprises startup voice ONLINE RECOGNITION, suspend voice ONLINE RECOGNITION, recover voice ONLINE RECOGNITION and stop voice ONLINE RECOGNITION, described phonetic synthesis ability is used for carrying out TTS playback to the text of input.

5. apply for a method of use ability API based on the user of the speech capability based on cloud computing according to claim 4 service open system, it is characterized in that, comprise the steps:

(1) the Portal page that user is provided by speech capability service open system carries out application and development application, information of registered users;

(2) speech capability service open system management person to user apply create application audit, audit by rear, for user creates Application Certificate relevant information;

(5) speech capability service open system management person is according to business result of trial operation, determines whether this application can carry out issue of reaching the standard grade.

6. user according to claim 5 applies for the method for use ability API, it is characterized in that, the user profile in described step (1) at least comprises: user name, contact method, Apply Names, application type.