CN105376429A - Cloud computing based voice ability service open system - Google Patents

Cloud computing based voice ability service open system Download PDF

Info

Publication number
CN105376429A
CN105376429A CN201510815457.2A CN201510815457A CN105376429A CN 105376429 A CN105376429 A CN 105376429A CN 201510815457 A CN201510815457 A CN 201510815457A CN 105376429 A CN105376429 A CN 105376429A
Authority
CN
China
Prior art keywords
service
voice
application
ability
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510815457.2A
Other languages
Chinese (zh)
Other versions
CN105376429B (en
Inventor
兰玉杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SUZHOU INDUSTRIAL PARK YUNSHI INFORMATION TECHNOLOGY Co Ltd
Original Assignee
SUZHOU INDUSTRIAL PARK YUNSHI INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SUZHOU INDUSTRIAL PARK YUNSHI INFORMATION TECHNOLOGY Co Ltd filed Critical SUZHOU INDUSTRIAL PARK YUNSHI INFORMATION TECHNOLOGY Co Ltd
Priority to CN201510815457.2A priority Critical patent/CN105376429B/en
Publication of CN105376429A publication Critical patent/CN105376429A/en
Application granted granted Critical
Publication of CN105376429B publication Critical patent/CN105376429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42204Arrangements at the exchange for service or number selection by voice

Abstract

The invention discloses a cloud computing based voice ability service open system. The cloud computing based voice ability service open system is deployed to a cloud computing platform and sequentially comprises an application database layer which provides various application database resources required by voice services; a data service layer which is used for access, synchronization, verification and logic conversion of data; a service implementing layer which is used for implementing all the functional components related to a voice service ability; a service layer which is used for packaging all the issued voice service abilities; an enterprise service bus layer which is used for providing a service access function for the outside; a service process layer which is used for arranging multiple services according to the service process and providing a composite service and a process service by combining or arranging the services; and a user experience layer which is used for supporting access of clients in all types. The cloud computing based voice ability service open system realizes uniform management and output of the voice ability; and application for accessing a voice ability open platform is just required, so that the use threshold, use cost and development period of the voice ability are reduced.

Description

Based on the speech capability service open system of cloud computing
Technical field
the invention belongs to intelligent sound service field, be specifically related to a kind of speech capability based on cloud computing service open system.
Background technology
along with the continuous maturation of intelligent sound technology, the application of phonetic synthesis, Voice Navigation and Application on Voiceprint Recognition constantly increases.The scope of application of voice technology is comparatively extensive, needs IVR(self-assisted voice mutual for those) scene all applicable, as telecommunications industry, banking and insurance business industry etc.
phonetic synthesis realizes the transfer process of Text To Speech, produces artificial voice by special method.Technically, any text message (comprising word, letter, numeral etc.) can be converted into the massage voice reading of the people of standard out in real time.
voice Navigation belongs to online speech recognition category, by the efficient voice data of user to be imported in real time, this technology identifies that engine is decoded, and user's rear system of having spoken can return voice identification result in a short period of time.
application on Voiceprint Recognition, by proposing some certain speech parameters that can identify speaker's feature in speaker's voice, realizes the determination of user identity.
as shown in Figure 1, existing speech capability presentation mode is all respective stand-alone development, deployment, and each speech capability belongs to different ability platform, as will be with the use of between ability, must develop new interface and communicate.
under existing conditions, enterprise is to obtain required speech capability, and be all realized by independent system Construction, threshold is higher, and the construction cycle is long, and needs the cost dropping into great number.
existing voice platform technology realizes there is following defect:
1., from the construction of ability, existing techniques in realizing is all realize each ability by building independently ability platform, separated from one another between each ability.The multiple speech capability of needs with the use of in need each platform development interface to communicate.
2. the construction of existing voice platform, be all dispose realization based on traditional minicomputer or PCServer, cost of investment is higher, and resource utilization is not high.
3., from the use of ability, enterprise or individual, to obtain these speech capabilities, must be realized by independent system Construction.Like this to realize threshold higher, the construction cycle is long, and needs the cost dropping into great number.For individual common application, as individual application developer, have no idea to enjoy these speech capability services at all.
4., from the scope of application of ability, the scope that current speech capability uses is comparatively narrow, mostly is telecommunications and banking, much needs the industry using speech capability to use.
Summary of the invention
the unicity on speech capability, limitation is being realized in order to solve prior art, realize the development model that how many abilities just need to create how many cover application system, the present invention seeks to: provide a kind of speech capability based on cloud computing to serve open system, achieve unified management and the output of speech capability; Meanwhile, for the user that access uses speech capability, only need application access speech capability open platform, reduce the use threshold of speech capability, greatly reduce use cost and the construction cycle of user.
technical scheme of the present invention is:
a kind of service of the speech capability based on cloud computing open system, it is characterized in that, described Account Dept is deployed on cloud computing platform, comprises application data base layer, data service layer, business realizing layer, service layer, ESB layer, business process level and Consumer's Experience layer from top to bottom successively;
described application data base layer, for providing the database resource of the types of applications needed for speech business;
described data service layer, for the access of data, synchronous, checking and conversion logic;
described business realizing layer, for realizing the relevant all functions assembly of speech business ability, comprises phonetic synthesis Service Component, Voice Navigation Service Component and Application on Voiceprint Recognition Service Component;
described service layer, for encapsulating the speech business ability of all external issues, comprises phonetic synthesis service, Voice Navigation service and Application on Voiceprint Recognition service;
described ESB layer, for externally providing safe, reliable, high performance service access function;
described business process level, for multiple service is carried out layout according to operation flow, by the combination of serving or layout, provides composite service and flow services;
described Consumer's Experience layer, for supporting all types of client-access, comprises WEB mode, ad hoc mode.
preferably, described business realizing layer comprises phonetic synthesis Service Component, Voice Navigation Service Component and Application on Voiceprint Recognition Service Component.Phonetic synthesis Service Component, realizes the transfer process of " text-> voice ", any Word message is converted into the massage voice reading of standard smoothness out in real time; Voice Navigation assembly, realizes the efficient voice data of user to decode in real time, within the extremely short time, returns voice identification result; Application on Voiceprint Recognition assembly, by proposing the speech parameter of stating speaker's feature in speaker's voice, realizes the determination of user identity.
preferably, described service layer comprises phonetic synthesis service, Voice Navigation service and Application on Voiceprint Recognition service.Phonetic synthesis is served, and by the encapsulation to voice synthesis module function, external system is provided to the online word synthetic speech service of standard; Voice Navigation is served, and by the encapsulation to Voice Navigation assembly function, external system is provided to the online speech-recognition services of standard; Application on Voiceprint Recognition is served, and by the encapsulation to Application on Voiceprint Recognition assembly function, provides the voice parameter identification of standard to serve to external system.
preferably, described system is to Application on Voiceprint Recognition ability, Voice Navigation ability and phonetic synthesis ability carry out API encapsulation, external issue, described Application on Voiceprint Recognition ability comprises voiceprint registration, voice print verification and vocal print are nullified, described voiceprint registration is used for the vocal print registering specific user in systems in which, described voice print verification is according to the vocal print sample of input, determine whether the sound of specific user, described vocal print nullifies the vocal print for nullifying specific user, described Voice Navigation ability comprises startup voice ONLINE RECOGNITION, suspend voice ONLINE RECOGNITION, recover voice ONLINE RECOGNITION and stop voice ONLINE RECOGNITION, described phonetic synthesis ability is used for carrying out TTS playback to the text of input.
the invention also discloses a kind of method that user based on the above-mentioned speech capability based on cloud computing service open system applies for use ability API, it is characterized in that, comprise the steps:
(1) the Portal page that user is provided by speech capability service open system carries out application and development application, information of registered users;
(2) speech capability service open system management person to user apply create application audit, audit by rear, for user creates Application Certificate relevant information;
(3) application developer is according to the certificate that provides of speech capability service open system, required ability API is searched in speech capability service open system, the corresponding application of exploitation, and carry out uniting and adjustment test by the test environment that speech capability service open system provides;
(4) application and speech capability are served open system uniting and adjustment and are tested by rear, application developer submits application access examination & verification to, speech capability service open system management person carries out examination & verification assessment to the access security of application and performance index of correlation, audits by rear, will carry out the trial run of applying;
(5) speech capability service open system management person is according to business result of trial operation, determines whether this application can carry out issue of reaching the standard grade.
preferably, the user profile in described step (1) at least comprises: user name, contact method, Apply Names, application type.
compared with prior art, advantage of the present invention is:
1, intelligent sound correlation technique is realized carrying out capabilities, externally carry out the output of ability, by providing the mode of standard interface API open to enterprises and individuals user, greatly reduce the threshold using these abilities, the construction cycle reducing enterprises and individuals and cost.Meanwhile, this programme utilizes cloud computing technology, Account Dept is deployed in cloud platform, greatly reduces cost of investment, improves the utilization rate of existing computing power.
2, unified management and the output of speech capability is achieved; Meanwhile, for the user that access uses speech capability, only need application access speech capability open platform, reduce the use threshold of speech capability, greatly reduce use cost and the construction cycle of user.
3, the integration of handling capacity exports, and the scope of application of voice technology can be expanded greatly, not only for traditional telecommunications and banking, can also need the industry of carrying out personal identification, interactive voice service for all.
Accompanying drawing explanation
below in conjunction with drawings and Examples, the invention will be further described:
fig. 1 is existing voice ability presentation mode block diagram;
fig. 2 is the speech capability service open platform Organization Chart that the present invention is based on cloud computing;
fig. 3 is the flow chart that user of the present invention applies for use ability API.
Embodiment
for making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with embodiment also with reference to accompanying drawing, the present invention is described in more detail.Should be appreciated that, these describe just exemplary, and do not really want to limit the scope of the invention.In addition, in the following description, the description to known features and technology is eliminated, to avoid unnecessarily obscuring concept of the present invention.
embodiment:
fig. 2 is the Organization Chart of the speech capability service open system based on cloud computing, this system is disposed with on cloud computing platform, mainly be divided into seven layers, be followed successively by application data base layer, data service layer, business realizing layer, service layer, ESB(ESB from top to bottom) layer, business process level and Consumer's Experience layer.
1, application data base layer
types of applications required for speech business database resource is provided.
2, data service layer
data, services as a kind of special business service, encapsulates all business data, is responsible for the access of data, synchronous, checking and logic required for conversion.Data service layer effectively creates a level of abstraction, makes business function avoid the details of operation of data.
3, business realizing layer
the Component service comprised is issued with the form external disclosure of interface, realizes all functions assembly that speech business ability is relevant, comprises phonetic synthesis Service Component, Voice Navigation Service Component and Application on Voiceprint Recognition Service Component.
4, service layer
encapsulate the speech business ability of all external issues, comprise phonetic synthesis service, Voice Navigation service and Application on Voiceprint Recognition service.
5, ESB(ESB) layer
eSB (EnterpriseServiceBus, abbreviation ESB), is the backbone of Service-Oriented Architecture Based, completes the access of service, on the communication between service and interactive basis, also provides fail safe, reliability, high performance service ability guarantee.Adopt SOA framework, Enterprise information integration is carried out based on ESB bus, mutual between application system is undertaken by bus, the degree of coupling of application system, each assembly and correlation technique can be reduced like this, eliminate the point-to-point integrated bottleneck of application system, reduce Integrated Development difficulty, improve multiplexing, promote system development and operational efficiency, be convenient to operation system and reconstruct flexibly, quick adaptation business and flow change needs.
6, business process level
multiple service is carried out layout according to operation flow, by the combination/layout to service, composite service that realize various complexity, that need and flow services.
7, Consumer's Experience layer
consumer's Experience layer provides support all types of client-access, comprises WEB mode, ad hoc mode and other modes.
speech capability service open system possesses the functions such as Capability promulgation, Capacity Management, ability access, safety management, performance and abnormal monitoring.
ability open system provides following ability API for external call, the service application that user (enterprise or individual) can realize required for it according to these API.
application on Voiceprint Recognition ability API comprises:
voiceprint registration, for registering the vocal print of specific user in systems in which;
voice print verification, according to the vocal print sample of input, determines whether the sound of specific user;
vocal print is nullified, and nullifies the vocal print of specific user.
voice Navigation ability API comprises:
start voice ONLINE RECOGNITION, for opening voice navigation feature;
suspend voice ONLINE RECOGNITION, for suspending speech navigation function;
recover voice ONLINE RECOGNITION, for recovering speech navigation function;
stop voice ONLINE RECOGNITION, for stopping speech navigation function.
phonetic synthesis ability API comprises:
phonetic synthesis playback, carries out TTS playback to the particular text of input.
below in conjunction with Fig. 3, user is applied for that the workflow of use ability API is described in detail:
1. the Portal page that enterprise/personal development person uses speech capability service open system to provide carries out application and development application, need information of registered users in this system, fill in necessary field (as user name, contact method, Apply Names, application type etc.).
2. speech capability service open system management person to user apply create application audit, audit by rear, for user creates Application Certificate relevant information.
3. the certificate that provides according to system of application developer, the ability API required for system searching, exploitation meets the application of self needss, and carries out uniting and adjustment test by the test environment that speech capability is served open system and provided.
4. application and speech capability serves open system uniting and adjustment and tests by rear, and application developer can be submitted to application to access and audit.The keeper of speech capability service open system carries out examination & verification assessment to the access security of application and performance index of correlation.Examination & verification, by rear, will carry out the trial run of applying.
5. speech capability service open system management person is according to business result of trial operation, determines whether this application can carry out issue of reaching the standard grade.If this is applied in the trial run stage, indices meets relevant regulations, then formally to reach the standard grade issue to this application; If this is applied in the trial run stage index against regulation, then notify that application developer is modified optimization.
achieve unified management and the output of speech capability; Meanwhile, for the user that access uses speech capability, only need application access speech capability open platform, reduce the use threshold of speech capability, greatly reduce use cost and the construction cycle of user.
should be understood that, above-mentioned embodiment of the present invention only for exemplary illustration or explain principle of the present invention, and is not construed as limiting the invention.Therefore, any amendment made when without departing from the spirit and scope of the present invention, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.In addition, claims of the present invention be intended to contain fall into claims scope and border or this scope and border equivalents in whole change and modification.

Claims (6)

1. the service of the speech capability based on a cloud computing open system, it is characterized in that, described Account Dept is deployed on cloud computing platform, comprises application data base layer, data service layer, business realizing layer, service layer, ESB layer, business process level and Consumer's Experience layer from top to bottom successively;
Described application data base layer, for providing the database resource of the types of applications needed for speech business;
Described data service layer, for the access of data, synchronous, checking and conversion logic;
Described business realizing layer, for realizing the relevant all functions assembly of speech business ability, comprises phonetic synthesis Service Component, Voice Navigation Service Component and Application on Voiceprint Recognition Service Component;
Described service layer, for encapsulating the speech business ability of all external issues, comprises phonetic synthesis service, Voice Navigation service and Application on Voiceprint Recognition service;
Described ESB layer, for externally providing service access function;
Described business process level, for multiple service is carried out layout according to operation flow, by the combination of serving or layout, provides composite service and flow services;
Described Consumer's Experience layer, for supporting all types of client-access, comprises WEB mode, ad hoc mode.
2. the service of the speech capability based on cloud computing open system according to claim 1, it is characterized in that, described phonetic synthesis Service Component, for being converted into voice messaging in real time by Word message; Described Voice Navigation assembly, for the efficient voice data real-time decoding by user, returns voice identification result; Described Application on Voiceprint Recognition assembly, by extracting speech parameter characteristic in voice, realizes the determination of user identity.
3. the service of the speech capability based on cloud computing open system according to claim 1, is characterized in that, described phonetic synthesis service, by the encapsulation to voice synthesis module function, external system is provided to the online word synthetic speech service of standard; Described Voice Navigation service, by the encapsulation to Voice Navigation assembly function, provides the online speech-recognition services of standard to external system; Described Application on Voiceprint Recognition service, by the encapsulation to Application on Voiceprint Recognition assembly function, provides the voice parameter identification of standard to serve to external system.
4. the service of the speech capability based on cloud computing open system according to claim 1, it is characterized in that, described system is to Application on Voiceprint Recognition ability, Voice Navigation ability and phonetic synthesis ability carry out API encapsulation, external issue, described Application on Voiceprint Recognition ability comprises voiceprint registration, voice print verification and vocal print are nullified, described voiceprint registration is used for the vocal print registering specific user in systems in which, described voice print verification is according to the vocal print sample of input, determine whether the sound of specific user, described vocal print nullifies the vocal print for nullifying specific user, described Voice Navigation ability comprises startup voice ONLINE RECOGNITION, suspend voice ONLINE RECOGNITION, recover voice ONLINE RECOGNITION and stop voice ONLINE RECOGNITION, described phonetic synthesis ability is used for carrying out TTS playback to the text of input.
5. apply for a method of use ability API based on the user of the speech capability based on cloud computing according to claim 4 service open system, it is characterized in that, comprise the steps:
(1) the Portal page that user is provided by speech capability service open system carries out application and development application, information of registered users;
(2) speech capability service open system management person to user apply create application audit, audit by rear, for user creates Application Certificate relevant information;
(3) application developer is according to the certificate that provides of speech capability service open system, required ability API is searched in speech capability service open system, the corresponding application of exploitation, and carry out uniting and adjustment test by the test environment that speech capability service open system provides;
(4) application and speech capability are served open system uniting and adjustment and are tested by rear, application developer submits application access examination & verification to, speech capability service open system management person carries out examination & verification assessment to the access security of application and performance index of correlation, audits by rear, will carry out the trial run of applying;
(5) speech capability service open system management person is according to business result of trial operation, determines whether this application can carry out issue of reaching the standard grade.
6. user according to claim 5 applies for the method for use ability API, it is characterized in that, the user profile in described step (1) at least comprises: user name, contact method, Apply Names, application type.
CN201510815457.2A 2015-11-23 2015-11-23 Speech capability based on cloud computing services open system Active CN105376429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510815457.2A CN105376429B (en) 2015-11-23 2015-11-23 Speech capability based on cloud computing services open system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510815457.2A CN105376429B (en) 2015-11-23 2015-11-23 Speech capability based on cloud computing services open system

Publications (2)

Publication Number Publication Date
CN105376429A true CN105376429A (en) 2016-03-02
CN105376429B CN105376429B (en) 2018-08-31

Family

ID=55378217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510815457.2A Active CN105376429B (en) 2015-11-23 2015-11-23 Speech capability based on cloud computing services open system

Country Status (1)

Country Link
CN (1) CN105376429B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133701A (en) * 2017-12-25 2018-06-08 江苏木盟智能科技有限公司 A kind of System and method for of robot voice interaction
CN109410922A (en) * 2018-10-09 2019-03-01 苏州思必驰信息科技有限公司 Resource preprocess method and system for voice dialogue platform
CN111355699A (en) * 2018-12-24 2020-06-30 中移(杭州)信息技术有限公司 Voice capability implementation system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1630419A (en) * 2003-12-19 2005-06-22 国际商业机器公司 Method and system for service management
US20100050150A1 (en) * 2002-06-14 2010-02-25 Apptera, Inc. Method and System for Developing Speech Applications
CN102917000A (en) * 2012-07-17 2013-02-06 上海语联信息技术有限公司 Intelligent cloud voice application service technology platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100050150A1 (en) * 2002-06-14 2010-02-25 Apptera, Inc. Method and System for Developing Speech Applications
CN1630419A (en) * 2003-12-19 2005-06-22 国际商业机器公司 Method and system for service management
CN102917000A (en) * 2012-07-17 2013-02-06 上海语联信息技术有限公司 Intelligent cloud voice application service technology platform

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133701A (en) * 2017-12-25 2018-06-08 江苏木盟智能科技有限公司 A kind of System and method for of robot voice interaction
CN108133701B (en) * 2017-12-25 2021-11-12 江苏木盟智能科技有限公司 System and method for robot voice interaction
CN109410922A (en) * 2018-10-09 2019-03-01 苏州思必驰信息科技有限公司 Resource preprocess method and system for voice dialogue platform
CN111355699A (en) * 2018-12-24 2020-06-30 中移(杭州)信息技术有限公司 Voice capability implementation system
CN111355699B (en) * 2018-12-24 2022-08-05 中移(杭州)信息技术有限公司 Voice capability implementation system

Also Published As

Publication number Publication date
CN105376429B (en) 2018-08-31

Similar Documents

Publication Publication Date Title
KR102279121B1 (en) System for securing a personal digital assistant with stacked data structures
US10157609B2 (en) Local and remote aggregation of feedback data for speech recognition
US11200886B2 (en) System and method for training a virtual agent to identify a user's intent from a conversation
KR102421668B1 (en) Authentication of packetized audio signals
US10535350B2 (en) Conflict resolution enhancement system
CN110428825B (en) Method and system for ignoring trigger words in streaming media content
US10268690B2 (en) Identifying correlated content associated with an individual
CN105244042B (en) A kind of speech emotional interactive device and method based on finite-state automata
CN112863529B (en) Speaker voice conversion method based on countermeasure learning and related equipment
CN105376429A (en) Cloud computing based voice ability service open system
US20230056680A1 (en) Integrating dialog history into end-to-end spoken language understanding systems
EP4143715A1 (en) Speaker identity and content de-identification
CN109637542A (en) A kind of outer paging system of voice
CN102917000A (en) Intelligent cloud voice application service technology platform
US10621990B2 (en) Cognitive print speaker modeler
CN112035630A (en) Dialogue interaction method, device, equipment and storage medium combining RPA and AI
US7853451B1 (en) System and method of exploiting human-human data for spoken language understanding systems
CN114726635B (en) Authority verification method and device, electronic equipment and medium
WO2021259073A1 (en) System for voice-to-text tagging for rich transcription of human speech
US20210366467A1 (en) Slang identification and explanation
CN112289314A (en) Voice processing method and device
CN107302581A (en) The method that end-to-end Internet of Things equipment wisdom contact is built based on virtual Internet of Things middleware
JP6995966B2 (en) Digital assistant processing of stacked data structures
Qi et al. Application research of a statistical regression algorithm in the IVR system
CN111988460A (en) Method and system for converting voice of dispatching telephone into text

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant