MXPA05010163A

MXPA05010163A - Distributed speech service.

Info

Publication number: MXPA05010163A
Application number: MXPA05010163A
Authority: MX
Inventors: Kuansan Wang
Original assignee: Microsoft Corp
Priority date: 2004-10-22
Filing date: 2005-09-22
Publication date: 2006-04-26
Also published as: EP1650925A2; BRPI0504081A; CA2518978A1; RU2005129428A; KR101265808B1; MY151285A; AU2005211611A1; JP4993656B2; CA2518978C; JP2006121673A; US8396973B2; AU2005211611B2; US20060101146A1; EP1650925A3; KR20060091695A; RU2455783C2; TW200614762A; TWI368425B

Abstract

The present invention relates to establishing a media channel and a signaling channel between a client and a server. The media channel uses a chosen codec and protocol for communication. Through the media channel and signaling channel, an application on the client can utilize speech services on the server.

Description

DISTRIBUTED LANGUAGE SERVICE REFERENCE TO CO-DEPENDENT PATENT REQUESTS The present application claims the benefit of the provisional patent application of E. U. A. series No. 60 / 621,303, filed on October 22, 200.4, the contents of which are incorporated herein by reference in their entirety.

BACKGROUND OF THE INVENTION The present invention relates to methods and systems for defining and managing computer interactions. In particular, the present invention relates to methods and systems for establishing communication protocols between devices in a system, such as with a telecommunication system. Computer Supported Telecommunication Applications (CSTA) is a widely adopted standard suitable for global and enterprise communications. In particular, CSTA is a standard that specifies access and cont'o! programmatic of the telecommunication infrastructure. The software can be developed for a wide variety of tasks, ranging from initiating and receiving calls from simple phones to handling large-scale, multi-site collaborations through voice and video.

CSTA is standardized in a number of ECMA / ISO standards (EC A International Rue du Rhone 114 CH -1204 Genoa, www.ecma-international.org). The central operation model and the semantics of CSTA objects, services and events are defined in ECMA -269. These CSTA features are defined in an abstract and platform independent way so that they can be suitable for several programming platforms. In addition, CSTA is accompanied by different standardized programming or protocol syntax, including ECMA -323 which defines the extensible uplink language (XML) that joins CSTA commonly known as CSTA -XML, and ECMA -348, the Language Union of Web Service Description (WSDL). These language unions, considered as part of the standard CSTA series, ensure maximum interoperability, making the CSTA features available for computers running different operating systems through any of the standard transport protocols, including Transmission Control Protocol ( TCP), Session Initiation Protocol (SIP), or Simple Object Access Protocol (SOAP). Recently, the CSTA has witnessed a strong adoption in the area of interactive voice services. This adoption has been advanced by improved voice services based on Language Application Language Labels (SALT), which is also described in the SALT 1.0 specification found at www.saltforum.org. When using SALT, call centers can also be automated to include various language-related features. However, differences in call control and language control applications create difficulties in facilitating distributed language services. Thus, there is no need to establish protocols to facilitate language services.

COMPENDIUM OF THE INVENTION The present invention relates to establishing a media channel and a signal channel between a client and a server. The media channel uses a code and protocol chosen for communication. Through the media channel and the signal channel, an application on the client can use language services on the server.

BRIEF DESCRIPTION OF THE DRAWINGS Figures 1-4 illustrate illustrative computing devices for use with the present invention. Figure 5 illustrates an illustrative architecture for distributed language services. Figure 6 illustrates an illustrative system for implementing distributed language services. Figure 7 illustrates an illustrative method for establishing channels in a SIP environment.

Figure 8 illustrates an illustrative method for establishing channels in a web service environment.

DESCRIPTION DETAIL OF ILLUSTRATIVE MODALITIES Before describing an architecture for distributed language services and methods to implement the same, it may be useful to generally discover computing devices that can work in the architecture. Referring now to Figure 1, an illustrative form of a data handling device (PIM, PDA or the like) is illustrated at 30. However, it is contemplated that the present invention may also be practiced using other computing devices discussed above. further, and in particular, it is a computing device having limited surface areas for entry buttons or the like. For example, telephones and / or data handling devices will also benefit from the present invention. Such devices will have improved utility compared to existing portable personal information handling devices and other portable electronic devices, and the functions and compact size of such devices will likely further encourage the user to carry the device at all times. Accordingly, it is not intended that the scope of the architecture described herein be limited by the description of an illustrative data handling or PIM device, telephone or computer illustrated herein. An illustrative form of a mobile data management device 30 is illustrated in Figure 1. The mobile device 30 includes a housing 32 and has a user interface that includes a display 34, which uses a contact sensitive display as a whole. with a stylus 33. The stylet 33 is used to press or contact the display 34 in designated coordinates to select in the field, to selectively move a starting position of a cursor, or to provide other information command information such as through gesture or handwriting. Alternatively, or in addition, one or more buttons 35 may be included in the device 30 for navigation. In addition, other input mechanisms such as spinning wheels, rollers or the like may also be provided, however, it should be noted that the invention is not intended to be limited by these forms of input mechanisms. For example, another form of input should include a visual input such as through computer vision. Referring now to Figure 2, a block diagram illustrates the functional components comprising the mobile device 30. A central processing unit (CPU) 50 implements the software control functions. The CPU is coupled to the display 34 so that the text and graphics cones generated according to the control software appear in the present invention 34. A speaker 43 may be coupled to the CPU 50 typically with a digital-to-analog converter 59 for provide a listenable output. The data that is downloaded or entered by the user into the mobile device 30 is stored in a non-volatile read / write random access memory store 54 bidirectionally coupled to the CPU 50. The random access memory (RAM) 54 provides storage volatile for instructions that are executed to the CPU 50, and storage for temporary data, such as register values. The default values for configuration options and other variables are stored in a read-only memory (ROM) 58. The ROM 58 can also be used to store the operating system software for the device controlling the basic functionality of the mobile 30 and other operating system kernel functions (for example, the registration of software components in RAM 54). The RAM 54 also serves as a storage for the code in the manner analogous to the function of a hard drive in a PC that is used to store application programs. It should be noted that although non-volatile memory is used to store the code, it may alternatively be stored in volatile memory that is not used for code execution. The wireless signals can be transmitted / received by the mobile device through a wireless transceiver 52, which is coupled to the CPU 50. An optional communication interface 60 can also be provided to download data directly from a computer (eg, desktop computer), or from a wireless network, if desired. Accordingly, the interface 60 may comprise various forms of communication devices, for example, an infrared connection, modem, a network card, or the like. The mobile device 30 includes a microphone 29, an analog-to-digital (A / D) 37, and an optional recognition program (language, DTMF, handwriting, gesture, or computer vision) stored in storage 54. A As an example, in response to listenable information, the instructions or commands of a device user 30, the microphone 29 provides language signals, which are digitized by the A / D converter 37. The language recognition program can perform functions of normalization and / or feature extraction in the digitized language signals to obtain intermediate language recognition results. When using the wireless transceiver 52 or the communication interface 60, the language data is transmitted to a remote language service 204 discussed later and illustrated in the architecture of Figure 5. The recognition results are then returned to the mobile device 30 to present (for example, visual and / or listenable) therein, and eventual transmission to a web server 202 (Figure 5), wherein the web server 202 and the mobile device 30 operate in a client / server relationship. Similar processing can be used for other forms of input. For example, handwriting input can be digitized with or without pre-processing on device 30. Like language data, this form of input can be transmitted to language server 204 for recognition where recognition results are returned. to at least one of the device 30 and / or web server 202. Similarly, the DTMF data, the gesture data and visual data can be processed in a similar manner. Depending on the input form, the device 30 (and the other client forms discussed below) would include necessary hardware such as a camera for visual input. Figure 3 is a plan view of an illustrative embodiment of a portable telephone 80. The telephone 80 includes a display 82 and a key pad 84. Generally, the block diagram of Figure 2 applies to the telephone of Figure 3, although The additional circuit needed to perform other functions may be required. For example, a transceiver needed to operate as a telephone will be required for the modality of Figure 2; however, such a circuit is not relevant to the present invention. In addition to the portable or mobile computing devices described above, it should also be understood that the present invention can be used with numerous other computing devices such as a general written computer. For example, the present invention will allow a user with limited physical abilities to enter or write text on a computer or other computing device when other conventional input devices, such as a full alphanumeric keyboard, are too difficult to operate. The invention is also operational with numerous other general purpose or special purpose computing systems, environments or configurations. Examples of well-known computer systems, environments, and / or configurations that may be suitable for use with the invention include, but are not limited to, regular telephone personal computers (without any display), server computer, mobile or portable devices , multiprocessor systems, microprocessor based systems, cable TV boxes, consumer programmable electronics, radio frequency identification (RFID) devices, network PCs, minicomputers, macrocomputers, distributed computing environments that include any of the previous systems or devices, and the like. The following is a brief description of a general-purpose computer 120 illustrated in Figure 4. However, computer 120 again is only an example of a suitable computing environment and is not intended to suggest any limitation to scope of use or functionality. of the invention. The computer 120 also should not be interpreted as having any dependency or requirement related to any or combination of components illustrated herein. The invention can be described in the general context of computer executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention can also be practiced in distributed computing environments where tasks are performed through remote processing devices that are connected through a communications network. In a distributed computing environment, program modules can be located on both local and remote computer storage media including memory storage devices. The tasks performed by the programs and modules are described below and with the help of the figures. Those skilled in the art can implement the description and the figures as executable instructions per processor, which can be written in any form of a computer readable medium. With reference to Figure 4, the components of computer 120 may include, but are not limited to, a processing unit 140, a system memory 150, and a common system conductor 141 that couples various system components including memory from system to processing unit 140. System conductor 141 may be any of different types of common conductor structures including a common memory conductor or memory controller, a common peripheral controller, and a common local controller that uses any of a variety of common conductor architectures. As an example, and not limitation, such architectures include Industry Standard Architecture (ISA) common conductor, Universal Series (USB) common conductor, Micro-channel Architecture (MCA) common conductor, Improved ISA common conductor (EISA), Conductor common local Electronic Video Standards Association (VESA), and common Peripheral Component Interconnect (PCI) driver also known as Mezzanine common driver. The computer 120 typically includes a variety of computer readable media. Computer-readable media can be any available means that can be accessed through computer 120 and includes both volatile and non-volatile media, removable media as not removable. By way of example, and not limitation, computer readable media may comprise computer storage media and media. The computer storage media includes volatile and non-volatile, removable and non-removable media, implemented in any method or technology for storing information such as computer-readable instructions, data structures, program modules, or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD), or other optical disc storage, magnetic cassettes, tape magnetic, magnetic disk storage or other magnetic storage devices, or any other means that can be used to store the desired information and which can be accessed by the computer 120. The media typically represents computer-readable instructions, data structures , program modules or other data in, a modulated data signal, such as a vehicle wave or other transport mechanism and include any means of information provision. The term "modulated data signal" means a signal having one or more of its characteristics set or changed in such a way that it encodes the information in the signal. By way of example, and not limitation, the communication means include means by cables such as a network with cables or direct cable connection, wireless means such as acoustic means, FR, infrared or other wireless means. Combinations of any of the foregoing should also be included within the scope of computer readable media. The system memory 150 includes computer storage means in the form of volatile and / or non-volatile memory, such as read-only memory (ROM) 151 and random access memory (RAM) 152. A basic input / output system 153 (BIOS) containing basic routines that help transfer information between elements within the computer 120, such as during startup, is typically stored in ROM 151. RAM 152 typically contains data and / or program modules that are immediately accessible and / or which are actually operated through the processing unit 140. By way of example, and not limitation, Figure 4 illustrates the operating system 54, application programs 155, other program modules 156, and program data 157. Computer 120 may also include other removable / non-removable, volatile / non-volatile computer storage media. By way of example only, Figure 4 illustrates a hard drive 161 that reads from or writes to non-removable, non-volatile magnetic media, a magnetic disk unit 171 that reads from or writes to a removable, non-volatile magnetic disk 172 , and an optical disk unit 175 that reads from or writes to a removable, non-volatile optical disk 176 such as a CD-ROM or other optical means. Other removable / non-removable, volatile / non-volatile computer storage media that can be used in the illustrative operating environment include, but are not limited to, magnetic tape cassettes, instant memory cards, digital versatile discs, digital video cassette , Solid state RAM, solid state ROM, and the like. The hard disk unit 161 is typically connected to the common system conductor 141 through a non-removable memory interface such as the interface 160, and the magnetic disk unit 171 and the optical disk unit 175 are typically connected to the common conductor of system 141 through a removable memory interface, such as interface 170.

The units and their associated computer storage media, discussed above and illustrated in Figure 4, provide a storage of computer-readable instructions, data structures, program modules, and other data for computer 120. In Figure 4, for example, hard drive 161 is illustrated as storing operating system 164, application programs 165, other program modules 166, and program data 167. Note that these components may be the same or different from the operating system 154, application programs 155, other program modules 156, and program data 157. Operating system 164, application programs 165, other program modules 166, and program data 167 are provided with different numbers here to illustrate that , at a minimum, they are different copies. A user can enter commands and information into the computer 120 through input devices such as a keyboard 182, a microphone 183, and pointing device 181, commonly referred to as a mouse, seguibola, or touch-sensitive pad. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, or the like. These and other input devices are usually connected to the processing unit 140 through a user input interface 180 which is coupled to the common system conductor, but can be connected through another interface and common conductor structures. , such as a parallel port, game port, or a common universal serial driver (USB). A monitor 184 or other type of display device is also connected to the common system conductor 141 through an interface, such as a video adapter 185. In addition to the monitor, computers may also include other peripheral output devices such as speakers 187 and printer 186, which can be connected through a peripheral output interface 188. The computer 120 can operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 194. The remote computer 194 can be a personal computer, a server, a router, a network PC, a peer device or another common network node, and typically includes many or all of the elements described above in relation to computer 120. The logical connections illustrated in FIG. Figure 4 includes a local area network (LAN) 191 and a wide area network (WAN) 193, but may also include other network is. Such networked environments are commonly located in offices, computer networks acquired for companies, intranets and the Internet. When used in a LAN network environment, the computer 120 is connected to the LAN 191 through a network interface or adapter 190. When used in a WAN network environment, the computer 120 typically includes a modem 192. or other means to establish communications on WAN 193, such as the Internet. The modem 192, which may be internal or external, may be connected to the common system conductor 141 through the user input interface 180, or other appropriate mechanism. In a networked environment, the program modules illustrated in relation to the computer 120, or their portions, can be stored in the remote memory storage device. By way of example, and not limitation, Figure 4 illustrates remote application programs 195 as resident on the remote computer 194. It will be appreciated that the network connections shown are illustrative and other means may be used to establish a communications link between the computers. Figure 5 illustrates an architecture 200 for distributed language services as may be represented in the present invention. Generally, the information stored in a web server 202 can be accessed through the mobile device 30 (which also represents here other forms of computing devices that have a presentation screen, a microphone, a camera, a touch sensitive panel, etc., as required based on the input form), or through the telephone 80 where the information is requested audibly or through tones generated by the telephone 80 in response to depressed keys and wherein the web server information 202 is provided only audibly back to the user. Although more importantly, the architecture 200 is unified if the information is obtained through the device 30 or telephone 80 that uses language recognition, an individual language server 204 can support any mode of operation. In addition, the architecture 200 operates using an extension of well-known hike languages (e.g., HTML, XHTML, cHT L, XML, WML, and the like). In that way, the information stored in web server 202 can also be accessed using well-known GUI methods found in these dialing languages. By using an extension of well-known dialing languages, creation on web server 202 is easier, and currently existing inheritance applications can also be easily modified to include voice recognition. Generally, the device 30 executes described HTML +, or the like, provided by the web server "202. When speech recognition is required, by way of example, the language data, which may be digitized audio signals or language features wherein the audio signals have been preprocessed through the device 30 as discussed above, are provided to the language server 204 with an indication of a grammar model or language to be used during language recognition.The implementation of the language server 204 may take many forms, one of which is illustrated, but generally includes a recognizer 211. The recognition results are provided back to the device 30 for local presentation if desired or appropriate.

With the compilation of information through recognition and any user inferióase. graphic if used, the device 30 sends the information to the web server 202 for further processing and receipt of other HTML writings, if necessary. As illustrated in Figure 5, the device 30, the web server 202 and the language server 204 are commonly connected, and are separately steerable, through a network 205, here a wide area network such as the Internet. Therefore it is not necessary for any of these devices to be physically located adjacent to each other. In particular, it is not necessary for the web server 202 to include a language server 204. In this way, the creation in the web server 202 can be focused on the intended application without the authors needing to know the intricacies of the server. language 204. More than that, the language server 204 can be independently designed and connected to the network 205, therefore, it can be updated and improved without other required changes in the web server 202. In another modality, the client 30 directly it can communicate with the language server 204, without the need of the web server 202. In addition it will be appreciated that the server 202, the language server 204 and the client 30 can be combined depending on the capabilities of the implementation machines. For example, if the client comprises a general-purpose computer, for example a personal computer, the client can include the language server 204. Thus, if desired, the web server and the language server 204 can be incorporated into the computer. an individual machine. Access to web server 202 through telephone 80 includes connection of telephone 80 to a wired or wireless telephone network 208, which in turn, connects telephone 80 to a third party input 210. Input 210 connects telephone 80 to a telephone voice browser 212. The telephone voice browser 212 includes a media server 214 that provides a telephony interface and a voice browser 216. Similar to the device 30, the telephone voice browser 212 receives letters of HTML or similar of the web server 202. Although more importantly, the HTML writings are in the form similar to the HTML writings provided to the device 30. In this way, the web server 202 does not need to support the device 30 and the telephone 80 separately, or even support the standard GUI clients separately. More than that, you can use a common dialing language. In addition, similar to the device 30, the speech recognition of audible signals transmitted by the telephone 80 are provided from the voice browser 216 to the language server 204, either through the network 205, or through a dedicated line 207 , for example, using TCP / IP. The web server 202, the language server 204 and the telephone voice browser 212 may be represented in any suitable computing environment such as the general-purpose desktop computer illustrated in Figure 4.

However, it should be noted that if the DTMF recognition is employed, this form of recognition would generally be performed on the media server 214, rather than on the language server 204. In other words, the DTMF grammar would be used by the media server. Given the devices and architecture described above, the present invention will also be described based on a simple client / server environment. As illustrated in Figure 6, the present invention pertains to a system 300 comprising a server 302 that provides media services (e.g., language or text recognition for language synthesis) and a client 304 that executes application specific codes . The communication between the server 302 and the client 304 is based on a service model in which the information can be exchanged or labeled or otherwise includes portions identified such as, but not limited to, XML documents (Extended Spreadsheet Language). The server 302 and / or client 304 can collect and transmit audio in addition to other information. In one embodiment, the server 302 may comprise the Microsoft Language Server developed by Microsoft Corporation of Redmon, Washington, while the client 304 may take any number of forms as discussed above, including but not limited to, desktop PCs, mobile devices, etc. At this point it should be noted that although the server 302 and the client 304 communicate with each other based on a service model, the application that evokes aspects of the present invention need not be exclusively written based on a service model in those applications Declarative and / or procedural based may be used while the communication is made between the server 302 and a client 304 in accordance with the service model requests. In one modality, the client application can be composed in C ++, Java, C # or other imperative programming languages that do not require a browser as in the case of the HTML-based applications described in Figure 5. An important aspect of CSTA (ECMA -269) edition 6 are enhanced voice services based on language application language (SALT) tags. The most recently added features include automatic language recognition, language verification, speaker identity, speaker verification and text to speech synthesis that can be implemented in the 300 system. Some or all of these features are provided in automated call centers . The aspects of the present invention provide a subset of CSTA services to facilitate the network based on language services. In particular, some aspects of the present invention illustrate how ECMA-348 and uaCSTA (ECMA -TR / 87) can be applied to provide distributed language services in a web service and SIP (Session Initiation Protocol) based on the Vo IP (Voice in Internet protocol), respectively.

Services for Computer Supported Telecommunications Applications (CSTA) ECMA -269, and their XML and web service protocols are defined by ECMA -323 and ECMA -348, respectively. Recently, ECMA -TR / 87 (uaCSTA) also describes a group of SIP conventions to use ECMA -323 in the VolP environment. All these protocols direct the entire group of CSTA at the beginning, and hence they are applicable to specific voice services. In the sixth edition of ECMA-269, the portion of CSTA's voice services has been increased based on technology derived from SALT. In addition to existing voice services, a new addition includes key features that are essential for call center automation and mobile applications, including automatic language recognition, language modification, speaker identification, speaker verification and text synthesis to language, etc. Although. Well-integrated CSTA implementations of call control and voice scenarios are desirable for application developers, the core competencies between call control and language vendors are not necessarily the same. For the current deployment and the foreseeable future, CSTA application developers may need to involve multiple vendors to meet their respective needs in these areas. Fortunately, the CSTA model concept, as illustrated in ECMA-269, allows an individual application to produce services from multiple CSTA service providers. Therefore, it is a valid scenario where a CSTA application will simultaneously use two implementations of CSTA, one for call control and one for voice services. The CSTA profiles for language services have not been refined as in the call control area. The aspects of the present invention describe a CSTA profile for providing language services in a platform-independent medium that uses XML. Although the CSTA profile is a transport that is agnostic in nature, two common applications of the language service profile are exemplified here to better promote interoperability in order to: the SIP environment based on low usage CSTA, and the environment based on web service based on ECAM-348. The description provided here provides examples of how CSTA Voice Services subgroups can be included to facilitate server-based language processing. The following ECMA standards are incorporated herein by reference in their entirety: ECMA-269 service for Phase III of Computer Supported Telecommunication Applications (CSTA); ECMA-323, SMLP protocol for phase III of Computer Supported Telecommunication Applications (CSTA); Web Service Description Language ECMA-348 (WSDL) for CSTA. In addition, this application describes how CSTA Language Services can be implemented in a SIP-based VolP environment that uses the uaCSTA proposal. ECMA TR / 87 should also be used as a reference for uaCSTA, a copy is incorporated here by reference. The client-server-based language processing described here is capable of controlling asymmetric media types in a response / request cycle. For example, by providing language recognition service, a client transmits audio data to a server. The server converts the audio data to text data and transmits the converted data back to the client. In the case of language synthesis, the client transmits text data and the server responds with converted audio data. The transmitted data can be sent according to a specified protocol, such as one based on CSTA. As a result, the SIP and web services environment can be extended to include text-audio or audio-text-audio-audio interactions. ECMA TR / 87 establishes a "signal channel" transport 308 as illustrated in Figure 6. Signal channel 308 is used by server 302 and client 304 to exchange information on what each should do while it belongs to call controls. When the server 302 comprises a telephone switch, the use of a signal channel 308 is efficient. However, if the server 304 is a language server and the client 304 is requesting a language service, the server 302 should also know where to receive and transmit language information. For example, server 302 must know where to obtain language recognition information and where to send the synthesized language. Therefore, in addition to establishing a signal channel 308, a "media channel" protocol 310 must also be established. For example, the media channel 310 is used to transport language data (audio data) collectors by the client 304 to the server 302. In that way, in a text-to-language operation, the client 304 can send the data of text through the signal channel 308 while the synthesized language data is provided back to the client 304 from the server 302 through the channel! of means 310. With respect to the architecture of Figure 5, the signal channel 308 and the media channel 310 are established for any communication to the language server 204. However, it should be noted that the use of the application server web 202 is optional and that the application may reside in the client 30 as illustrated in Figure 5. One aspect of the present invention is what steps are taken to implement the media channel 310. In an illustrative embodiment, the establishment of a media channel 310 for CSTA in a SIP environment. In another illustrative embodiment, it is discussed what steps are taken to implement the media channel 310 for CSTA in an environment based on web service. It is important to note that semantic information can be transferred between server 302 and client 304, for example by using Language Application Description Language (SADL), which can specify the XML schema for results returned by the listener source, it is say results returned by the 302 service with language recognition.

ESTABLISHING CHANNELS IN A SIP ENVIRONMENT The SIP is a protocol that is designed to be "speaker," so that server 302 and client 304 exchange small pieces of information frequently. In the SIP environment, the establishment of the media channel 310 is performed through Session Description Protocol (SDP). An illustrative method 300 for performing this task is illustrated in Figure 7. In step 402, the client 304 performs a session with the server 302 using a SIP Invitation. An SDP description is also sent that declares an IP (Internet Protocol) address that must also be used and a port in the IP address that must be used for the audio. In addition, in step 404, the description of SDP will warn which type of code to be encoded is used for the media stream and a communication protocol such as a transmission control protocol (TCP) or real-time transport protocol (RTP). ). Upon receiving through the server, the server can decide whether to accept the SDP description mentioned by the client 304, in step 406. If the protocol and the code are accepted, the server 302 responds with a SIP acceptance and with its own SDP description by listing your IP address and audio port. Then, method 400 proceeds to step 408, where a signal channel is established. In the alternative, if the server 302 does not support the proposed code or protocol, the server 302 may begin to negotiate with the client 304 which code and / or protocol will be used. In other words, the server 302 will respond to the description of the initial customer SDP 304 by a counter-offer proposing a different code and / or protocol. Before making a proposal, method 400 proceeds to step 410, where it makes a determination as to whether recognition should continue. For example, in step 412, after a specified number of counter offers have been proposed, the communication will stop. Additional counter offers may be made between customer 304 and server 302 in step 414 until an agreement is reached or until it is clear that no agreement will be reached. The SIP / SDP is a standard approved by the Internet Engineering Task Force (IETF) that is used to set the audio channel on voice IP. However, the SIP / SDP does not describe a method for establishing a signal channel implementing CSTA. In case 408, the signal channel 308 is established by ECMA-TR / 87. After the establishment of the signal channel, the application association is considered complete. As a result, distributed language services can be implemented in the 300 system.

ESTABLISHING CHANNELS IN A WEB SERVICE ENVIRONMENT In contrast to the "speaking" nature of SIP as described above, web services are designed and frequently used for "short" communications so that minor dialog exchanges are required between server 302 and the client 304. As a result, the features that are negotiated in multiple dialog exchanges in SIP are usually described and discovered through the service descriptions published in the public directories for the web services or obtained dynamically in a service metadata exchange Web. A web service environment includes a standard UDDI protocol (Uniform Description Discovery Integration). Web service providers publish relevant information that developers can discover, obtain, and therefore choose the appropriate service provider, which allows application developers to dynamically integrate the web service into the application. For example, ECMA -348 specifies the web service description language (WSDL) for CSTA so that web services that offer CSTA functionality can be uniformly described, discovered, and integrated using standard web service protocols. The establishment of the media channel is an extension to ECMA-348. Figure 8 illustrates an illustrative method 420 for establishing channels in a web service environment. In the current invention, the web service providers list as service metadata all the codes and protocols that are supported by the web service in step 422. An application developer can use web service directory providers to obtain or discover which web service has a code or protocol that can be used in step 424. This step can be performed by searching through the metadata of each web service provided in order to find the desired code and protocol that it requires. The directory provides a URL address (universal resource locator) for each web service. The client 304 then assigns connection to the web service and uses an application with the desired code and protocol to communicate with the server 302. After a connection is made, the media channel 310 and its signal channel 308 are established at one time. . The invention under the web service environment directs how to establish connections across all levels (application and transport) in an exchange through a media description extension to WSDL. In one embodiment, the invention can be applied in conjunction with ECMA-348, which already has a mechanism for establishing CSTA and its signal transport protocol highlighted. By adding the media encoding and transport protocol extension to ECMA-348, the CSTA is thus improved to establish the signal and media channels in an individual step. In another embodiment, the media description is transported using the web service address extensibility, or WS address, the protocol as a CSTA application association of preceding step. The WS address (WSA) is a specification that provides neutral transport mechanisms to direct web service endpoints and messages. Both the CSTA change functions and the CSTA applications are web service endpoints. The WS address introduces a new specification, called endpoint reference, which supports the dynamic use of services not appropriately covered by the elements of <; wsdl: service > and < wsdl: port > in WSDL. The WS address defines a type of XML document (wsa: endpointReferenceType) to represent an endpoint reference. An XML element, wsa: PuntofinalReferencia, is also specified to have the type. Both reside in the XML namespace. http: // is quemas, xmlsoap.org/ws/2004/03/dirección. A WSA endpoint reference type can include the following: [address]: a URI (Uniform Resource Identifier) identifies the endpoint. [reference properties]: < xs: any / > (0 .. unlimited), specific properties, one for each entity or resource being transported. [port type selected]: QName (0 .. 1), the name of the primary port type as defined in WSDL for the endpoint. [service and port]: (Úname, NCName (0 .. 1)) (0 .. 1), the port service, as defined in WSDL, corresponding to the end point, [policy]: WS policy elements Optionals that describe the behavior, requirements, and capabilities of the endpoint. As in the case of SIP, establishing an audio channel is necessary for CSTA language services. As an audio channel can be negotiated in SIP through SDP, the WSA endpoint reference can be used for language service providers to declare the media endpoint. Media transport protocols and coding mechanisms are among the critical items needed to be specified in order to provide language services. These items are declared as reference properties. To improve volume, the media channel in a web service environment is modeled as a server lease (CSTA voice resource provider) for the client (CSTA application), and the lease expires over time. The server can also designate a leasing administrator where the client can cancel or renew the lease. An Endpoint Reference Type of CSTA media, with an XML schema, includes one or multiple WSA endpoint references. For example, a CSTA language service provider that uses the G.711 protocol in the Real-Time Transport Protocol (RTP) on port 6060 may describe the media endpoint as follows: < csta: MediaPointFinalReference xnlns: csta = "http: //www.ecma international.org/TR/xx" xmlns: wsa = "http://esquemas.xmlsoap.Org/ws/2004/0 3 / address" > < wsa: Address > rtp: //Server.acme.com: 6060 < / wsa: Address < wsa: ReferenceProperties > < csta: Code > G.711 < / csta: Code > < csta: SubscriptionD > 12345 < / csta: SubscriptionD > < csta: Expires > 2004-1021T21: 07: 00.000 - 08:00 < / csta: Expires > < / wsa: ReferenceProperties > < / csta: MediaPointReference > The CSTA media endpoint reference properties include a code declaration, a subscription identifier, and an optional lease expiration declaration. As in the case of uaCSTA, where a media channel is established together with the signal channel, the previous media end point reference must be included before the CSTA application association process under the web service environments. Take advantage of the extensibility of WS protocols, a language session can be established using < wsa: Action > . The media endpoint reference itself can be a reference property in the endpoint reference of the CSTA web service provider. A Simple Object Access Protocol (SOAP) message is composed by joining the media endpoint reference immediately after the <wsa: For > , as shown below: < soap: Cover xmlns: soap = "http: / www. w3.org/2003/05/soap -cover xm I ns: ws a = "http Resquemas, xmlsoap.org/ws/2004/03 / address xmlns: csta =" http: /www.ecma - international.org/TR/xx "> < soap: Headline > < wsa: ReplyA > < wsa: Address > http: / example. customer.com < / wsa: Address > < / wsa: ReplyA > < wsa: A > http: / Server. acme.com < / wsa: A > < csta: MediaPointFinalReference > < / csta: MediaPointFinalReference > < wsa: Action > http: /www.ecma - i nternational.org/TR/xx/ Create rSe si ó n < / was: Action > < wsa: MensajelD > ... < / wsa: MessageID > < / soap: Header > < soap: Body > < / soap: Body > < / soap: Cover > The services are described by tai metadata as WS and WSDL policy. While the WS policy describes general capabilities, requirements and characteristics of a service, the WSDL describes abstract message operations and specific network protocols and addresses to achieve the web service. Web Service Metadata Exchange, WS - MEX or WSX, is a specification that strives for the recovery of metadata. The client can send a WS - MEX request to an endpoint to obtain its metadata. A normative outline for the request using SOAP is as follows: < soap: Cover ... > < soap: Header < wsa: Action > http://esquemas.xmlsoap.org/ws/2004/09/mex/get metadates / request < / wsa: Action > < wsa: ensajelD > < xs: anyURI / > < / wsa: MensajelD > < wsa: ReplyA > WS-Direct puntofinal reference < / wsa: ReplyA > < wsa: For > < xs: anyURI / > < / wsa: For > < / soap: Header > < soap: Body > < wsx: GetMetadata ... > [< wsx: Dialect [Identified r = '< xs: anyURI / > ']? > < xs: anyURI / > < / wsx: Dialect > ] * < / wsx: GetMetadata > < / soap: Bodyo > < / soap: Cover > As shown in the SOAP header, the WS-MEX uses the WS address to specify the request to retrieve metadata. The goal service is specified as a URI in the < wsa: For > , and the endpoint of response is specified using WS address endpoint reference in the content of < wsa: ReplyA > . The types of metadata to be recovered are specified in the content of < wsx: GetMetadata > in the body of SOAP. If an endpoint accepts a metadata request, it must respond with a metadata response message. The normative contour of the response in SOAP is as follows: < soap: Cover ... > < soap: Header ... > < wsa: Action > http://esquemas.xmlsoap.org/ws/2004/09/mex/Obtener Metada cough / Answer < / wsa: Action > < wsa: RefersA > previous id message < / wsa: RefersA > < wsa: For > < xs: anyURI / > < / wsa: For > < / soap: Headline > < soap: Body > < wsx: Metadata ... > [< wsa: etadatosSelect Dialect = "dialect URI" [identifier = 'previous identifier'] > < xs: any / > < ! - service-specific data section - > < wsx: Refere nce data > WS-WS address endpoint reference < / wsx: MetadataReference > I < wsx: Location > < xs: anyURI / > < / wsx: Location > 1 < / wsa: Metadata Section > ] * < / wsx: Metadata > < / soap: Body > < / soap: Cover > Transported in SOAP body, metadata can be returned online as element contents < wsx: Metadata > , or by reference using reference WS endpoint or simply URI. Previous SOAP messages can have WSDL junctions as follows: < wsd!: message name = "GetMetadataMessage" > < wsdl: part name = "body" element = "tns: getmetadata'7> &ls; / wsdl: message > < wsdl: message name = "GetMetadataResponse Message" > < wsdl: part name = "body" element = "tns: Metadata" / > < / wsdl: message > < wsdl: name PortClass = "Metadatalnterchange" > < wsdl: operationname = "GetMetadata" > < wsdl: input message = "tns: GetMetadata Message "wsa: Action = "http i / schemes. xmlsoap.org/ws/2004/09/mex/ Get Metadata / Request" / > < wsdl: output message = "tns: GetMetadataRespues ta Message" wsa: Action = "http: / esq uemas.xmlsoap.org/ws/2004/09/mex/Obtener Metadatos / Res pu esta" / > < / wsdl: operation > < / w s d I: T i p o d e p u e r t o > The CSTA medium description is a type of metadata that CSTA applications must obtain from the voice service provider. WS-MEX is particularly suitable here. Later on it is in the SOAP message sample to retrieve media endpoint reference: < soap: Cover xmlns: soap = "http: / www. w3.org/2003/05/soap- cover xm I ns: wsa =" http: / schemas, xmlsoap.org/ws/2004/08/di re cci n xmlns: wsx = "http: / schemas, xmlsoap.org/ws/2004/09/mex xmlns: csta =" http: / www, ecma internacional.org/TR/xx "> < / soap: Header > < wsa: action > http: / esq uemas.xmlsoap.org/ws/2004/09/mex/Obtener Metada cough / Request <; / wsa: Action > < wsa: MessageID > uuid: 12345edf -53cl-4923-ba23-23459cee433e < / wsa: lMessages > < wsa: ReplyA > < wsa: Address > http: / client. example.com/My untofinal < / wsa: Address > < / wsa: Resonder &> < wsa: For > http: / server. acme.org < / wsa: For > < / soap: Header > < soap: Body > < wsx: GetMetadata > < wsx: Diaiecto > http: /www.ecma - ¡nternacional.org/TR/XX/Puntoofinalmedios < / wsx: Diaiecto > < / wsx: Get etadatos > < / soap: Body > < / soap: Cover > The example demonstrates a client application, located on client.example.com, which requests the endpoint reference of media from a CSTA language service provider on server.acme.org. Because the specific dialect is specified, the server must respond only to metadata of the desired type. A SOAP response message should be: < soap: Cover ... > < soap: Header > < wsa: Action > http: / esq uemas.xmlsoap.org/ws/2004/09/mex/Obetener Meta datos / Res que ta < / wsa: Accumulation > < wsa: RelatedA > uuid: 12345edf-53c1-4923-ba23-23450cee433e < / wsa: Relationship adoA > < wsa: For > http: / client. example.com/MyFinalPoint > / wsa: For > < / soap: Header > < soap ': Body > < wsx: Metadata > < wsx: Diale ctoteMetadataSection = "http: www.ecma - internacional.org/TR/XX/PuntoFinalMedia" > < csta: Final Point Media Referencing > < That: Address > rtp: / server. acme.org: 6060 < / wsa: Address > < wsa: ReferenceProperties > < csta: Code > G.7 < / csta: Code > < csta: SubscriptionD > 12345 < / csta: SubscriptionD > < csta: Expires > 2004 -10- 21T21: 00: 00.0 -22: 00 < / csta: expires s > < / wsa: Reference Property is > < / csta: MediaPuntoofinalReferencia > < / wsx: MetadataSection > < / Metadata > < / soap: Body > < / soap: Cover > The language application description is another type of metadata that a language service can provide. Multiple types of metadata can be obtained at the same time by populating the < wsx: GetMetadata > with other respective URIs through < wsx: Dialect > . The following is an example of the SOAP body for obtaining both the media endpoint and the language application reference: < wsx: GetMetadata > < wsx: Dialect > http: /www.ecmainternacional.org/TR/xx/PuntoFinalMedios < / wsx: Dialect http: /www.ecma - international. org / TR / xx / Description Language Application < / wsx: Dialect > < / wsx: GetMetadata > The corresponding response in the SOAP body: < wsx: Metadata > < wsx: Dialect of Metadata Section "http: /www.emca -internacional.org/TR/xx/PuntoFinalMedios" > < / wsx: Me ta da tos Section > < wsx: Metadata DialectSection = "http: /www.ecma -internacional.org/TR/xx/DescriptionApplicationLanguage" > < csta: resource id_ "US_AddressRecognition" > < csta: type > Listener < / csta: type > < csta: grammar uri = "urn: acme.com/dirección/calle_número.grxml" schema = "urn: acme.com/direction/calle_número.xsd" / > < csta: grammar uri = urn "urn: acme.com/direction/ciudad.grxml" > < csta: id rule = "postcode" schema = "urn: acme.com/address s / zip.xsd" / > < csta: rule id = "city_status" schema = "urn: acme.com/direction s / ciudad.xsd" / > < / csta: grammar > < / csta: resource > < / wsx: etadatosSection > < / wsx: Metadata > While web services initiate in one direction, the request and response model, web services often wish to receive messages when events occur in other services or applications. The web services event, or WS event (WSE), is a specification to facilitate event notification. The WS event defines how a web service can subscribe to events in favor of another service or application, and allows applications to specify how event services are delivered. It supports a wide variety of event topologies, which allow the event source and the final event to sink to be decoupled. These properties are suitable for a wide range of CSTA applications, ranging from call centers to mobile computing. The use of WS event is provided because the CSTA voice services need event notification to function. Although the present invention has been described with reference to particular embodiments, workers skilled in the art will recognize that changes in form and detail can be made without departing from the spirit and scope of the invention.

Claims

1. - A method of communication between a client and a server, comprising: establishing a media channel; establish a signal channel; and exchanging information between the client and the server through at least one of the media channel and the signal channel.

2. - The method according to claim 1, wherein establishing the media channel further comprises establishing a code and a protocol.

3. - The method according to claim 1, wherein the exchange of information is performed in a Session Initiation Protocol (SIP) environment.

4. The method according to claim 1, wherein the exchange of information is performed in a web services environment.

5. - The method according to claim 1, wherein establishing the media channel includes proposing a code and a protocol to be used for the media channel.

6. - The method according to claim 1, wherein establishing the media channel includes declaring an Internet protocol address and a port associated with the Internet protocol address.

7. The method according to claim 1, and further comprising providing a list of at least one code and at least one protocol used to establish the media channel.

8. - The method according to claim 7, and further comprising referring to the list to establish the media channel.

9. - The method according to claim 1, wherein the exchange of information includes transmitting language data through the media channel.

10. - A computer-readable medium that is instructed to provide language services, the instructions comprise: receiving signal information through a signal channel according to an established signal protocol; receive language information through a media channel according to an established code and protocol; and process the signal information and the language information.

11. - The computer readable medium according to claim 10, wherein the instructions further comprise performing language recognition in the language information.

12. The computer readable medium according to claim 10, wherein the instructions further comprise establishing a session in a Session Initiation Protocol (SIP) environment.

13. The computer readable medium according to claim 10, wherein the processing of signal information and language information is performed in a web services environment.

14. - The computer readable medium according to claim 10, wherein the instructions further comprise providing a computer-supported Telecommunications Application (CSTA) infer.

15. - The computer readable medium according to claim 10, wherein the instructions further comprise interpreting a Simple Object Access Protocol (SOAP) message.

16. - The computer readable medium according to claim 10, wherein the instructions further comprise processing the language information to identify the semantic information contained therein.

17. The computer readable medium according to claim 10, wherein the instructions further comprise transmitting information to a specific port associated with an Internet Protocol (IP) address.

18. - The computer readable medium according to claim 10, wherein the instructions further comprise transmitting a Simple Object Access Protocol (SOAP) message.

19. - A method for processing information in a computer network, comprising: establishing a relationship between a client and a server in one of a SIP environment and a web services environment; transmit data from the client to the server according to a specified protocol, the data comprising audio data or text data; convert audio data data to text data if the data is audio data and text data to audio data if the data is text data; and transmit data covered from the server to the client in accordance with the specified protocol.

20. The method according to claim 19, wherein the specified protocol is based on CSTA (Applications ofTelecommunication Supported by Computer).