WO2020198799A1 - Instant messaging/chat system with translation capability - Google Patents

Instant messaging/chat system with translation capability Download PDF

Info

Publication number
WO2020198799A1
WO2020198799A1 PCT/AU2020/050328 AU2020050328W WO2020198799A1 WO 2020198799 A1 WO2020198799 A1 WO 2020198799A1 AU 2020050328 W AU2020050328 W AU 2020050328W WO 2020198799 A1 WO2020198799 A1 WO 2020198799A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
instant messaging
chat
translation
language
Prior art date
Application number
PCT/AU2020/050328
Other languages
French (fr)
Inventor
Danny Stephen MAY
Muhammad Zubair
Original Assignee
Lingmo International Pty Ltd
Hangzhou Lingwosheng Intelligent Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lingmo International Pty Ltd, Hangzhou Lingwosheng Intelligent Tech Co Ltd filed Critical Lingmo International Pty Ltd
Publication of WO2020198799A1 publication Critical patent/WO2020198799A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3337Translation of the query language, e.g. Chinese to English
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/04Real-time or near real-time messaging, e.g. instant messaging [IM]
    • H04L51/046Interoperability with other network applications or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/063Content adaptation, e.g. replacement of unsuitable content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Definitions

  • an instant messaging /chat system interface configured to communicate with the at least one instant messaging /chat system to obtain and send content data within an established session of the instant messaging /chat session;
  • the content processor further comprises a pre -processing module that is configured to process the content prior to forwarding the content to the translation system.
  • a content processor which, within the established session of the instant messaging /chat session, is configured to:
  • an operating system of the client 110, 112 supports the instant messaging/chat process that provides a user interface 150 (Fig. 2) through which a user can both receive content of an instant messaging/chat session and also add content to the instant messaging/chat session.
  • the client resides on a mobile device (such as a smart phone or watch) but may reside on other computing devices.
  • “content” can mean SMS messages, MMS messages, messages on dedicated platforms such as WhatsApp, Messenger, Instagram etc. and includes messages having only a textual content as well as those having an audio content, or a mixture of these content types.
  • system 100 also includes a capability to allow in session translation of content.
  • the host may instigate an initialising routine to seek the language preferences of the recipients (if it is not already known by the host 130). This initialising process may involve a push notification 156 to client(s) 112 regarding incoming content and requesting language preference for the content. This initialising routine occurs before posting of the content with clients 112.
  • the assembled translated text content is then passed either to the Text to Speech module 140c, or passed back to the messaging platform 150 via the communication system 130.

Abstract

In instant messaging /chat system, a method and system of translating a message is provided utilising an intermediary translation system that is configured to receive content; determining a language of a content of the message; if the determined language is different from a required language, to enable translating of the content to produce a translated content; and posting on the instant messaging /chat system within an established session. In some forms, the method and system include audio pre-processing of an audio stream associated with the instant messaging/chat system.

Description

INSTANT MESSAGING/CHAT SYSTEM WITH TRANSLATION CAPABILITY
Technical Field
Embodiments relate to real time communication systems and in particular to instant messaging and chat systems, and to systems and methods for processing audio and/or text content. The disclosure has particular application to improvements to instant messaging /chat systems to cater for multiple languages.
Background
Instant messaging /chat systems that provide real time communication between users with text and/or voice messages have become prevalent, particularly on mobile devices. In the context of the specification, the term“instant messaging/ chat systems” includes all types of IP telephony services including VOIP services, video conferencing, instant messaging, live chat systems for websites, dedicated interactive kiosks (at point of sale or point of service) and the like. Whilst the popularity of such services has increased dramatically in recent times, such services suffer from the disadvantage that their understanding is limited to the user’s understanding of the language of the message. This is naturally problematic when the users do not understand a common language.
There is a need for improvements in instant messaging and chat services that cater for multiple languages through a translation service without unduly effecting the user experience and/or which can be integrated readily into existing systems. There is also a need for improvements in translation systems to improve accuracy particularly in relation to translation of audio content in an instant message/ chat session.
Summary of the Disclosure
An embodiment relates to an intermediary translation system implemented using computer processing and memory resources and configured to integrate with one or more instant messaging /chat systems and one or more remote translation systems via a communication network, the system comprising;
an instant messaging /chat system interface configured to communicate with the at least one instant messaging /chat system to obtain and send content data within an established session of the instant messaging /chat session;
a translation system interface configured to communicate with the at least one translation system to obtain and send content data; and
a content processor which, within the established session of the instant messaging /chat session, is configured to:
determine a language of the content received from the instant messaging /chat system; determine a required language of a recipient of the content within the instant messaging /chat system and, in response to the required language being different from the determined language;
to forward the content via the translation system interface to a remote translation system for translating the content to produce translated content in the required language; and to receive the translated content from the remote translation system; and
to forward the translated content to the recipient within the established session via the instant messaging /chat system interface.
An advantage of the above disclosed system is that content translation can occur within an established instant messaging/chat session in a seamless real-time fashion without requiring additional user input in the established session. A further advantage of the system is that the intermediary translation system is separated from both the instant messaging /chat platform and translation service. This provides flexibility in design and application of the system. For example, the system can be deployed (through the appropriate communication interfaces) into existing messaging/chat platforms and can similarly access different translation services depending on system requirements.
In some forms, the content processor determines the language of the content by parsing the content.
In some forms, the content processor determines the language of the content by referencing information of the sender of the content within the instant messaging /chat system.
In some forms, the content processor determines the language of the content by reference to user input. In some forms, the intermediary translation system further comprises a user profile store that retains user data including the required language of the user. In some forms, the user data is derived from the instant messaging/chat system.
In some forms, the instant messaging /chat system interface is a machine to machine (M2M) interface. In some forms, the M2M interface utilises one or more application programming interfaces (APIs). In some forms, the communication is performed utilising HTTP protocol with a push notification service.
In some forms, the translation service interface is a M2M interface. In some forms, the M2M interface utilises one or more APIs. In some forms, the communication is performed utilising HTTP protocol with a push notification service.
In some forms, the content processor further comprises a pre -processing module that is configured to process the content prior to forwarding the content to the translation system.
The content may be processed in different ways. In one form, the pre-processing module may be configured to insert punctuation to text content to aid in establishing context to assist in translation.
In other forms, the pre-processing module may modify audio content to improve translation accuracy.
In some forms, the pre-processing module is operative to create audio data packets from an audio stream that allow for improved translation. The audio packets may equate to sentences or other parts of speech. The applicant has found that dividing audio streams into smaller sub groups can allow for improved translation accuracy as it enables translation without contextual bias of the translation service.
Also disclosed is a method of translating audio content from a source language into a target language, the method being executed by a computer system using computer processing and memory resources, the method comprising: separating an audio stream into audio frames of predetermined duration; detecting voice activity within individual ones of the frames; using the detected voice activity to group frames into audio data packets; and using the audio data packets to create input data packets for a translation service to allow for translation of the audio content into the target language.
In some forms, the pre-processing module may also analyse the audio packets and vary the audio data dependent on a characterisation of that data to improve subsequent translation accuracy. In one form, data in one or more data packets that is characterised as being from a sender in an instant messaging chat session is promoted whereas data from other audio (e.g. background, near noise or other speakers) is supressed.
In some forms, the content processor is operative to extract features from audio content. In one form, this step extracts features that have components representative of speech which can be reliably modelled. In some form, the feature extraction is undertaken on audio data packets that have been established from the audio data stream.
In some form, the extracted features are compared to a stored model of target language dialects so as to identify accent or dialects of the speech which can then be associated with the audio content so that the translation accuracy can be improved.
Also disclosed is a method of translating audio content from a source language into a target language, the method being executed by a computer system using computer processing and memory resources, the method comprising; extracting features from the audio content, the features extracted being compatible with dialect models of the target language;
comparing the extracted features with the language dialect models to identify any dialect indicated by the extracted features, associating any identified dialect with the audio content; forwarding audio content with information on any identified dialect to a translation service for translating the audio content.
In some forms, the content processor also discloses a post processing module that is operative to combine translated content and to incorporate punctuation into the combined text. In one form, the post processing module uses a punctuation model which uses characteristics from the pre-processing module as inputs to the punctuation model to improve accuracy. The applicant has found that the ability to include correct punctuation significantly improves translation accuracy and contributes significantly to allowing contextual translations.
Also disclosed is a method of translating audio content from a source language into a target language, the method being executed by a computer system using computer processing and memory resources, the method comprising: providing the audio content; extracting features from the audio content; generating a translated text of the audio content in the target language; and using a punctuation model to add punctuation to the translated text, wherein the punctuation model uses the extracted features from the audio content in determining placement of punctuation in the translated text.
A further embodiment relates to an intermediary translation system implemented using computer processing and memory resources and configured to integrate with one or more instant messaging /chat systems and one or more remote translation systems via a communication network, the system comprising;
an instant messaging /chat system interface configured to communicate with the at least one instant messaging /chat system to obtain and send content data within an
established session of the instant messaging /chat session;
a translation system interface configured to communicate with the at least one translation system to obtain and send content data; and
a content processor which, within the established session of the instant messaging /chat session, is configured to:
determine a language of the content received from the instant messaging /chat system; determine a required language of a recipient of the content within the instant messaging /chat system and, in response to the required language being different from the determined language;
extract features from the audio content, the features extracted being compatible with dialect models of the target language;
compare the extracted features with the language dialect models to identify any dialect indicated by the extracted features,
associate any identified dialect with the audio content;
forward audio content with information on any identified dialect to a translation service for translating the audio content.; and
to receive the translated content from the remote translation system; and
to forward the translated content to the recipient within the established session via the instant messaging /chat system interface.
A further embodiment relates to an intermediary translation system implemented using computer processing and memory resources and configured to integrate with one or more instant messaging /chat systems and one or more remote translation systems via a
communication network, the system comprising;
an instant messaging /chat system interface configured to communicate with the at least one instant messaging /chat system to obtain and send content data within an
established session of the instant messaging /chat session;
a translation system interface configured to communicate with the at least one translation system to obtain and send content data; and
a content processor which, within the established session of the instant messaging /chat session, is configured to:
determine a language of the content received from the instant messaging /chat system; determine a required language of a recipient of the content within the instant messaging /chat system and, in response to the required language being different from the determined language;
separate an audio stream into audio frames of predetermined duration;
detect voice activity within individual ones of the frames;
use the detected voice activity to group frames into audio data packets; and using the audio data packets to create input data packets;
forward input data packets via the translation system interface to a remote translation system for translating the input data packets to produce translated content in the required language; and
to receive the translated content from the remote translation system; and
to forward the translated content to the recipient within the established session via the instant messaging /chat system interface.
In some forms, the content processor is further configured to;
extract features from the audio content; and
to receive the translated content from the remote translation system in the form of text; and using a punctuation model to add punctuation to the translated text, wherein the punctuation model uses the extracted features from the audio content in determining placement of punctuation in the translated text.
A further embodiment relates to an intermediary translation system implemented using computer processing and memory resources and configured to integrate with one or more instant messaging /chat systems and one or more remote translation systems via a communication network, the system comprising;
an instant messaging /chat system interface configured to communicate with the at least one instant messaging /chat system to obtain and send content data within an
established session of the instant messaging /chat session;
a translation system interface configured to communicate with the at least one translation system to obtain and send content data; and
a content processor which, within the established session of the instant messaging /chat session, is configured to:
determine a language of the content received from the instant messaging /chat system; determine a required language of a recipient of the content within the instant messaging /chat system and, in response to the required language being different from the determined language;
extract features from the audio content; forward the content via the translation system interface to a remote translation system for translating the content to produce translated content in the required language; and
to receive the translated content from the remote translation system; and using a punctuation model to add punctuation to the translated text, wherein the punctuation model uses the extracted features from the audio content in determining placement of punctuation in the translated text.; and to forward the translated content to the recipient within the established session via the instant messaging /chat system interface.
In some forms, the translation system has a plurality of sub-systems, and the content processor is configured to select a translation sub-system for the received content in response to a characteristic of the content and/or system setting and forwarding the received content to the selected translation sub- system.
In some forms, the translation sub systems comprise one or more of text to text translation system, speech to text translation systems, and speech to text translation systems.
In some forms, the received content is in the form of any one or more of text, and audio information.
In some forms, the translated content is in the form of any one of text and audio information.
The system can be implemented using computer processing and memory resources in the form of one or more network connected servers and databases, these hardware resources executing software programmed to implement the functions as described above.
Alternatively, the computer processing and memory resources may be network accessible distributed "cloud based" resources, executing software to implement the system
functionality as described above. Some embodiments may utilise a combination of dedicated hardware and shared resources. A variety of different system architectures are contemplated within the scope of the present disclosure.
A further embodiment relates to an instant messaging/chat system comprising; an instant messaging /chat client; an instant messaging /chat host configured for communicatively coupling to said instant messaging /chat client, said host having logic for receiving application input from the instant messaging /chat client as a content posting to an established instant messaging /chat client session between instant messaging /chat clients; one or more translation systems implemented using computer processing and memory resources and configured to translate content from an instant messaging/chat session; and an intermediary translation system implemented using computer processing and memory resources and configured to integrate with the instant messaging /chat server and the one or more remote translation systems via a communication network, the intermediary translation system comprising;
an instant messaging /chat host interface configured to communicate with the instant messaging /chat host to obtain and send content data within an established session of the instant messaging /chat session;
a translation system interface configured to communicate with the at least one translation system to obtain and send content data; and a content processor which, within the established session of the instant messaging /chat session, is configured to:
determine a language of the content received from the instant messaging /chat session;
determine a required language of a recipient of the content within the instant messaging /chat and, in response to the required language being different from the determined language;
to forward the content via the translation system interface to a remote translation system for translating the content to produce translated content in the required language; and to receive the translated content from the remote translation system; and
to forward the translated content to the instant messaging /chat host as an output to the recipient of the posted content within the established session.
The instant messaging/chat system may further comprise features of any
embodiments of the intermediary translation system (or combinations thereof) as disclosed above.
A further embodiment of the disclosure relates to a method of translating instant messaging/ chat content within an established session of an instant messaging/chat system that includes a sender and a recipient, the method being executed by an intermediary translation system using computer processing and memory resources via a communication network, the method comprising the steps of: receiving content from the sender within the established session;
determining a language of the content of the message;
determining a required language of the recipient; and
if the required language is different from the determined language, forwarding the content to a remote translation system for translating the content to produce translated content in the required language;
receiving the translated content from the remote translation system; and
forwarding the translated content to the recipient within the established session.
In some forms, the method of the intermediary translation system includes the step of determining the language of the content by parsing the content. In some forms, the method of the intermediary translation system includes the step of determining the language of the content by referencing information for the sender of the message.
In some forms, the method of the intermediary translation system includes the step of determining the language of the content by reference to user input. In some forms, the user input is stored by the intermediate system. In some forms, the user input is derived from the instant messaging/chat system.
In some forms, the intermediary translation system communicates with the instant messaging/chat system via an M2M interface. In some forms, the M2M interface utilises one or more APIs. In some forms, the communication is performed utilising HTTP protocol with a push notification service.
In some forms, the intermediary translation system communicates with the remote translation system via a M2M interface. In some forms, the M2M interface utilises one or more APIs. In some forms, the communication is performed utilising HTTP protocol with a push notification service.
In some forms, the method further comprises modifying the content prior to forwarding the content to the remote translation system. The content may be modified in different ways. For example, punctuation may be added to text content to aid in establishing context to assist in translation. In other forms, noise reduction may be applied to audio content to aid in translation.
In some forms, the method may also include any of the steps disclosed in other embodiments of methods (or combinations thereof) as disclosed above
In some forms, the translation system has a plurality of sub-systems, and the method further comprises the step of selecting a translation sub- system for the received content in response to a characteristic of the content and/or system setting and forwarding the received content to the selected translation sub- system.
In some forms, the translation sub systems comprise one or more of text to text translation system, speech to text translation systems, and speech to text translation systems. In some forms, the received content is in the form of any one or more of text, and audio information.
In some forms, the translated content is in the form of any one of text and audio information.
Description of Accompanying Drawings
Embodiments are described with reference to the accompanying drawings in which:
Fig. 1 is a schematic representation of an instant messaging/chat system configured to allow in session translation of content; Fig. 2 is a block diagram illustrating client side process and content flow of the instant messaging/chat system of Fig. 1;
Fig. 3 is block diagram illustrating components of the instant messaging/chat system of Fig. 1;
Fig. 4 is a flow chart illustrating content processing and routing between components of the system of Fig. 1;
Fig. 5 is an example of a first stage of pre-processing of audio content in the content processor of Fig. 4;
Fig. 6 is an example of a second stage of pre-processing of audio content in the content processor of Fig. 4; Fig. 7 is an example of a third stage of pre-processing of audio content in the content processor of Fig. 4; and
Fig. 8 is a block diagram of a post processing procedure in the content processor of
Fig. 4 Detailed Description of Specific Embodiment
An embodiment of the present disclosure is a system and method for allowing in session translation of content in an instant messaging/chat system. In accordance with the disclosure, users in a session are able to post content (via text or speech) in one language and within the session, the content may be processed, modified, and if required translated to a required language so that the received posted content is in the required language. The system and method of the disclosure allow the processing and translation of the content to occur within an established instant messaging/chat session in a seamless real-time fashion without requiring additional user input in the established session. An intermediary translation system which provides the logic to conduct the processing of content and enables communication to a translation system to provide the translation is separated from both the messaging platform and translation service. This provides flexibility in design and application of the system.
The system can be deployed (through the appropriate communication interfaces) into existing messaging/chat platforms and can similarly access different translation services depending on system requirements. Further, the content processing operations within the system allow for improved accuracy of translation by the translation service allowing, amongst other benefits, audio processing to improve context and accuracy of the translation through, audio filtering, dialect identification and improved punctuation. These processes are described in more detail below.
The system and method can be implemented using computer processing and memory resources in the form of one or more network connected servers and databases, these hardware resources executing software programmed to implement the functions as described above. Alternatively, the computer processing and memory resources may be network accessible distributed "cloud based" resources, executing software to implement the system functionality as described above. Some embodiments may utilise a combination of dedicated hardware and shared resources. A variety of different system architectures are contemplated within the scope of the present disclosure.
Fig. 1 is a schematic representation of an instant messaging/chat system 100 configured to allow in session translation of content. The system 100 includes one or more instant messaging/chat session clients 110, 112 that communicate via a data network 120. An instant messaging host 130 also communicates with the clients 110, 112 to provide instant messaging/chat communications between the instant messaging/chat clients 110, 112.
Whilst not shown, an operating system of the client 110, 112 supports the instant messaging/chat process that provides a user interface 150 (Fig. 2) through which a user can both receive content of an instant messaging/chat session and also add content to the instant messaging/chat session. Typically, the client resides on a mobile device (such as a smart phone or watch) but may reside on other computing devices. As used herein,“content” can mean SMS messages, MMS messages, messages on dedicated platforms such as WhatsApp, Messenger, Instagram etc. and includes messages having only a textual content as well as those having an audio content, or a mixture of these content types.
In addition to the functionality that allows for instant messaging/chat client sessions, the system 100 also includes a capability to allow in session translation of content.
Specifically, the system 100 includes an intermediary translation system 10 implemented using computer processing and memory resources and configured to integrate with the instant messaging /chat host 130 and one or more remote translation systems 140. The intermediary translation system 10 communicates with the instant messaging/chat host and the one or more translation via a machine to machine (M2M) interface. In some forms, the M2M interface utilises one or more application programming interfaces (APIs). In some forms, the communication is performed utilising HTTP protocol with a push notification service.
Fig. 2 illustrates is a block diagram illustrating client side process of the user interface 150 and content flow of the instant messaging/chat system 100. In a typical scenario, a first user operating on client 110 is able to enter into the instant messaging system 100 via a registration or login process 152. The client operating system provides a menu functionality 154 allowing user details and preferences to be inputted. In one form, this may include specific language preferences of the user. In other form, this language preference may be obtained from other information, such as location data, device setting data, or from parsing of content of the data, or as a default language. Once the language preference is established it may be stored in one or more locations (including locally on the device, at the instant messaging/chat host 130 and the intermediary translation service 10 at memory 12).
The client 110, allows for one on one sessions to be initiated with other clients 112, or with multiple clients in a group chat session. Once an instant messaging session is established, users are able to post content which is then managed by the instant
messaging/chat host 130. In initiating a content posting from client 110, the host may instigate an initialising routine to seek the language preferences of the recipients (if it is not already known by the host 130). This initialising process may involve a push notification 156 to client(s) 112 regarding incoming content and requesting language preference for the content. This initialising routine occurs before posting of the content with clients 112.
Where there is a multi-party chat session, each of clients 112 may input their own unique language preference such that the chat session may be conducted in more than two languages. It is to be understood, that this initialising routine may not occur in instances where a language preference of clients 112 is known (say from previous user input) or where it is determined from other information or where a default language is assumed.
The information of language preferences of the clients in the session is then provided to the intermediary translation system 10 to determine if the content requires translation. The intermediary translation system 10 includes a content processor 14 which, within the established session of the instant messaging /chat session, is designed to pre -process and/to post-process content to aid in translation accuracy (as will be described in more detail below) and is configured to determine a language of the content received from the instant messaging /chat system (‘source language’) to determine a required language of a recipient of the content within the instant messaging /chat system (‘target language’) and, in response to the required language being different from the determined language; to forward the content via the translation system interface to a remote translation system for translating the content to produce translated content in the required language.
The translation system 140 used for translating content may be a proprietary translation system of the intermediary translation system or may be a commercially available translation system or may be a hybrid system (where the translation is conducted within a commercially available translation service using proprietary data). Example of a hybrid system is when unique corpus (such as pertaining to a specific technical field or dialect). Further, the intermediary translation system may be configured to route the content for translation to one of a number of translation systems or subsystems (e.g. 140a, 140b or 140c of Fig. 3). A feature of the arrangement is that a separation of the intermediary translation system 10 from the translation service 140 allows flexibility of operation to bring new translation systems on line, or to route the content to a particular translation system dependent on the language required for the translation or other factors (such as content type (text or speech), latency, user preference etc). One exemplary translation system is the IBM Watson Translator which can identify the language of text and translate it into different languages programmatically.
All content received by the intermediary translation system 10 is logged and routing of the content to and from the translation system 140 and back to the instant messaging/chat host 130 for issuing to the recipient using meta information applied to the content. This process is managed by the content processor of the intermediary translation system 10 and communication system (160 Fig. 3) which acts as a message bus and is able to able to allow synchronous content within the instant messaging/chat session and, if required,
asynchronous routing of content.
Fig. 3 is block diagram illustrating components of the instant messaging/chat architecture that provides the language translation.
As illustrated, the architecture utilising the intermediary translation system 10 is designed so that it can integrate more easily into existing instant messaging/ chat platforms. These platforms may have various client device structures that may operate over multiple host platforms (although as illustrated only a single instant messaging/client host is shown). Using secure APIs, the intermediary translation system allows a secure communication channel for users to chat multilingually via a HTTP layer. Each request is logged, analysed and may be modified to improve context for translation. The intermediary translation system automatically routes the content requests to one or more related sub- systems of the translation system.
In one form, the translation sub-systems may comprise one or more of speech to text translation systems 140a, text to text translation systeml40b, text to speech translation systems 140c.
The speech to text system 140a may accepts requests routed from the intermediary translation system 10 in raw audio format and generates respective transcribed text. The system 140a may support more than 100 different dialects/ accents of each language and may support 27 languages. On generation of the transcribed text, the content request may be returned to the intermediary system 10 for further processing before being returned to the translation sub- system 140b for translation. Alternatively, the translation system 140b may incorporates its own models to enhance contextual translations.
The speech to text translation systems provide automated conversion of speech to text and may use machine learning training systems that processes the training data over deep learning RNN models. This allows trainees to train the system via automated routines.
The translation sub- system 140b accepts requests routed from the intermediary translation system 10 in text and generates the requested target request. The system 140b may support 27 languages.
The translation system provides automated translation of text and may use machine learning training systems that processes the training data over deep learning RNN models. This allows trainees to train the system via automated routines.
The text to speech sub system 140c may accepts requests routed from the
intermediary translation system 10 in text and generates respective audio files. The system 140c may support audio formats such as WAV (both mono and stereo) and FLAC and may generate both male and female voices. The system 140c may support 27 languages.
The text to speech sub system 140c provides automated translation of text and may use machine learning training systems that processes the training data over deep learning RNN models. This allows trainees to train the system via automated routines.
Accordingly, with the three subs- systems of the translation system 140, translated content may be provided in text or audio and translated content may be provided in either text or audio.
In addition to the routing of audio and or text content between the messaging platform 150 and translation platform 140, the content processor 14 of the intermediary translation system 10 is designed to pre -process and post process (i.e. after translation) the content to improve translation accuracy. These processes are described with references to Figs. 4 to 8. In general, pre-processing helps to improve speech input work-flows. However, post processing helps in improving text results by sentence and punctuation identification.
As shown in Fig. 4, the processing of the content depends on the nature of the content from the messaging platform 150 (through communication host 130). If the content is audio, the content is passed through an audio pre-processing module 16 of the content processor 14 which is typically in the form of a digital signal processor. The processed audio is then passed to the speech to text module 140a, then to the translation module 140b. After it is translated the content is returned to a post-processing module 18, where the translated content is assembled, and punctuation is added (as will be described in more detail below).
As required, the assembled translated text content is then passed either to the Text to Speech module 140c, or passed back to the messaging platform 150 via the communication system 130.
If the content is initially text, a simpler processing route is provided with an initial text processing step 20 being optionally provided to check for incomplete punctuation. This step may be bypassed such that the original text is passed directly to the translation module 140b. The text is then translated and returned to the text processing step 18 for punctuation checking before again either passing directly to the messaging platform 150, or through the Text to Speech module 140c for outputting as audio.
If language of the content that is required by the recipient of the content is the same, then content can be routed straight back to the instant messaging/chat host for posting with the recipient client
The audio pre-processing module 16 includes three stages which are described with reference to Figs 5 - 7. These stages are named Silence Detector, Speaker Identification, and Noise Purifier
In a first stage (Silence Detector) shown in Fig. 5, the raw audio stream 260 is processed so as to be grouped into audio data packets 262 for further processing in subsequent stages. It has been found that reducing the audio steam into smaller components and then subsequently reassembling them as long text strings after translation can lead to improved results as it provides more direct translation (as each packet is translated separately) and therefore avoids contextual bias which the translation services may erroneous apply on larger audio content. To allow the translation to be contextual, the system is designed to extract characteristics of the audio content in pre-processing which it will then use in post processing, when the translated text is reassembled and punctuated. These characteristics are selected to work with a punctuation model in the post processing module 18 and which is appropriately trained and has machine learning capability.
In the first stage, one objective is to detect silences within the audio content as a means of parsing sentences from the audio content. The audio content is initially framed 264 to be analysed. Speech has its statistical properties which are not constants across time. In an exemplary embodiment, to extract the spectral features from a small window of speech, an assumption is made that that the signal is stationary. Frame blocks of 20ms with 60% overlapping are used. Overlapping frames is preferred as non-overlapped samples may cause problems when Fourier analysis is used for voice activity detection (VAD). It abruptly cuts the signals at boundaries. Typical steps in conducting VAD on the frames is based on calculation of the energy levels with the audio frames. This process conducted through digital signal process may include conducting multiple -linear Fourier analysis and calculating mean and standard deviation of the first 500 ms samples of the given utterance. The noise and silence are characterized by calculated mean and standard deviation.
Once the noise and silence are calculated, for each sample (from 1st to last) a determination is made whether the ID Mahalanobis distance is greater than a threshold value. As per Gaussian Distribution the threshold rejects up to 97% of frames and accepting only voice samples. In this regard, silence determined by this process gives indication of language structure, particular sentence lengths. Accordingly, the silence characteristic within the audio frames is captured as a characteristic for subsequent use in the post processing stage. Other characteristics that may be captured for subsequent use include frequency spectrum, magnitude spectrum, thresholding and power spectral density (PSD) estimation. In a third stage, the voice samples from windowed array are collected in new array of samples with the consecutive parts of voiced samples being combined to generate packets for next processing. These collected sample arrays are defined by threshold lengths of silence (as determined above). An exemplary duration of silence might be lsec. These pauses in speech activity are representative of sentences such that the newly combined data packets are representative of sentences in the audio content.
In the second stage of the pre-processing of the content data (as shown in Fig. 6), feature extraction 266 occurs through spectral analysis of the audio data packets. The features which are extracted are compatible with models to establish audio fingerprints to identify speakers within the voice samples, and other characteristics to aid in translation including to identify dialects of the target language based on established language dialects models stored in the memory 12 or retrieved on the fly. The features extracted may be based on frequency coefficients including long term specific divergence (LTSD). Pitch and distortion factors may also be established. Other features which may be used include speech rates, articulatory rates, syllables per minute rate, phonation-time rate ratios may be captured to assist in the post processing model for punctuation.
Following from this second stage of pre-processing, audio fingerprints are established which will enable speakers to be identified in the audio data packets. This identification will allow for enhanced filtering in the subsequent stage of pre- processing of the audio data. The feature extracted is also able to be used to compare with the language dialect models stored to the content processor so as to identify any specific dialect of the target language. This dialect can then be associated with the data packet and conveyed to the translation platform 140 to improve translation accuracy.
In the final stage of pre-processing (Fig. 7), the audio data packets are filtered to enhance the target speaker voice and to supress any other noises (such as other speakers, background and near noise). Digital filters 268 are applied to boost and cut the certain characteristics of sampled signal to make for better for mathematical models. To deal with this process, two different filters are needed. Environment noise causing mic bias, i.e. door being shut, are low in frequency but high in energy and dealt with high pass filter. In order to boost lower energy levels of high frequencies present in human speech, a pre-emphasis filter is used. A pre-emphasis filter boosts the high frequencies while attenuating the lower ones, which flattens the spectrum.
Following this noise purifying step, the audio data packets are then able to be passed to the translation platform 140 as discussed above. Typically, these data packets include information to initiate the translation request including Source Language”,“Target Language”, Audio information (sample size, sample rate, encoding format), and possible dialect.
Post processing of the translated data packets is required to reassemble the text (if the content was audio and subject to the pre-processing stages described above) and to add punctuation to improve context and meaning. An exemplary post processing stage is illustrated in Fig.8. To affect this process, a punctuation model 60 is provided which programmatically adds punctuation to the assembled and translated text. Typically for each audio stream event, multiple audio data packets are generated in the pre-processing stage and these are individually translated. The post processing stage waits for each of these translated data packets to be returned. These data packets are assembled in order and the punctuation model applies punctuation to the assembled text. The model 60 is typically trained on grammar and on punctuated and not punctuated text. To further aid the model, the characteristics obtained in the pre- processing stage are also inputted into the model to aid n the model decision making. These characteristics also are used in training of the model with the output of the post processing stage being periodically checked by language experts. Further routine analysis can be preformed which will analyse inputs performed in a prior period and calculate comparison matrices that allow for comparison under different sample sizes. This feedback can then be used to tune the post processing models to improve accuracy.
It has been found that utilising the combination of pre-processing of audio data to parse sentences from audio data and through the use of post processing the translated audio data using punctuation models and data from the pre-processing stages, significant improvements have been realised in translation accuracy as compared to existing translation services..
Accordingly, an intermediary translation platform is provided that is able to deliver in session translation to audio and text content. Procedures are also disclosed to improve accuracy of translation particularly when based on audio data.
It will be understood to persons skilled in the art of the invention that many modifications may be made without departing from the spirit and scope of the invention. It is to be understood that, if any prior art publication is referred to herein, such reference does not constitute an admission that the publication forms a part of the common general knowledge in the art.
In the claims which follow and in the preceding description, except where the context requires otherwise due to express language or necessary implication, the word "comprise" or variations such as "comprises" or "comprising" is used in an inclusive sense, i.e. to specify the presence of the stated features but not to preclude the presence or addition of further features in various embodiments of the disclosure.

Claims

Claims
1. An intermediary translation system implemented using computer processing and
memory resources and configured to integrate with one or more instant messaging /chat systems and one or more remote translation systems via a communication network, the system comprising;
an instant messaging /chat system interface configured to communicate with the at least one instant messaging /chat system to obtain and send content data within an established session of the instant messaging /chat session;
a translation system interface configured to communicate with the at least one translation system to obtain and send content data; and
a content processor which, within the established session of the instant messaging /chat session, is configured to:
determine a language of the content received from the instant messaging /chat system;
determine a required language of a recipient of the content within the instant messaging /chat system and, in response to the required language being different from the determined language;
to forward the content via the translation system interface to a remote translation system for translating the content to produce translated content in the required language; and
to receive the translated content from the remote translation system; and to forward the translated content to the recipient within the established session via the instant messaging /chat system interface.
2. An intermediary translation system according to claim 1, wherein the content
processor determines the language of the content by parsing the content.
3. An intermediary translation system according to claim 1 or 2, wherein the content processor determines the language of the content by referencing information of the sender of the content within the instant messaging /chat system.
4. An intermediary translation system according to any preceding claim, wherein the content processor determines the language of the content by reference to user input.
5. An intermediary translation system according to any preceding claim, wherein the intermediary translation system further comprises a user profile store that retain user data including the required language of the user. In some forms, the user input is derived from the instant messaging/chat system.
6. An intermediary translation system according to any preceding claim, wherein the instant messaging /chat system interface is a machine to machine (M2M) interface.
7. An intermediary translation system according to claim 6, wherein the instant
messaging /chat system interface utilises one or more application programming interfaces (APIs).
8. An intermediary translation system according to any preceding claim, wherein the communication is performed utilising HTTP protocol with a push notification service.
9. An intermediary translation system according to any preceding claim, wherein the translation interface is a M2M interface.
10. An intermediary translation system according to claim 9, wherein the translation
service interface utilises one or APIs.
11. An intermediary translation system according to any preceding claim, wherein the intermediary translation system further comprises a content modifying module that is configured to modify the content prior to forwarding the content to the remote translation system.
12. An intermediary translation system according to any preceding claim, wherein the translation system has a plurality of sub-systems, and the content processor is configured to select a translation sub- system for the received content in response to a characteristic of the content and/or system setting and forwarding the received content to the selected translation sub- system.
13. An intermediary translation system according to claim 12, wherein the translation sub systems comprise one or more of text to text translation system, speech to text system, and speech to text system.
14. An intermediary translation system according to any preceding claim, wherein the received content is in the form of any one or more of text, and audio information.
15. An intermediary translation system according to any preceding claim, wherein the translated content is in the form of any one of text and audio information.
16. An instant messaging/chat system comprising; an instant messaging /chat client; an instant messaging /chat host configured for communicatively coupling to said instant messaging /chat client, said host having logic for receiving application input from the instant messaging /chat client as a content posting to an established instant messaging /chat client session between instant messaging /chat clients; one or more translation systems implemented using computer processing and memory resources and configured to translate content from an instant messaging/chat session; and an intermediary translation system implemented using computer processing and memory resources and configured to integrate with the instant messaging /chat server and the one or more remote translation systems via a communication network, the intermediary translation system comprising;
an instant messaging /chat host interface configured to communicate with the instant messaging /chat host to obtain and send content data within an established session of the instant messaging /chat session;
a translation system interface configured to communicate with the at least one translation system to obtain and send content data; and
a content processor which, within the established session of the instant messaging /chat session, is configured to:
determine a language of the content received from the instant messaging /chat session; determine a required language of a recipient of the content within the instant messaging /chat and, in response to the required language being different from the determined language;
to forward the content via the translation system interface to a remote translation system for translating the content to produce translated content in the required language; and
to receive the translated content from the remote translation system; and to forward the translated content to the instant messaging /chat host as an output to the recipient of the posted content within the established session.
17. A method of translating audio content from a source language into a target language, the method being executed by a computer system using computer processing and memory resources, the method comprising: separating an audio stream into audio frames of predetermined duration; detecting voice activity within individual ones of the frames; using the detected voice activity to group frames into audio data packets; and using the audio data packets to create input data packets for a translation service to allow for translation of the audio content into the target language.
18. A method according to claim 17, further comprising establishing a frame energy level of the respective audio frames and wherein the voice activity is detected using the energy levels of the individual frames.
19. A method according to claim 17 or 18, wherein the grouping into the audio data
packets is based on identifying audio frames exhibiting one or more specified characteristic associated with the detected voice activity.
20. A method according to claim 19, wherein the groupings are established with a data packet containing some or all of the audio frames occurring between the identified audio frames.
21. A method according to claim 18 or 19, when dependent on claim 18, wherein the one or more specified characteristic is indicative of the presence of an interruption in detected voice activity over a specified duration.
22. A method according to claim 17 or 18, where input data packets are audio packets which are converted to text before translation into the target language.
23. A method according to claiml8 or 19, wherein the input data packets are translated into the target language separately to provide translated text packets.
24. A method according to claim 23, wherein the translated text packets are combined to provide a translation of the audio content.
25. A method according to claim 24, wherein a punctuation model is used to
programmatically add punctuation to the translated text.
26. A method according to claim 24, wherein the punctuation model uses features
extracted from the audio content to influence decision making on the incorporation of punctuation in the text.
27. A method of translating audio content from a source language into a target language, the method being executed by a computer system using computer processing and memory resources, the method comprising: providing the audio content; extracting features from the audio content; generating a translated text of the audio content in the target language; and using a punctuation model to add punctuation to the translated text, wherein the punctuation model uses the extracted features from the audio content in deciding on placement of punctuation in the translated text.
28. A method according to claim 27, further comprising separating the audio content into parts; translating the audio content parts separately into the target language; assembling the translated audio content parts to generate the translated text of the audio content.
29. A method of translating audio content from a source language into a target language, the method being executed by a computer system using computer processing and memory resources, the method comprising; extracting features from the audio content, the features extracted being compatible with dialect models of the target language; comparing the extracted features with the language dialect models to identify any dialect indicated by the extracted features; associating any identified dialect with the audio content; forwarding audio content with information on any identified dialect to a translation service for translating the audio content.
30. A method of translating instant messaging/chat content within an established session of an instant messaging/chat system that includes a sender and a recipient, the method being executed by an intermediary translation system using computer processing and memory resources via a communication network, the method comprising the steps of: receiving content from the sender within the established session; determining a language of the content of the message;
determining a required language of the recipient; and
if the required language is different from the determined language: forwarding the content to a remote translation system for translating the content to produce translated content in the required language; receiving the translated content from the remote translation system; and
forwarding the translated content to the recipient within the established session.
31. A method of translating instant messaging/chat content according to claim 30, wherein the method of the intermediary translation system includes the step of determining the language of the content by parsing the content.
32. A method of translating instant messaging/chat content according to claim 30 or 31, wherein the method of the intermediary translation system includes the step of determining the language of the content by referencing information for the sender of the message.
33. A method of translating instant messaging/chat content according to any one of claims 30 to 32, wherein the method of the intermediary translation system includes the step of determining the language of the content by reference to user input.
34. A method of translating instant messaging/chat content according to any one of claims 30 to 33, wherein the method further comprises modifying the content prior to forwarding the content to the remote translation system.
35. A method of translating instant messaging/chat content according to any one of claims 30 to 34, wherein the translation system has a plurality of sub-systems, and the method further comprises the step of selecting a translation sub- system for the received content in response to a characteristic of the content and/or system setting and forwarding the received content to the selected translation sub- system.
36. A method of translating instant messaging/chat content according to claim 35, wherein the translation sub systems comprise one or more of text to text translation system, speech to text system, and speech to text system.
37. A method of translating instant messaging/chat content according to any one of claims 30 to 36, wherein the received content is in the form of any one or more of text, and audio information.
38. A method of translating instant messaging/chat content according to any one of claims 30 to 37, wherein the translated content is in the form of any one of text and audio information.
PCT/AU2020/050328 2019-04-02 2020-04-02 Instant messaging/chat system with translation capability WO2020198799A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910260669.7A CN110119514A (en) 2019-04-02 2019-04-02 The instant translation method of information, device and system
CN201910260669.7 2019-04-02

Publications (1)

Publication Number Publication Date
WO2020198799A1 true WO2020198799A1 (en) 2020-10-08

Family

ID=67520686

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2020/050328 WO2020198799A1 (en) 2019-04-02 2020-04-02 Instant messaging/chat system with translation capability

Country Status (2)

Country Link
CN (1) CN110119514A (en)
WO (1) WO2020198799A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113676394A (en) * 2021-08-19 2021-11-19 维沃移动通信(杭州)有限公司 Information processing method and information processing apparatus
WO2022093192A1 (en) * 2020-10-27 2022-05-05 Google Llc Method and system for text-to-speech synthesis of streaming text

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113076760B (en) * 2020-01-03 2024-01-26 阿里巴巴集团控股有限公司 Translation and commodity retrieval method and device, electronic equipment and computer storage medium
CN111261162B (en) * 2020-03-09 2023-04-18 北京达佳互联信息技术有限公司 Speech recognition method, speech recognition apparatus, and storage medium
CN114124864B (en) * 2021-09-28 2023-07-07 维沃移动通信有限公司 Message processing method and device
CN116227504B (en) * 2023-02-08 2024-01-23 广州数字未来文化科技有限公司 Communication method, system, equipment and storage medium for simultaneous translation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060133585A1 (en) * 2003-02-10 2006-06-22 Daigle Brian K Message translations
US20070168450A1 (en) * 2006-01-13 2007-07-19 Surendra Prajapat Server-initiated language translation of an instant message based on identifying language attributes of sending and receiving users
EP2131537B1 (en) * 2008-06-04 2015-12-16 Broadcom Corporation Phone based text message language translation
US20180089172A1 (en) * 2016-09-27 2018-03-29 Intel Corporation Communication system supporting blended-language messages

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101957814A (en) * 2009-07-16 2011-01-26 刘越 Instant speech translation system and method
CN104252861B (en) * 2014-09-11 2018-04-13 百度在线网络技术(北京)有限公司 Video speech conversion method, device and server
CN106598955A (en) * 2015-10-20 2017-04-26 阿里巴巴集团控股有限公司 Voice translating method and device
CN107515862A (en) * 2017-09-01 2017-12-26 北京百度网讯科技有限公司 Voice translation method, device and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060133585A1 (en) * 2003-02-10 2006-06-22 Daigle Brian K Message translations
US20070168450A1 (en) * 2006-01-13 2007-07-19 Surendra Prajapat Server-initiated language translation of an instant message based on identifying language attributes of sending and receiving users
EP2131537B1 (en) * 2008-06-04 2015-12-16 Broadcom Corporation Phone based text message language translation
US20180089172A1 (en) * 2016-09-27 2018-03-29 Intel Corporation Communication system supporting blended-language messages

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022093192A1 (en) * 2020-10-27 2022-05-05 Google Llc Method and system for text-to-speech synthesis of streaming text
CN113676394A (en) * 2021-08-19 2021-11-19 维沃移动通信(杭州)有限公司 Information processing method and information processing apparatus

Also Published As

Publication number Publication date
CN110119514A (en) 2019-08-13

Similar Documents

Publication Publication Date Title
WO2020198799A1 (en) Instant messaging/chat system with translation capability
US11699456B2 (en) Automated transcript generation from multi-channel audio
US10372831B2 (en) Auto-translation for multi user audio and video
US9031839B2 (en) Conference transcription based on conference data
US8972261B2 (en) Computer-implemented system and method for voice transcription error reduction
US11682401B2 (en) Matching speakers to meeting audio
EP2609588B1 (en) Speech recognition using language modelling
US10325599B1 (en) Message response routing
CN102903361A (en) Instant call translation system and instant call translation method
JP4960596B2 (en) Speech recognition method and system
CN111489765A (en) Telephone traffic service quality inspection method based on intelligent voice technology
JP5387416B2 (en) Utterance division system, utterance division method, and utterance division program
JP2012181358A (en) Text display time determination device, text display system, method, and program
KR20230086737A (en) Cascade Encoders for Simplified Streaming and Non-Streaming Speech Recognition
EP2763136B1 (en) Method and system for obtaining relevant information from a voice communication
KR20230098266A (en) Filtering the voice of other speakers from calls and audio messages
JP5385876B2 (en) Speech segment detection method, speech recognition method, speech segment detection device, speech recognition device, program thereof, and recording medium
KR20220130739A (en) speech recognition
AU2022203531B1 (en) Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore
US20060074667A1 (en) Speech recognition device and method
CN112435669B (en) Robot multi-wheel dialogue voice interaction method, system and terminal equipment
US11632345B1 (en) Message management for communal account
US20230306207A1 (en) Computer-Implemented Method Of Real Time Speech Translation And A Computer System For Carrying Out The Method
WO2014059585A1 (en) Instant call translation system and method
KR20230066797A (en) Real-time subtitle and document creation method by voice separation, computer program and device using the method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20783289

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20783289

Country of ref document: EP

Kind code of ref document: A1