WO2022022370A1 - 直播方法、装置及电子设备 - Google Patents

直播方法、装置及电子设备 Download PDF

Info

Publication number
WO2022022370A1
WO2022022370A1 PCT/CN2021/107766 CN2021107766W WO2022022370A1 WO 2022022370 A1 WO2022022370 A1 WO 2022022370A1 CN 2021107766 W CN2021107766 W CN 2021107766W WO 2022022370 A1 WO2022022370 A1 WO 2022022370A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
live
target
live stream
server
Prior art date
Application number
PCT/CN2021/107766
Other languages
English (en)
French (fr)
Inventor
赵文倩
黄非
刘彦伊
许勇
刘福
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2022022370A1 publication Critical patent/WO2022022370A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/47815Electronic shopping
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Definitions

  • the present application relates to the field of live broadcast technology, and in particular to a live broadcast method, apparatus and electronic device.
  • live broadcast including commodity object information service system.
  • Merchants or sellers introduce the information of commodity objects through live broadcast.
  • Buyers or consumer users can obtain more intuitive information about commodity objects through the video in the live broadcast and the language description of the anchor, and enjoy the closer to reality brought by the live broadcast.
  • Shopping experience in addition, you can also interact with the anchor during the live broadcast, including asking for information about the product object, the anchor can answer online in real time, and so on.
  • live streaming technology buyers or consumer users can be more effectively helped to make shopping decisions.
  • some commodity object information service systems also provide users with cross-border services, and can provide services such as commodity object sales for overseas buyers or consumer users.
  • the details of pictures and texts can be translated into multiple languages for overseas users to browse.
  • anchor users can usually only cover one language during the live broadcast, but they are aimed at buyers and users from multiple countries, and there are language barriers between them.
  • This application provides a live broadcast method, device and electronic equipment, which can better apply the live broadcast technology in systems such as cross-border commodity object information services.
  • a method of live broadcasting including:
  • the first server receives the request for creating a multilingual live broadcast submitted by the first client;
  • a live stream processing method comprising:
  • the second server creates at least one director service according to the request submitted by the first server; the request is submitted after the first server receives the request to create a multilingual live broadcast; the at least one director service corresponds to at least one target language;
  • the director service After the multi-language live broadcast is successfully created, the director service is started, and the director service is used to read the source live stream from the first address, and by calling the streaming speech recognition service and translation service, After performing streaming speech recognition on the source live stream and obtaining the translation result corresponding to one of the target languages, the translated target live stream corresponding to the target language is generated by combining the source live stream and the translation result. stream, and save it to the second address corresponding to the target language.
  • a live stream processing method comprising:
  • the third server creates a streaming speech recognition service and a translation service according to the calling request of the second server, wherein the request carries target language information, a first address and a third address, and the first address is used to save source live stream;
  • Translate the speech recognition result through the translation service obtain the translation result corresponding to the target language, and save the translation result to the third address, so that the second server can obtain the translation result from the third address.
  • the translation result is synthesized with the source live stream into a target live stream corresponding to the target language.
  • a method of live broadcasting including:
  • the first client receives a request to create a multilingual live broadcast
  • the live broadcast After the live broadcast is successfully created, submit the generated live stream to the first address, so as to obtain the source live stream from the first address, and obtain the translated target live stream corresponding to at least one target language , for providing to the second client associated with the user with the target language requirement.
  • a method for obtaining a live stream including:
  • the second client submits a request for obtaining the live stream to the first server
  • the second address is determined according to the target language required by the user associated with the second client, and the second address is stored in the target language corresponding to the The translated target live stream;
  • the target live stream is pulled through the second address and played.
  • a live broadcast device applied to a first server, includes:
  • a request receiving unit configured to receive a request for creating a multilingual live broadcast submitted by the first client
  • a target live stream obtaining unit configured to obtain a translated target live stream corresponding to at least one target language according to the source live stream collected by the first client after the multilingual live broadcast is successfully created
  • the target live stream providing unit is configured to, after receiving the request for pulling the live stream submitted by the second client, determine the target language required by the user associated with the second client, and provide the target live stream corresponding to the target language provided to the second client for playback.
  • a live stream processing device applied to a second server, includes:
  • a director service creation unit configured to create at least one director service according to a request submitted by a first server; the request is submitted after the first server receives a request for creating a multilingual live broadcast; the at least one One director service corresponds to at least one target language;
  • the address obtaining unit is configured to obtain the first address and at least one second address provided by the first server, wherein the first address is used to save the source live stream of the live broadcast, and the at least one second address corresponds to at least one target language;
  • a director service starting unit configured to start the director service after the multi-language live broadcast is successfully created, and the director service is used to read the source live stream from the first address, and call the stream by calling the stream.
  • the target live stream is merged with the translation result to generate the target
  • the translated target live stream corresponding to the language is saved to the second address corresponding to the target language.
  • a live stream processing device applied to a third server, includes:
  • the service creation unit is used to create a streaming speech recognition service and a translation service according to the calling request of the second server, wherein the request carries target language information, a first address and a third address, and the first address is used To save the source live stream;
  • a speech recognition unit configured to read the source live stream from the first address, and perform speech recognition on the source live stream through the streaming speech recognition service;
  • a translation unit configured to translate the speech recognition result through the translation service, obtain the translation result corresponding to the target language, and save the translation result to the third address, so that the second server can transfer the translation result from the first
  • the third address obtains the translation result, and combines it with the source live stream into a target live stream corresponding to the target language.
  • a live broadcast device applied to a first client, includes:
  • a request receiving unit for receiving a request for creating a multilingual live broadcast
  • a request submission unit configured to submit the request to the first server, and receive the first address returned by the first server;
  • the streaming unit is configured to submit the generated live streaming to the first address after the live streaming is successfully created, so as to obtain the source live streaming from the first address, and obtain at least one target language corresponding to the live streaming.
  • the translated target live stream for serving to the second client associated with the user having the target language requirement.
  • a device for acquiring live streaming, applied to a second client comprising:
  • a request submission unit configured to submit a request for obtaining the live stream to the first server
  • the address obtaining unit is configured to receive the second address provided by the first server, the second address is determined according to the target language required by the user associated with the second client, and the second address is stored with The translated target live stream corresponding to the target language;
  • a stream pulling unit configured to pull the target live stream through the second address and play it.
  • the embodiments of the present application can support the creation of multilingual live streams, and can generate at least one translated target live stream corresponding to the target language according to the source live stream.
  • the second client initiates a request for obtaining the live stream, it can be determined
  • the target user required by the user associated with the second client is obtained, and the corresponding target live stream is provided to the second client, so that the user can watch the live content that meets their own language requirements.
  • a specific multilingual live broadcast service can be provided in the commodity object information service system.
  • training samples can be provided according to the historical live broadcast records in the commodity object information service system to realize the training of the translation model.
  • the translation result can be recorded in advance according to the proprietary vocabulary of the commodity object information service field, so as to improve the accuracy of the translation result.
  • the user data generated in the system by the user associated with the second client terminal can also be used.
  • the country/region to which the user belongs is judged to automatically determine the target language required by the user.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • 3-1 is a schematic diagram of an interactive sequence of creating a live broadcast process provided by an embodiment of the present application
  • 3-2 is a schematic diagram of an interaction sequence of a streaming process provided by an embodiment of the present application.
  • 3-3 is a schematic diagram of a viewer user interface provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a fourth method provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of a fifth method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a first device provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a second device provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a third device provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a fourth device provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a fifth device provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • a cross-language live broadcast function is provided.
  • the host user (may be referred to as the first user in the embodiment of this application, and correspondingly, the viewer user may be referred to as the second user) can choose whether to use the cross-language live broadcast service when creating a live broadcast. It can help users to generate target live streams corresponding to multiple target languages, and provide multiple streaming addresses, each of which can correspond to one target language. In this way, when the second user needs to watch the live broadcast, the server can provide the corresponding streaming address to the client of the second user according to the target language required by the second user, so that the client can download the streaming address from the streaming address.
  • the live broadcast created by the first user in one language can be translated into multiple different target languages for the second user in multiple countries/regions to watch.
  • the second user can also obtain richer and more intuitive information about commodity objects by watching the live broadcast.
  • this multilingual live broadcast method can also be used in other cross-border systems.
  • each director service the specific streaming speech recognition service and translation service can be called to obtain the translation result data stream, and then the source live stream and the translation result data stream can be merged to obtain the corresponding target language.
  • This live stream can be saved to the streaming address specified by the first server, so that the first server can obtain translated target live streams corresponding to multiple different target languages.
  • the streaming speech recognition service and the translation service can also be provided by a third server.
  • each server can focus on the realization of a certain function, and then, through the mutual cooperation between multiple servers, the purpose of improving the translation accuracy is finally achieved.
  • the specific translation service may also translate the speech recognition result through a pre-established translation model.
  • the live broadcast scene is relatively simple, which also provides a basis for obtaining a good translation accuracy.
  • the historical live broadcast records in the commodity object information service system can be used as training data to train the translation model, so that the translation model becomes a dedicated model in the commodity object information service field.
  • some proper nouns and the like in the commodity object information service scenario may also be pre-recorded in advance, for example, the expressions of proper nouns in various target languages are obtained in advance, and so on. In this way, through the dedicated translation model and the pre-recorded information of the above-mentioned proper nouns, the accuracy of translation can be further improved.
  • the embodiment of the present application can mainly provide a multilingual live broadcast service in a system such as commodity object information service, it can also be based on the data generated by the second user in such a system (for example, the delivery address commonly used by the user). etc.) to automatically identify the target language required by the second user, thereby recommending or directly pushing the streaming address corresponding to the target language to the user.
  • the embodiment of the present application may involve a client and a server provided by systems such as commodity object information services, wherein the server may correspond to the aforementioned first server, and the client may be divided into oriented The first client for the host user, and the second client for the viewer user.
  • a second server or even a third server may also be involved.
  • the first server can call the interface of the second server to create multiple director services, which are different from the various corresponding to the target language.
  • the first address and a plurality of second addresses can be generated at the same time.
  • the specific director service can call the streaming speech recognition and translation services of the third server, and the obtained translation result can be saved to the third address specified by the director service.
  • the director service can read the source live stream from the first address, read the translation result data stream from the third address, and combine the streams into the target live stream corresponding to the target language, and save it to the second address specified by the first server.
  • the second client submits the request for obtaining the live stream, it can provide the second client with the corresponding second address according to the target language required by the specific user, so that the second client can pull the second address from the second address.
  • the target live stream corresponding to the target language required by the user is played.
  • the target languages required by users associated with different second clients are different, so the second addresses provided to different second clients may also be different.
  • the anchor user is a user in China
  • the source language in the source live stream is Chinese
  • the target live stream corresponding to multiple target languages such as English, French, German, and Japanese is obtained and saved to the in a different second address.
  • user A in an English-speaking country requests to watch the live broadcast
  • user B in a French-speaking country requests to watch the live broadcast he can The second address B of the target live stream corresponding to French is provided to user B, and so on.
  • the first embodiment provides a live broadcast method from the perspective of the first server.
  • the method may specifically include:
  • the first server receives a request for creating a multilingual live broadcast submitted by a first client;
  • an operation option for creating a live broadcast can be provided in the first client associated with the first user such as the host, and when the first user clicks to create a live broadcast, the user can be asked whether the user needs to create a multilingual live broadcast. Then, a request for creating a multilingual live broadcast can be sent to the first server.
  • different operation options for creating a normal live broadcast and a multilingual live broadcast may be provided in the first client, and a user who needs to create a multilingual live broadcast can directly initiate a specific request through the operation option.
  • an operation option for submitting source language information used for live broadcast may also be provided through the first client.
  • the first client terminal may further provide an operation option for selecting a target language, that is, the first user may decide which target languages to translate into. If not selected by the user, the target language can be determined according to the default configuration.
  • the information of the default configuration may be configuration information common to multiple users, or may be configuration based on personalized information of users associated with the first client, for example, default configuration may be performed according to historical selection records, and so on.
  • the target language may be one or more, that is, the source live stream can be translated into target live streams corresponding to multiple different target languages, so that users in different countries/regions can understand the live broadcast content.
  • the first server may enter the process of creating a live broadcast.
  • the translated target live stream corresponding to at least one target language can be obtained according to the source live stream, so as to be provided to the second client with multiple different language requirements.
  • processing such as speech recognition and translation of the source live stream can be implemented by invoking the director service of the second server.
  • the second server may be a server of a cloud service platform that has an associated relationship with the first server, or may exist in other forms.
  • the first server may first generate a first address and at least one second address, wherein the at least one second address corresponds to at least one target language .
  • the above-mentioned first address and second address may be addresses applied for in an associated content distribution network (CDN).
  • CDN content distribution network
  • the first address can be provided to the first client, so that after the multilingual live broadcast is successfully created, the first client can The live stream is saved to the first address (that is, the first client can push streams to the first address).
  • the first address and at least one second address can also be provided to the second server, so that the second server can obtain the source live stream from the first address, and after obtaining the at least one The translated target live stream corresponding to the target language is saved to the second address respectively. In this way, the first server can obtain the translated target live streams corresponding to multiple different target languages stored in different second addresses respectively.
  • the second address corresponding to the target language required by the user associated with the second client may be returned to the second client, so that the second client can access the second client from the second address.
  • the second address obtains the translated target live stream corresponding to the target language and plays it.
  • the first server can also call the interface of the second server before generating the specific second address, so as to realize the creation of the director service.
  • the specific sequence diagram can be shown in Figure 3-1.
  • the first server can first send a "CreateCaster" request to the second server, specifically, it can request to create multiple caster services; the second server completes the caster service After the creation, the CasterId can be returned to the first server; after that, the first server can configure the Caster (SetCasterConfig); and the second server can apply to add the source of the caster (AddCasterVideoResource), and then can set the caster channel (SetCasterChannel).
  • the first server can generate multiple second addresses, that is, streaming addresses, and the multiple second addresses respectively correspond to multiple target languages.
  • the creation of the multi-language live broadcast can be completed, and the first address can be provided to the first client, and then the process of pushing the stream by the host can be entered.
  • the first client can push the collected source live stream to the first address for saving.
  • the first server can also call the interface provided by the second server to start the caster service (StartCaster) that has been created before the second server starts, and the at least one caster service is respectively associated with the at least one caster service. corresponding to the target language.
  • startCaster start the caster service
  • parameters such as the first address and the second address can also be provided to a specific broadcaster service by updating the broadcaster configuration information (UpdateCasterSceneConfig) or the like.
  • the broadcast director service can perform streaming speech recognition on the source live stream to obtain the translation result by invoking the streaming speech recognition service and the translation service, and then combine the source live stream and the translation result to generate a translation result.
  • the translated target live stream can be provided to a specific broadcaster service by updating the broadcaster configuration information (UpdateCasterSceneConfig) or the like.
  • the broadcast director service may also perform streaming speech recognition on the source live stream and obtain the translation result by invoking the streaming speech recognition service and the translation service provided by the third server.
  • the director service can also apply for a third address first, and carry the third address in the request for invoking the streaming speech recognition and translation services, so that the translation result (translated text or voice ) to the third address.
  • the director service can read the translation result from the third address, and then combine the source live stream of the first address with the translation result of the third address to generate the translated target live stream and save it to the corresponding 's second address.
  • the third server can focus on providing basic services such as big data processing.
  • a specific translation service may be to translate the speech recognition result according to a pre-established translation model.
  • the multilingual live broadcast in the embodiment of the present application may refer to a live broadcast created in the commodity object information service system; in this case, the specific translation model is based on the historical live broadcast records in the commodity object information service system as training data obtained by training. That is to say, the historical live broadcast records in the commodity object information service system can be provided to the third server, and these data can be used as training samples to train the translation model, so that the specific translation model becomes the special purpose of the commodity object information service field. model to improve the accuracy of translation results in this field.
  • translation information of special vocabulary related to the introduction of commodity objects can also be stored in advance, and the speech recognition result is translated according to the information, so as to further improve the accuracy of translation. That is to say, in the process of live broadcast in the field of commodity object information services, the anchor users may often use some proprietary words. If these words are not considered in the field, there may be various translation methods, so that the translation may be inaccurate. Case.
  • the proprietary vocabulary can be translated in advance in combination with the information in this field to obtain translation results in multiple different target languages. Specifically, when translating a live stream, if you encounter such specialized vocabulary, you can use this pre-recorded result to translate, so as to improve the translation accuracy.
  • the embodiment of the present application can provide a multilingual live broadcast service in a certain field, the single characteristic of this field makes it possible to obtain accurate multilingual translation results, that is, the translation results have higher readability, rather than mere mechanical translation, to provide an effective multilingual live service.
  • the translation service can also adjust the sentence structure of the speech recognition result, for example, including the subject-verb-object-definite complement of the sentence, etc. Sentence components are adjusted to make the sentence structure more standard.
  • the embodiment of the present application may involve translation from multiple languages to multiple languages.
  • the first server may further determine source language information associated with the live broadcast according to the information carried in the request for creating a multilingual live broadcast, and provide the source language information to the second server.
  • the source language may also be determined by the second server or a specific translation service by itself according to the speech recognition result in the source live stream.
  • the specifically generated target live stream after translation may include: a live stream associated with subtitles corresponding to the target language, or may also include a live stream associated with voice corresponding to the target language. That is, the voice in the source live stream can be directly converted into text, and translated into the text in the target language, and then, it can be added to the image of the source live stream in the form of subtitles, so that the viewer user can view the subtitles by viewing the subtitles. way to know what the host user said. Or, in another case, after the text translation is completed, speech synthesis can also be performed, and then the speech stream in the source live stream can be replaced with the translated speech stream to generate the target live stream. In this way, the viewer user can directly listen to the voice information corresponding to the target language while watching the live broadcast.
  • the first server can also provide the second server with parameter information related to subtitle display, including subtitle layout parameters, the position, height, size of the subtitle frame, background color, character limit, subtitle font, size, duration of appearance, etc.
  • the second server adds subtitles to the source live stream according to the parameter information related to the subtitle display, so as to generate a corresponding target live stream.
  • a specific implementation scenario of the embodiment of the present application may be a multilingual live broadcast in a commodity object information service system
  • the anchor user in such a system is usually a salesperson of a merchant or a seller, etc., and usually only introduces commodities.
  • the live broadcast equipment used by anchor users is usually mobile terminal equipment such as mobile phones, and the equipment itself is not professional enough, etc. Therefore, the quality of the specific live broadcast screen exists. uneven situation. For example, due to the different resolutions of the devices used by different hosts, the clarity of the live broadcast screen may be different; in addition, the host may choose the space in which the live broadcast is more casual, resulting in some live broadcast screen backgrounds may be messy, etc. .
  • the subtitle information provided in the embodiment of the present application needs to be added to the live screen, the existence of the above factors may affect the effect of adding subtitles. For example, for a device with a low resolution, if the subtitle font is relatively small, the subtitles may not be displayed clearly, making it inconvenient to read; for a situation where the background of the live broadcast screen is chaotic, if the subtitle background is transparent, some subtitles may appear The problem of unclear display, but if the background color of non-transparent subtitles is uniformly set, then for the case where the background of the live broadcast screen is relatively simple, the occlusion caused by the subtitles of the non-transparent background color to the live broadcast screen is unnecessary, and so on.
  • parameters related to subtitle display may be determined in combination with the actual situation of the specific first client.
  • the resolution of the terminal device associated with the first client and/or the screen orientation (portrait or landscape) information required for the live broadcast process can be obtained, and specific parameters related to subtitle display can be determined according to the information.
  • the first client can obtain the relevant screen parameters local to the terminal device where it is located, or, the first client can also provide an operation option for entering screen parameters, and the first user can perform the operation options. Enter.
  • the screen orientation information can be entered by the first user, or, in a specific implementation, the live broadcast scene information associated with the multi-language live broadcast can also be acquired before the live broadcast starts, and the first user can be sent to the first customer according to the live broadcast scene information.
  • the terminal provides suggested information about screen orientation. For example, if the specific live broadcast scene is to introduce the commodity objects of clothing, including the display of the upper body effect of the clothing, etc., at this time, the user may be advised to live broadcast in a vertical screen mode, and so on.
  • the height and size of the subtitle frame, the size of the subtitle font, and the like may be determined according to specific resolution parameters.
  • the position of the subtitle frame can also be determined according to the screen orientation information. For example, if the screen is vertical, the subtitle frame can be located above the comment area; The subtitle text and the text in the comment area block each other, and so on.
  • the background image information of the live broadcast screen can also be obtained.
  • the image of the live broadcast scene can be started to obtain the background image of the live broadcast screen, and so on.
  • the main color of the background image can be determined, or the degree of confusion of the background image can also be determined, so that it can be determined whether the subtitle background adopts a transparent color.
  • the subtitle background color is determined according to the main color of the background image of the live broadcast screen. For example, it may be a color with a larger color difference from the main color of the background image of the live broadcast screen, so as to improve the recognition degree of the subtitles.
  • the director service of the second server can, according to the above parameter information, The translation is added as the subtitle of the source live stream, thereby generating target live streams corresponding to multiple different target languages, and can be respectively saved to the second address pre-specified by the first server.
  • S203 After receiving the request for pulling the live stream submitted by the second client, determine the target language required by the user associated with the second client, and provide the target live stream corresponding to the target language to the second client client to play.
  • the specific target live streams can be provided to the second client. Specifically, after receiving the request for pulling the live stream submitted by the second client, determine the target language required by the user associated with the second client, and provide the target live stream corresponding to the target language to the the second client to play. It should be noted that, during specific implementation, an operation option for enabling or disabling the multilingual live translation function may be provided in the second client. In this way, when a user sends a specific request to watch a live broadcast, the switch state can be firstly judged, and if the live broadcast translation function is enabled, a request for obtaining the translated target live broadcast stream can be submitted to the first server. Otherwise, if the live translation function is disabled, a request for obtaining the source live stream may be submitted to the first server, so as to play the source live stream.
  • the target language required by the user associated with the second client there are various ways to specifically determine the target language required by the user associated with the second client. For example, in one way, when the user initiates a request for obtaining a live stream through the second client, the information in the target language. Or, in another way, since the multi-language live broadcast in the embodiment of the present application may specifically include the live broadcast created in the commodity object information service system, it may also be based on the user associated with the second client. The data generated in the object information service system determines the target language required by the user associated with the second client.
  • the country/region where the user associated with the second client is located may be determined according to the shipping address information corresponding to the user associated with the second client; then, the country/region may be determined according to the country/region. The user associated with the second client. Alternatively, the country/region where the user is located may also be determined according to the location information associated with the user, and so on.
  • an operation option for switching other target languages can also be provided in the second client , so that the user can switch to the target live stream corresponding to other target languages for playback.
  • the target live stream corresponding to the target language can be provided to the user. Accordingly, the second client can play the target live stream.
  • the interface seen by users in English-speaking countries/regions can be as shown in (A), where English subtitles are used to express what the host user is currently saying , for example, "Sensible beauty tips for enhancing your appearance”.
  • Users in French-speaking countries can see the interface shown in (B), with French subtitles expressing what the host user is currently saying, for example, "Un bon smoking pour 4-6 Common apparence”.
  • the interface seen by users in Japanese-speaking countries/regions can be as shown in (C), with Japanese subtitles expressing what the anchor user is currently saying, for example, " ⁇ Strengthening ⁇ ", etc. Wait.
  • the first server may also collect statistics on the viewing situation of the multi-language live broadcast for users in the country/region associated with the at least one target language according to the access situation of each second address by the client, and report to the user.
  • the first client provides statistical results. For example, you can count the number of viewers in English-speaking countries, the number of viewers in French-speaking countries, the number of viewers in Japanese-speaking countries, and so on. These data can be provided to the first client through data dashboards, etc., so that anchor users can intuitively determine the popularity of specific live broadcasts in various countries/regions of different languages, and further help users in their marketing strategies, etc. make adjustments.
  • this data dashboard information can also help users adjust subsequent live broadcast strategies, and so on.
  • the viewer user can also share the live broadcast address with other users, so that other users can also watch the specific live broadcast.
  • viewer users can also be supported to share with users in other countries/regions.
  • the first server can also determine the target language required by the target user of the sharing, The address of the target live stream corresponding to the target language is returned to the first client, so that the first client can copy the address and share it with the target user for playback on the client associated with the target user.
  • a user A shares a live broadcast with user B. In the traditional way, user A can directly copy the address of the live broadcast to user B.
  • a sharing operation option may be provided in the first client, and when a user needs to share with other users, a sharing request may be initiated through the operation option, and the required target language information may be carried.
  • the first server can perform address conversion, convert it into the address of the target live stream corresponding to the target language required by user B, and then return it to user A, and user A provides the converted address to user B, so that user B can watch the live content in the target language.
  • a business user can send a multi-language live broadcast request through its associated first client; the source language used for the request can also be sent at the same time.
  • the desired target language, etc. can be selected so that the request can carry this information. Of course, it can also not be selected.
  • the speech recognition service automatically recognizes the source language, translates it according to the target language configured by default, and so on.
  • the screen parameters of the terminal device associated with the first client, the screen orientation required for the live broadcast, and the background image information of the live broadcast screen may also be carried to the first server through the request.
  • the first server can request the second server to create multiple director services, which correspond to multiple target languages respectively.
  • some parameters can also be configured, specifically, parameters such as subtitle display can be included. This parameter can be specifically determined according to the screen parameters of the device associated with the first client, the screen orientation, the main color of the background image of the live broadcast screen, the degree of color confusion, etc., so as to meet the requirements of the product object information service scene due to the equipment, The demand for subtitle display caused by the unprofessionalism of the anchor.
  • the first client After receiving the information that the live broadcast has been created, the first client can save the collected source live stream to the first address, and at the same time, the first server can initiate a request to the second server to start the previously created director station. service and provide it with the information of the first address and the second address.
  • the broadcast director service can read the source live stream from the first address, and obtain the speech recognition result and the translation corresponding to the target language by invoking the speech recognition service and translation service of the third server.
  • model training can be performed in advance based on the historical live broadcast records in the commodity object information service system to improve the accuracy of the translation.
  • some proper nouns in the field can also be recorded in advance, so as to further improve the accuracy of the translation.
  • the second server After the second server obtains the translation of the language recognition result in the live stream, it can display the relevant parameters according to the subtitles previously configured by the first server, and add the translation to the image of the source live stream to generate the target live stream in the corresponding target language , and save it to the corresponding second address.
  • the above process can be completed separately by multiple broadcast director services, so that during the live broadcast process, target live streams corresponding to various target languages can be generated at multiple second addresses, respectively, with subtitles in the respective target languages.
  • a request for pulling the live stream can be initiated to the first server through the second client terminal.
  • the first server can determine the target language that the user may need according to the information such as the commonly used delivery address of the user in the commodity object information service system associated with the second client, and then corresponds to the target language.
  • the second address is provided to the second client.
  • the second client only needs to pull the target live stream from the second address and play it, so that the user can understand what the host said in the live broadcast through the subtitles.
  • an operation option for selecting more target languages may be provided on the second client terminal, so that the user can switch to other target languages to watch the live content.
  • the embodiment of the present application can support the creation of multi-language live streams, and can generate at least one translated target live stream corresponding to the target language according to the source live stream.
  • the target user required by the user associated with the second client can be determined, and the corresponding target live stream is provided to the second client, so that the user can watch the live content that meets their own language requirements.
  • a specific multilingual live broadcast service can be provided in the commodity object information service system.
  • training samples can be provided according to the historical live broadcast records in the commodity object information service system to realize the training of the translation model.
  • the translation result can be recorded in advance according to the proprietary vocabulary of the commodity object information service field, so as to improve the accuracy of the translation result.
  • the user data generated in the system by the user associated with the second client terminal can also be used.
  • the country/region to which the user belongs is judged to automatically determine the target language required by the user.
  • the second embodiment corresponds to the first embodiment. From the perspective of the second server, a method for processing live streams is provided. Referring to FIG. 4 , the method may specifically include:
  • the second server creates at least one director service according to the request submitted by the first server; the request is submitted after the first server receives the request for creating a multilingual live broadcast; the at least one director The desk service corresponds to at least one target language;
  • S402 Obtain a first address and at least one second address provided by the first server, where the first address is used to save the source live stream of the live broadcast, and the at least one second address is associated with at least one target language correspondence;
  • S403 After the multilingual live broadcast is successfully created, start the director service, where the director service is used to read the source live stream from the first address, and call the streaming speech recognition service and translation by calling the streaming speech recognition service and translation. After performing streaming speech recognition on the source live stream and obtaining the translation result corresponding to one of the target languages, the source live stream and the translation result are combined to generate a translated version corresponding to the target language. The target live stream is saved to the second address corresponding to the target language.
  • the director service can be specifically used to invoke the streaming speech recognition service and translation service provided by the third server, generate a third address, and provide the first address and the third address to the third server.
  • the director service reads the translation result through the third address, and communicates with the source live stream Perform synthesis to generate the target live stream.
  • the translation result includes the translated text stream; at this time, the director service is specifically used to add the text stream as the subtitle information of the source live stream to generate a corresponding target live stream.
  • the translation result includes the translated voice stream; at this time, the director service is specifically used to delete the voice stream from the source live stream, and synthesize with the translated voice stream to generate The target live stream.
  • the third embodiment also corresponds to the first embodiment. From the perspective of the third server, a method for processing live streams is provided. Referring to FIG. 5 , the method may specifically include:
  • the third server creates a streaming speech recognition service and a translation service according to a calling request of the second server, wherein the request carries target language information, a first address and a third address, and the first address is To save the source live stream;
  • S502 Read the source live stream from the first address, and perform speech recognition on the source live stream through the streaming speech recognition service;
  • S503 Translate the speech recognition result through a translation service, obtain a translation result corresponding to the target language, and save the translation result to the third address, so that the second server can obtain the result from the third address
  • the translation result is synthesized with the source live stream into a target live stream corresponding to the target language.
  • the live broadcast includes a live broadcast created in the commodity object information service system; in this case, the translation service may specifically translate the speech recognition result according to a pre-established translation model, and the translation model is based on the commodity object
  • the historical live broadcast records in the information service system are obtained from training data.
  • the translation service may also translate the speech recognition result according to the pre-stored translation information of the special vocabulary related to the introduction of the commodity object.
  • the fourth embodiment provides a live broadcast method from the perspective of the first client associated with the anchor user.
  • the method may specifically include:
  • the first client receives a request for creating a multilingual live broadcast
  • S602 Submit the request to a first server, and receive a first address returned by the first server;
  • S603 After the live broadcast is successfully created, submit the generated live stream to the first address, so as to obtain the source live stream from the first address, and obtain the translated target corresponding to at least one target language A live stream for serving to a second client associated with a user with a target language requirement.
  • an operation option for selecting a source language associated with the source live broadcast may also be provided; the source language information received through the operation option is submitted to the first server.
  • statistical information provided by the first server may also be received, where the statistical information includes: the viewing situation of the multilingual live broadcast by users in countries/regions associated with the at least one target language, The above statistics are displayed.
  • the fifth embodiment provides a method for obtaining a live stream from the perspective of a second client associated with a viewer user.
  • the method may specifically include:
  • S701 The second client submits a request for obtaining the live stream to the first server;
  • S702 Receive a second address provided by the first server, where the second address is determined according to a target language required by a user associated with the second client, and the second address stores the target language The corresponding translated target live stream;
  • S703 Pull and play the target live stream through the second address.
  • an operation option for re-selection of the target language may also be provided; the target language re-selected through the operation option is submitted to the first server, so that the first server provides the re-selected target language.
  • an operation option for enabling or disabling the multilingual live translation function can also be provided; specifically, when submitting a request for obtaining a live stream to the first server, if the live translation function is enabled, the first server Submit a request to get the translated target live stream. Otherwise, if the live translation function is disabled, a request for obtaining the source live stream is submitted to the first server, so as to play the source live stream.
  • an operation option for sharing the live broadcast can also be provided; after receiving the sharing request through the operation option, the target language required by the sharing object is determined, and the sharing request and the sharing object required The target language is submitted to the first server; after receiving the second address corresponding to the target language required by the sharing object returned by the first server, the second address is provided to the sharing object associated client.
  • the embodiments of the present application may involve the use of user data.
  • the user's data may be used in accordance with the applicable laws and regulations of the country where the user is located (for example, the user expressly agrees, and the user is effectively notified. etc.), use user-specific personal data in the scenarios described herein to the extent permitted by applicable laws and regulations.
  • the embodiment of the present application further provides a live broadcast device, and the device is applied to the first server.
  • the device may specifically include:
  • a request receiving unit 801 configured to receive a request for creating a multilingual live broadcast submitted by a first client
  • a target live stream obtaining unit 802 configured to obtain a translated target live stream corresponding to at least one target language according to the source live stream collected by the first client after the multilingual live broadcast is successfully created;
  • the target live stream providing unit 803 is configured to, after receiving the request for pulling the live stream submitted by the second client terminal, determine the target language required by the user associated with the second client terminal, and broadcast the target live stream corresponding to the target language The stream is provided to the second client for playback.
  • the target live stream obtaining unit may include:
  • An address generating unit configured to generate a first address and at least one second address, the at least one second address corresponding to at least one target language
  • a first address providing unit configured to provide the first address to the first client, so that after the multilingual live broadcast is successfully created, the first client saves the generated source live stream to the first address;
  • a second address providing unit configured to provide the first address and at least one second address to a second server, so that the second server obtains the source live stream from the first address, and after obtaining After the translated target live stream corresponding to at least one target language is stored, it is respectively saved to the second address;
  • the target live stream providing unit can be specifically used for:
  • the second address corresponding to the target language is returned to the second client, so that the second client obtains the translated target live stream corresponding to the target language from the second address to play.
  • the second address providing unit can be specifically used for:
  • the director service By calling the service creation interface provided by the second server, at least one director service is started in the second server, and the at least one director service corresponds to the at least one target language respectively; the director service By invoking the streaming speech recognition service and the translation service, after performing streaming speech recognition on the source live stream and obtaining the translation result, the translated source live stream is combined with the translation result to generate the translated The target live stream; wherein, the request for invoking the service of the director station carries the information of the first address and the second address.
  • the director service can perform streaming speech recognition on the source live stream and obtain the translation result by invoking the streaming speech recognition service and the translation service provided by the third server; wherein, the director service is in The third address is carried in the call request, so that the translation result is saved to the third address, and the director station service combines the source live stream of the first address and the translation result of the third address to generate the translated target live stream , and save to the second address.
  • the multilingual live broadcast includes the live broadcast created in the commodity object information service system; at this time, the translation service translates the speech recognition result according to a pre-established translation model, and the translation model is based on the commodity object
  • the historical live broadcast records in the information service system are obtained from training data.
  • the translation service may also translate the speech recognition result according to the pre-stored translation information of the special vocabulary related to the introduction of the commodity object.
  • the translation service may further adjust the sentence structure of the speech recognition result before translating the speech recognition result.
  • the device may further include:
  • a source language information determining unit configured to determine source language information associated with the live broadcast according to the information carried in the request for creating a multilingual live broadcast
  • a source language information providing unit configured to provide the source language information to the second server.
  • the translated target live stream includes: a live stream associated with subtitles corresponding to the target language; at this time, the device may further include:
  • the layout parameter information providing unit is used to provide page layout parameter information to the second server, so that after the director service obtains the translated text stream corresponding to the target language, according to the page layout parameters, the text The stream is added as the subtitle information of the source live stream to generate the corresponding target live stream.
  • the device may also include:
  • a statistical unit configured to collect statistics on the viewing situation of the multilingual live broadcast by users in the countries/regions associated with the at least one target language according to the access situation of the second address, and report to the first customer
  • the terminal provides statistical results.
  • the translated target live stream includes: a live stream associated with subtitles corresponding to the target language, or a live stream associated with voice corresponding to the target language.
  • the multilingual live broadcast includes the live broadcast created in the commodity object information service system; at this time, the target live broadcast stream providing unit can be used for:
  • the target language required by the user associated with the second client is determined.
  • the target live stream providing unit can be used for:
  • the country/region where the user associated with the second client is located is determined according to the shipping address information corresponding to the user associated with the second client; the user associated with the second client is determined according to the country/region.
  • the device may also include:
  • the sharing unit is configured to determine the target language required by the shared target user when receiving a request for sharing the live broadcast by the second client, and provide the address information of the target live stream corresponding to the target language to the the second client, so that the second client shares the address with the target user.
  • the embodiment of the present application further provides a live stream processing device, and the device is applied to the second server.
  • the device may specifically include:
  • a director service creation unit 901 configured to create at least one director service according to a request submitted by a first server; the request is submitted after the first server receives a request for creating a multilingual live broadcast; the at least one director service corresponds to at least one target language;
  • the address obtaining unit 902 is configured to obtain a first address and at least one second address provided by the first server, wherein the first address is used to save the source live stream of the live broadcast, and the at least one second address is used to save the source live stream of the live broadcast.
  • the address corresponds to at least one target language
  • the director service starting unit 903 is configured to start the director service after the multi-language live broadcast is successfully created, and the director service is used to read the source live stream from the first address, and by calling Streaming speech recognition service and translation service, after performing streaming speech recognition on the source live stream and obtaining a translation result corresponding to one of the target languages, the source live stream and the translation result are merged to generate the The translated target live stream corresponding to the target language is saved to the second address corresponding to the target language.
  • the director service is specifically used to invoke the streaming speech recognition service and the translation service provided by the third server, generate a third address, and provide the first address and the third address to the third server , so that the third server saves the translation result to the third address after obtaining the translation result; the director station service reads the translation result through the third address, and synthesizes it with the source live stream, The target live stream is generated.
  • the translation result includes the translated text stream
  • the broadcast director service is specifically used to add a text stream as subtitle information of the source live stream to generate a corresponding target live stream.
  • the translation result includes the translated voice stream
  • the broadcast director service is specifically used for deleting the voice stream from the source live stream, and synthesizing it with the translated voice stream to generate the target live stream.
  • the embodiment of the present application also provides a live stream processing device, see FIG. 10 , the device is applied to a third server, including:
  • a service creation unit 1001 configured to create a streaming speech recognition service and a translation service according to a call request of the second server, wherein the request carries target language information, a first address and a third address, the first address Used to save the source live stream;
  • a speech recognition unit 1002 configured to read the source live stream from the first address, and perform speech recognition on the source live stream through the streaming speech recognition service;
  • the translation unit 1003 is configured to translate the speech recognition result through the translation service, obtain the translation result corresponding to the target language, and save the translation result to the third address, so that the second server can transfer the translation result from the The third address obtains the translation result, and combines it with the source live stream into a target live stream corresponding to the target language.
  • the live broadcast includes the live broadcast created in the commodity object information service system
  • the translation service is to translate the speech recognition result according to a pre-established translation model, and the translation model is obtained by training the historical live broadcast records in the commodity object information service system as training data.
  • the translation service also translates the speech recognition result according to the pre-stored translation information of the special vocabulary related to the introduction of the commodity object.
  • this embodiment of the present application further provides a live broadcast apparatus, which is applied to the first client.
  • the apparatus may specifically include:
  • a request receiving unit 1101, configured to receive a request for creating a multilingual live broadcast
  • a request submitting unit 1102 configured to submit the request to the first server, and receive the first address returned by the first server;
  • Streaming unit 1103 configured to submit the generated live stream to the first address after the live broadcast is successfully created, so as to obtain the source live stream from the first address and obtain at least one target language corresponding The translated target live stream for serving to a second client associated with a user with target language requirements.
  • the device may further include:
  • the operation option providing unit is used to provide operation options for selecting the source language associated with the source live broadcast
  • a source language information submission unit configured to submit the source language information received through the operation option to the first server.
  • the device may also include:
  • a statistical information receiving unit configured to receive statistical information provided by the first server, where the statistical information includes: the viewing situation of the multilingual live broadcast by users in the countries/regions associated with the at least one target language;
  • a statistical information display unit configured to display the statistical information.
  • this embodiment of the present application further provides an apparatus for acquiring live streams.
  • the apparatus is applied to a second client and includes:
  • a request submitting unit 1201 configured to submit a request for obtaining a live stream to the first server
  • the address obtaining unit 1202 is configured to receive a second address provided by the first server, where the second address is determined according to the target language required by the user associated with the second client, and the second address is stored in There is a translated target live stream corresponding to the target language;
  • the stream pulling unit 1203 is configured to pull the target live stream through the second address and play it.
  • the device may further include:
  • a first operation option providing unit for providing an operation option for re-selection of the target language
  • the re-selection result submitting unit is configured to submit the target language re-selected through the operation option to the first server, so that the first server provides the second address corresponding to the re-selected target language.
  • the device may also include:
  • the second operation option providing unit is used to provide operation options for enabling or disabling the multilingual live translation function
  • the request submission unit can be specifically used for:
  • a request for obtaining the translated target live stream is submitted to the first server.
  • request submission unit can also be used to:
  • a request for obtaining the source live stream is submitted to the first server, so as to play the source live stream.
  • the device may also include:
  • a third operation option providing unit for providing operation options for sharing the live broadcast
  • a target language determination unit configured to determine the target language required by the sharing object after receiving the sharing request through the operation option, and submit the sharing request and the target language required by the sharing object to the first service end;
  • the sharing unit is configured to provide the second address to the client associated with the sharing object after receiving the second address returned by the first server and corresponding to the target language required by the sharing object.
  • an embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the steps of the method described in any one of the foregoing method embodiments.
  • an electronic device comprising:
  • the memory is used to store program instructions that, when read and executed by the one or more processors, perform any one of the foregoing method embodiments the steps of the method.
  • the device 1300 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant , aircraft, etc.
  • the device 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power supply component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, and communication component 1316.
  • a processing component 1302 a memory 1304, a power supply component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, and communication component 1316.
  • the processing component 1302 generally controls the overall operation of the device 1300, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing element 1302 may include one or more processors 1320 to execute instructions to complete all or part of the steps of the methods provided by the technical solutions of the present disclosure.
  • processing component 1302 may include one or more modules that facilitate interaction between processing component 1302 and other components.
  • processing component 1302 may include a multimedia module to facilitate interaction between multimedia component 1308 and processing component 1302.
  • Memory 1304 is configured to store various types of data to support operations at device 1300 . Examples of such data include instructions for any application or method operating on device 1300, contact data, phonebook data, messages, pictures, videos, and the like. Memory 1304 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power supply component 1306 provides power to various components of device 1300 .
  • Power components 1306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to device 1300 .
  • Multimedia component 1308 includes screens that provide an output interface between device 1300 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. A touch sensor can sense not only the boundaries of a touch or swipe action, but also the duration and pressure associated with the touch or swipe action.
  • the multimedia component 1308 includes a front-facing camera and/or a rear-facing camera. When the device 1300 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 1310 is configured to output and/or input audio signals.
  • audio component 1310 includes a microphone (MIC) that is configured to receive external audio signals when device 1300 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signal may be further stored in memory 1304 or transmitted via communication component 1316 .
  • audio component 1310 also includes a speaker for outputting audio signals.
  • the I/O interface 1312 provides an interface between the processing component 1302 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • Sensor assembly 1314 includes one or more sensors for providing status assessments of various aspects of device 1300 .
  • the sensor component 1314 can detect the open/closed state of the device 1300, the relative positioning of components, such as the display and keypad of the device 1300, and the sensor component 1314 can also detect a change in the position of the device 1300 or a component of the device 1300 , the presence or absence of user contact with the device 1300 , the device 1300 orientation or acceleration/deceleration and the temperature change of the device 1300 .
  • Sensor assembly 1314 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 1314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 1316 is configured to facilitate wired or wireless communications between device 1300 and other devices.
  • the device 1300 can access a wireless network based on a communication standard, such as WiFi, or a mobile communication network such as 2G, 3G, 4G/LTE, and 5G.
  • the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 1316 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • device 1300 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • a non-transitory computer-readable storage medium including instructions is also provided, such as a memory 1304 including instructions, and the above-mentioned instructions can be executed by the processor 1320 of the device 1300 to complete the method provided by the technical solution of the present disclosure.
  • the non-transitory computer-readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Abstract

本申请实施例公开了直播方法、装置及电子设备,所述方法包括:第一服务端接收第一客户端提交的创建多语言直播的请求;在所述多语言直播创建成功后,根据所述第一客户端采集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流;接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。通过本申请实施例,能够在跨境商品对象信息服务等系统中更好地应用直播技术。

Description

直播方法、装置及电子设备
本申请要求2020年07月27日递交的申请号为202010733464.9、发明名称为“直播方法、装置及电子设备”中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及直播技术领域,特别是涉及直播方法、装置及电子设备。
背景技术
随着直播技术的发展,越来越多的行业中引入了直播,其中包括商品对象信息服务系统。商家或者卖家用户通过直播的方式对商品对象的信息进行介绍,买家或者消费者用户可以通过直播中的视频以及主播的语言描述获得关于商品对象更直观的信息,享受直播带来的更接近真实购物的体验;另外,还可以在直播过程中与主播进行互动,包括询问关于商品对象的信息,主播可以实时在线解答,等等。总之,通过引入直播技术,可以更有效地帮助买家或消费者用户进行购物决策。
其中,一些商品对象信息服务系统还为用户提供跨境服务,可以面向海外买家或消费者用户提供商品对象的销售等服务。在传统的通过图文方式对商品对象进行描述的情况下,可以将图文详情翻译成多国语言,供海外用户浏览。但是,如果在这种跨境的商品对象信息服务系统中引入直播技术,则存在一定的困难。因为主播用户在直播过程中通常只能覆盖一种语言,而面向的却是多个国家的买家用户,彼此之间存在语言障碍。
因此,如何在跨境商品对象信息服务等系统中更好地应用直播技术,成为需要本领域技术人员解决的技术问题。
发明内容
本申请提供了直播方法、装置及电子设备,能够在跨境商品对象信息服务等系统中更好地应用直播技术。
本申请提供了如下方案:
一种直播方法,包括:
第一服务端接收第一客户端提交的创建多语言直播的请求;
在所述多语言直播创建成功后,根据所述第一客户端采集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流;
接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。
一种直播流处理方法,包括:
第二服务端根据第一服务端提交的请求,创建至少一个导播台服务;所述请求是在所述第一服务端接收到创建多语言直播的请求后提交的;所述至少一个导播台服务与至少一种目标语言对应;
获取所述第一服务端提供的第一地址以及至少一个第二地址,其中,所述第一地址用于保存所述直播的源直播流,所述至少一个第二地址与至少一种目标语言对应;
在所述多语言直播创建成功后,启动所述导播台服务,所述导播台服务用于从所述第一地址读取所述源直播流,并通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到其中一目标语言对应的翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成该目标语言对应的翻译后的目标直播流,保存到该目标语言对应的第二地址。
一种直播流处理方法,包括:
第三服务端根据第二服务端的调用请求,创建流式语音识别服务以及翻译服务,其中,所述请求中携带有目标语言信息,第一地址以及第三地址,所述第一地址用于保存源直播流;
从所述第一地址读取所述源直播流,并通过所述流式语音识别服务对所述源直播流进行语音识别;
通过翻译服务对语音识别结果进行翻译,得到所述目标语言对应的翻译结果,并将所述翻译结果保存到所述第三地址,以便所述第二服务端从所述第三地址获取所述翻译结果,并与所述源直播流合成为目标语言对应的目标直播流。
一种直播方法,包括:
第一客户端接收创建多语言直播的请求;
将所述请求提交到第一服务端,并接收所述第一服务端返回的第一地址;
在所述直播创建成功后,将产生的直播流提交到所述第一地址,以便从所述第一地址获取所述源直播流,并获得至少一种目标语言对应的翻译后的目标直播流,以用于提供给具有目标语言需求的用户关联的第二客户端。
一种获取直播流方法,包括:
第二客户端向第一服务端提交获取直播流的请求;
接收所述第一服务端提供的第二地址,所述第二地址是根据所述第二客户端关联的用户所需的目标语言确定的,所述第二地址保存有所述目标语言对应的翻译后的目标直播流;
通过所述第二地址拉取所述目标直播流并进行播放。
一种直播装置,应用于第一服务端,包括:
请求接收单元,用于接收第一客户端提交的创建多语言直播的请求;
目标直播流获得单元,用于在所述多语言直播创建成功后,根据所述第一客户端采 集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流;
目标直播流提供单元,用于接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。
一种直播流处理装置,应用于第二服务端,包括:
导播台服务创建单元,用于根据第一服务端提交的请求,创建至少一个导播台服务;所述请求是在所述第一服务端接收到创建多语言直播的请求后提交的;所述至少一个导播台服务与至少一种目标语言对应;
地址获取单元,用于获取所述第一服务端提供的第一地址以及至少一个第二地址,其中,所述第一地址用于保存所述直播的源直播流,所述至少一个第二地址与至少一种目标语言对应;
导播台服务启动单元,用于在所述多语言直播创建成功后,启动所述导播台服务,所述导播台服务用于从所述第一地址读取所述源直播流,并通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到其中一目标语言对应的翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成该目标语言对应的翻译后的目标直播流,保存到该目标语言对应的第二地址。
一种直播流处理装置,应用于第三服务端,包括:
服务创建单元,用于根据第二服务端的调用请求,创建流式语音识别服务以及翻译服务,其中,所述请求中携带有目标语言信息,第一地址以及第三地址,所述第一地址用于保存源直播流;
语音识别单元,用于从所述第一地址读取所述源直播流,并通过所述流式语音识别服务对所述源直播流进行语音识别;
翻译单元,用于通过翻译服务对语音识别结果进行翻译,得到所述目标语言对应的翻译结果,并将所述翻译结果保存到所述第三地址,以便所述第二服务端从所述第三地址获取所述翻译结果,并与所述源直播流合成为目标语言对应的目标直播流。
一种直播装置,应用于第一客户端,包括:
请求接收单元,用于接收创建多语言直播的请求;
请求提交单元,用于将所述请求提交到第一服务端,并接收所述第一服务端返回的第一地址;
推流单元,用于在所述直播创建成功后,将产生的直播流提交到所述第一地址,以便从所述第一地址获取所述源直播流,并获得至少一种目标语言对应的翻译后的目标直播流,以用于提供给具有目标语言需求的用户关联的第二客户端。
一种获取直播流装置,应用于第二客户端,包括:
请求提交单元,用于向第一服务端提交获取直播流的请求;
地址获得单元,用于接收所述第一服务端提供的第二地址,所述第二地址是根据所述第二客户端关联的用户所需的目标语言确定的,所述第二地址保存有所述目标语言对应的翻译后的目标直播流;
拉流单元,用于通过所述第二地址拉取所述目标直播流并进行播放。
根据本申请提供的具体实施例,本申请公开了以下技术效果:
通过本申请实施例,能够支持多语言直播的创建,并且可以根据源直播流生成至少一种目标语言对应的翻译后的目标直播流,在第二客户端发起获取直播流的请求后,可以确定出第二客户端关联的用户所需的目标用户,并将对应的目标直播流提供给该第二客户端,使得用户能够观看到符合自己语言需求的直播内容。
在具体实现时,可以在商品对象信息服务系统中提供具体的多语言直播服务,此时,可以根据该商品对象信息服务系统中的历史直播记录提供训练样本,实现对翻译模型的训练。另外还可以根据商品对象信息服务领域的专有词汇提前进行翻译结果的录定,以此提升翻译结果的准确率。
再者,同样在商品对象信息服务系统中提供具体的多语言直播服务的情况下,还可以根据第二客户端关联的用户在该系统中产生的用户数据,包括常用的收货地址信息等,对用户所属的国家/地区进行判断,从而自动确定出用户所需的目标语言。
当然,实施本申请的任一产品并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的系统架构的示意图;
图2是本申请实施例提供的第一方法的流程图;
图3-1是本申请实施例提供的创建直播过程的交互时序的示意图;
图3-2是本申请实施例提供的推流过程的交互时序的示意图;
图3-3是本申请实施例提供的观看者用户界面的示意图;
图4是本申请实施例提供的第二方法的流程图;
图5是本申请实施例提供的第三方法的流程图;
图6是本申请实施例提供的第四方法的流程图;
图7是本申请实施例提供的第五方法的流程图;
图8是本申请实施例提供的第一装置的示意图;
图9是本申请实施例提供的第二装置的示意图;
图10是本申请实施例提供的第三装置的示意图;
图11是本申请实施例提供的第四装置的示意图;
图12是本申请实施例提供的第五装置的示意图;
图13是本申请实施例提供的电子设备的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
在本申请实施例中,为了能够在跨境商品对象信息服务系统中应用直播技术,提供了跨语言直播功能。主播用户(在本申请实施例中可以称为第一用户,相应的,可以将观看者用户称为第二用户)在创建直播时,可以选择是否使用跨语言直播服务,如果使用,则服务端可以帮助用户生成多种目标语言对应的目标直播流,并提供多路拉流地址,每路拉流地址可以对应一种目标语言。这样,在第二用户需要观看直播时,服务端可以根据该第二用户所需的目标语言,将对应的拉流地址提供给第二用户的客户端,使得该客户端可以从该拉流地址获取到对应目标语言的直播流进行播放。通过这种方式,使得第一用户使用一种语言创建的直播,可以翻译成多种不同的目标语言,供多个国家/地区的第二用户观看。这样,在跨境商品对象信息服务系统中,使得第二用户也能够通过观看直播的方式获得关于商品对象的更丰富更直观的信息。当然,该多语言直播方法也可以在其他的跨境系统中使用。
其中,由于在生产目标语言对应的翻译后的目标直播流的过程中,涉及到对源直播流进行流式语音识别以及翻译,同时又需要尽量提高识别以及翻译的准确度,另外,通常还需要将源直播流翻译成多种不同目标语言对应的目标直播流,以满足多个不同国家/地区的用户的观看需求,因此,对服务器的能力具有比较高的要求。为此,在本申请实施例中,还可以通过专门的服务端(具体的,可以称为第二服务端,相应的具体与前端直播客户端交互的服务端可以称为第一服务端)创建多个导播台服务,每个导播台服务对应一种目标语言。在每个导播台服务中,又可以调用具体的流式语音识别服务以及翻译服务,得到翻译结果数据流,然后,将源直播流与该翻译结果数据流进行合流,即可得到具体目标语言对应的翻译后的直播流。这种直播流可以保存到由第一服务端指定的拉流地址,从而使得第一服务端可以获得多种不同目标语言对应的翻译后的目标直播流。
在优选的实施方式中,流式语音识别服务以及翻译服务也可以通过第三服务端来提供。这样,各服务端可以专注于某一项功能的实现,然后,通过多个服务端之间的相互 配合,最终达到提高翻译准确度的目的。
其中,具体的翻译服务还可以通过预先建立的翻译模型对语音识别结果进行翻译。具体实现时,由于本申请实施例主要可以在商品对象信息服务等系统内部提供多语言直播服务,因此,直播场景比较单一,这也为获得良好的翻译准确率提供了基础。具体的,可以将商品对象信息服务系统中的历史直播记录作为训练数据对翻译模型进行训练,使得翻译模型成为商品对象信息服务领域的专用模型。另外,还可以预先对商品对象信息服务场景中的一些专有名词等预先进行录定,例如,预先获得专有名词在各种目标语言下的表达方式,等等。这样,通过专用的翻译模型以及上述专有名词的预先录定信息,可以进一步提升翻译的准确率。
再者,同样由于本申请实施例主要可以在商品对象信息服务等系统内部提供多语言直播服务,因此,还可以基于第二用户在这种系统内部产生的数据(例如,用户常用的收货地址等),对第二用户所需要的目标语言进行自动识别,从而向用户推荐或者直接推送该目标语言对应的拉流地址。
具体实现时,如图1所示,本申请实施例可以涉及到商品对象信息服务等系统提供的客户端以及服务端,其中,该服务端可以对应前述第一服务端,客户端可以分为面向主播用户的第一客户端,以及面向观看者用户的第二客户端。另外,如前文所述,在具体实现时,还可以涉及第二服务端甚至第三服务端。在一种具体的实现方式下,第一客户端向第一服务端发起创建多语言直播的请求后,第一服务端可以调用第二服务端的接口创建多个导播台服务,分别与多种不同的目标语言对应。同时可以生成第一地址以及多个第二地址。在直播创建成功后,具体的导播台服务可以调用第三服务端的流式语音识别以及翻译服务,得到的翻译结果可以保存到导播台服务指定的第三地址。导播台服务可以从第一地址读取源直播流,从第三地址读取翻译结果数据流,并合流为目标语言对应的目标直播流,并保存到第一服务端指定的第二地址。之后,第二客户端提交获取直播流的请求后,可以根据具体用户所需的目标语言,向第二客户端提供对应的第二地址,使得第二客户端可以从该第二地址拉取到用户所需目标语言对应的目标直播流进行播放。其中,不同的第二客户端关联的用户所需的目标语言不同,因此,提供给不同第二客户端的第二地址也可能是不同的。例如,假设主播用户是中国的用户,源直播流中的源语言是中文,在进行翻译后,得到了英文、法文、德文、日文等多种目标语言对应的目标直播流,并分别保存到不同的第二地址中。之后,某英语国家的用户甲请求观看该直播时,可以将保存有英语对应的目标直播流的第二地址A提供给用户甲;某法语国家的用户乙请求观看该直播时,可以将保存有法语对应的目标直播流的第二地址B提供给用户乙,等等。
下面对本申请实施例提供的具体实现方案进行详细介绍。
实施例一
首先,该实施例一从第一服务端的角度,提供了一种直播方法,参见图2,该方法具体可以包括:
S201:第一服务端接收第一客户端提交的创建多语言直播的请求;
具体实现时,可以在主播等第一用户关联的第一客户端中提供用于创建直播的操作选项,在第一用户点击创建直播时,可以询问用户是否需要创建多语言直播,如果选择需要,则可以向第一服务端发出创建多语言直播的请求。或者,另一种方式下,也可以在第一客户端中提供分别用于创建普通直播以及多语言直播的不同操作选项,需要创建多语言直播的用户可以直接通过该操作选项发起具体的请求。
具体实现时,在用户需要创建多语言直播请求时,还可以通过第一客户端提供用于提交直播所使用的源语言信息的操作选项。例如,某位中国的用户,在直播时是使用中文,则可以将源语言选择为“中文”,等等。另外,在可选的方式下,第一客户端还可以提供用于选择目标语言的操作选项,也即,可以由第一用户决定翻译成哪些目标语言。在用户未选择的情况下,可以按照默认的配置确定目标语言。该默认配置的信息可以是多用户通用的配置信息,或者,也可以是根据第一客户端关联的用户的个性化信息进行的配置,例如,可以根据历史选择记录进行默认配置,等等。其中,目标语言可以为一种或多种,也即,可以将源直播流翻译为多种不同目标语言对应的目标直播流,使得不同国家/地区的用户能够看懂直播的内容。
S202:在所述多语言直播创建成功后,根据所述第一客户端采集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流;
第一服务端在接收到创建多语言直播的请求后,可以进入到创建直播的流程。在创建完成后,则可以根据源直播流获得至少一种目标语言对应的翻译后的目标直播流,以便提供给多种不同语言需求的第二客户端。
其中,具体实现时,如前文所述,可以通过调用第二服务端的导播台服务来实现对源直播流的语音识别以及翻译等处理。其中,第二服务端可以是与第一服务端具有关联关系的云服务平台的服务端,或者也可以以其他形式存在。在这种情况下,第一服务端在接收到创建多语言直播的请求后,首先可以生成第一地址以及至少一个第二地址,其中,所述至少一个第二地址与至少一种目标语言对应。具体的,上述第一地址以及第二地址可以是在关联的内容分发网络(CDN)中申请的地址。在生成第一地址以及第二地址后,可以将所述第一地址提供给所述第一客户端,这样,在所述多语言直播创建成功后,所述第一客户端可以将产生的源直播流保存到所述第一地址(也即,第一客户端可以向该第一地址进行推流)。另外,还可以将所述第一地址以及至少一个第二地址提供给第二服务端,这样,第二服务端可以从所述第一地址获得所述源直播流,并且,在获得至少一种目标语言对应的翻译后的目标直播流后,分别保存到所述第二地址。通过这种方式,第一服务端可以获取到分别保存到不同第二地址的多种不同目标语言对应的翻译后的目 标直播流。后续在第二客户端请求观看直播时,则可以将第二客户端关联的用户所需的目标语言对应的第二地址返回给所述第二客户端,以便所述第二客户端从该第二地址获取该目标语言对应的翻译后的目标直播流进行播放。
其中,在通过上述方式获得目标直播流的情况下,第一服务端还可以在生成具体的第二地址之前,首先调用第二服务端的接口,以实现对导播台服务的创建。例如,具体的时序图可以如图3-1所示,第一服务端首先可以向第二服务端发送“CreateCaster”请求,具体可以请求创建多个导播台服务;第二服务端完成导播台服务的创建后,可以向第一服务端返回CasterId;之后,第一服务端可以对Caster进行配置(SetCasterConfig);并向第二服务端申请添加导播台源(AddCasterVideoResource),之后可以设置导播频道(SetCasterChannel),添加导播布局(AddCasterLayout),添加导播组件(AddCasterComponent)等等。完成上述交互之后,第一服务端可以生成多个第二地址,也即拉流地址,多个第二地址分别与多个目标语言对应。
在生成上述第二地址之后,可以完成对多语言直播的创建,并且可以将第一地址提供给第一客户端,之后可以进入到主播推流的过程。例如,如图3-2所示,第一客户端可以将采集到的源直播流推送到该第一地址进行保存。同时,第一服务端还可以通过调用第二服务端提供的接口,在所述第二服务端启动之前已经创建的导播台服务(StartCaster),所述至少一个导播台服务分别与所述至少一种目标语言对应。另外,还可以通过更新导播台配置信息(UpdateCasterSceneConfig)等方式,将第一地址以及第二地址等参数提供给具体的导播台服务。之后,导播台服务可以通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别获取到翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成所述翻译后的目标直播流。
具体实现时,导播台服务还可以通过调用第三服务端提供的流式语音识别服务以及翻译服务的方式,对所述源直播流进行流式语音识别并获取翻译结果。在这种情况下,所述导播台服务还可以首先申请第三地址,并在对流式语音识别以及翻译服务进行调用的请求中携带该第三地址,以便将翻译结果(翻译后的文本或者语音)保存到所述第三地址。这样,所述导播台服务可以从第三地址读取到翻译结果,然后,将第一地址的源直播流与第三地址的翻译结果进行合流,生成翻译后的目标直播流,并保存到对应的第二地址。
其中,第三服务端可以是专注于提供大数据处理等相关的基础服务。具体的翻译服务可以是根据预先建立的翻译模型对语音识别结果进行翻译。具体实现时,本申请实施例中的多语言直播可以是指在商品对象信息服务系统中创建的直播;此时,具体的翻译模型是以所述商品对象信息服务系统中的历史直播记录为训练数据进行训练获得的。也就是说,可以将商品对象信息服务系统中的历史直播记录提供给第三服务端,这些数据可以作为训练样本,对翻译模型进行训练,以使得具体的翻译模型成为商品对象信息服 务领域的专用模型,以此提高该领域翻译结果的准确率。
另外,在优选的实施方式中,还可以预先保存与商品对象介绍相关的专用词汇的翻译信息,根据这些信息对所述语音识别结果进行翻译,以此进一步提升翻译的准确率。也就是说,在商品对象信息服务领域的直播过程中,可能主播用户会经常用到一些专有词汇,这些词汇如果在不考虑领域因素,可能具有多种翻译方式,以至于可能出现翻译不准确的情况。而本申请实施例中,由于可以确定是在商品对象信息服务领域进行直播,因此,可以预先结合该领域信息,对专有词汇进行翻译,得到多种不同目标语言下的翻译结果。具体在对直播流进行翻译时,如果遇到这种专业词汇,则可以利用这种预先录定的结果进行翻译,以此提高翻译准确率。
也就是说,由于本申请实施例可以在某一特定领域中提供多语言直播服务,这种领域的单一性特点,使得获得准确的多语言翻译结果成为可能,也即,翻译结果具有较高的可读性,而不是仅进行机械的翻译,从而提供有效的多语言直播服务。
再者,由于主播用户在直播过程中通常是采用口语化的语言对商品对象等信息进行介绍,因此,可能经常存在表达语法不准确或者比较随意等情况。例如,可能会说“我试穿一下先”,但正确的表达语法应该是“我先试穿一下”,等等。而在语法不准确的情况下,可能会影响翻译结果的准确度。因此,为了进一步提升翻译结果的质量,翻译服务在对所述语音识别结果进行翻译前,还可以对所述语音识别结果的句子结构进行调整,例如,包括对句子的主谓宾定状补等句子成分进行调整,以使得句子结构更标准。需要说明的是,这种对句子结构进行调整的情况下,可能对翻译结果的实时性略有影响,但在实际应用中,由于这种商品对象信息服务场景下,观看者用户对实时性的要求通常并不高,并且通常也不会影响到观看者用户与主播用户的互动,因此,可以忽略这种对实时性的影响。
另外,由于不同的源直播流中主播用户使用的源语言也可能会不同,因此,本申请实施例中可能会涉及到多种语言到多种语言的翻译。为了便于进行翻译,第一服务端还可以根据所述创建多语言直播的请求中携带的信息确定所述直播关联的源语言信息,并将所述源语言信息提供给所述第二服务端。当然,在具体实现时,也可以由第二服务端或者具体的翻译服务自行根据源直播流中的语音识别结果确定源语言。
具体实现时,具体生成的翻译后的目标直播流可以包括:关联有所述目标语言对应字幕的直播流,或者,也可以包括联有所述目标语言对应的语音的直播流。也即,可以直接将源直播流中的语音转换为文本,并翻译成目标语言的文本,然后,可以以字幕的形式添加到源直播流的图像中,这样,观看者用户可以通过查看字幕的方式获知主播用户所说的内容。或者,在另一种情况下,完成文本翻译后,还可以进行语音合成,然后可以将源直播流中的语音流替换为翻译后的语音流,生成目标直播流。这样,观看者用户在观看直播的过程中,可以直接收听到目标语言对应的语音信息。
其中,在以字幕的形式提供翻译信息的情况下,第一服务端还可以向所述第二服务端提供字幕展示相关的参数信息,包括字幕布局参数,字幕框的位置、高度、大小,背景色,字数上限,字幕字体、大小、出现持续时间等等。这样所述第二服务端在获取到目标语言对应的翻译后的字幕流后,按照所述字幕展示相关的参数信息,将字幕添加到所述源直播流中,以生成对应的目标直播流。
具体的,由于本申请实施例的一种具体实施场景可以是商品对象信息服务系统中的多语言直播,而这种系统中的主播用户通常是商家或者卖家的销售员等,通常仅在介绍商品方面具有比较专业的知识,但是在直播技术方面可能并专业;另外,主播用户所使用的直播设备通常是手机等移动终端设备,设备本身也不够专业,等等,因此,具体直播画面的质量存在参差不齐的情况。例如,由于不同主播使用的设备的分辨率不同,导致直播画面的清晰度可能不同;另外,主播在开直播时可能对所在空间的选择也比较随意,导致有些直播画面背景可能比较乱,等等。而由于本申请实施例中提供的字幕信息需要添加到直播画面中,因此,上述因素的存在都可能影响字幕的添加效果。例如,对于分辨率比较低的设备,如果字幕字体比较小,则可能会出现字幕显示不清楚,不便于阅读的情况;对于直播画面背景比较乱的情况,如果字幕背景透明,则可能出现部分字幕显示不清楚的问题,但如果统一设置为非透明的字幕背景色,则对于直播画面背景比较简单的情况,非透明背景色的字幕对直播画面造成的遮挡又显得没有必要,等等。
为此,在本申请实施例中,可以结合具体第一客户端的实际情况,对字幕展示相关的参数进行确定。例如,可以获取第一客户端关联的终端设备的分辨率和/或直播过程所需的屏幕方向(竖屏或者横屏)信息,根据这些信息确定具体的字幕展示相关的参数。其中,关于上述分辨率信息,可以由第一客户端对所在终端设备本地的相关屏幕参数进行获取,或者,也可以在第一客户端提供用于录入屏幕参数的操作选项,由第一用户进行录入。关于屏幕方向信息,可以由第一用户进行录入,或者,具体实现时,还可以在直播开始之前获取所述多语言直播关联的直播场景信息,并根据所述直播场景信息向所述第一客户端提供关于屏幕方向的建议信息。例如,如果具体的直播场景是对服装类的商品对象进行介绍,包括对服装的上身效果进行展示,等等,此时,可以建议用户采用竖屏的方式进行直播,等等。在确定出上述信息后,可以根据具体的分辨率参数确定字幕框的高度、大小、字幕字体的大小,等等。另外,还可以根据屏幕方向信息,确定字幕框的位置,例如,如果是竖屏,则字幕框可以位于评论区的上方,如果是横屏,则字幕框可以位于评论区的右侧,以避免字幕文字与评论区的文字之间相互遮挡等情况发生,等等。
另外,除了屏幕分辨率、屏幕方向等信息外,还可以获取直播画面背景图像信息,例如,具体可以在直播开始之前,便开始采集直播现场的图像,以此获取直播画面背景图像,等等。通过上述直播画面背景图像信息,可以确定出背景图像的主色调,或者, 还可以确定出背景图像的混乱程度,以此可以确定出字幕背景是否采用透明色,在非透明的情况下,还可以根据直播画面背景图像的主色调确定字幕背景色,例如,具体可以是与直播画面背景图像主色调的色差比较大的颜色,以此提升字幕的辨识度。
在具体确定出字幕展示方面的上述各种参数后,可以提供给第二服务端,这样,第二服务端的导播台服务在获取到各种目标语言下的译文后,可以按照上述参数信息,将译文添加为源直播流的字幕,以此生成多种不同目标语言对应的目标直播流,并且可以分别保存到第一服务端预先指定的第二地址。
S203:接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。
在获得多种不同目标语言对应的翻译后的目标直播流后,可以向第二客户端提供具体的目标直播流。具体的,可以在接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。需要说明的是,在具体实现时,可以在第二客户端中提供用于开启或关闭多语言直播翻译功能的操作选项。这样,在用户发出具体的观看直播的请求时,可以首先对开关状态进行判断,如果所述直播翻译功能为开启状态,则可以向第一服务端提交获取翻译后的目标直播流的请求。否则,如果所述直播翻译功能为关闭状态,则可以向所述第一服务端提交获取源直播流的请求,以便对所述源直播流进行播放。
其中,具体确定第二客户端关联的用户所需的目标语言的方式可以有多种,例如,一种方式下,可以是在用户通过第二客户端发起获取直播流的请求时,提交所需的目标语言的信息。或者,另一种方式下,由于本申请实施例中的多语言直播具体可以包括在商品对象信息服务系统中创建的直播,因此,也可以根据所述第二客户端关联的用户在所述商品对象信息服务系统中产生的数据,确定所述第二客户端关联的用户所需的目标语言。
例如,具体的,可以根据所述第二客户端关联的用户对应的收货地址信息,确定所述第二客户端关联的用户所在的国家/地区;然后,根据所述国家/地区确定所述第二客户端关联的用户。或者,还可以根据用户关联的定位信息等确定用户所在的国家/地区,等等。
当然,在具体实现时,在通过上述方式自动判断用户所需目标语言的情况下,可能会出现判断错误等情况,因此,还可以在第二客户端中提供用于切换其他目标语言的操作选项,使得用户可以切换到其他目标语言对应的目标直播流进行播放。
在确定出用户所需的目标语言后,可以将该目标语言对应的目标直播流提供给用户。相应的,第二客户端便可以对目标直播流进行播放。通过这种方式,不同国家/地区的用户在观看同一场直播时,可以获得符合自己所需目标语言的直播内容。例如,在通过字 幕的方式提供目标直播流的情况下,如图3-3所示,英语国家/地区的用户看到的界面可以如(A)所示,通过英文字幕表达主播用户当前所说的内容,例如,“Sensible beauty tips for enhancing your appearance”。法语国家/地区的用户看到的界面可以如(B)所示,通过法语字幕表达主播用户当前所说的内容,例如,“Un bon smoking pour améliorer votre apparence”。日语国家/地区的用户看到的界面可以如(C)所示,通过日语字幕表达主播用户当前所说的内容,例如,“あなたの外観を強化するための賢明な美しさのヒント”,等等。
另外,第一服务端还可以根据客户端对各个第二地址的访问情况,对所述至少一种目标语言关联的国家/地区的用户分别对所述多语言直播的观看情况进行统计,并向所述第一客户端提供统计结果。例如,可以统计出某场直播在英语国家/地区的观看人数,在法语国家/地区的观看人数,在日语国家/地区的观看人数,等等。这些数据可以通过数据看板等形式提供给第一客户端,使得主播用户可以直观地确定出具体直播在各个不同语种的国家/地区的受欢迎程度等信息,进而还可以帮助用户对其营销策略等进行调整。例如,如果某直播在英语国家/地区的观看人数明显高于其他国家/地区,则可以重点在英语国家/地区进行营销策略的布局,等等。或者,这种数据看板信息还可以帮助用户对后续的直播策略进行调整,等等。
再者,在实际应用中,观看者用户还可以向其他用户分享直播地址,以使得其他用户也能够观看具体的直播。而在本申请实施例中,还可以支持观看者用户向其他国家/地区的用户进行分享。其中,如果在不同语言的用户之间进行分享,则第一服务端在接收到第二客户端对所述直播进行分享的请求后,还可以确定所述分享的目标用户所需的目标语言,并将该目标语言对应的目标直播流的地址返回给第一客户端,这样,第一客户端可以将该地址进行复制后分享给目标用户,以便在目标用户关联的客户端中进行播放。例如,某用户A向用户B分享某直播,在传统的方式下,用户A可以直接将自己收看直播的地址复制给用户B,但是,在本申请实施例中,如果用户B与用户A所需的目标语言不同,则用户B无法直接从用户A复制的地址中获得有效的直播内容。因此,在本申请实施例中,可以在第一客户端中提供分享操作选项,用户在需要向其他用户进行分享时,可以通过该操作选项发起分享请求,并且可以携带所需的目标语言信息。第一服务端在接收到请求后,可以进行地址转换,转换成用户B所需的目标语言对应的目标直播流所在的地址,然后返回给用户A,用户A再将该转换后的地址提供给用户B,从而使得用户B能够观看到符合自己目标语言的直播内容。
为便于更好的理解本申请实施例提供的具体技术方案,下面结合具体在商品对象信息服务系统中实现时的一个例子,对本申请实施例提供的一种可选的实现方案进行介绍。
假设某商家用户需要通过直播的方式向多个国家的消费者用户介绍其商品,则可以通过其关联的第一客户端发出多语言直播请求;发请求的同时还可以对其使用的源语言, 所需的目标语言等进行选择,使得请求中可以携带这些信息,当然,也可以不进行选择,由语音识别服务自动识别源语言,按照默认配置的目标语言进行翻译,等等。另外,还可以将第一客户端关联的终端设备的屏幕参数,直播所需的屏幕方向,直播画面背景图像信息等通过所述请求携带至第一服务端。
第一服务端收到创建多语言直播的请求后,可以向第二服务端请求创建多个导播台服务,分别与多个目标语言相对应。在创建导播台服务的过程中,还可以配置一些参数,具体就可以包括字幕展示方面的参数等。这种参数具体可以是根据第一客户端关联的设备的屏幕参数,屏幕方向,直播画面背景图像的主色调、色彩混乱程度等进行确定,以满足商品对象信息服务场景下由于直播过程中设备、主播的不专业性所产生的字幕展示需求。
完成与第二服务端之间的交互后,可以向关联的内容分发系统等申请一第一地址作为推流地址,以及多个第二地址作为拉流地址,以此完成多语言直播的创建,并将第一地址提供给第一客户端。
第一客户端在接收到直播创建完成的信息后,可以将采集到的源直播流保存到第一地址,同时,第一服务端可以向第二服务端发起请求,以启动之前创建的导播台服务,并向其提供第一地址以及第二地址的信息。相应的,导播台服务则可以从第一地址读取源直播流,并通过调用第三服务端的语音识别服务以及翻译服务,获得语音识别结果以及对应目标语言的译文。其中,在进行语言识别以及翻译时,可以预先基于商品对象信息服务系统中的历史直播记录进行模型训练,以提升译文的准确度。另外,还可以对该领域的一些专有名词进行提前录定,以此进一步提升译文的准确度。
第二服务端获得直播流中语言识别结果的译文后,可以按照第一服务端之前配置的字幕展示相关参数,将译文添加到源直播流的图像中,以生成对应目标语言下的目标直播流,并保存到对应的第二地址。
多个导播台服务都可以分别完成上述过程,从而使得在直播过程中,可以在多个第二地址处分别生成各种不同目标语言对应的目标直播流,分别带有各自目标语言的字幕。
当境外某国家/地区的消费者用户需要观看该直播时,可以通过第二客户端向第一服务端发起拉取直播流的请求。此时,第一服务端可以根据该第二客户端关联的用户在商品对象信息服务系统中的常用收货地址等信息,确定出该用户可能所需的目标语言,然后将该目标语言对应的第二地址提供给该第二客户端。第二客户端从该第二地址对目标直播流进行拉流并进行播放即可,从而使得用户能够通过字幕来理解直播中主播所说的内容。同时还可以在第二客户端提供用于选择更多目标语言的操作选项,使得用户可以切换到其他目标语言观看直播内容。
总之,通过本申请实施例,能够支持多语言直播的创建,并且可以根据源直播流生成至少一种目标语言对应的翻译后的目标直播流,在第二客户端发起获取直播流的请求 后,可以确定出第二客户端关联的用户所需的目标用户,并将对应的目标直播流提供给该第二客户端,使得用户能够观看到符合自己语言需求的直播内容。
在具体实现时,可以在商品对象信息服务系统中提供具体的多语言直播服务,此时,可以根据该商品对象信息服务系统中的历史直播记录提供训练样本,实现对翻译模型的训练。另外还可以根据商品对象信息服务领域的专有词汇提前进行翻译结果的录定,以此提升翻译结果的准确率。
再者,同样在商品对象信息服务系统中提供具体的多语言直播服务的情况下,还可以根据第二客户端关联的用户在该系统中产生的用户数据,包括常用的收货地址信息等,对用户所属的国家/地区进行判断,从而自动确定出用户所需的目标语言。
实施例二
该实施例二是与实施例一相对应的,从第二服务端的角度,提供了一种直播流处理方法,参见图4,该方法具体可以包括:
S401:第二服务端根据第一服务端提交的请求,创建至少一个导播台服务;所述请求是在所述第一服务端接收到创建多语言直播的请求后提交的;所述至少一个导播台服务与至少一种目标语言对应;
S402:获取所述第一服务端提供的第一地址以及至少一个第二地址,其中,所述第一地址用于保存所述直播的源直播流,所述至少一个第二地址与至少一种目标语言对应;
S403:在所述多语言直播创建成功后,启动所述导播台服务,所述导播台服务用于从所述第一地址读取所述源直播流,并通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到其中一目标语言对应的翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成该目标语言对应的翻译后的目标直播流,保存到该目标语言对应的第二地址。
具体实现时,所述导播台服务具体可以用于调用第三服务端提供的流式语音识别服务以及翻译服务,并生成第三地址,将所述第一地址以及第三地址提供给所述第三服务端,以便所述第三服务端在获得翻译结果后,保存到所述第三地址;所述导播台服务通过所述第三地址读取所述翻译结果,并与所述源直播流进行合成,生成所述目标直播流。
其中,所述翻译结果包括翻译后的文本流;此时,所述导播台服务具体用于,将文本流添加为所述源直播流的字幕信息,以生成对应的目标直播流。
或者,所述翻译结果包括翻译后的语音流;此时,所述导播台服务具体用于,从所述源直播流中将语音流删除,并与所述翻译后的语音流进行合成,生成所述目标直播流。
实施例三
该实施例三也是与实施例一相对应的,从第三服务端的角度,提供了一种直播流处理方法,参见图5,该方法具体可以包括:
S501:第三服务端根据第二服务端的调用请求,创建流式语音识别服务以及翻译服 务,其中,所述请求中携带有目标语言信息,第一地址以及第三地址,所述第一地址用于保存源直播流;
S502:从所述第一地址读取所述源直播流,并通过所述流式语音识别服务对所述源直播流进行语音识别;
S503:通过翻译服务对语音识别结果进行翻译,得到所述目标语言对应的翻译结果,并将所述翻译结果保存到所述第三地址,以便所述第二服务端从所述第三地址获取所述翻译结果,并与所述源直播流合成为目标语言对应的目标直播流。
其中,所述直播包括在商品对象信息服务系统中创建的直播;此时,所述翻译服务具体可以是根据预先建立的翻译模型对语音识别结果进行翻译,所述翻译模型是以所述商品对象信息服务系统中的历史直播记录为训练数据进行训练获得的。
另外,所述翻译服务还可以根据预先保存的与商品对象介绍相关的专用词汇的翻译信息对所述语音识别结果进行翻译。
实施例四
该实施例四是从主播用户关联的第一客户端的角度,提供了一种直播方法,参见图6,该方法具体可以包括:
S601:第一客户端接收创建多语言直播的请求;
S602:将所述请求提交到第一服务端,并接收所述第一服务端返回的第一地址;
S603:在所述直播创建成功后,将产生的直播流提交到所述第一地址,以便从所述第一地址获取所述源直播流,并获得至少一种目标语言对应的翻译后的目标直播流,以用于提供给具有目标语言需求的用户关联的第二客户端。
具体实现时,还可以提供用于对源直播关联的源语言进行选择的操作选项;将通过所述操作选项接收到的源语言信息提交到所述第一服务端。
另外,还可以接收所述第一服务端提供的统计信息,所述统计信息包括:所述至少一种目标语言关联的国家/地区的用户分别对所述多语言直播的观看情况,并对所述统计信息进行展示。
实施例五
该实施例五是从观看者用户关联的第二客户端的角度,提供了一种获取直播流的方法,参见图7,该方法具体可以包括:
S701:第二客户端向第一服务端提交获取直播流的请求;
S702:接收所述第一服务端提供的第二地址,所述第二地址是根据所述第二客户端关联的用户所需的目标语言确定的,所述第二地址保存有所述目标语言对应的翻译后的目标直播流;
S703:通过所述第二地址拉取所述目标直播流并进行播放。
具体实现时,还可以提供用于对目标语言进行重新选择的操作选项;将通过所述操 作选项重新选定的目标语言提交到所述第一服务端,以便所述第一服务端提供该重新选定的目标语言对应的第二地址。
另外,还可以提供用于开启或关闭多语言直播翻译功能的操作选项;具体在向第一服务端提交获取直播流的请求时,如果所述直播翻译功能为开启状态,则向第一服务端提交获取翻译后的目标直播流的请求。否则,如果所述直播翻译功能为关闭状态,则向所述第一服务端提交获取源直播流的请求,以便对所述源直播流进行播放。
另外,还可以提供用于对所述直播进行分享的操作选项;通过所述操作选项接收到分享请求后,确定分享对象所需的目标语言,并将所述分享请求以及所述分享对象所需的目标语言提交到所述第一服务端;接收到所述第一服务端返回的与所述分享对象所需的目标语言对应的第二地址后,将该第二地址提供给所述分享对象关联的客户端。
关于前述实施例二至实施例五中的未详述部分,可以参见前述实施例一中的记载,这里不再赘述。
需要说明的是,本申请实施例中可能会涉及到对用户数据的使用,在实际应用中,可以在符合所在国的适用法律法规要求的情况下(例如,用户明确同意,对用户切实通知,等),在适用法律法规允许的范围内在本文描述的方案中使用用户特定的个人数据。
与实施例一相对应,本申请实施例还提供了一种直播装置,该装置应用于第一服务端,参见图8,该装置具体可以包括:
请求接收单元801,用于接收第一客户端提交的创建多语言直播的请求;
目标直播流获得单元802,用于在所述多语言直播创建成功后,根据所述第一客户端采集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流;
目标直播流提供单元803,用于接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。
具体的,所述目标直播流获得单元可以包括:
地址生成单元,用于生成第一地址以及至少一个第二地址,所述至少一个第二地址与至少一种目标语言对应;
第一地址提供单元,用于将所述第一地址提供给所述第一客户端,以便在所述多语言直播创建成功后,所述第一客户端将产生的源直播流保存到所述第一地址;
第二地址提供单元,用于将所述第一地址以及至少一个第二地址提供给第二服务端,以便所述第二服务端从所述第一地址获得所述源直播流,并在获得至少一种目标语言对应的翻译后的目标直播流后,分别保存到所述第二地址;
所述目标直播流提供单元具体可以用于:
将该目标语言对应的第二地址返回给所述第二客户端,以便所述第二客户端从该第二地址获取该目标语言对应的翻译后的目标直播流进行播放。
其中,所述第二地址提供单元具体可以用于:
通过调用第二服务端提供的服务创建接口,在所述第二服务端中启动至少一个导播台服务,所述至少一个导播台服务分别与所述至少一种目标语言对应;所述导播台服务通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成所述翻译后的目标直播流;其中,对所述导播台服务进行调用的请求中携带有所述第一地址以及第二地址的信息。
具体的,所述导播台服务可以通过调用第三服务端提供的流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取翻译结果;其中,所述导播台服务在调用请求中携带第三地址,以便将翻译结果保存到所述第三地址,所述导播台服务将第一地址的源直播流与第三地址的翻译结果进行合流,生成翻译后的目标直播流,并保存到所述第二地址。
其中,所述多语言直播包括在商品对象信息服务系统中创建的直播;此时,所述翻译服务是根据预先建立的翻译模型对语音识别结果进行翻译,所述翻译模型是以所述商品对象信息服务系统中的历史直播记录为训练数据进行训练获得的。
另外,所述翻译服务还可以根据预先保存的与商品对象介绍相关的专用词汇的翻译信息,对所述语音识别结果进行翻译。
再者,所述翻译服务在对所述语音识别结果进行翻译前,还可以对所述语音识别结果的句子结构进行调整。
具体实现时,该装置还可以包括:
源语言信息确定单元,用于根据所述创建多语言直播的请求中携带的信息确定所述直播关联的源语言信息;
源语言信息提供单元,用于将所述源语言信息提供给所述第二服务端。
其中,所述翻译后的目标直播流包括:关联有所述目标语言对应字幕的直播流;此时,该装置还可以包括:
布局参数信息提供单元,用于向所述第二服务端提供页面布局参数信息,以便所述导播台服务在获取到目标语言对应的翻译后的文本流后,按照所述页面布局参数,将文本流添加为所述源直播流的字幕信息,以生成对应的目标直播流。
另外该装置还可以包括:
统计单元,用于根据所述第二地址的访问情况,对所述至少一种目标语言关联的国家/地区的用户分别对所述多语言直播的观看情况进行统计,并向所述第一客户端提供统计结果。
其中,所述翻译后的目标直播流包括:关联有所述目标语言对应字幕的直播流,或者,关联有所述目标语言对应的语音的直播流。
具体的,所述多语言直播包括在商品对象信息服务系统中创建的直播;此时,目标直播流提供单元可以用于:
根据所述第二客户端关联的用户在所述商品对象信息服务系统中产生的数据,确定所述第二客户端关联的用户所需的目标语言。
具体的,所述目标直播流提供单元可以用于:
根据所述第二客户端关联的用户对应的收货地址信息,确定所述第二客户端关联的用户所在的国家/地区;根据所述国家/地区确定所述第二客户端关联的用户。
另外,该装置还可以包括:
分享单元,用于接收到第二客户端对所述直播进行分享的请求时,确定分享的目标用户所需的目标语言,并将该目标语言对应的目标直播流所在的地址信息提供给所述第二客户端,以便所述第二客户端将该地址分享给所述目标用户。
与实施例二相对应,本申请实施例还提供了一种直播流处理装置,该装置应用于第二服务端,参见图9,该装置具体可以包括:
导播台服务创建单元901,用于根据第一服务端提交的请求,创建至少一个导播台服务;所述请求是在所述第一服务端接收到创建多语言直播的请求后提交的;所述至少一个导播台服务与至少一种目标语言对应;
地址获取单元902,用于获取所述第一服务端提供的第一地址以及至少一个第二地址,其中,所述第一地址用于保存所述直播的源直播流,所述至少一个第二地址与至少一种目标语言对应;
导播台服务启动单元903,用于在所述多语言直播创建成功后,启动所述导播台服务,所述导播台服务用于从所述第一地址读取所述源直播流,并通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到其中一目标语言对应的翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成该目标语言对应的翻译后的目标直播流,保存到该目标语言对应的第二地址。
其中,所述导播台服务具体用于调用第三服务端提供的流式语音识别服务以及翻译服务,并生成第三地址,将所述第一地址以及第三地址提供给所述第三服务端,以便所述第三服务端在获得翻译结果后,保存到所述第三地址;所述导播台服务通过所述第三地址读取所述翻译结果,并与所述源直播流进行合成,生成所述目标直播流。
其中,所述翻译结果包括翻译后的文本流;
所述导播台服务具体用于,将文本流添加为所述源直播流的字幕信息,以生成对应的目标直播流。
或者,所述翻译结果包括翻译后的语音流;
所述导播台服务具体用于,从所述源直播流中将语音流删除,并与所述翻译后的语音流进行合成,生成所述目标直播流。
与实施例三相对应,本申请实施例还提供了一种直播流处理装置,参见图10,该装置应用于第三服务端,包括:
服务创建单元1001,用于根据第二服务端的调用请求,创建流式语音识别服务以及翻译服务,其中,所述请求中携带有目标语言信息,第一地址以及第三地址,所述第一地址用于保存源直播流;
语音识别单元1002,用于从所述第一地址读取所述源直播流,并通过所述流式语音识别服务对所述源直播流进行语音识别;
翻译单元1003,用于通过翻译服务对语音识别结果进行翻译,得到所述目标语言对应的翻译结果,并将所述翻译结果保存到所述第三地址,以便所述第二服务端从所述第三地址获取所述翻译结果,并与所述源直播流合成为目标语言对应的目标直播流。
其中,所述直播包括在商品对象信息服务系统中创建的直播;
所述翻译服务是根据预先建立的翻译模型对语音识别结果进行翻译,所述翻译模型是以所述商品对象信息服务系统中的历史直播记录为训练数据进行训练获得的。
另外,所述翻译服务还根据预先保存的与商品对象介绍相关的专用词汇的翻译信息对所述语音识别结果进行翻译。
与实施例四相对应,本申请实施例还提供了一种直播装置,应用于第一客户端,参见图11,该装置具体可以包括:
请求接收单元1101,用于接收创建多语言直播的请求;
请求提交单元1102,用于将所述请求提交到第一服务端,并接收所述第一服务端返回的第一地址;
推流单元1103,用于在所述直播创建成功后,将产生的直播流提交到所述第一地址,以便从所述第一地址获取所述源直播流,并获得至少一种目标语言对应的翻译后的目标直播流,以用于提供给具有目标语言需求的用户关联的第二客户端。
具体实现时,该装置还可以包括:
操作选项提供单元,用于提供用于对源直播关联的源语言进行选择的操作选项;
源语言信息提交单元,用于将通过所述操作选项接收到的源语言信息提交到所述第一服务端。
另外,该装置还可以包括:
统计信息接收单元,用于接收所述第一服务端提供的统计信息,所述统计信息包括:所述至少一种目标语言关联的国家/地区的用户分别对所述多语言直播的观看情况;
统计信息展示单元,用于对所述统计信息进行展示。
与实施例五相对应,本申请实施例还提供了一种获取直播流装置,参见图12,该装置应用于第二客户端,包括:
请求提交单元1201,用于向第一服务端提交获取直播流的请求;
地址获得单元1202,用于接收所述第一服务端提供的第二地址,所述第二地址是根据所述第二客户端关联的用户所需的目标语言确定的,所述第二地址保存有所述目标语言对应的翻译后的目标直播流;
拉流单元1203,用于通过所述第二地址拉取所述目标直播流并进行播放。
具体实现时,该装置还可以包括:
第一操作选项提供单元,用于提供用于对目标语言进行重新选择的操作选项;
重选结果提交单元,用于将通过所述操作选项重新选定的目标语言提交到所述第一服务端,以便所述第一服务端提供该重新选定的目标语言对应的第二地址。
另外,该装置还可以包括:
第二操作选项提供单元,用于提供用于开启或关闭多语言直播翻译功能的操作选项;
所述请求提交单元具体可以用于:
如果所述直播翻译功能为开启状态,则向第一服务端提交获取翻译后的目标直播流的请求。
另外,所述请求提交单元还可以用于:
如果所述直播翻译功能为关闭状态,则向所述第一服务端提交获取源直播流的请求,以便对所述源直播流进行播放。
再者,该装置还可以包括:
第三操作选项提供单元,用于提供用于对所述直播进行分享的操作选项;
目标语言确定单元,用于通过所述操作选项接收到分享请求后,确定分享对象所需的目标语言,并将所述分享请求以及所述分享对象所需的目标语言提交到所述第一服务端;
分享单元,用于接收到所述第一服务端返回的与所述分享对象所需的目标语言对应的第二地址后,将该第二地址提供给所述分享对象关联的客户端。
另外,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述方法实施例中任一项所述的方法的步骤。
以及一种电子设备,包括:
一个或多个处理器;以及
与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行前述方法实施例中任一项所述的方法的步骤。
其中,图13示例性的展示出了电子设备的架构,例如,设备1300可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理,飞行器等。
参照图13,设备1300可以包括以下一个或多个组件:处理组件1302,存储器1304, 电源组件1306,多媒体组件1308,音频组件1310,输入/输出(I/O)的接口1312,传感器组件1314,以及通信组件1316。
处理组件1302通常控制设备1300的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理元件1302可以包括一个或多个处理器1320来执行指令,以完成本公开技术方案提供的方法的全部或部分步骤。此外,处理组件1302可以包括一个或多个模块,便于处理组件1302和其他组件之间的交互。例如,处理部件1302可以包括多媒体模块,以方便多媒体组件1308和处理组件1302之间的交互。
存储器1304被配置为存储各种类型的数据以支持在设备1300的操作。这些数据的示例包括用于在设备1300上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器1304可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件1306为设备1300的各种组件提供电力。电源组件1306可以包括电源管理系统,一个或多个电源,及其他与为设备1300生成、管理和分配电力相关联的组件。
多媒体组件1308包括在设备1300和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件1308包括一个前置摄像头和/或后置摄像头。当设备1300处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件1310被配置为输出和/或输入音频信号。例如,音频组件1310包括一个麦克风(MIC),当设备1300处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器1304或经由通信组件1316发送。在一些实施例中,音频组件1310还包括一个扬声器,用于输出音频信号。
I/O接口1312为处理组件1302和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件1314包括一个或多个传感器,用于为设备1300提供各个方面的状态评估。例如,传感器组件1314可以检测到设备1300的打开/关闭状态,组件的相对定位,例如所述组件为设备1300的显示器和小键盘,传感器组件1314还可以检测设备1300或 设备1300一个组件的位置改变,用户与设备1300接触的存在或不存在,设备1300方位或加速/减速和设备1300的温度变化。传感器组件1314可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件1314还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件1314还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件1316被配置为便于设备1300和其他设备之间有线或无线方式的通信。设备1300可以接入基于通信标准的无线网络,如WiFi,或2G、3G、4G/LTE、5G等移动通信网络。在一个示例性实施例中,通信部件1316经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信部件1316还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,设备1300可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器1304,上述指令可由设备1300的处理器1320执行以完成本公开技术方案提供的方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统或系统实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的系统及系统实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上对本申请所提供的直播方法、装置及电子设备,进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理 解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处。综上所述,本说明书内容不应理解为对本申请的限制。

Claims (40)

  1. 一种直播方法,其特征在于,包括:
    第一服务端接收第一客户端提交的创建多语言直播的请求;
    在所述多语言直播创建成功后,根据所述第一客户端采集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流;
    接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。
  2. 根据权利要求1所述的方法,其特征在于,
    所述根据所述第一客户端采集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流,包括:
    生成第一地址以及至少一个第二地址,所述至少一个第二地址与至少一种目标语言对应;
    将所述第一地址提供给所述第一客户端,以便在所述多语言直播创建成功后,所述第一客户端将产生的源直播流保存到所述第一地址;
    将所述第一地址以及至少一个第二地址提供给第二服务端,以便所述第二服务端从所述第一地址获得所述源直播流,并在获得至少一种目标语言对应的翻译后的目标直播流后,分别保存到所述第二地址;
    所述将该目标语言对应的目标直播流提供给所述第二客户端进行播放,包括:
    将该目标语言对应的第二地址返回给所述第二客户端,以便所述第二客户端从该第二地址获取该目标语言对应的翻译后的目标直播流进行播放。
  3. 根据权利要求2所述的方法,其特征在于,
    所述将所述第一地址以及至少一个第二地址提供给第二服务端,包括:
    通过调用第二服务端提供的接口,在所述第二服务端中创建并启动至少一个导播台服务,所述至少一个导播台服务分别与所述至少一种目标语言对应;所述导播台服务通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成所述翻译后的目标直播流;
    其中,调用所述接口的请求中携带有所述第一地址以及第二地址的信息。
  4. 根据权利要求3所述的方法,其特征在于,
    所述导播台服务通过调用第三服务端提供的流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取翻译结果;
    其中,所述导播台服务在调用请求中携带第三地址,以便将翻译结果保存到所述第三地址,所述导播台服务将第一地址的源直播流与第三地址的翻译结果进行合流,生成翻译后的目标直播流,并保存到所述第二地址。
  5. 根据权利要求3所述的方法,其特征在于,
    所述多语言直播包括在商品对象信息服务系统中创建的直播;
    所述翻译服务是根据预先建立的翻译模型对语音识别结果进行翻译,所述翻译模型是以所述商品对象信息服务系统中的历史直播记录为训练数据进行训练获得的。
  6. 根据权利要求5所述的方法,其特征在于,
    所述翻译服务还根据预先保存的与商品对象介绍相关的专用词汇的翻译信息,对所述语音识别结果进行翻译。
  7. 根据权利要求3所述的方法,其特征在于,
    所述翻译服务在对所述语音识别结果进行翻译前,还对所述语音识别结果的句子结构进行调整。
  8. 根据权利要求2所述的方法,其特征在于,还包括:
    根据所述创建多语言直播的请求中携带的信息确定所述直播关联的源语言信息;
    将所述源语言信息提供给所述第二服务端。
  9. 根据权利要求3所述的方法,其特征在于,
    所述翻译后的目标直播流包括:关联有所述目标语言对应字幕的直播流;
    所述方法还包括:
    向所述第二服务端提供字幕展示相关参数的信息,以便所述导播台服务在获取到目标语言对应的翻译后的文本流后,按照所述字幕展示相关参数,将文本流添加为所述源直播流的字幕信息,以生成对应的目标直播流。
  10. 根据权利要求9所述的方法,其特征在于,还包括:
    获取所述第一客户端关联的终端设备的分辨率和/或直播过程所需的屏幕方向信息;
    根据所述分辨率和/或直播过程所需的屏幕方向信息确定所述字幕展示相关参数。
  11. 根据权利要求10所述的方法,其特征在于,还包括:
    获取所述多语言直播关联的直播场景信息,并根据所述直播场景信息向所述第一客户端提供关于屏幕方向的建议信息。
  12. 根据权利要求9所述的方法,其特征在于,还包括:
    获取所述多语言直播关联的直播画面背景图像信息;
    根据所述直播画面背景图像信息确定所述字幕展示相关参数。
  13. 根据权利要求9所述的方法,其特征在于,
    所述字幕展示相关参数包括以下一种或多种:字幕布局参数,字幕框的位置、高度、大小,背景色,字数上限,字幕字体、大小、出现持续时间。
  14. 根据权利要求2所述的方法,其特征在于,还包括:
    根据所述第二地址的访问情况,对所述至少一种目标语言关联的国家/地区的用户分别对所述多语言直播的观看情况进行统计,并向所述第一客户端提供统计结果。
  15. 根据权利要求1所述的方法,其特征在于,
    所述翻译后的目标直播流包括:关联有所述目标语言对应字幕的直播流,或者,关联有所述目标语言对应的语音的直播流。
  16. 根据权利要求1所述的方法,其特征在于,
    所述多语言直播包括在商品对象信息服务系统中创建的直播;
    所述确定所述第二客户端关联的用户所需的目标语言,包括:
    根据所述第二客户端关联的用户在所述商品对象信息服务系统中产生的数据,确定所述第二客户端关联的用户所需的目标语言。
  17. 根据权利要求16所述的方法,其特征在于,
    所述确定所述第二客户端关联的用户所需的目标语言,包括:
    根据所述第二客户端关联的用户对应的收货地址信息,确定所述第二客户端关联的用户所在的国家/地区;
    根据所述国家/地区确定所述第二客户端关联的用户。
  18. 根据权利要求1所述的方法,其特征在于,
    接收到第二客户端对所述直播进行分享的请求时,确定分享的目标用户所需的目标语言,并将该目标语言对应的目标直播流所在的地址信息提供给所述第二客户端,以便所述第二客户端将该地址分享给所述目标用户。
  19. 一种直播流处理方法,其特征在于,包括:
    第二服务端根据第一服务端提交的请求,创建至少一个导播台服务;所述请求是在所述第一服务端接收到创建多语言直播的请求后提交的;所述至少一个导播台服务与至少一种目标语言对应;
    获取所述第一服务端提供的第一地址以及至少一个第二地址,其中,所述第一地址用于保存所述直播的源直播流,所述至少一个第二地址与至少一种目标语言对应;
    在所述多语言直播创建成功后,启动所述导播台服务,所述导播台服务用于从所述第一地址读取所述源直播流,并通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到其中一目标语言对应的翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成该目标语言对应的翻译后的目标直播流,保存到该目标语言对应的第二地址。
  20. 根据权利要求19所述的方法,其特征在于,
    所述导播台服务具体用于调用第三服务端提供的流式语音识别服务以及翻译服务,并生成第三地址,将所述第一地址以及第三地址提供给所述第三服务端,以便所述第三服务端在获得翻译结果后,保存到所述第三地址;所述导播台服务通过所述第三地址读取所述翻译结果,并与所述源直播流进行合成,生成所述目标直播流。
  21. 根据权利要求19所述的方法,其特征在于,
    所述翻译结果包括翻译后的文本流;
    所述导播台服务具体用于,将文本流添加为所述源直播流的字幕信息,以生成对 应的目标直播流。
  22. 根据权利要求19所述的方法,其特征在于,
    所述翻译结果包括翻译后的语音流;
    所述导播台服务具体用于,从所述源直播流中将语音流删除,并与所述翻译后的语音流进行合成,生成所述目标直播流。
  23. 一种直播流处理方法,其特征在于,包括:
    第三服务端根据第二服务端的调用请求,创建流式语音识别服务以及翻译服务,其中,所述请求中携带有目标语言信息,第一地址以及第三地址,所述第一地址用于保存源直播流;
    从所述第一地址读取所述源直播流,并通过所述流式语音识别服务对所述源直播流进行语音识别;
    通过翻译服务对语音识别结果进行翻译,得到所述目标语言对应的翻译结果,并将所述翻译结果保存到所述第三地址,以便所述第二服务端从所述第三地址获取所述翻译结果,并与所述源直播流合成为目标语言对应的目标直播流。
  24. 根据权利要求23所述的方法,其特征在于,
    所述直播包括在商品对象信息服务系统中创建的直播;
    所述翻译服务是根据预先建立的翻译模型对语音识别结果进行翻译,所述翻译模型是以所述商品对象信息服务系统中的历史直播记录为训练数据进行训练获得的。
  25. 根据权利要求23所述的方法,其特征在于,
    所述翻译服务还根据预先保存的与商品对象介绍相关的专用词汇的翻译信息对所述语音识别结果进行翻译。
  26. 一种直播方法,其特征在于,包括:
    第一客户端接收创建多语言直播的请求;
    将所述请求提交到第一服务端,并接收所述第一服务端返回的第一地址;
    在所述直播创建成功后,将产生的直播流提交到所述第一地址,以便从所述第一地址获取源直播流,并获得至少一种目标语言对应的翻译后的目标直播流,以用于提供给具有目标语言需求的用户关联的第二客户端。
  27. 根据权利要求26所述的方法,其特征在于,还包括:
    提供用于对源直播关联的源语言进行选择的操作选项;
    将通过所述操作选项接收到的源语言信息提交到所述第一服务端。
  28. 根据权利要求26所述的方法,其特征在于,还包括:
    接收所述第一服务端提供的统计信息,所述统计信息包括:所述至少一种目标语言关联的国家/地区的用户分别对所述多语言直播的观看情况;
    对所述统计信息进行展示。
  29. 一种获取直播流方法,其特征在于,包括:
    第二客户端向第一服务端提交获取直播流的请求;
    接收所述第一服务端提供的第二地址,所述第二地址是根据所述第二客户端关联的用户所需的目标语言确定的,所述第二地址保存有所述目标语言对应的翻译后的目标直播流;
    通过所述第二地址拉取所述目标直播流并进行播放。
  30. 根据权利要求29所述的方法,其特征在于,还包括:
    提供用于对目标语言进行重新选择的操作选项;
    将通过所述操作选项重新选定的目标语言提交到所述第一服务端,以便所述第一服务端提供该重新选定的目标语言对应的第二地址。
  31. 根据权利要求29所述的方法,其特征在于,还包括:
    提供用于开启或关闭多语言直播翻译功能的操作选项;
    所述向第一服务端提交获取直播流的请求,包括:
    如果所述直播翻译功能为开启状态,则向第一服务端提交获取翻译后的目标直播流的请求。
  32. 根据权利要求31所述的方法,其特征在于,还包括:
    如果所述直播翻译功能为关闭状态,则向所述第一服务端提交获取源直播流的请求,以便对所述源直播流进行播放。
  33. 根据权利要求29所述的方法,其特征在于,还包括:
    提供用于对所述直播进行分享的操作选项;
    通过所述操作选项接收到分享请求后,确定分享对象所需的目标语言,并将所述分享请求以及所述分享对象所需的目标语言提交到所述第一服务端;
    接收到所述第一服务端返回的与所述分享对象所需的目标语言对应的第二地址后,将该第二地址提供给所述分享对象关联的客户端。
  34. 一种直播装置,其特征在于,应用于第一服务端,包括:
    请求接收单元,用于接收第一客户端提交的创建多语言直播的请求;
    目标直播流获得单元,用于在所述多语言直播创建成功后,根据所述第一客户端采集到的源直播流,获得至少一种目标语言对应的翻译后的目标直播流;
    目标直播流提供单元,用于接收到第二客户端提交的拉取直播流的请求后,确定所述第二客户端关联的用户所需的目标语言,并将该目标语言对应的目标直播流提供给所述第二客户端进行播放。
  35. 一种直播流处理装置,其特征在于,应用于第二服务端,包括:
    导播台服务创建单元,用于根据第一服务端提交的请求,创建至少一个导播台服务;所述请求是在所述第一服务端接收到创建多语言直播的请求后提交的;所述至少一个导播台服务与至少一种目标语言对应;
    地址获取单元,用于获取所述第一服务端提供的第一地址以及至少一个第二地址, 其中,所述第一地址用于保存所述直播的源直播流,所述至少一个第二地址与至少一种目标语言对应;
    导播台服务启动单元,用于在所述多语言直播创建成功后,启动所述导播台服务,所述导播台服务用于从所述第一地址读取所述源直播流,并通过调用流式语音识别服务以及翻译服务,对所述源直播流进行流式语音识别并获取到其中一目标语言对应的翻译结果后,通过将所述源直播流与所述翻译结果进行合流,生成该目标语言对应的翻译后的目标直播流,保存到该目标语言对应的第二地址。
  36. 一种直播流处理装置,其特征在于,应用于第三服务端,包括:
    服务创建单元,用于根据第二服务端的调用请求,创建流式语音识别服务以及翻译服务,其中,所述请求中携带有目标语言信息,第一地址以及第三地址,所述第一地址用于保存源直播流;
    语音识别单元,用于从所述第一地址读取所述源直播流,并通过所述流式语音识别服务对所述源直播流进行语音识别;
    翻译单元,用于通过翻译服务对语音识别结果进行翻译,得到所述目标语言对应的翻译结果,并将所述翻译结果保存到所述第三地址,以便所述第二服务端从所述第三地址获取所述翻译结果,并与所述源直播流合成为目标语言对应的目标直播流。
  37. 一种直播装置,其特征在于,应用于第一客户端,包括:
    请求接收单元,用于接收创建多语言直播的请求;
    请求提交单元,用于将所述请求提交到第一服务端,并接收所述第一服务端返回的第一地址;
    推流单元,用于在所述直播创建成功后,将产生的直播流提交到所述第一地址,以便从所述第一地址获取源直播流,并获得至少一种目标语言对应的翻译后的目标直播流,以用于提供给具有目标语言需求的用户关联的第二客户端。
  38. 一种获取直播流装置,其特征在于,应用于第二客户端,包括:
    请求提交单元,用于向第一服务端提交获取直播流的请求;
    地址获得单元,用于接收所述第一服务端提供的第二地址,所述第二地址是根据所述第二客户端关联的用户所需的目标语言确定的,所述第二地址保存有所述目标语言对应的翻译后的目标直播流;
    拉流单元,用于通过所述第二地址拉取所述目标直播流并进行播放。
  39. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至33任一项所述的方法的步骤。
  40. 一种电子设备,其特征在于,包括:
    一个或多个处理器;以及
    与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行权利要求1至33任一项所述的方法 的步骤。
PCT/CN2021/107766 2020-07-27 2021-07-22 直播方法、装置及电子设备 WO2022022370A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010733464.9A CN113301357B (zh) 2020-07-27 2020-07-27 直播方法、装置及电子设备
CN202010733464.9 2020-07-27

Publications (1)

Publication Number Publication Date
WO2022022370A1 true WO2022022370A1 (zh) 2022-02-03

Family

ID=77318168

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107766 WO2022022370A1 (zh) 2020-07-27 2021-07-22 直播方法、装置及电子设备

Country Status (2)

Country Link
CN (1) CN113301357B (zh)
WO (1) WO2022022370A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113452935B (zh) * 2021-08-31 2021-11-09 成都索贝数码科技股份有限公司 横屏和竖屏直播视频生成系统及方法
CN114501042A (zh) * 2021-12-20 2022-05-13 阿里巴巴(中国)有限公司 跨境直播处理方法及电子设备
CN114745595B (zh) * 2022-05-10 2024-02-27 上海哔哩哔哩科技有限公司 弹幕显示方法及装置
CN114866822B (zh) * 2022-05-10 2024-04-09 上海哔哩哔哩科技有限公司 直播推流方法及装置,直播拉流方法及装置
CN116847113B (zh) * 2023-06-20 2024-03-12 联城科技(河北)股份有限公司 基于云架构模块的视频直播中转方法、装置、设备、介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106340294A (zh) * 2016-09-29 2017-01-18 安徽声讯信息技术有限公司 基于同步翻译的新闻直播字幕在线制作系统
CN106921873A (zh) * 2017-02-28 2017-07-04 北京小米移动软件有限公司 直播控制方法及装置
CN108566558A (zh) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 视频流处理方法、装置、计算机设备及存储介质
CN108737845A (zh) * 2018-05-22 2018-11-02 北京百度网讯科技有限公司 直播处理方法、装置、设备以及存储介质
WO2019040400A1 (en) * 2017-08-21 2019-02-28 Kudo, Inc. SYSTEMS AND METHODS FOR LANGUAGE CHANGE DURING LIVE PRESENTATION
CN110636323A (zh) * 2019-10-15 2019-12-31 博科达(北京)科技有限公司 一种基于云平台的全球直播及视频点播系统及方法
CN110730952A (zh) * 2017-11-03 2020-01-24 腾讯科技(深圳)有限公司 处理网络上的音频通信的方法和系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108401192B (zh) * 2018-04-25 2022-02-22 腾讯科技(深圳)有限公司 视频流处理方法、装置、计算机设备及存储介质
CN110111775B (zh) * 2019-05-17 2021-06-22 腾讯科技(深圳)有限公司 一种流式语音识别方法、装置、设备及存储介质
CN110769265A (zh) * 2019-10-08 2020-02-07 深圳创维-Rgb电子有限公司 一种同声字幕翻译方法、智能电视及存储介质
CN111191472A (zh) * 2019-12-31 2020-05-22 湖南师语信息科技有限公司 一种教学辅助翻译学习系统和方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106340294A (zh) * 2016-09-29 2017-01-18 安徽声讯信息技术有限公司 基于同步翻译的新闻直播字幕在线制作系统
CN106921873A (zh) * 2017-02-28 2017-07-04 北京小米移动软件有限公司 直播控制方法及装置
WO2019040400A1 (en) * 2017-08-21 2019-02-28 Kudo, Inc. SYSTEMS AND METHODS FOR LANGUAGE CHANGE DURING LIVE PRESENTATION
CN110730952A (zh) * 2017-11-03 2020-01-24 腾讯科技(深圳)有限公司 处理网络上的音频通信的方法和系统
CN108566558A (zh) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 视频流处理方法、装置、计算机设备及存储介质
CN108737845A (zh) * 2018-05-22 2018-11-02 北京百度网讯科技有限公司 直播处理方法、装置、设备以及存储介质
CN110636323A (zh) * 2019-10-15 2019-12-31 博科达(北京)科技有限公司 一种基于云平台的全球直播及视频点播系统及方法

Also Published As

Publication number Publication date
CN113301357B (zh) 2022-11-29
CN113301357A (zh) 2021-08-24

Similar Documents

Publication Publication Date Title
WO2022022370A1 (zh) 直播方法、装置及电子设备
US11895426B2 (en) Method and apparatus for capturing video, electronic device and computer-readable storage medium
CN109600659B (zh) 播放视频时的操作方法、装置、设备及存储介质
WO2020077855A1 (zh) 视频拍摄方法、装置、电子设备及计算机可读存储介质
WO2022022196A1 (zh) 弹幕发布及展示方法及电子设备
US11153666B2 (en) Method and apparatus for releasing video file
US20200258146A1 (en) Electronic purchase order generation method and device, terminal and storage medium
WO2022152064A1 (zh) 视频生成方法、装置、电子设备和存储介质
US20230110542A1 (en) Product Object Information Providing Method, Apparatus, and Electronic Device
US20150289024A1 (en) Display apparatus and control method thereof
WO2017219267A1 (zh) 卡片显示方法及装置
WO2021237590A1 (zh) 图像采集方法、装置、设备及存储介质
CN109754298B (zh) 界面信息提供方法、装置及电子设备
US20220078221A1 (en) Interactive method and apparatus for multimedia service
WO2023098531A1 (zh) 视频处理方法、视频处理装置和计算机可读存储介质
US20240119082A1 (en) Method, apparatus, device, readable storage medium and product for media content processing
CN108028966B (zh) 视频提供装置、视频提供方法及计算机程序
US20230317117A1 (en) Video generation method and apparatus, device, and storage medium
US20240028189A1 (en) Interaction method and apparatus, electronic device and computer readable medium
CN109729367B (zh) 提供直播媒体内容信息的方法、装置及电子设备
WO2024078486A1 (zh) 内容展示方法、装置、设备及存储介质
US20240069706A1 (en) Method and apparatus for displaying co-hosting, electronic device and computer readable medium
WO2024022473A1 (zh) 在直播间发送评论和接收评论的方法及相关设备
CN114302221A (zh) 一种虚拟现实设备及投屏媒资播放方法
WO2023169361A1 (zh) 信息推荐方法、装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21850704

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21850704

Country of ref document: EP

Kind code of ref document: A1