WO2021103741A1 - 内容处理方法、装置、计算机设备及存储介质 - Google Patents

内容处理方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021103741A1
WO2021103741A1 PCT/CN2020/114352 CN2020114352W WO2021103741A1 WO 2021103741 A1 WO2021103741 A1 WO 2021103741A1 CN 2020114352 W CN2020114352 W CN 2020114352W WO 2021103741 A1 WO2021103741 A1 WO 2021103741A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
block
content block
proxy server
single link
Prior art date
Application number
PCT/CN2020/114352
Other languages
English (en)
French (fr)
Inventor
欧阳才晟
陈祺
郑杨
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021103741A1 publication Critical patent/WO2021103741A1/zh
Priority to US17/519,237 priority Critical patent/US20220059073A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/14Multichannel or multilink protocols
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2212/00Encapsulation of packets
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Definitions

  • This application relates to the field of computer technology and artificial intelligence technology, in particular to a content processing method, device, computer equipment, and storage medium.
  • a content processing method, device, computer equipment, and storage medium are provided.
  • a content processing method executed by a computer device, including:
  • the first content block is structured data
  • the two-way communication single link Through the two-way communication single link, receive the second content block returned in streaming; the second content block is obtained by converting the content type of the first content block; send the first content block and receive the second content Blocks are performed asynchronously in this two-way communication single link; and
  • the second content block is sequentially output.
  • a content processing apparatus which is set in a computer device, and includes:
  • the obtaining module is used to obtain the first content block; the first content block is structured data;
  • the streaming module is used to stream the first content block according to the first order of obtaining the first content block through a two-way communication single link established based on the application layer protocol; through the two-way communication single link , Receiving the second content block returned in streaming; the second content block is obtained by converting the content type of the first content block; sending the first content block and receiving the second content block are performed in the two-way communication unit Asynchronous in the link; and
  • the output module is configured to sequentially output the second content block according to the second order in which the second content block is received.
  • a computer device that includes a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the one or more processors execute various implementations of the present application. The steps in the content processing method of the example.
  • One or more computer-readable storage media where computer-readable instructions are stored on the computer-readable storage media, and when the computer-readable instructions are executed by one or more processors, the one or more processors execute the embodiments of the present application The steps in the content processing method.
  • Fig. 1 is an application scenario diagram of a content processing method in an embodiment
  • Figure 2 is an application scenario diagram of a content processing method in another embodiment
  • FIG. 3 is a schematic flowchart of a content processing method in an embodiment
  • Figure 8 is a schematic diagram of a content processing flow in an embodiment
  • FIG. 9 is a schematic diagram of streaming transmission in an embodiment
  • Figure 10 is a block diagram of the content processing architecture in an embodiment
  • Figure 11 is a schematic diagram of central control forwarding in an embodiment
  • Figure 12 is a block diagram of a content processing device in an embodiment
  • Figure 13 is a block diagram of a content processing device in another embodiment.
  • Figure 14 is a block diagram of a computer device in one embodiment.
  • Fig. 1 is an application scenario diagram of a content processing method in an embodiment.
  • this application scenario includes a terminal 110 and a server 120 connected to the network.
  • the terminal 110 may be a smart TV, a smart speaker, a desktop computer, or a mobile terminal.
  • the mobile terminal may include at least one of a mobile phone, a tablet computer, a notebook computer, a personal digital assistant, and a wearable device.
  • the server 120 may be implemented as an independent server or a server cluster composed of multiple physical servers.
  • the user can input initial content through the terminal 110.
  • the terminal 110 may perform structured processing on the input initial content to generate a first content block belonging to structured data.
  • a two-way communication single link is established between the terminal 110 and the server 120 based on an application layer protocol.
  • the terminal 110 may stream the first content block to the server 120 in the first order of acquiring the first content block through a two-way communication single link.
  • the server 120 may perform content type conversion on the first content block to obtain the second content block.
  • the server 120 may stream the second content block to the terminal 110.
  • the processing of the terminal 110 sending the first content block to the server 120 and the server 120 returning the second content block to the terminal 110 is performed in a two-way communication single link, and is performed asynchronously without interfering with each other.
  • the terminal 110 may sequentially output the second content blocks in the second order in which the second content blocks are received. For example, the terminal 110 may output the second content block in the form of display or playback.
  • the server 120 includes a proxy server 120a, an adaptation server 120b, and a decoding server 120c.
  • the terminal 110 and the proxy server 120a can establish a two-way communication single link based on an application layer protocol.
  • the terminal 110 may stream the first content block to the proxy server 120a according to the first order in which the first content block is acquired.
  • the proxy server 120a may send the first content block to the adaptation server 120b.
  • the adaptation server 120b may perform logical adaptation conversion on the first content block, and distribute the adapted and converted content block to the decoding server 120c.
  • the decoding server 120c may perform content type conversion on the first content block to obtain the second content block.
  • the decoding server 120c may stream the second content block to the terminal 110 via the adaptation server 120b and the proxy server 120a in sequence. .
  • the terminal 110 may sequentially output the second content blocks in the second order in which the second content blocks are received.
  • the adaptation server 120b can be omitted.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the content processing methods in the embodiments of the present application can be applied to speech processing scenarios such as speech recognition or speech synthesis processing.
  • speech technology is automatic speech recognition technology (ASR), speech synthesis technology (TTS) and voiceprint recognition technology. Enabling computers to be able to listen, see, speak, and feel is the future development direction of human-computer interaction, among which voice has become one of the most promising human-computer interaction methods in the future.
  • ASR automatic speech recognition technology
  • TTS speech synthesis technology
  • voiceprint recognition technology Enabling computers to be able to listen, see, speak, and feel is the future development direction of human-computer interaction, among which voice has become one of the most promising human-computer interaction methods in the future.
  • Fig. 3 is a schematic flowchart of a content processing method in an embodiment.
  • the content processing method in this embodiment can be applied to computer equipment.
  • the computer equipment is mainly used as the terminal 110 in FIG. 1 for illustration.
  • the method specifically includes the following steps:
  • the first content block is structured data.
  • Structured data refers to the structured data after initial data is encapsulated, which can be directly identified and used at the application layer without data format conversion.
  • the computer device may obtain the first content block in a streaming manner.
  • streaming means continuously.
  • Obtaining the first content block by streaming means that the first content block is obtained continuously.
  • the first content block is equivalent to a part of the data stream. In the embodiment of the present application, it is not sent after obtaining a complete content, but while acquiring the first content block and sending the first content block, it is a Real-time, streaming process.
  • the voice data is collected while streaming and structured, and then the voice data is sent. It is equivalent to speech recognition at the same time.
  • the segmented speech recognition is performed, and the spoken text result can be returned without all the speech.
  • the computer device can directly obtain the first content block belonging to the structured data.
  • the computer device may also obtain the initial content, and perform structured packaging processing on the initial content to generate the first content block belonging to the structured data.
  • the computer device can directly obtain the initial content.
  • step S302 includes: receiving a trigger instruction; in response to the trigger instruction, acquiring the initial content; performing structural processing on the initial content to generate a first content block.
  • the trigger instruction is used to trigger the acquisition of the initial content. That is, the computer device obtains the initial content after being triggered.
  • the trigger instruction may include any one of a voice recognition instruction and a voice generation instruction.
  • the computer device may first preprocess the initial content, and perform structured packaging processing on the preprocessed content to generate the first content block.
  • preprocessing refers to the process of extracting target content from the initial content. Then, the target content extracted by the preprocessing can be structured and packaged, and packaged into the first content block.
  • Target content refers to the content to be converted into content type.
  • the initial content refers to the content that has not been structured.
  • Content refers to data that can convey information.
  • the content may include at least one of text content and media content.
  • Media content is the content conveyed through the media.
  • the media content may include at least one of audio content, video content, and picture content.
  • S304 Stream sending the first content block according to the first order of obtaining the first content block through the two-way communication single link established based on the application layer protocol.
  • the two-way communication single link is a single link used for two-way streaming. That is, bidirectional streaming can be realized in one link.
  • Two-way streaming means that the two ends can send and receive data asynchronously. That is, one end can stream data to the other end, and can receive data stream from the other end. It can be understood that asynchronous means that the receiving and sending of data are independent of each other and do not interfere with each other.
  • the application layer protocol defines the specification for transferring messages between application processes running on different end systems. It can be understood that the application layer protocol in the embodiment of the present application is an application layer protocol used to establish a two-way communication single link, and does not generally refer to all application layer protocols. Because some application layer protocols (for example, http protocol, HyperText Transfer Protocol, hypertext transfer protocol) cannot establish a single link for bidirectional communication.
  • the computer device may establish a two-way communication single link based on the application layer protocol in advance.
  • the computer device may also trigger the establishment of a two-way communication single link based on the application layer protocol after obtaining the first first content block.
  • the time for establishing a single link for two-way communication is not limited, as long as it can be established between streaming the first content block.
  • the computer device can directly establish a two-way communication single link based on an existing application layer protocol.
  • the existing application layer protocol for establishing a two-way communication single link may include a network socket protocol.
  • WebSocket Protocol is a protocol for full-duplex communication on a single TCP (Transmission Control Protocol) connection. It was designated as a standard RFC 6455 by the IETF in 2011, and The specification is supplemented by RFC7936.
  • the computer device can also use other existing application layer protocols to establish a bidirectional communication single link.
  • the computer device may perform protocol encapsulation on the transmission control protocol or the multi-link application layer protocol to generate an application layer protocol for establishing a two-way communication single link.
  • TCP Transmission Control Protocol
  • IETF RFC793 connection-oriented byte stream-based transport layer communication protocol
  • the multi-link application layer protocol is an application layer protocol used to realize two-way communication by establishing at least two links. That is, the multi-link application layer protocol itself cannot establish a bidirectional communication single link.
  • the computer device may perform protocol encapsulation at the upper layer of the transmission control protocol to generate an application layer protocol for establishing a two-way communication single link.
  • the computer device may also perform protocol encapsulation for a multi-link application layer protocol, and encapsulate it into an application layer protocol for establishing a two-way communication single link.
  • the Http protocol is a multi-link application layer protocol.
  • Computer equipment can encapsulate the Http protocol and encapsulate it as an application layer protocol for establishing a two-way communication single link.
  • the first order is the order in which each first content block is obtained. It can be understood that since the computer device acquires the first content block in a stream, there is a sequence among the acquired first content blocks, that is, the first order.
  • Streaming refers to continuously sending the first content block. For example, after obtaining the first content block, the computer device sends the first content block, then obtains the next first content block, and sends the next first content block, so that the first content block is sent continuously, Form streaming.
  • S306 Receive the second content block returned in a streaming manner through a two-way communication single link.
  • the second content block is obtained by converting the content type of the first content block.
  • the content type is used to characterize the presentation form of the content.
  • the sending of the first content block and the receiving of the second content block are performed asynchronously in a two-way communication single link.
  • the second content block returned by streaming refers to the second content block returned continuously.
  • the content type may include at least one of audio, video, text, and pictures.
  • the picture may include at least one of a static picture and a dynamic picture.
  • the first content block and the second content block belong to different content types.
  • the first content block is audio data, which can be processed by voice recognition to convert the content type to generate a text content block. Then, audio data and text content blocks belong to different content types.
  • the server may perform content type conversion processing on the first content block, generate a second content block, and stream the second content block to the computer device.
  • S308 Output the second content blocks in sequence according to the second order of receiving the second content blocks.
  • the second order is the order in which the second content block is received. It can be understood that since the second content block is returned in a streaming manner, the computer device continuously receives the second content block, and then there is an order among the received second content blocks, that is, the second order.
  • the computer device may sequentially output the second content block in the second order. It can be understood that the second content block received earlier is output before the second content block received later. That is, the second content block received earlier is output before the second content block received later.
  • a first content block belonging to structured data is streamed, and a second content block returned by the stream is received; the second content block, It is obtained by converting the content type of the first content block.
  • the second content blocks are sequentially output according to the second order in which the second content blocks are received. Since the sending of the first content block and the receiving of the second content block are performed asynchronously in a two-way communication single link, the structured content can be streamed bidirectionally in the same communication link, compared to binary data. , No additional data conversion processing is required, saving system resources.
  • the communication link established based on the underlying protocol in the traditional method must be based on a fixed IP address to establish the link.
  • the solution of the present application is not limited to a fixed IP address. In the case of large traffic, it can still perform reasonable adaptation and allocation through balanced offload processing.
  • the two-way communication single link established based on the application layer protocol can stream uplink and downlink data through the same link, realizes stable two-way streaming processing, and avoids synchronization failures that are prone to multiple links.
  • the problem not only improves the accuracy, but also avoids the consumption of system resources caused by multiple links.
  • the trigger instruction is a voice recognition instruction; in response to the trigger instruction, acquiring the initial content includes: in response to the voice recognition instruction, collecting audio data.
  • preprocessing the initial content to extract the target content from the initial content includes: extracting the target voice data from the collected audio data.
  • Performing structured processing on the target content to generate the first content block includes: performing structured processing on the target voice data to generate a voice data block as the first content block.
  • speech recognition ASR, Automatic Speech Recognition
  • ASR Automatic Speech Recognition
  • Voice recognition instructions are instructions used to trigger voice recognition processing.
  • the voice recognition instruction may include an instruction that directly triggers voice recognition and an instruction that triggers voice recognition indirectly.
  • instructions that directly trigger voice recognition are instructions specifically used to trigger voice recognition. That is, the instruction is specifically used to trigger voice recognition.
  • the instruction that indirectly triggers voice recognition triggers the voice recognition process in the process of triggering the generation of the target instruction.
  • the instruction that indirectly triggers voice recognition may include a voice search instruction.
  • Voice search instructions are instructions used to search for information based on voice data. It can be understood that in the process of voice search, the need to recognize the voice will indirectly trigger the voice recognition.
  • Audio data is digitized sound data.
  • Target voice data refers to voice data that needs to be converted into text content. It can be understood that the target voice data is voice data other than the interfering voice in the audio data. Interfering voice refers to voice data that does not need to be converted into text content.
  • the interference voice may include at least one of environmental sound data and voice data of non-target objects.
  • Non-target objects refer to objects other than the target object that provides the target voice data.
  • the user can input a voice recognition instruction to the computer device, and the computer device can establish a two-way communication single link based on the application layer protocol in response to the voice recognition instruction.
  • the user can start talking, and the computer device can collect audio data.
  • the computer equipment can preprocess the audio data and extract the target voice data from it.
  • the computer device can perform structural processing on the target voice data to generate a voice data block as the first content block.
  • the computer device generates voice data blocks while receiving audio data, which is a streaming process, rather than generating voice data blocks after recording a complete video.
  • a client is installed in the computer device, and a software development kit (SDK, Software Development Kit) is pre-installed in the client.
  • SDK Software Development Kit
  • the client is a client with an audio collection portal. It can be understood that the client may be a client that needs to implement its own characteristics through an audio collection portal, or it may be a client that uses audio collection as an additional integrated audio collection portal for auxiliary functions.
  • the client may include at least one of a client of a content playing platform, a signal receiver of a smart home device (for example, a set-top box), and an instant messaging client.
  • the client of the content playback platform may include at least one of a video playback client and an audio playback client.
  • Smart home uses residential as a platform to integrate facilities related to home life by using integrated wiring technology, network communication technology, security technology, automatic control technology, and audio and video technology.
  • the smart home device includes at least one of a smart TV, a smart speaker, and a smart air conditioner.
  • the user can perform a voice recognition operation on the client to input a voice recognition instruction, and the client can call the installed software package to start the voice recognition process in response to the voice recognition instruction.
  • the computer device initiates the speech recognition processing through the software package, it establishes a two-way communication single link with the server based on the application layer protocol.
  • the voice recognition trigger control can be displayed on the interface of the client.
  • a voice recognition instruction generated by the triggering of the voice recognition trigger control is received, it jumps to the voice search interface for audio data collection, and based on the application
  • the layer protocol establishes a two-way communication single link with the server.
  • the audio data is collected, the audio data is preprocessed to extract the target voice data.
  • the computer device can structure the target voice data through the client to generate a voice data block as the first content block. Then, the voice data block is sent to the server through the two-way communication single link for voice recognition processing.
  • the voice search interface is an interface for searching media content based on voice data.
  • the method further includes: splicing and combining the displayed second content blocks in a second order to generate a search sentence; searching for media content matching the search sentence according to the search sentence; and displaying the searched media content.
  • the media content may include at least one of audio content, video content, and picture content.
  • the computer device may splice and combine the displayed second content blocks in the order in which they are received (ie, the second order) to generate a complete search sentence.
  • the computer device can search for the media content that matches the search sentence and display the media content.
  • the computer device can display the media content in at least one of pictures and text.
  • the computer device can call the software package to establish a two-way communication single link between the client and the proxy server based on the application layer protocol.
  • a proxy server refers to a server used to establish a link with a client and perform traffic distribution.
  • the computer device can collect audio data.
  • the computer equipment can preprocess the audio data and extract the target voice data from it.
  • the computer device can perform structural processing on the target voice data to generate a voice data block as the first content block.
  • the computer device may perform at least one of preprocessing such as noise reduction, activity detection, and compression on the audio data to obtain target voice data therefrom.
  • preprocessing such as noise reduction, activity detection, and compression
  • the second content block is a text content block obtained by performing voice recognition on the voice data block.
  • the sequentially outputting the second content blocks according to the second order of receiving the second content blocks includes: sequentially displaying the text content blocks on the interface according to the second order of receiving the text content blocks.
  • speech recognition processing is equivalent to content type conversion processing.
  • displaying the text content blocks on the interface sequentially in the second order means that the text content blocks are displayed on the interface in sequence according to the order in which the text content blocks are received. It can be understood that the display of the text content block means that this part of the text content is displayed.
  • FIG. 4 to 7 are schematic diagrams of interfaces for voice recognition in an embodiment.
  • a voice button 402 that is, a voice recognition trigger control
  • FIG. 4 there is a voice button 402 (that is, a voice recognition trigger control) near the search box of the video client.
  • the user can click the voice button to enter the voice search interface.
  • Figure 5 is the voice search interface, and call the software package to establish a two-way communication single link.
  • the user can speak, suppose the user says "I want to watch Andy Lau's latest movie”. Then, audio data can be collected through the voice search interface, and a voice data block can be generated.
  • the client can send the voice data block to the server through a two-way communication single link, and the server performs voice recognition processing on it, and the recognized text content block is streamed back.
  • the client can display the text content blocks in sequence on the interface according to the order in which the text content blocks are received. It can be understood that because the voice data block is streamed and the text content block is received, it is the user who recognizes and converts it into text while speaking, and the corresponding text content appears on the voice search interface while the user is speaking. As shown in Figure 6, when the user only says “I want to watch” but not the whole sentence of "I want to watch Andy Lau's latest movie", you can use "I want to watch” as a voice The data block is sent, and then the recognized text content block "I want to see” will be displayed on the voice search interface. Figure 7 shows the text content blocks returned by streaming in sequence to get the final text content of "I want to watch Andy Lau's latest movie”.
  • Fig. 8 is a schematic diagram of a content processing flow in an embodiment.
  • the application layer protocol is the websocket protocol as an example for illustration.
  • the user starts speech recognition through the client (app), calls the software package (sdk) through the client and starts the speech recognition process.
  • the software package starts the speech recognition process, it first establishes a two-way connection between the client and the proxy server based on the websocket protocol.
  • a single communication link is used to stream the upstream voice data and the downstream recognized and converted text content.
  • uplink refers to the transmission of information from the client to the network.
  • Downlink refers to receiving information from the network to the client.
  • the software package sdk starts to collect audio data, and performs processing such as noise reduction, activity detection, and compression on the obtained audio data to obtain the target voice data.
  • the target voice data is structured to generate voice data blocks.
  • the computer equipment can stream the voice data block to the proxy server through the two-way communication single link, and the proxy server distributes it to the decoding server.
  • the decoding server performs speech recognition and conversion to generate text content blocks.
  • the decoding server returns the text content block to the proxy server, and the proxy server streams back the text content block to the client through a two-way communication single link.
  • the client displays the text content blocks in sequence on the interface according to the second order of receiving the text content blocks.
  • Fig. 9 is a schematic diagram of streaming transmission in an embodiment.
  • a two-way communication single link is established between the client and the proxy server.
  • the voice data block sent by the client and the text content block (that is, the result) returned by the proxy server are all transmitted in the two-way communication single link, and the two are asynchronous.
  • the two-way communication single link established through the application layer protocol can realize the two-way streaming transmission of structured speech recognition related data, which saves system resources while ensuring transmission stability.
  • the trigger instruction is a speech synthesis instruction.
  • acquiring the initial content includes: acquiring the input text content in response to the speech synthesis instruction.
  • structuring the initial content to generate the first content block includes: structuring the text content to generate the text content block as the first content block.
  • speech synthesis refers to the process of converting text into corresponding speech.
  • the second content block is a voice data block obtained by performing speech synthesis on the text content block.
  • the sequentially outputting the second content blocks according to the second order of receiving the second content blocks includes: sequentially playing the voice data blocks according to the second order of receiving the voice data blocks.
  • the user can input text content in the computer device.
  • the computer device can structure the input text content to generate a text content block in the process of the user inputting the text content.
  • the computer device can stream the text content block to the server via a two-way communication single link.
  • the server may perform speech synthesis processing on the text content block to generate a speech data block corresponding to the text content block.
  • the server can stream the generated voice data block back to the computer device.
  • the computer device can play the voice data blocks in sequence according to the second order in which the voice data blocks are received.
  • outputting speech while inputting text is a streaming process, instead of inputting complete text content and then synthesizing it into speech.
  • the obtained speech data block is obtained by combining the text content block and the preset sound template to perform speech synthesis.
  • the preset sound template is a pre-established sound template. That is, the voice data block is obtained through the voice data block generation step.
  • the voice data block generation step includes: combining the text content block with a preset sound template to synthesize a voice data block that matches the sound characteristics of the preset sound template.
  • the preset sound template is the sound template of a certain game character.
  • the voice data block is consistent with the voice feature of the game character, which is equivalent to using the game character to speak the text content.
  • the two-way communication single link established by the application layer protocol can realize the two-way streaming transmission of structured speech synthesis-related data, which saves system resources while ensuring transmission stability.
  • the method further includes: obtaining an application layer protocol; establishing a two-way communication single link between the local end and the proxy server based on the application layer protocol; wherein the two-way communication single link is used for two-way streaming Single link for fast transmission.
  • the application layer protocol may be an existing application layer protocol for establishing a two-way communication single link.
  • the application layer protocol can also be obtained by protocol encapsulation based on a communication protocol that cannot establish a two-way communication single link.
  • the computer device can establish a two-way communication single link between the local end and the proxy server based on an application layer protocol.
  • the local end is the local end of the computer equipment.
  • the computer device streams the first content block to the proxy server through a two-way communication single link.
  • the proxy server returns the second content block after the content type conversion processing is performed on the first content block.
  • the proxy server may diverge the first content block to the server for voice recognition.
  • the proxy server may also perform voice recognition processing on the first content block by itself.
  • a two-way communication single link is established between the local end and the proxy server, and the proxy server can perform balanced distribution processing, which improves the rationality of resource utilization.
  • the accuracy and processing efficiency of content processing can also be improved.
  • the method further includes: performing protocol encapsulation based on a transmission control protocol or a multi-link application layer protocol, and generating an application layer protocol for establishing a two-way communication single link.
  • the computer device may perform protocol encapsulation on the transmission control protocol to generate an application layer protocol for establishing a two-way communication single link.
  • the computer device may perform protocol encapsulation on the multi-link application layer protocol to generate an application layer protocol for establishing a two-way communication single link.
  • the multi-link application layer protocol is an application layer protocol used to realize two-way communication by establishing at least two links. That is, the multi-link application layer protocol itself cannot establish a bidirectional communication single link.
  • the computer device may perform protocol encapsulation on the transmission control protocol or protocol encapsulation on the multi-link application layer protocol to generate a set of application layer protocols that can realize the interaction between the client and the server.
  • the establishment of a two-way communication single link between the local end and the proxy server includes: sending an uplink request and a downlink request to the proxy server; through the application layer protocol, the uplink request It is combined and encapsulated with the downlink request to generate a two-way communication single link between the local end and the proxy server.
  • the local end refers to the local end of the computer equipment.
  • the uplink request is used to request the establishment of a communication link for transmitting information from the client to the network.
  • Downlink request is used to request the establishment of a communication link for the client to receive information from the network.
  • the computer device can send uplink requests and downlink requests to the proxy server, and combine and encapsulate the uplink requests and downlink requests through the encapsulated application layer protocol to generate a connection between the local end and the proxy server.
  • Two-way communication single link In this way, the sending and receiving processing of uplink and downlink data can be realized through the two-way communication single link.
  • the computer device may also receive an application interface adaptation instruction, and in response to the application interface adaptation instruction, adapt at least one application interface at the access layer.
  • each application corresponding to the adapted interface can realize data sending and receiving processing through the two-way communication single link, achieving the purpose of generalization, thereby improving applicability.
  • the underlying protocol or the existing application layer protocol can be encapsulated to generate an application layer protocol for establishing a two-way communication single link, and then establish a two-way communication single link, which is different from using an existing application layer protocol
  • a new and expanded scheme that directly establishes a two-way communication single link improves the applicability.
  • streaming the first content block according to the first order of obtaining the first content block includes: streaming the first content block to the proxy server according to the first order of obtaining the first content block;
  • the first content block is used to instruct the proxy server to distribute the first content block to the decoding server.
  • receiving the second content block streamed back through the two-way communication single link includes: receiving the second content block streamed back by the proxy server through the two-way communication single link; the second content block is The decoding server performs content type conversion on the first content block.
  • the decoder server refers to a server responsible for content type conversion.
  • the proxy server may directly distribute the first content block to the decoding server.
  • the proxy server may also distribute the first content block to the adaptation server, and the adaptation server will offload the first content block to the decoding server.
  • the adaptation server is used to perform logical adaptation and conversion of data, and distribute the content after the adaptation and conversion.
  • Fig. 10 is a block diagram of the content processing architecture in an embodiment.
  • Figure 10 illustrates the application scenario of speech recognition as an example.
  • the user speaks
  • the client collects audio data, and performs pre-processing and structural processing on it, generates a voice data block, and transfers the voice data block to the installed software package.
  • a two-way communication single link is established between the software package and the proxy server.
  • the voice data block is distributed to the adaptation server through the two-way communication single link, and the adaptation server diverges it to the decoding server through a data process.
  • the decoding server performs voice recognition on it, generates a text content block, and returns the text content block to the adaptation server through a data process.
  • the adaptation server returns the structured text content block to the proxy server.
  • the proxy server returns the structured text content block to the software package through a two-way communication single link, and then transmits it back to the client. That is, the uplink and downlink transmission is realized through a two-way communication single link.
  • the client will display the text content block. It can be understood that in the process of transmitting voice data blocks, audio data is still being collected, so the voice data blocks are streamed and the text content blocks are streamed back to achieve the effect of recognizing text while the user is speaking. Moreover, sending voice data blocks and receiving text content blocks are asynchronous in a two-way communication single link.
  • content processing is performed through the collaborative division of labor of multiple servers such as a proxy server and a decoding server, which can improve processing efficiency and accuracy.
  • the proxy server includes a first proxy server and a second proxy server; the first proxy server is the proxy server provided by the first object; the second proxy server is the proxy server provided by the second object; the first content block , Is obtained by the client based on the second object; the two-way communication single link is based on the application layer protocol and the software package provided by the first object, and is established between the client and the first proxy server.
  • the first content block is also used to instruct the first proxy server to forward the first content block to the second proxy server, and the second proxy server distributes the first content block to the decoding server.
  • the first object is different from the second object.
  • the first object is the service provider, which is used to provide tools for implementing content processing methods.
  • the second object which is equivalent to the business party, is used to implement the content processing method in each embodiment of the present application according to the software package provided by the first object.
  • the second object may be at least one of a content playing platform party, a smart home platform party, and an instant messaging platform party.
  • the computer device installs the software package provided by the first object in the client provided by the second object in advance.
  • the computer device may establish a two-way communication single link between the client and the first proxy server provided by the first object based on the application layer protocol and the installed software package.
  • the computer device can obtain the first content block through the client, and stream the first content block to the first proxy server through a two-way communication single link.
  • the first proxy server may forward the first content block to the second proxy server provided by the second object.
  • the second proxy server then distributes the first content block to the decoding server for content type conversion processing.
  • the second proxy server may directly send the first content block to the decoding server for content type conversion processing.
  • the second proxy server may also distribute the first content block to the adaptation server, and the adaptation server distributes the first content block to the decoding server for decoding processing according to the principle of load balancing.
  • Fig. 11 is a schematic diagram of central control forwarding in an embodiment.
  • the first proxy server forwards the central control of the first content block to the second proxy server corresponding to the region and business, and the second proxy server distributes the first content block to the adaptation server in a balanced manner.
  • the adaptation server performs logical adaptation processing, and after the data is put into the queue, the first content block is distributed to the decoding server corresponding to the service, and the decoding server decodes and recognizes it, that is, performs content type conversion processing.
  • the decoding server converts the second content block after the content type conversion, passes through the adaptation server and the second proxy server in turn, and returns to the first proxy server.
  • the first proxy server transmits the second content block back to the client through a two-way communication single link, and the client sequentially outputs each second content block.
  • the first content block is forwarded to the proxy server of the business side (ie, the second proxy server), and then the proxy server of the business side performs distribution processing.
  • the proxy server of the business side ie, the second proxy server
  • it is equivalent to security management and control, which improves security.
  • letting the proxy server of the corresponding business party perform distribution processing is equivalent to considering business characteristics and improving the accuracy of content processing.
  • a content processing apparatus 1200 is provided, which is set in a computer device.
  • the computer equipment can be a terminal or a server.
  • the device 1200 includes: an acquisition module 1202, a streaming module 1204, and an output module 1206, where:
  • the obtaining module 1202 is used to obtain a first content block; the first content block is structured data.
  • the streaming module 1204 is configured to stream the first content block according to the first order of obtaining the first content block through a two-way communication single link established based on the application layer protocol; receive the first content block through the two-way communication single link The second content block returned by streaming; the second content block is obtained by converting the content type of the first content block; the sending of the first content block and the receiving of the second content block are performed asynchronously in a two-way communication single link.
  • the output module 1206 is configured to sequentially output the second content blocks according to the second order in which the second content blocks are received.
  • the acquisition module 1202 is further configured to receive a trigger instruction; in response to the trigger instruction, acquire the initial content; perform structural processing on the initial content to generate the first content block.
  • the trigger instruction is a voice recognition instruction
  • the acquisition module 1202 is also used to collect audio data in response to the voice recognition instruction; extract target voice data from the collected audio data; perform structured processing on the target voice data, Generate a voice data block as the first content block.
  • the second content block is a text content block obtained by performing voice recognition on the voice data block; the output module 1206 is also used to display the text content on the interface in sequence according to the second order of receiving the text content block Piece.
  • the output module 1206 is also used to splice and combine the displayed second content blocks in a second order to generate a search sentence; search for media content matching the search sentence according to the search sentence; display the searched media content .
  • the trigger instruction is a speech synthesis instruction
  • the acquisition module 1202 is also used to obtain input text content in response to the speech synthesis instruction; structure the text content to generate a text content block as the first content block .
  • the second content block is a voice data block obtained by performing speech synthesis on the text content block; the output module 1206 is further configured to play the voice data block in sequence according to the second order in which the voice data block is received.
  • the apparatus 1200 further includes:
  • the link establishment module 1203 is used to obtain the application layer protocol; based on the application layer protocol, establish a two-way communication single link between the local end and the proxy server; among them, the two-way communication single link is used for two-way streaming Single link.
  • the link establishment module 1203 is further configured to perform protocol encapsulation based on the transmission control protocol or the multi-link application layer protocol to generate an application layer protocol for establishing a two-way communication single link; and send the uplink to the proxy server Request and downlink request; through the application layer protocol, the uplink request and the downlink request are combined and encapsulated to generate a two-way communication single link between the local end and the proxy server.
  • the streaming module 1204 is further configured to stream the first content block to the proxy server according to the first order in which the first content block is obtained; the first content block is used to instruct the proxy server to transfer the first content block to the proxy server.
  • the block is distributed to the decoding server; the second content block returned by the proxy server is received through a two-way communication single link; the second content block is obtained by the decoding server to convert the content type of the first content block.
  • the proxy server includes a first proxy server and a second proxy server; the second proxy server is the proxy server of the business party; the first content block is obtained based on the client of the business party; the first content block It is also used to instruct the first proxy server to forward the first content block to the second proxy server, and the second proxy server distributes the first content block to the decoding server.
  • the above content processing device in a two-way communication single link established based on an application layer protocol, streams a first content block belonging to structured data, and receives a second content block that is streamed back; the second content block, It is obtained by converting the content type of the first content block.
  • the second content blocks are sequentially output according to the second order in which the second content blocks are received. Since the sending of the first content block and the receiving of the second content block are performed asynchronously in a two-way communication single link, the structured content can be streamed bidirectionally in the same communication link, compared to binary data. , No additional data conversion processing is required, saving system resources.
  • Each module in the above content processing device may be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • Figure 14 is a block diagram of a computer device in one embodiment.
  • the computer device may be the terminal 110 in FIG. 1.
  • the computer equipment includes one or more processors, a memory, a network interface, a display screen, and an input device connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device can store an operating system and computer readable instructions. When the computer-readable instruction is executed, it can cause one or more processors to execute a content processing method.
  • One or more processors of the computer device are used to provide calculation and control capabilities, and support the operation of the entire computer device.
  • the internal memory may store computer-readable instructions, and when the computer-readable instructions are executed by one or more processors, the one or more processors can execute a content processing method.
  • the network interface of the computer equipment is used for network communication.
  • the display screen of the computer device may be a liquid crystal display screen or an electronic ink display screen.
  • the input device of the computer equipment can be a touch layer covered on a display screen, a button, a trackball, or a touchpad provided on the terminal shell, or an external keyboard, a touchpad, or a mouse.
  • the computer device may be a personal computer, a smart speaker, a mobile terminal, or a vehicle-mounted device.
  • the mobile terminal includes at least one of a mobile phone, a tablet computer, a personal digital assistant, or a wearable device.
  • FIG. 14 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the content processing apparatus provided in the present application can be implemented in a form of computer-readable instructions, which can run on the computer device as shown in FIG. 14, and the non-volatile storage of the computer device
  • the medium can store various program modules constituting the content processing apparatus.
  • the computer-readable instructions composed of each program module are used to make the computer device execute the steps in the content processing method of each embodiment of the application described in this specification.
  • the computer device may obtain the first content block through the obtaining module 1202 in the content processing apparatus 1200 as shown in FIG. 12; the first content block is structured data.
  • the computer device can stream the first content block according to the first order of obtaining the first content block through the two-way communication single link established based on the application layer protocol through the streaming module 1204; through the two-way communication single link, Receive the second content block returned by streaming; the second content block is obtained by converting the content type of the first content block; sending the first content block and receiving the second content block are performed asynchronously in a two-way communication single link .
  • the computer device can output the second content blocks in sequence in the second order in which the second content blocks are received through the output module 1206.
  • a computer device including a memory and one or more processors.
  • the memory stores computer-readable instructions.
  • the processor executes the steps of the content processing method described above.
  • the steps of the content processing method may be the steps in the content processing method of each of the foregoing embodiments.
  • one or more computer-readable storage media are provided, and computer-readable instructions are stored, and when the computer-readable instructions are executed by one or more processors, the one or more processors perform the foregoing content processing.
  • the steps of the content processing method may be the steps in the content processing method of each of the foregoing embodiments.
  • steps in the embodiments of the present application are not necessarily executed in sequence in the order indicated by the step numbers. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in each embodiment may include multiple sub-steps or multiple phases. These sub-steps or phases are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or phases The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Channel
  • memory bus Radbus direct RAM
  • RDRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种内容处理方法,包括:获取第一内容块;所述第一内容块为结构化数据;通过基于应用层协议建立的双向通信单链路,按照获取所述第一内容块的第一顺序,对所述第一内容块进行流式发送;通过所述双向通信单链路,接收流式返回的第二内容块;所述第二内容块,是通过对所述第一内容块进行内容类型转化得到;发送所述第一内容块和接收所述第二内容块是在所述双向通信单链路中异步进行的;按照接收所述第二内容块的第二顺序,依次输出所述第二内容块。

Description

内容处理方法、装置、计算机设备及存储介质
本申请要求于2019年11月29日提交中国专利局、申请号为201911200739.6、申请名称为“内容处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域和人工智能技术领域,特别是涉及一种内容处理方法、装置、计算机设备及存储介质。
背景技术
随着科学技术的飞速发展,线上处理内容已经是当前比较主流的方式。很多应用场景下都会涉及到内容的线上处理,比如,语音识别应用场景,都是通过线上方式对语音内容进行识别。
通过线上方式进行内容处理时,通常会涉及到内容的线上传输,传统方法中,是基于底层协议建立通信链路,这样在传输数据时只能传输二进制数据,二进制数据无法直接被识别,所以需要经过比较复杂的转换处理,这样就需要耗费一定的系统资源。
发明内容
根据本申请提供的各种实施例,提供一种内容处理方法、装置、计算机设备及存储介质。
根据本申请的一个方面,提供了一种内容处理方法,由计算机设备执行,包括:
获取第一内容块;该第一内容块为结构化数据;
通过基于应用层协议建立的双向通信单链路,按照获取该第一内容块的第一顺序,对该第一内容块进行流式发送;
通过该双向通信单链路,接收流式返回的第二内容块;该第二内容块,是通过对该第一内容块进行内容类型转化得到;发送该第一内容块和接收该第二内容块是在该双向通信单链路中异步进行的;及
按照接收该第二内容块的第二顺序,依次输出该第二内容块。
根据本申请的一个方面,提供了一种内容处理装置,设置于计算机设备中,包括:
获取模块,用于获取第一内容块;该第一内容块为结构化数据;
流式传输模块,用于通过基于应用层协议建立的双向通信单链路,按照获取该第一内容块的第一顺序,对该第一内容块进行流式发送;通过该双向通信单链路,接收流式返回的第二内容块;该第二内容块,是通过对该第一内容块进行内容类型转化得到;发送该第一内容块和接收该第二内容块是在该双向通信单链路中异步进行的;及
输出模块,用于按照接收该第二内容块的第二顺序,依次输出该第二内容块。
一种计算机设备,包括存储器和一个或多个处理器,存储器中存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行本申请各实施例的内容处理方法中的步骤。
一个或多个计算机可读存储介质,计算机可读存储介质上存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行本申请各实施例的内容处理方法中的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。基于本申请的说明书、附图以及权利要求书,本申请的其它特征、目的和优点将变得更加明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中内容处理方法的应用场景图;
图2为另一个实施例中内容处理方法的应用场景图;
图3为一个实施例中内容处理方法的流程示意图;
图4至图7为一个实施例中语音识别的界面示意图;
图8为一个实施例中内容处理的流程简示图;
图9为一个实施例中流式传输示意图;
图10为一个实施例中内容处理的架构框图;
图11为一个实施例中的中控转发的示意图;
图12为一个实施例中内容处理装置的框图;
图13为另一个实施例中内容处理装置的框图;及
图14为一个实施例中计算机设备的框图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
图1为一个实施例中内容处理方法的应用场景图。参照图1,该应用场景中包括网络连接的终端110和服务器120。终端110可以是智能电视机、智能音箱、台式计算机或移动终端,移动终端可以包括手机、平板电脑、笔记本电脑、个人数字助理和穿戴式设备等中的至少一种。服务器120可以用独立的服务器或者是多个物理服务器组成的服务器集群来实现。
用户可以通过终端110输入初始内容。终端110可以对输入的初始内容进行结构化处理,生成属于结构化数据的第一内容块。终端110和服务器120之间基于应用层协议建立了双向通信单链路。终端110可以通过双向通信单链路,按照获取第一内容块的第一顺序,将第一内容块流式发送至服务器120。服务器120可以对第一内容块进行内容类型转化,得到第二内容块。服务器120可以将第二内容块流式返回至终端110。其中,终端110向服务器120发送第一内容块,以及服务器120向终端110返回第二内容块的处理,都是在双向通信单链路中进行的,且是相互不干扰、异步进行的。终端110可以按照接收第二内容块的第二顺序,依次输出第二内容块。比如,终端110可以 对第二内容块进行展示或者播放等形式的输出。
在一个实施例中,如图2所示,服务器120包括代理服务器120a、适配服务器120b以及解码服务器120c。终端110可以与代理服务器120a之间基于应用层协议建立双向通信单链路。终端110可以按照获取第一内容块的第一顺序,将第一内容块流式发送至代理服务器120a。代理服务器120a可以将第一内容块发送至适配服务器120b。适配服务器120b可以对第一内容块进行逻辑的适配转换,并将适配转换后的内容块分发至解码服务器120c。解码服务器120c可以对第一内容块进行内容类型转化,得到第二内容块。解码服务器120c可以依次经适配服务器120b和代理服务器120a,将第二内容块流式返回至终端110。。终端110可以按照接收第二内容块的第二顺序,依次输出第二内容块。
需要说明的是,当代理服务器120a和解码服务器120c之间能够直接通信,而不需要适配转换时,则可以省略适配服务器120b。
可以理解,本申请各实施例中的内容处理方法,相当于使用人工智能技术来自动内容类型的转化和输出。
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
可以理解,本申请各实施例中的内容处理方法可以应用语音识别或语音 合成处理等语音处理场景中。语音技术(Speech Technology)的关键技术有自动语音识别技术(ASR)和语音合成技术(TTS)以及声纹识别技术。让计算机能听、能看、能说、能感觉,是未来人机交互的发展方向,其中语音成为未来最被看好的人机交互方式之一。
图3为一个实施例中内容处理方法的流程示意图。本实施例中的该内容处理方法可以应用于计算机设备,现主要以计算机设备为图1中的终端110进行举例说明。参照图3,该方法具体包括如下步骤:
S302,获取第一内容块。
其中,第一内容块为结构化数据。结构化数据(Struct Data),是指初始数据经过封装后的结构化数据,在应用层可以直接被识别使用,而不需要进行数据格式转换。
在一个实施例中,计算机设备可以流式获取第一内容块。其中,流式,即指连续地。流式获取第一内容块,即指,连续地获取第一内容块。
可以理解,第一内容块相当于数据流中的一部分,本申请实施例中,并非得到一个完整的内容后进行发送,而是在边获取第一内容块,边发送第一内容块,是一个实时的、流式传输过程。
比如,流式地边采集语音数据,并将其结构化后,边发送语音数据,而非录制一个完整的音频进行传输。相当于,边说话,边识别,用户说话的过程中,进行分段的语音识别,而不需要全部的语音就能返回已经说话的文字结果。
在一个实施例中,计算机设备可以直接获取属于结构化数据的第一内容块。
在一个实施例中,计算机设备也可以获取初始内容,并对初始内容进行结构化封装处理,生成属于结构化数据的第一内容块。
在一个实施例中,计算机设备可以直接获取初始内容。
在一个实施例中,步骤S302包括:接收触发指令;响应于触发指令,获取初始内容;对初始内容进行结构化处理,生成第一内容块。
其中,触发指令,用于触发获取初始内容。即,计算机设备在被触发后,获取初始内容。
在一个实施例中,触发指令可以包括语音识别指令和语音生成指令中的任意一种。
在一个实施例中,计算机设备可以先对初始内容进行预处理,并对预处理后的内容进行结构化封装处理,生成第一内容块。可以理解,预处理,是指从初始内容中提取目标内容的过程。那么,则可以对预处理提取的目标内容进行结构化封装处理,封装成第一内容块。目标内容,是指待进行内容类型转化的内容。
其中,初始内容,是指未进行结构化处理的内容。内容,是指能够传达信息的数据。
在一个实施例中,内容,可以包括文本内容和媒体内容等中的至少一种。媒体内容,是通过媒体方式传达的内容。在一个实施例中,媒体内容可以包括音频内容、视频内容和图片内容等中的至少一种。
S304,通过基于应用层协议建立的双向通信单链路,按照获取第一内容块的第一顺序,对第一内容块进行流式发送。
其中,双向通信单链路,是用于进行双向流式传输的单链路。即,一条链路中可以实现双向流式传输。双向流式传输,是指两端之间可以异步地进行流式发送和接收数据。即,一端可以向另一端流式发送数据,且可以从该另一端流式地接收数据。可以理解,异步,是指数据的接收和发送相互间独立、互不干扰。
应用层协议(application layer protocol),定义了运行在不同端系统上的应用程序进程相互间传递报文的规范。可以理解,本申请实施例中的应用层协议,是用于建立双向通信单链路的应用层协议,而并非泛指所有的应用层协议。因为,有的应用层协议(比如,http协议,HyperText Transfer Protocol,超文本传输协议)就无法建立双向通信单链路。
可以理解,计算机设备可以在获取第一内容块之前,预先基于应用层协 议建立双向通信单链路。计算机设备也可以在获取首个第一内容块之后,触发基于应用层协议建立双向通信单链路。并不限定建立双向通信单链路的时机,只要满足能够在流式发送第一内容块之间建立即可。
在一个实施例中,计算机设备可以直接基于已有的应用层协议,建立双向通信单链路。在一个实施例中,已有的用于建立双向通信单链路的应用层协议,可以包括网络套接字协议。
其中,网络套接字协议(WebSocket协议),是一种在单个TCP(Transmission Control Protocol,传输控制协议)连接上进行全双工通信的协议,其于2011年被IETF定为标准RFC 6455,并由RFC7936补充规范。
在其他实施例中,计算机设备还可以使用其他已有的应用层协议,建立双向通信单链路。
在一个实施例中,计算机设备可以对传输控制协议或多链路应用层协议进行协议封装,生成用于建立双向通信单链路的应用层协议。
其中,传输控制协议(TCP,Transmission Control Protocol)是一种面向连接的基于字节流的传输层通信协议,由IETF的RFC 793定义。
多链路应用层协议,是用于通过建立至少两条链路实现双向通信的应用层协议。即,多链路应用层协议自身无法建立双向通信单链路。
在一个实施例中,计算机设备可以在传输控制协议的上层进行协议封装,生成用于建立双向通信单链路的应用层协议。
在一个实施例中,计算机设备也可以针对多链路应用层协议进行协议封装,将其封装为用于建立双向通信单链路的应用层协议。比如,Http协议就属于多链路应用层协议,计算机设备可以对Http协议进行协议封装,将其封装为用于建立双向通信单链路的应用层协议。
第一顺序,是获取各第一内容块的顺序。可以理解,由于计算机设备是流式地获取第一内容块,所以,获取的各第一内容块之间是有先后顺序的,即为第一顺序。
流式发送,是指连续地发送第一内容块。比如,计算机设备在获取第一 内容块之后,则将第一内容块进行发送,然后获取下一个第一内容块,并将下一个第一内容块进行发送,如此连续地发送第一内容块,形成流式发送。
可以理解,获取第一内容块,以及按照第一顺序,流式发送第一内容块,整个过程是连续性的,相当于边获取第一内容块,边发送第一内容块。
S306,通过双向通信单链路,接收流式返回的第二内容块。
其中,第二内容块,是通过对第一内容块进行内容类型转化得到。内容类型,用于表征内容的呈现形式。发送第一内容块和接收第二内容块是在双向通信单链路中异步进行的。流式返回的第二内容块,是指连续返回的第二内容块。
在一个实施例中,内容类型可以包括音频、视频、文本和图片等中的至少一种。图片可以包括静态图片和动态图片中的至少一种。
可以理解,第一内容块和第二内容块,属于不同内容类型。比如,第一内容块是音频数据,可以通过语音识别处理,对其进行内容类型转化,生成文本内容块。那么,音频数据和文本内容块就属于不同的内容类型。
具体地,计算机设备在将各第一内容块流式发送至服务器后,服务器可以对第一内容块进行内容类型转化处理,生成第二内容块,并向计算机设备流式返回第二内容块。
可以理解,双向通信单链路建立于计算机设备和服务器之间。
S308,按照接收第二内容块的第二顺序,依次输出第二内容块。
其中,第二顺序,是接收第二内容块的顺序。可以理解,由于第二内容块是流式返回的,所以,计算机设备是连续接收第二内容块的,那么,接收的第二内容块之间是有顺序的,即为第二顺序。
具体地,计算机设备可以按照第二顺序,依次输出第二内容块。可以理解,在前接收的第二内容块先于在后接收的第二内容块输出。即,在前接收的第二内容块,比在后接收的第二内容块先输出。
上述内容处理方法,在基于应用层协议建立的双向通信单链路中,对属于结构化数据的第一内容块进行流式发送,并接收流式返回的第二内容块; 第二内容块,是通过对第一内容块进行内容类型转化得到。按照接收第二内容块的第二顺序,依次输出第二内容块。由于,发送第一内容块和接收第二内容块是在双向通信单链路中异步进行的,从而实现在同一条通信链路中流式地双向传输结构化的内容,相较于二进制数据而言,不需要额外的数据转化处理,节省了系统资源。
此外,由于传统方法中是基于底层协议传输的,所以,会存在不支持部分应用场景的情况(比如,不支持小程序或html5的接入)。而使用基于应用层协议建立的双向通信单链路,能够支持传统方法中所不能支持的应用场景,提高了适用性,此外,避免了不支持问题所导致的错误,提高了内容处理的准确性。同时,也避免了由于不支持产生错误所导致的系统资源的浪费。
然后,传统方法中基于底层协议建立的通信链路,必须基于固定IP地址来建立链接。这样一来,在流量比较大时,就会由于IP地址不够而受限。本申请的方案,不受限于固定IP地址的限制,在流量大的情况下,仍然能够通过均衡分流处理,进行合理的适配分配。
最后,基于应用层协议建立的双向通信单链路,能够将上下行数据通过同一条链路进行流式传输,实现了稳定的双向流式传输处理,避免了多链路容易出现的同步失败的问题,既提高了准确性,又避免了多链路所导致的系统资源耗费。
在一个实施例中,触发指令为语音识别指令;响应于触发指令,获取初始内容包括:响应于语音识别指令,采集音频数据。本实施例中,对初始内容进行预处理,以从初始内容中提取目标内容包括:从采集的音频数据中提取目标语音数据。对目标内容进行结构化处理,生成第一内容块包括:对目标语音数据进行结构化处理,生成语音数据块,作为第一内容块。
其中,语音识别(ASR,Automatic Speech Recognition),是指将语音数据转化为文本内容的处理过程。
语音识别指令,是用于触发语音识别处理的指令。在一个实施例中,语音识别指令,可以包括直接触发语音识别的指令和间接触发语音识别的指令。
其中,直接触发语音识别的指令,是专门用于触发语音识别的指令。即,该指令专门用于触发语音识别。
间接触发语音识别的指令,是在触发生成目标指令的过程中,触发了语音识别处理。在一个实施例中,间接触发语音识别的指令可以包括语音搜索指令。语音搜索指令,是用于根据语音数据进行信息搜索的指令。可以理解,在语音搜索过程中,需要对语音进行识别,则会间接触发语音识别。
音频数据,即为数字化的声音数据。目标语音数据,是指需要转化为文本内容的语音数据。可以理解,目标语音数据是音频数据中除干扰语音之外的语音数据。干扰语音,是指不用被转化为文本内容的语音数据。
在一个实施例中,干扰语音可以包括环境音数据和非目标对象的语音数据中的至少一种。非目标对象,是指除提供目标语音数据的目标对象之外的对象。
具体地,用户可以对计算机设备输入语音识别指令,计算机设备可以响应于语音识别指令,基于应用层协议建立双向通信单链路。用户可以开始说话,计算机设备则可以采集音频数据。计算机设备可以对音频数据进行预处理,从中提取目标语音数据。计算机设备可以对目标语音数据进行结构化处理,生成语音数据块,作为第一内容块。
可以理解,计算机设备是边接收音频数据,边生成语音数据块,属于一个流式处理过程,而并非录制完整视频后,生成语音数据块。
在一个实施例中,计算机设备中安装有客户端,在该客户端中预先安装了软件包(SDK,Software Development Kit)。
其中,客户端是具备音频采集入口的客户端。可以理解,客户端可以是需要通过音频采集入口实现自身特性的客户端,也可以是将音频采集作为辅助功能的额外集成音频采集入口的客户端。
在一个实施例中,客户端可以包括内容播放平台的客户端、智能家居设备的信号接收器(比如,机顶盒)和即时通信客户端等中的至少一种。
内容播放平台的客户端,可以包括视频播放客户端和音频播放客户端等 中的至少一种。
智能家居(smart home,home automation)是以住宅为平台,利用综合布线技术、网络通信技术、安全防范技术、自动控制技术、音视频技术将家居生活有关的设施集成。在一个实施例中,智能家居设备,包括智能电视、智能音箱和智能空调等中的至少一种。
具体地,用户可以在客户端上进行语音识别操作,以输入语音识别指令,客户端可以响应于语音识别指令,调用安装的软件包启动语音识别处理。当计算机设备通过软件包启动语音识别处理时,则基于应用层协议,建立与服务器之间的双向通信单链路。
在一个实施例中,客户端的界面上可以展示语音识别触发控件,当接收到由对语音识别触发控件的触发所生成的语音识别指令时,跳转至语音搜索界面进行音频数据采集,并基于应用层协议建立与服务器之间的双向通信单链路。当采集到音频数据时,则对音频数据进行预处理,从中提取目标语音数据。计算机设备可以通过客户端对目标语音数据进行结构化处理,生成语音数据块,作为第一内容块。接着,通过双向通信单链路将语音数据块发送至服务器进行语音识别处理。其中,语音搜索界面,是基于语音数据搜索媒体内容的界面。
在一个实施例中,该方法还包括:将展示的第二内容块按照第二顺序进行拼接组合,生成搜索语句;根据搜索语句,搜索与搜索语句匹配的媒体内容;展示搜索到的媒体内容。
在一个实施例中,媒体内容可以包括音频内容、视频内容和图片内容等中的至少一种。
具体地,计算机设备可以将展示的第二内容块按照其被接收的先后顺序(即第二顺序)进行拼接组合,生成一个完整的搜索语句。计算机设备可以根据该搜索与搜索语句匹配的媒体内容,并展示该媒体内容。
可以理解,计算机设备可以通过图片和文字中的至少一种方式来展示该媒体内容。
在一个实施例中,计算机设备可以调用软件包,基于应用层协议建立客户端与代理服务器之间的双向通信单链路。其中,代理服务器(Proxy Server),是指用于与客户端之间建立链接,并进行流量分发的服务器。
用户开始说话时,计算机设备则可以采集音频数据。计算机设备可以对音频数据进行预处理,从中提取目标语音数据。计算机设备可以对目标语音数据进行结构化处理,生成语音数据块,作为第一内容块。
在一个实施例中,计算机设备可以对音频数据进行降噪、活动检测和压缩等至少一种预处理,以从中得到目标语音数据。
在一个实施例中,第二内容块,是通过对语音数据块进行语音识别,得到的文本内容块。按照接收第二内容块的第二顺序,依次输出第二内容块包括:按照接收文本内容块的第二顺序,在界面上依次展示文本内容块。
可以理解,语音识别处理,相当于内容类型转化处理。
需要说明的是,按照第二顺序在界面上依次展示文本内容块,即指按照接收文本内容块的先后顺序,依次在界面上展示该文本内容块。可以理解,展示文本内容块,即展示这部分文本内容。
图4至图7为一个实施例中语音识别的界面示意图。以客户端为视频客户端为例进行举例说明。参照图4,视频客户端的搜索框附近有一个语音按钮402(即,语音识别触发控件)。用户可以点击语音按钮,进入语音搜索界面。图5即为语音搜索界面,并调用软件包建立双向通信单链路。用户可以说话,假设,用户说“我想看刘德华最新的电影”。那么,则可以通过该语音搜索界面采集音频数据,并生成语音数据块。客户端则可以将语音数据块通过双向通信单链路发送至服务器,由服务器对其进行语音识别处理,并将识别的文本内容块流式返回。客户端则可以按照接收文本内容块的顺序,在界面上依次展示文本内容块。可以理解,因为是流式发送语音数据块和接收文本内容块,所以,是用户边说边将其识别转化为文字,则可以用户一边说一边在语音搜索界面上出现对应的文字内容。如图6所示,当用户只说到“我想看”,而并未说完“我想看刘德华最新的电影”这一整句话时,则就可以将 “我想看”作为一个语音数据块进行发送,然后,就会在该语音搜索界面上显示识别出来的文字内容块“我想看”。图7即为将流式返回的文本内容块依次展示,得到“我想看刘德华最新的电影”这一最终的文本内容。
图8为一个实施例中内容处理的流程简示图。参照图8,以应用层协议为websocket协议为例进行举例说明。用户通过客户端(app)开始语音识别,通过客户端调用软件包(sdk)并启动语音识别处理,软件包启动语音识别处理时,首先基于websocket协议,在客户端与代理服务器之间建立一个双向通信单链路,以供流式传输上行的语音数据,和下行的所识别转化的文本内容。其中,上行,是指从客户端向网络传送信息。下行,是指从网络中接收信息至客户端。然后,用户开始说话时软件包sdk开始采集音频数据,并对得到的音频数据做降噪、活动检测和压缩等处理,得到目标语音数据。然后,将目标语音数据进行结构化处理,生成语音数据块。计算机设备可以通过双向通信单链路将语音数据块流式发送至代理服务器,由代理服务器将其分发至解码服务器。解码服务器对其进行语音识别转化,生成文本内容块。解码服务器将文本内容块返回至代理服务器,由代理服务器通过双向通信单链路将文本内容块流式返回至客户端。客户端按照接收文本内容块的第二顺序,在界面上依次展示文本内容块。
图9为一个实施例中流式传输示意图。参照图9,客户端和代理服务器之间建立了双向通信单链路。客户端发送的语音数据块、以及代理服务器返回的文本内容块(即结果),都是在该双向通信单链路中传输,而且二者之间是异步进行。
上述实施例中,通过应用层协议建立的双向通信单链路,即可以实现对结构化的语音识别相关数据的双向流式传输,在保证传输稳定性的同时,节省了系统资源。
在一个实施例中,触发指令,为语音合成指令。响应于触发指令,获取初始内容包括:响应于语音合成指令,获取输入的文本内容。本实施例中,对初始内容进行结构化处理,生成第一内容块包括:将文本内容进行结构化 处理,生成文本内容块,作为第一内容块。
其中,语音合成(TTS,Text To Speech),是指将文本转化为对应语音的的处理过程。
可以理解,本实施例中,第二内容块,是通过将文本内容块进行语音合成,得到的语音数据块。按照接收第二内容块的第二顺序,依次输出第二内容块包括:按照接收语音数据块的第二顺序,依次播放语音数据块。
具体地,用户可以在计算机设备中输入文本内容。计算机设备可以在用户输入文本内容的过程中,将已输入的文本内容进行结构化处理,生成文本内容块。计算机设备可以将文本内容块通过双向通信单链路流式发送至服务器。服务器则可以对文本内容块进行语音合成处理,以生成与该文本内容块对应的语音数据块。服务器可以将生成的语音数据块流式返回至计算机设备。计算机设备可以按照接收语音数据块的第二顺序,依次播放语音数据块。
可以理解,本实施例中,相当于边输入文本内容,边生成语音数据块。即,边输入文字边输出语音,是一个流式地处理过程,而不是需要输入完整的文本内容后,再将其合成为语音。
在一个实施例中,得到的语音数据块,是将文本内容块与预设声音模板进行结合,进行语音合成得到的。
其中,预设声音模板,是预先建立的声音模板。即,语音数据块通过语音数据块生成步骤得到。语音数据块生成步骤包括:将文本内容块与预设声音模板结合,合成与该预设声音模板的声音特征相符的语音数据块。
比如,预设声音模板为某一个游戏角色的声音模板。那么,语音数据块,即与该游戏角色的声音特征相符,相当于,用该游戏角色说出该文本内容。
上述实施例中,通过应用层协议建立的双向通信单链路,即可以实现对结构化的语音合成相关数据的双向流式传输,在保证传输稳定性的同时,节省了系统资源。
在一个实施例中,该方法还包括:获取应用层协议;基于应用层协议,在本端与代理服务器之间建立双向通信单链路;其中,双向通信单链路,是 用于进行双向流式传输的单链路。
可以理解,应用层协议,可以是已有的用于建立双向通信单链路的应用层协议。应用层协议,也可以是基于不能建立双向通信单链路的通信协议进行协议封装得到。
具体地,计算机设备可以基于应用层协议,在本端与代理服务器之间建立双向通信单链路。其中,本端,是计算机设备的本端。
可以理解,计算机设备是通过双向通信单链路向代理服务器流式发送第一内容块。由代理服务器返回针对第一内容块进行内容类型转化处理后的第二内容块。
需要说明的是,代理服务器可以将第一内容块分流至用于进行语音识别的服务器。代理服务器也可以自身对第一内容块进行语音识别处理。
上述实施例中,基于应用层协议,在本端与代理服务器之间建立双向通信单链路,能够通过代理服务器进行均衡分发处理,提高了资源利用的合理性。此外,也能够提高内容处理的准确性和处理效率。
在一个实施例中,该方法还包括:基于传输控制协议或多链路应用层协议进行协议封装,生成用于建立双向通信单链路的应用层协议。
在一个实施例中,计算机设备可以对传输控制协议进行协议封装,生成用于建立双向通信单链路的应用层协议。
在一个实施例中,计算机设备可以对多链路应用层协议进行协议封装,生成用于建立双向通信单链路的应用层协议。
其中,多链路应用层协议,是用于通过建立至少两条链路实现双向通信的应用层协议。即,多链路应用层协议自身无法建立双向通信单链路。
具体地,计算机设备可以对传输控制协议进行协议封装、或者对多链路应用层协议进行协议封装,生成一套能够实现客户端和服务器之间交互的应用层协议。
本实施例中,基于应用层协议,在本端与代理服务器之间建立双向通信单链路包括:向代理服务器发送上行链路请求和下行链路请求;通过应用层 协议,将上行链路请求和下行链路请求进行合并封装,生成本端与代理服务器之间的双向通信单链路。
其中,本端,是指计算机设备的本地端。上行链路请求,用于请求建立从客户端向网络传送信息的通信链路。下行链路请求,用于请求建立客户端从网络中接收信息的通信链路。
具体地,计算机设备可以向代理服务器发送上行链路请求和下行链路请求,并通过封装的应用层协议,将上行链路请求和下行链路请求进行合并封装,生成本端与代理服务器之间的双向通信单链路。这样一来,则可以通过该双向通信单链路实现上行和下行数据的发送和接收处理。
在一个实施例中,在保持双向通信单链路稳定的同时,计算机设备还可以接收应用接口适配指令,响应于该应用接口适配指令,在接入层适配至少一个应用的接口。这样一来,所适配的接口所对应的各应用,皆可以通过该双向通信单链路实现数据的发送和接收处理,达到泛化的目的,从而提高适用性。
上述实施例中,可以对底层协议或已有应用层协议进行封装,生成用于建立双向通信单链路的应用层协议,进而建立双向通信单链路,是不同于使用既有的应用层协议直接建立双向通信单链路的、扩展的新方案,提高了适用性。
在一个实施例中,按照获取第一内容块的第一顺序,对第一内容块进行流式发送包括:按照获取第一内容块的第一顺序,向代理服务器流式发送第一内容块;第一内容块,用于指示代理服务器将第一内容块分发至解码服务器。本实施例中,通过双向通信单链路,接收流式返回的第二内容块包括:通过双向通信单链路,接收由代理服务器流式返回的第二内容块;第二内容块,是由解码服务器对第一内容块进行内容类型转化得到。
其中,解码服务器(Decoder Server),是指负责进行内容类型转化的服务器。
在一个实施例中,代理服务器可以直接将第一内容块分发至解码服务器。 代理服务器也可以将第一内容块分发至适配服务器,由适配服务器将第一内容块分流至解码服务器。
其中,适配服务器,用于数据进行逻辑的适配转换,并将适配转换后的内容分发。
图10为一个实施例中内容处理的架构框图。图10是以语音识别为应用场景进行举例说明。参照图10,用户说话,客户端采集音频数据,并对其进行预处理以及结构化处理,生成语音数据块,并将语音数据块传递至所安装的软件包。软件包与代理服务器之间建立了双向通信单链路。通过该双向通信单链路将语音数据块分发至适配服务器,由适配服务器通过数据进程(Data process)将其分流至解码服务器。解码服务器对其进行语音识别,生成文本内容块,并通过数据进程将文本内容块返回至适配服务器。适配服务器则将结构化的文本内容块返回至代理服务器。代理服务器则通过双向通信单链路,将结构化的文本内容块返回至软件包,进而,将其传回客户端。即,通过双向通信单链路实现上下行传输。客户端则会对该文本内容块进行展示。可以理解,在传输语音数据块的过程中,仍然在采集音频数据,所以是流式发送语音数据块,以及流式返回文本内容块,实现用户边说话边识别文字的效果。而且,发送语音数据块和接收文本内容块在双向通信单链路中是异步进行的。
上述实施例中,在服务器端,通过代理服务器、解码服务器等多个服务器的协同分工来进行内容处理,能够提高处理效率和准确性。
在一个实施例中,代理服务器包括第一代理服务器和第二代理服务器;第一代理服务器是第一对象提供的代理服务器;第二代理服务器,是第二对象提供的代理服务器;第一内容块,是基于第二对象的客户端获取得到;双向通信单链路,是基于应用层协议和第一对象提供的软件包,建立于客户端和第一代理服务器之间。
本实施例中,第一内容块还用于指示第一代理服务器将第一内容块转发至第二代理服务器,由第二代理服务器将第一内容块分发至解码服务器。
可以理解,第一对象不同于第二对象。第一对象,是服务提供方,即用 于提供实现内容处理方法的工具。第二对象,相当于业务方,用于根据第一对象提供的软件包,实现本申请各实施例中的内容处理方法。
在一个实施例中,第二对象可以是内容播放平台方、智能家居平台方、以及即时通信平台方等中的至少一种。
具体地,计算机设备预先在第二对象提供的客户端中安装第一对象提供的软件包。在接收到触发指令后,计算机设备可以基于应用层协议和所安装的软件包,在客户端和第一对象提供的第一代理服务器之间建立双向通信单链路。计算机设备可以通过该客户端获取第一内容块,并将第一内容块通过双向通信单链路流式发送至第一代理服务器。第一代理服务器可以将该第一内容块转发至第二对象提供的第二代理服务器。第二代理服务器再将第一内容块分流至解码服务器进行内容类型转化处理。
可以理解,第二代理服务器可以直接将第一内容块发送至解码服务器进行内容类型转化处理。第二代理服务器也可以将第一内容块分发至适配服务器,由适配服务器根据负载均衡原理,将第一内容块分流至解码服务器进行解码处理。
图11为一个实施例中中控转发的示意图。第一代理服务器作为中央控制中心,将第一内容块中控转发至对应地区和业务的第二代理服务器,由第二代理服务器将第一内容块均衡分流至适配服务器。适配服务器进行逻辑适配处理,将数据入队列后,再将第一内容块分发至该业务对应的解码服务器,有解码服务器对其进行解码识别,即进行内容类型转化处理。然后,解码服务器将内容类型转化后的第二内容块,依次经过适配服务器和第二代理服务器,返回至第一代理服务器。再由第一代理服务器通过双向通信单链路将第二内容块传回至客户端,由客户端依次输出各第二内容块。
上述实施例中,将第一内容块转发至业务方的代理服务器(即第二代理服务器),继而由业务方的代理服务器进行分发处理,一方面相当于进行安全管控,提高了安全性。另一方面,让对应业务方的代理服务器进行分发处理,相当于考虑了业务特征,提高了内容处理的准确性。
如图12所示,在一个实施例中,提供了一种内容处理装置1200,设置于计算机设备。该计算机设备可以为终端或服务器。该装置1200包括:获取模块1202、流式传输模块1204以及输出模块1206,其中:
获取模块1202,用于获取第一内容块;第一内容块为结构化数据。
流式传输模块1204,用于通过基于应用层协议建立的双向通信单链路,按照获取第一内容块的第一顺序,对第一内容块进行流式发送;通过双向通信单链路,接收流式返回的第二内容块;第二内容块,是通过对第一内容块进行内容类型转化得到;发送第一内容块和接收第二内容块是在双向通信单链路中异步进行的。
输出模块1206,用于按照接收第二内容块的第二顺序,依次输出第二内容块。
在一个实施例中,获取模块1202还用于接收触发指令;响应于触发指令,获取初始内容;对初始内容进行结构化处理,生成第一内容块。
在一个实施例中,触发指令,为语音识别指令;获取模块1202还用于响应于语音识别指令,采集音频数据;从采集的音频数据中提取目标语音数据;对目标语音数据进行结构化处理,生成语音数据块,作为第一内容块。
在一个实施例中,第二内容块,是通过对语音数据块进行语音识别,得到的文本内容块;输出模块1206还用于按照接收文本内容块的第二顺序,在界面上依次展示文本内容块。
在一个实施例中,输出模块1206还用于将展示的第二内容块按照第二顺序进行拼接组合,生成搜索语句;根据搜索语句,搜索与搜索语句匹配的媒体内容;展示搜索到的媒体内容。
在一个实施例中,触发指令,为语音合成指令;获取模块1202还用于响应于语音合成指令,获取输入的文本内容;对文本内容进行结构化处理,生成文本内容块,作为第一内容块。
在一个实施例中,第二内容块,是通过将文本内容块进行语音合成,得到的语音数据块;输出模块1206还用于按照接收语音数据块的第二顺序,依 次播放语音数据块。
如图13所示,在一个实施例中,该装置1200还包括:
链路建立模块1203,用于获取应用层协议;基于应用层协议,在本端与代理服务器之间建立双向通信单链路;其中,双向通信单链路,是用于进行双向流式传输的单链路。
在一个实施例中,链路建立模块1203还用于基于传输控制协议或多链路应用层协议进行协议封装,生成用于建立双向通信单链路的应用层协议;向代理服务器发送上行链路请求和下行链路请求;通过应用层协议,将上行链路请求和下行链路请求进行合并封装,生成本端与代理服务器之间的双向通信单链路。
在一个实施例中,流式传输模块1204还用于按照获取第一内容块的第一顺序,向代理服务器流式发送第一内容块;第一内容块,用于指示代理服务器将第一内容块分发至解码服务器;通过双向通信单链路,接收由代理服务器流式返回的第二内容块;第二内容块,是由解码服务器对第一内容块进行内容类型转化得到。
在一个实施例中,代理服务器包括第一代理服务器和第二代理服务器;第二代理服务器,是业务方的代理服务器;第一内容块,是基于业务方的客户端获取得到;第一内容块还用于指示第一代理服务器将第一内容块转发至第二代理服务器,由第二代理服务器将第一内容块分发至解码服务器。
上述内容处理装置,在基于应用层协议建立的双向通信单链路中,对属于结构化数据的第一内容块进行流式发送,并接收流式返回的第二内容块;第二内容块,是通过对第一内容块进行内容类型转化得到。按照接收第二内容块的第二顺序,依次输出第二内容块。由于,发送第一内容块和接收第二内容块是在双向通信单链路中异步进行的,从而实现在同一条通信链路中流式地双向传输结构化的内容,相较于二进制数据而言,不需要额外的数据转化处理,节省了系统资源。
关于内容处理装置的具体限定可以参见上文中对于内容处理方法的限 定,在此不再赘述。上述内容处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
图14为一个实施例中计算机设备的框图。参照图14,该计算机设备可以图1中的终端110。该计算机设备包括通过系统总线连接的一个或多个处理器、存储器、网络接口、显示屏和输入装置。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质可存储操作系统和计算机可读指令。该计算机可读指令被执行时,可使得一个或多个处理器执行一种内容处理方法。该计算机设备的一个或多个处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该内存储器中可储存有计算机可读指令,该计算机可读指令被一个或多个处理器执行时,可使得一个或多个处理器执行一种内容处理方法。计算机设备的网络接口用于进行网络通信。计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏等。计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是终端外壳上设置的按键、轨迹球或触控板,也可以是外接的键盘、触控板或鼠标等。该计算机设备可以是个人计算机、智能音箱、移动终端或车载设备,移动终端包括手机、平板电脑、个人数字助理或可穿戴设备等中的至少一种。
本领域技术人员可以理解,图14中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,本申请提供的内容处理装置可以实现为一种计算机可读指令的形式,计算机可读指令可在如图14所示的计算机设备上运行,计算机设备的非易失性存储介质可存储组成该内容处理装置的各个程序模块。比如,图12所示的获取模块1202、流式传输模块1204以及输出模块1206。各个程序模块所组成的计算机可读指令用于使该计算机设备执行本说明书中描 述的本申请各个实施例的内容处理方法中的步骤。
例如,计算机设备可以通过如图12所示的内容处理装置1200中的获取模块1202获取第一内容块;第一内容块为结构化数据。计算机设备可以通过流式传输模块1204通过基于应用层协议建立的双向通信单链路,按照获取第一内容块的第一顺序,对第一内容块进行流式发送;通过双向通信单链路,接收流式返回的第二内容块;第二内容块,是通过对第一内容块进行内容类型转化得到;发送第一内容块和接收第二内容块是在双向通信单链路中异步进行的。计算机设备可以通过输出模块1206按照接收第二内容块的第二顺序,依次输出第二内容块。
在一个实施例中,提供了一种计算机设备,包括存储器和一个或多个处理器,存储器存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述内容处理方法的步骤。此处内容处理方法的步骤可以是上述各个实施例的内容处理方法中的步骤。
在一个实施例中,提供了一个或多个计算机可读存储介质,存储有计算机可读指令,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述内容处理方法的步骤。此处内容处理方法的步骤可以是上述各个实施例的内容处理方法中的步骤。
需要说明的是,本申请各实施例中的“第一”和“第二”等仅用作区分,而并不用于大小、先后、从属等方面的限定。本申请各实施例中的“多个”即为至少两个。
应该理解的是,虽然本申请各实施例中的各个步骤并不是必然按照步骤标号指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,各实施例中至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种内容处理方法,由计算机设备执行,所述方法包括:
    获取第一内容块;所述第一内容块为结构化数据;
    通过基于应用层协议建立的双向通信单链路,按照获取所述第一内容块的第一顺序,对所述第一内容块进行流式发送;
    通过所述双向通信单链路,接收流式返回的第二内容块;所述第二内容块,是通过对所述第一内容块进行内容类型转化得到;发送所述第一内容块和接收所述第二内容块是在所述双向通信单链路中异步进行的;及
    按照接收所述第二内容块的第二顺序,依次输出所述第二内容块。
  2. 根据权利要求1所述的方法,其特征在于,所述获取第一内容块包括:
    接收触发指令;
    响应于所述触发指令,获取初始内容;及
    对所述初始内容进行结构化处理,生成第一内容块。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述初始内容进行结构化处理,生成第一内容块,包括:
    对所述初始内容进行预处理,以从所述初始内容中提取目标内容;所述目标内容,是指待进行内容类型转化的内容;
    对所述目标内容进行结构化处理,生成第一内容块。
  4. 根据权利要求3所述的方法,其特征在于,所述触发指令,为语音识别指令;所述响应于所述触发指令,获取初始内容包括:
    响应于所述语音识别指令,采集音频数据;
    所述对所述初始内容进行预处理,以从所述初始内容中提取目标内容包括:
    从采集的所述音频数据中提取目标语音数据;及
    所述对所述目标内容进行结构化处理,生成第一内容块,包括:
    对所述目标语音数据进行结构化处理,生成语音数据块,作为第一内容块。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    展示语音识别触发控件;
    所述响应于所述语音识别指令,采集音频数据,包括:
    当接收到由对所述语音识别触发控件的触发所生成的语音识别指令时,跳转至语音搜索界面进行音频数据采集,并基于应用层协议建立所述双向通信单链路。
  6. 根据权利要求4所述的方法,其特征在于,所述第二内容块,是通过对所述语音数据块进行语音识别,得到的文本内容块;及
    所述按照接收所述第二内容块的第二顺序,依次输出所述第二内容块包括:
    按照接收所述文本内容块的第二顺序,在界面上依次展示所述文本内容块。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    将展示的所述第二内容块按照所述第二顺序进行拼接组合,生成搜索语句;
    根据所述搜索语句,搜索与所述搜索语句匹配的媒体内容;及
    展示搜索到的所述媒体内容。
  8. 根据权利要求2所述的方法,其特征在于,所述触发指令,为语音合成指令;所述响应于所述触发指令,获取初始内容包括:
    响应于语音合成指令,获取输入的文本内容;
    所述对所述初始内容进行结构化处理,生成第一内容块包括:及
    对所述文本内容进行结构化处理,生成文本内容块,作为第一内容块。
  9. 根据权利要求8所述的方法,其特征在于,所述第二内容块,是通过将所述文本内容块进行语音合成,得到的语音数据块;及
    所述按照接收所述第二内容块的第二顺序,依次输出所述第二内容块包括:
    按照接收所述语音数据块的第二顺序,依次播放所述语音数据块。
  10. 根据权利要求9所述的方法,其特征在于,所述语音数据块通过语音数据块生成步骤得到,所述语音数据块生成步骤包括:
    将文本内容块与预设声音模板结合,合成与所述预设声音模板的声音特征相符的语音数据块。
  11. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    获取应用层协议;及
    基于所述应用层协议,在本端与代理服务器之间建立双向通信单链路;
    其中,所述双向通信单链路,是用于进行双向流式传输的单链路。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    基于传输控制协议和多链路应用层协议中的任意一种进行协议封装,生成用于建立双向通信单链路的应用层协议;
    所述基于所述应用层协议,在本端与代理服务器之间建立双向通信单链路包括:
    向代理服务器发送上行链路请求和下行链路请求;及
    通过所述应用层协议,将所述上行链路请求和下行链路请求进行合并封装,生成本端与代理服务器之间的双向通信单链路。
  13. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收应用接口适配指令;
    响应于该应用接口适配指令,在接入层适配至少一个应用的接口;所适配的接口所对应的应用,为用于通过所述双向通信单链路进行数据的发送和接收处理的应用。
  14. 根据权利要求1所述的方法,其特征在于,所述按照获取所述第一内容块的第一顺序,对所述第一内容块进行流式发送包括:
    按照获取所述第一内容块的第一顺序,向代理服务器流式发送所述第一内容块;所述第一内容块,用于指示所述代理服务器将所述第一内容块分发至解码服务器;及
    所述通过所述双向通信单链路,接收流式返回的第二内容块包括:
    通过所述双向通信单链路,接收由所述代理服务器流式返回的第二内容块;所述第二内容块,是由所述解码服务器对所述第一内容块进行内容类型转化得到。
  15. 根据权利要求14所述的方法,其特征在于,所述代理服务器包括第一代理服务器和第二代理服务器;所述第一代理服务器是第一对象提供的代理服务器;所述第二代理服务器,是第二对象提供的代理服务器;所述第一内容块,是基于所述第二对象的客户端获取得到;所述双向通信单链路,是基于所述应用层协议和所述第一对象提供的软件包,建立于所述客户端和所述第一代理服务器之间;
    所述第一内容块还用于指示所述第一代理服务器将所述第一内容块转发至第二代理服务器,由所述第二代理服务器将所述第一内容块分发至所述解码服务器。
  16. 一种内容处理装置,其特征在于,设置于计算机设备中,包括:
    获取模块,用于获取第一内容块;所述第一内容块为结构化数据;
    流式传输模块,用于通过基于应用层协议建立的双向通信单链路,按照获取所述第一内容块的第一顺序,对所述第一内容块进行流式发送;通过所述双向通信单链路,接收流式返回的第二内容块;所述第二内容块,是通过对所述第一内容块进行内容类型转化得到;发送所述第一内容块和接收所述第二内容块是在所述双向通信单链路中异步进行的;及
    输出模块,用于按照接收所述第二内容块的第二顺序,依次输出所述第二内容块。
  17. 根据权利要求16所述的装置,其特征在于,所述流式传输模块还用于按照获取所述第一内容块的第一顺序,向代理服务器流式发送所述第一内容块;所述第一内容块,用于指示所述代理服务器将所述第一内容块分发至解码服务器;通过所述双向通信单链路,接收由所述代理服务器流式返回的第二内容块;所述第二内容块,是由所述解码服务器对所述第一内容块进行 内容类型转化得到。
  18. 根据权利要求16所述的装置,其特征在于,所述装置还包括:
    链路建立模块,用于基于传输控制协议和多链路应用层协议中的任意一种进行协议封装,生成用于建立双向通信单链路的应用层协议;向代理服务器发送上行链路请求和下行链路请求;及通过所述应用层协议,将所述上行链路请求和下行链路请求进行合并封装,生成本端与代理服务器之间的双向通信单链路。
  19. 一种计算机设备,其特征在于,包括存储器和一个或多个处理器,所述存储器中存储有计算机程序,所述计算机程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1至15中任一项所述方法的步骤。
  20. 一个或多个计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被一个或多个处理器执行时,使得所述一个或多个处理器执行权利要求1至15中任一项所述方法的步骤。
PCT/CN2020/114352 2019-11-29 2020-09-10 内容处理方法、装置、计算机设备及存储介质 WO2021103741A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/519,237 US20220059073A1 (en) 2019-11-29 2021-11-04 Content Processing Method and Apparatus, Computer Device, and Storage Medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911200739.6 2019-11-29
CN201911200739.6A CN110971685B (zh) 2019-11-29 2019-11-29 内容处理方法、装置、计算机设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/519,237 Continuation US20220059073A1 (en) 2019-11-29 2021-11-04 Content Processing Method and Apparatus, Computer Device, and Storage Medium

Publications (1)

Publication Number Publication Date
WO2021103741A1 true WO2021103741A1 (zh) 2021-06-03

Family

ID=70032155

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/114352 WO2021103741A1 (zh) 2019-11-29 2020-09-10 内容处理方法、装置、计算机设备及存储介质

Country Status (3)

Country Link
US (1) US20220059073A1 (zh)
CN (1) CN110971685B (zh)
WO (1) WO2021103741A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929780B (zh) 2019-11-19 2023-07-11 腾讯科技(深圳)有限公司 视频分类模型构建、视频分类的方法、装置、设备及介质
CN110971685B (zh) * 2019-11-29 2021-01-01 腾讯科技(深圳)有限公司 内容处理方法、装置、计算机设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002089114A1 (en) * 2001-04-26 2002-11-07 Stenograph, L.L.C. Systems and methods for automated audio transcription translation and transfer
CN110136703A (zh) * 2019-03-25 2019-08-16 视联动力信息技术股份有限公司 一种模糊回答方法和视联网系统
CN110299152A (zh) * 2019-06-28 2019-10-01 北京猎户星空科技有限公司 人机对话的输出控制方法、装置、电子设备及存储介质
CN110491370A (zh) * 2019-07-15 2019-11-22 北京大米科技有限公司 一种语音流识别方法、装置、存储介质及服务器
CN110971685A (zh) * 2019-11-29 2020-04-07 腾讯科技(深圳)有限公司 内容处理方法、装置、计算机设备及存储介质

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059197A1 (en) * 2006-08-29 2008-03-06 Chartlogic, Inc. System and method for providing real-time communication of high quality audio
CN102104853A (zh) * 2010-12-30 2011-06-22 重庆新媒农信科技有限公司 基于移动终端网页数据业务的服务器系统及其业务通信方法
PL401347A1 (pl) * 2012-10-25 2014-04-28 Ivona Software Spółka Z Ograniczoną Odpowiedzialnością Spójny interfejs do lokalnej i oddalonej syntezy mowy
US9131067B2 (en) * 2012-11-05 2015-09-08 Genesys Telecommunications Laboratories, Inc. System and method for out-of-band communication with contact centers
CN104765579B (zh) * 2014-01-08 2019-01-18 精工爱普生株式会社 Pos控制系统、pos控制系统的控制方法、以及打印装置
US20160048561A1 (en) * 2014-08-15 2016-02-18 Chacha Search, Inc. Method, system, and computer readable storage for podcasting and video training in an information search system
US10701037B2 (en) * 2015-05-27 2020-06-30 Ping Identity Corporation Scalable proxy clusters
CN105243155A (zh) * 2015-10-29 2016-01-13 贵州电网有限责任公司电力调度控制中心 一种大数据抽取和交换系统
CN105679319B (zh) * 2015-12-29 2019-09-03 百度在线网络技术(北京)有限公司 语音识别处理方法及装置
US10846029B2 (en) * 2017-06-13 2020-11-24 Bixolon Co., Ltd. Printing apparatus to acquire print data and transmit a request to an external apparatus to close websocket communication when predetermined time period elapses
CN107808670B (zh) * 2017-10-25 2021-05-14 百度在线网络技术(北京)有限公司 语音数据处理方法、装置、设备及存储介质
CN108173721A (zh) * 2017-12-18 2018-06-15 华南师范大学 基于iOS的语音控制智能家居系统及语音识别控制方法
CN108052681B (zh) * 2018-01-12 2020-05-26 毛彬 一种关系型数据库间结构化数据的同步方法及系统
US11288038B2 (en) * 2018-07-30 2022-03-29 John Holst, III System and method for voice recognition using a peripheral device
CN110120917B (zh) * 2019-06-28 2024-02-02 北京瑛菲网络科技有限公司 基于内容的路由方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002089114A1 (en) * 2001-04-26 2002-11-07 Stenograph, L.L.C. Systems and methods for automated audio transcription translation and transfer
CN110136703A (zh) * 2019-03-25 2019-08-16 视联动力信息技术股份有限公司 一种模糊回答方法和视联网系统
CN110299152A (zh) * 2019-06-28 2019-10-01 北京猎户星空科技有限公司 人机对话的输出控制方法、装置、电子设备及存储介质
CN110491370A (zh) * 2019-07-15 2019-11-22 北京大米科技有限公司 一种语音流识别方法、装置、存储介质及服务器
CN110971685A (zh) * 2019-11-29 2020-04-07 腾讯科技(深圳)有限公司 内容处理方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN110971685B (zh) 2021-01-01
CN110971685A (zh) 2020-04-07
US20220059073A1 (en) 2022-02-24

Similar Documents

Publication Publication Date Title
CN110730952B (zh) 处理网络上的音频通信的方法和系统
US11151765B2 (en) Method and apparatus for generating information
US11917344B2 (en) Interactive information processing method, device and medium
US9177551B2 (en) System and method of providing speech processing in user interface
US8898054B2 (en) Determining and conveying contextual information for real time text
US10824664B2 (en) Method and apparatus for providing text push information responsive to a voice query request
US11270690B2 (en) Method and apparatus for waking up device
WO2021103741A1 (zh) 内容处理方法、装置、计算机设备及存储介质
JP7448672B2 (ja) 情報処理方法、システム、装置、電子機器及び記憶媒体
CN110992955A (zh) 一种智能设备的语音操作方法、装置、设备及存储介质
CN107274882B (zh) 数据传输方法及装置
CN102299934A (zh) 一种基于云模式和语音识别的语音输入方法
CN113676741A (zh) 数据传输方法、装置、存储介质及电子设备
US11818491B2 (en) Image special effect configuration method, image recognition method, apparatus and electronic device
JP2023522092A (ja) インタラクション記録生成方法、装置、デバイス及び媒体
CN110418181B (zh) 对智能电视的业务处理方法、装置、智能设备及存储介质
JP2022050309A (ja) 情報処理方法、装置、システム、電子機器、記憶媒体およびコンピュータプログラム
US20200412773A1 (en) Method and apparatus for generating information
CN113299285A (zh) 设备控制方法、装置、电子设备及计算机可读存储介质
CN116566963B (zh) 一种音频处理方法、装置、电子设备和存储介质
CN113098931B (zh) 信息分享方法和多媒体会话终端
US11830120B2 (en) Speech image providing method and computing device for performing the same
WO2024032111A9 (zh) 在线会议的数据处理方法、装置、设备、介质及产品
US20230297324A1 (en) Audio Control Method, System, and Electronic Device
CN116246192A (zh) 一种字幕的展示方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20894799

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20894799

Country of ref document: EP

Kind code of ref document: A1