WO2021218981A1 - 互动记录的生成方法、装置、设备及介质 - Google Patents

互动记录的生成方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021218981A1
WO2021218981A1 PCT/CN2021/090395 CN2021090395W WO2021218981A1 WO 2021218981 A1 WO2021218981 A1 WO 2021218981A1 CN 2021090395 W CN2021090395 W CN 2021090395W WO 2021218981 A1 WO2021218981 A1 WO 2021218981A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
information
interactive
behavior
user
Prior art date
Application number
PCT/CN2021/090395
Other languages
English (en)
French (fr)
Inventor
杨晶生
陈可蓉
赵立
韩晓
史寅
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Priority to JP2022563348A priority Critical patent/JP2023522092A/ja
Priority to EP21795627.5A priority patent/EP4124024A4/en
Publication of WO2021218981A1 publication Critical patent/WO2021218981A1/zh
Priority to US17/881,999 priority patent/US20220375460A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/72Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for transmitting results of analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1831Tracking arrangements for later retrieval, e.g. recording contents, participants activities or behavior, network status
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44213Monitoring of end-user related data
    • H04N21/44222Analytics of user selections, e.g. selection of programs or purchase activity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/4508Management of client data or end-user data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1827Network arrangements for conference optimisation or adaptation

Definitions

  • the embodiments of the present disclosure relate to the field of computer data processing technology, and in particular, to a method, device, device, and medium for generating interactive records.
  • the server can receive the voice information of each speaking user, and process the voice information before playing it.
  • the embodiments of the present disclosure provide a method, device, equipment, and medium for generating an interactive record, so as to optimize the recording method of the interactive process, thereby improving the efficiency of interactive communication.
  • an embodiment of the present disclosure provides a method for generating an interactive record, the method including:
  • an embodiment of the present disclosure also provides an interactive record generating device, which includes:
  • a behavior data collection module configured to collect user behavior data represented by the multimedia data stream from the multimedia data stream, the behavior data including voice information and/or operation information;
  • the interactive record data generating module is configured to generate interactive record data corresponding to the behavior data based on the behavior data.
  • embodiments of the present disclosure also provide an electronic device, the electronic device including:
  • One or more processors are One or more processors;
  • Storage device for storing one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the interactive record generating method according to any one of the embodiments of the present disclosure.
  • the embodiments of the present disclosure also provide a storage medium containing computer-executable instructions, when the computer-executable instructions are executed by a computer processor, they are used to execute the interactive recording as described in any of the embodiments of the present disclosure.
  • the method of generation is also provided.
  • the technical solution of the embodiment of the present disclosure collects voice information and/or operation information in a multimedia data stream, and generates interactive recording data based on the voice information and operation information, so that interactive users can determine the interactive information through the interactive recording data, which improves The interactive efficiency of the interactive user, in turn, also improves the user experience.
  • FIG. 1 is a schematic flowchart of a method for generating an interactive record according to Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for generating an interactive record according to Embodiment 2 of the present disclosure
  • FIG. 3 is a schematic flowchart of a method for generating an interactive record according to Embodiment 3 of the present disclosure
  • FIG. 4 is a schematic flowchart of a method for generating an interactive record according to Embodiment 4 of the present disclosure
  • FIG. 5 is a schematic structural diagram of an interactive record generating apparatus provided by Embodiment 5 of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by Embodiment 6 of the present disclosure.
  • Figure 1 is a schematic flow chart of a method for generating interactive records provided by Embodiment 1 of the present disclosure.
  • This embodiment of the present disclosure is applicable to a situation where interactive record data is generated based on user interaction information in an interactive application scenario supported by the Internet.
  • the method may be executed by an interactive record generating device, which may be implemented in the form of software and/or hardware, and optionally, implemented by an electronic device, which may be a mobile terminal, a PC, or a server.
  • the interaction scenario is usually realized by the cooperation of the client and the server, and the method provided in this embodiment can be executed by the server, or executed by the cooperation of the client and the server.
  • the method of this embodiment includes:
  • S110 Collect user behavior data represented by the multimedia data stream from the multimedia data stream.
  • the multimedia data stream may be video stream data corresponding to the real-time interactive interface, or video stream data in the recorded video after the real-time interactive interface is recorded.
  • the real-time interactive interface is any interactive interface in the real-time interactive application scenario. Real-time interactive scenes can be implemented through the Internet and computer means, for example, interactive applications implemented through native programs or web programs.
  • multiple users may be allowed to interact in various forms of interactive behaviors, for example, interactive behaviors such as inputting text, voice, video, or sharing.
  • the behavior data may include various data involved in the interactive behavior, for example, the type of the interactive behavior, and the specific content involved in the interactive behavior.
  • the voice data and/or behavior data of each interactive user participating in the interaction can be collected from the multimedia data stream corresponding to the interactive behavior interface, so as to generate the interaction record data corresponding to the behavior data according to the behavior data.
  • the interactive record data corresponds to the collected behavior data.
  • the interactive record data can be the conversion of the voice information in the interactive behavior and the specific content involved in the interactive behavior into corresponding textual expressions.
  • the interactive record data may be an interactive record text corresponding to the behavior data, and the interactive record text may include a textual expression corresponding to the voice information, or the operation information is converted into a corresponding textual expression; or an interactive record
  • the text may include not only the text expression corresponding to each voice information, but also the text expression corresponding to the operation information.
  • the operation of generating interactive record data can be generated by each client's own processing, or it can be generated by the server's unified processing of the behavior data of each user.
  • the server may process each behavior data, and obtain the textual expressions corresponding to the voice information and the operation information respectively, that is, generate the corresponding behavior data Interactive record data.
  • the advantage of generating interactive record data is: considering that in the process of video conference or live broadcast, when the speech information of other speaking users cannot be understood or missed, the behavior data of each speaking user can be collected, and the corresponding data can be generated based on the behavior data.
  • Interactive recording data so that users can view the speech information of other speaking users based on the interactive recording data, thereby determining the core ideas of each speaking user, and improving the user interaction efficiency during the interaction process and the technical effect of user experience.
  • the voice information of the speaking user cannot be determined based on the screen recording video, the user needs to manually trigger the playback operation. For example, each time the playback control is triggered, the video can be rewinded for five seconds, or the progress bar can be dragged to control the screen recording.
  • the interactive recording data corresponding to the screen recording video can be generated, and the core ideas of each speaking user can be determined intuitively based on the interactive recording data, which improves the convenience and efficiency of interactive interaction.
  • the operation information can be that a certain text in the document is triggered.
  • the operation information is converted into corresponding textual expressions, and a certain piece of text content triggered by the speaking user can also be obtained, and then the textual expression and text content can be used as interactive record data.
  • the data recorded in the interactive recording data may include the speaking user identification, speaking time and corresponding textual expressions, for example, the information recorded in the interactive recording data
  • the data can be "ID1-20: 00-I agree with this matter”.
  • the technical solution of the embodiment of the present disclosure collects voice information and/or operation information in a multimedia data stream, and generates interactive recording data based on the voice information and operation information, so that interactive users can determine the interactive information through the interactive recording data, which improves The interactive efficiency of the interactive user, in turn, also improves the user experience.
  • the behavior data includes operation information
  • "generating interaction record data corresponding to the behavior data" in this step S120 may include: determining the operation object and the operation behavior in the operation information, and based on the operation object and the operation behavior The association relationship of the generated interactive record data.
  • the user's operation behavior can have multiple types, and correspondingly, the operation objects corresponding to the operation behavior also include multiple types.
  • the operation object and operation behavior in the operation behavior can be obtained, and the operation object and operation behavior can be converted into corresponding interactive record data.
  • Typical user operation behavior data may include sharing behaviors and shared objects.
  • the sharing behaviors may be document sharing operations and/or screen sharing operations, and the shared objects may be specific shared content.
  • the operation information includes document sharing operation information
  • the operation object includes a shared document
  • the operation behavior includes a document sharing behavior.
  • interaction record data corresponding to the behavior data, including: determining the document sharing address and/or storage address associated with the shared document based on the shared document, and generating interaction based on the shared address and/or storage address Record data.
  • the operation information includes screen sharing operation information
  • the operation object includes the shared screen
  • the operation behavior includes the sharing behavior of the shared screen.
  • generating interaction record data corresponding to the behavior data includes: determining identification information in the shared screen based on the shared screen, and generating interaction record data based on the identification information.
  • user operation behaviors may be, but not limited to, those listed above. For example, they may also include behaviors such as whiteboard writing.
  • FIG. 2 is a schematic flowchart of a method for generating an interactive record according to Embodiment 2 of the present disclosure.
  • the multimedia data stream may be determined based on a real-time interactive interface, or may be determined based on a recorded screen video.
  • the multimedia data streams are obtained in different ways, the user behavior data represented by the collected multimedia data streams are also different. Accordingly, there are also certain differences in generating interactive record data corresponding to the behavior data.
  • the multimedia data stream is determined based on the real-time interactive interface as an example.
  • the method of this embodiment includes:
  • S210 When receiving request information for generating an interactive record, collect behavior data of each user based on the request information.
  • the target control can be a control that generates interactive recording data. If the target control is triggered, each speech can be collected The behavior data of the user, on the contrary, the behavior data of the speaking user is not collected.
  • the server If the server generates an interactive record, it can be specifically in the process of real-time interaction (for example, during a video conference). If the user triggers the interactive record generation control on the client, the client can generate an interactive record request based on the trigger operation Information and send the request information to the server. After receiving the request information, the server can start to collect the voice information and/or operation information of each interactive user in real time based on the request information.
  • collecting user behavior data represented by the multimedia data stream includes: receiving the client terminal to collect voice information of each user, and/or receiving request information corresponding to the trigger operation, and determining the operation corresponding to the request information information.
  • users participating in real-time interaction may be referred to as interactive users or speaking users.
  • the client corresponding to interactive user A can collect the voice data of interactive user A; and/or, if the server receives the request information corresponding to the trigger operation, it can be based on this Request information to determine the operation triggered by the interactive user on the client, and then determine the operation object and operation behavior corresponding to the trigger operation, so as to generate an interaction record based on the operation object and operation behavior.
  • the behavior data includes operation information
  • determine the operation object and operation behavior in the operation information and generate interactive record data based on the association relationship between the operation object and the operation behavior, including: when an operation that triggers document sharing is detected When, first obtain the shared document and the associated information corresponding to the shared document; then determine the operation information based on the trigger operation, the shared document, and the associated information; where the associated information includes the shared link of the shared document and/or the shared document Storage address; then based on the operation information, interactive record data corresponding to the behavior data can be generated.
  • the trigger operation is a document sharing operation
  • the operation object in the operation information is a shared document
  • the operation behavior is a document sharing operation.
  • generating the interaction record based on the operation information may be: when the sharing control is detected to be triggered, the shared document in the multimedia video stream may be obtained, and the shared link corresponding to the shared document may be determined, or the storage address of the shared document may be determined.
  • a piece of data in the interactive record data can be generated based on the trigger operation of the document sharing, the shared document, and the link or storage address corresponding to the shared document.
  • the interactive record data corresponding to the sharing trigger operation can be: ID-sharing operation- Shared document A- The storage link is http//xxxxxxxxxx.com.
  • generating interactive record data corresponding to the operation information includes: first identifying the identifying information in the shared screen when an operation that triggers screen sharing is detected; Then, based on the identifying information, the screen sharing operation is triggered, and the shared screen is used as the operation information to generate interactive record data based on the operation information; the identifying information includes the link in the shared screen.
  • the trigger operation is a screen sharing operation
  • the operation object in the operation information is a shared screen
  • the operation behavior is a screen sharing operation.
  • the interactive record generated based on the operation information can be: when the sharing control is detected, the shared screen in the multimedia video stream can be obtained, and the identifying information in the shared screen can be extracted. If the content displayed on the shared screen is web page information, The extracted identifying information in the shared screen may be a link to the webpage. Furthermore, a piece of data in the interactive record data can be generated based on the trigger operation of the screen sharing, the shared screen, and the identifying information in the shared screen.
  • generating the interaction record corresponding to the operation information can not only determine the operation behavior of the interactive user, but also record the information associated with the operation object in the operation information, so that the user can adjust the operation information based on the operation information recorded in the interaction record data. Fetching the corresponding shared document or the content on the shared screen further improves the interactive efficiency of real-time interaction.
  • the collected behavior data inevitably includes the voice information of each interactive user.
  • generating interaction record data corresponding to the behavior data may include: performing voice recognition on the voice information, and generating interaction record data based on the recognition result of the voice information .
  • the recognition of voice information can include voiceprint recognition, and the identity information of each speaking user can be determined based on the result of voiceprint recognition.
  • Speech information recognition can also include language type recognition, which can determine the target language type of the speaking user to which the speech information belongs, and then translate the speech information of each speaking user into a text expression of the same target language type.
  • the advantage of this processing is that other The language type voice data is translated into the same interactive record data as the target language type, so that the user can assist the user to understand the speech information of other users according to the interactive record data, and improve the communication efficiency in the interactive process.
  • perform voice recognition on the voice information and generate interactive record data based on the voice recognition result of the voice information, including: determining the target language type of the speaking user to which the voice information belongs, and comparing the voice information in the behavior data based on the target language type Process and generate interactive record data.
  • the user corresponding to each client can be used as the speaking user, and the corresponding client can be used as the target client.
  • the target language type may be the language type currently used by the target speaking user, or a language type preset on the client by the target speaking user.
  • the target language type of the speaking user to which each voice information belongs can be determined, and the voice data can be converted into interactive record data of the same type as the target language.
  • the target language type can be the language type used by the speaking user, or the language type set by the speaking user on the client in advance, that is, the language that the speaking user is more familiar with. Therefore, the voice data of other speaking users is converted to and The interactive recording data corresponding to the target language type can improve the efficiency of the speaking user in reading the interactive recording data, and assist the speaking user to understand the voice information of other speaking users conveniently and accurately, thereby achieving the technical effect of improving the interaction efficiency.
  • determining the target language type of the speaking user to which the voice information belongs includes: determining the target language type based on the language type of the speaking user to which the current client belongs.
  • the target language type can be determined according to the language type of the speaking user to which each client belongs.
  • determining the language type of the speaking user to which the current client belongs is determined by at least one of the following methods: by identifying the language type of the voice information in the behavior data to determine the language type of the user; obtaining the preset on the client Language type; obtain the login address of the client, and determine the language type corresponding to the user based on the login address.
  • the first method may be: first obtain the voice information in the behavior data, and then determine the corresponding language type of the speaking user based on the voice information, and then use this language type as the target language type. For example, if the voice information of the speaking user A is collected, and after processing the voice information, it is determined that the language type of the speaking user A is Chinese, the Chinese language type may be used as the target language type.
  • the real-time interaction is a video conference.
  • the three users can be marked as user A, user B, and user C.
  • the language type used by user A is Chinese.
  • the language type used by user B is the English language type
  • the language type used by user C is the Japanese language type.
  • user A triggers the subtitle display control the voice information of user A, user B, and user C can be collected separately.
  • the voice information of user A it is determined that the language type of user A is Chinese.
  • Chinese can be used as the target Language type, at the same time, it can translate the voice information of user B and user C into Chinese, and convert the operation information into Chinese corresponding to it.
  • the interactive record data is the data that converts the behavior data of each user into Chinese; of course, If the user B triggers the subtitle display control, the target language type can be determined to be English according to the voice information of the user B, and accordingly, the voice information of the user A and the user C can be translated into English. In other words, the language type of the speaking user can be used as the target language type, and the voice information of other speaking users can be translated into the target language type as the interactive record data.
  • the second way can be: when the speaking user triggers the operation of displaying the subtitles, the language type of the subtitles is set, and the set language type is used as the target language type.
  • a language selection list may pop up for the user to select. The user can select any language type. For example, if the user triggers the Chinese language type in the language selection list and clicks the confirm button, the server or client can confirm that the speaking user has selected the Chinese language type, and use the Chinese language type as Target language type.
  • the third method can be: when it is detected that the speaking user triggers the subtitle display control, obtain the login address of the client, that is, the IP address of the client, so as to determine the area to which the client belongs according to the login address, and then the language used in the area Type as the target language type. For example, when the user triggers the subtitle display control, the login address of the client can be obtained, and based on the login address, it is determined that the region to which the client belongs is China, and the target language type is Chinese.
  • the target language type corresponding to each speaking user is determined, and the voice information of other speaking users is converted into the target language type, so that the generated interactive record data conforms to the reading habits of each speaking user, so that the user can Quickly understand the speaking information of other speaking users, thereby improving the technical effect of interactive interaction efficiency.
  • the technical solution of the embodiment of the present disclosure collects the behavior data of each speaking user in the process of real-time interaction, and converts the behavior data into interactive recording data of the target language type, so that the speaking user can understand the voice information of other speaking users based on the interactive recording data.
  • interactive review based on interactive record data not only improves the efficiency of interactive interaction, but also improves the technical effect of meeting content summary.
  • FIG. 3 is a schematic flowchart of a method for generating an interactive record according to Embodiment 3 of the present disclosure.
  • the multimedia data stream can also be determined based on the recorded screen video. Accordingly, the collection of behavior data and the generation of interactive recording data can be specifically optimized.
  • the method includes:
  • S310 Collect voice information and operation information in the screen-recorded video.
  • a screen recording device may be used to record the interaction process to obtain a screen-recorded video.
  • the video conference is recorded, and the recorded video conference is used as a screen recording video.
  • the voice information and operation information of each speaking user can be determined.
  • the user can first trigger the interactive recording data generation control, and based on the user's trigger operation, the voice information and operation information collected from the multimedia data stream of the screen recording video can be collected.
  • the processing of the voice data can be: first perform voiceprint recognition on the voice information to determine the speaking user corresponding to the voice information; at the same time perform voice recognition on the voice information to obtain Speech recognition result; then based on the association between the speaking user and the speech recognition result, interactive record data corresponding to the behavior data can be generated.
  • the client has a corresponding client account or client ID, so that different speaking users can be distinguished according to different client accounts.
  • client ID based on the recorded video, the speaking users cannot be distinguished by the client ID. Therefore, the voiceprint recognition of each speaking user's voice information can be performed. Since each speaking user's voice has a unique voiceprint, it can be based on this To distinguish different speaking users.
  • the interactive record data may include the translation corresponding to the user A-voice data, and the translation corresponding to the user B-voice data.
  • each voice information in the recorded video is collected, and different speaking user identities are determined by voiceprint recognition; through the analysis and processing of voice data, the translation data corresponding to the voice data is determined; The identity is associated with the translation data, and the interactive record data corresponding to the behavior data is determined.
  • the interactive record data can be speaking user A-translated behavior data; speaking user B-translated behavior data....
  • the screen recording video also includes the operation information of the speaking user, and the operation information can be processed by extracting information from the operation object in the operation information to generate interactive recording data corresponding to the behavior data.
  • extracting information from the operation object in the operation information to generate interactive record data corresponding to the behavior data may include: first determining the target element in the target image corresponding to the operation information based on image recognition; and then based on the target element Generate interactive records corresponding to the behavior data; where the target image includes the image corresponding to the shared document and/or the image corresponding to the shared screen, where the target element can be identification information such as target link, target storage address, movie, TV show name, etc. At least one or more of them. Therefore, interactive recording data can be generated based on the above information (that is, the interactive recording data includes information such as target link, target storage address, movie, TV series name, etc.).
  • the technical solution of the embodiment of the present disclosure collects the behavior data of each user who speaks in the screen recording video, and generates interactive record data corresponding to the behavior data, which can facilitate the user to browse the interactive record data and determine the core of each user who speaks.
  • This idea avoids the problem that related technologies need to play back the recorded video when determining the core idea of each user who speaks. For example, if there is a long pause in the recorded video, the user needs to wait for a certain period of time or trigger the fast forward button to pass When the fast forward button is triggered, the user cannot accurately locate the location that the user wants to browse, which results in wasted time and low efficiency of real-time interactive review. As a result, the screen recording video is converted into the corresponding interactive recording text based on the interactive recording. When the text is quickly browsed, and then the core ideas of each user can be understood in a timely and convenient manner, it has played a time-saving technical effect.
  • FIG. 4 is a schematic flowchart of a method for generating an interactive record according to Embodiment 4 of the present disclosure. As shown in Figure 4, the method includes:
  • S410 Collect user behavior data represented by the multimedia data stream from the multimedia data stream, where the behavior data includes voice information and/or operation information;
  • S430 Send the interactive record data to the target client, so as to display the interactive record data on the target client.
  • the client corresponding to the speaking user is taken as the target client.
  • the interaction record data may be sent to the target client terminal to display the interaction record data on the target client terminal.
  • the display area of the interactive record can be preset and the preset display area can be used as the target area.
  • the target area may be, for example, the area around the main interaction area, but may be the top, bottom, or side edges.
  • the video interaction window is the main interaction area, occupying 2/3 of the screen area
  • the area displaying interactive recording data can be the 1/3 area on the side, and correspondingly, the 1/3 area on the side.
  • the interactive record data can be displayed in the 1/3 area on the side.
  • displaying the interactive recording data in the target area includes: displaying the interactive recording data in the target area in the form of a barrage; where the target area includes a blank area in the video screen.
  • the blank area may be an area that does not include any elements in the interactive interface, for example, text, avatar, and other elements.
  • the blank area may be preset to store the interactive record data in the target area.
  • the blank area can also be updated in real time, and its specific update method can be updated in real time according to the image information displayed on the display interface.
  • the server can detect the interactive interactive interface in real time, for example, detect the video conference interface, and based on the elements displayed on the interactive interface, determine the area without any elements on the display interface in real time, and use the determined area as a blank area.
  • the interactive recording data can also be stored in the target location.
  • the interaction record data may be stored locally; and/or the interaction record data may be stored in the cloud, and a storage link corresponding to the interaction record data may be generated to obtain the interaction record data based on the storage link.
  • the target location can be the cloud or the local.
  • storing the interaction record data to the target location may include: exporting the interaction record data to the local; and/or storing the interaction record data in the cloud, and generating a storage link corresponding to the interaction record data to obtain the interaction record based on the storage link data.
  • the technical solution of the embodiment of the present disclosure can display the interactive recording data in the target area, so that the user can browse the interactive recording data displayed on the interactive interface during the real-time interaction process, which improves the convenience of the user to read the interactive recording data.
  • FIG. 5 is a schematic structural diagram of an interactive record generating device provided by Embodiment 5 of the present disclosure. As shown in FIG. 5, the device includes: a behavior data collection module 510 and an interactive record data generation module 520.
  • the behavior data collection module 510 is configured to collect user behavior data represented by the multimedia data stream from the multimedia data stream, and the behavior data includes voice information and/or operation information; the interactive record data generation module 520 , For generating interaction record data corresponding to the behavior data based on the behavior data.
  • the technical solution of the embodiment of the present disclosure collects voice information and/or operation information in a multimedia data stream, and generates interactive recording data based on the voice information and operation information, so that interactive users can determine the interactive information through the interactive recording data, which improves The interactive efficiency of interactive users, and thus improve the technical effect of user experience.
  • the behavior data includes operation information
  • the interactive record data generating module is further used to: determine the operation object and the operation behavior in the operation information, based on the association relationship between the operation object and the operation behavior, Generate interactive record data.
  • the operation information includes document sharing operation information
  • the operation object includes a shared document
  • the operation behavior includes a document sharing behavior
  • the interactive record data generation module is further configured to:
  • the shared document determines the document sharing address and/or storage address associated with the shared document, and generates the interaction record data based on the shared address and/or storage address.
  • the operation information includes screen sharing operation information
  • the operation object includes a shared screen
  • the operation behavior includes a sharing behavior on the shared screen
  • the interactive record data generation module It is used to: determine the identification information in the shared screen based on the shared screen, and generate the interaction record data based on the identification information.
  • the behavior data collection module is also used to collect voice information and operation information in the screen-recorded video.
  • the behavior data collection module further includes:
  • the speaking user determining unit is used to perform voiceprint recognition on the voice information to determine the speaking user corresponding to the voice information; the voice recognition unit is used to perform voice recognition on the voice information to obtain the voice recognition result; interactive record data generation The unit is configured to generate interactive record data corresponding to the behavior data based on the association between the speaking user and the voice recognition result.
  • the behavior data collection module is further used to: generate interactive record data corresponding to the behavior data by extracting information from the operation object in the operation information.
  • the behavior data collection module is also used for:
  • the multimedia data stream is a data stream generated based on a real-time interactive interface
  • the behavior data collection module is further used for: when receiving request information for generating an interactive record, based on the request information Collect the behavior data of each user in real time.
  • the behavior data collection module is also used to: receive the voice information of each user collected by the client terminal, and/or receive request information corresponding to the trigger operation, and determine that it corresponds to the request information Operation information.
  • the interactive record data generation module is also used to: when an operation that triggers document sharing is detected, obtain the shared document and associate it with the shared document. Corresponding associated information; determine the operation information based on the trigger operation, the shared document, and the associated information; the associated information includes the shared link of the shared document, and/or the storage address of the shared document; Based on the operation information, interactive record data corresponding to the behavior data is generated.
  • the interactive record data generation module is also used to: when an operation that triggers screen sharing is detected, identify the identifying information in the shared screen; The identifying information, the trigger operation, and the video frame of the shared screen to determine the operating information; based on the operating information, generating interactive record data corresponding to the behavior data; wherein the identifying information includes the Share the link in the screen.
  • the interactive record data generation module is also used for:
  • the interactive record data generating module further includes:
  • the language type determining unit is used to determine the target language type of the speaking user to which the voice information belongs; the interactive record data generating sub-module is used to process the voice information in the behavior data based on the target language type to generate the interactive record data.
  • the language type determining unit is also used to determine the target language type based on the language type of the speaking user to which the current client belongs.
  • the language type of the speaking user to which the current client belongs is determined by at least one of the following methods: by recognizing the language type of the voice information in the behavior data to determine the language type of the user; The set language type; the login address of the client is obtained, and the language type corresponding to the user is determined based on the login address.
  • the device further includes an interactive recording data display module, configured to send the interactive recording data to a target client, so as to display the interactive recording data on the target client.
  • the device further includes an interactive record data display module, which is also used to display the interactive record data in the target area.
  • the target area is located at the periphery of the multimedia picture, or at a blank area in the video picture.
  • the interactive recording data display module is further configured to display the interactive recording data in the target area in the form of a bullet screen; the target area includes a blank area in the video frame.
  • the blank area is updated in real time based on the image information displayed on the display interface.
  • the device further includes: an interactive recording data storage module, configured to store the interactive recording data to a target location.
  • the interactive recording data storage module is also used to store the interactive recording data locally; and/or, storing the interactive recording data in the cloud, and generating a corresponding data corresponding to the interactive recording data A storage link to obtain the interaction record data based on the storage link.
  • the multimedia data stream includes a video data stream based on a multimedia conference, a video data stream based on a live video broadcast, or a video data stream based on a group chat.
  • the device provided in the embodiment of the present disclosure can execute the method provided in any embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
  • FIG. 6 shows a schematic structural diagram of an electronic device (for example, the terminal device or the server in FIG. 6) 600 suitable for implementing the embodiments of the present disclosure.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (e.g. Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG. 6 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device 600 may include a processing device (such as a central processing unit, a graphics processor, etc.) 601, which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 602 or from a storage device 606.
  • the program in the memory (RAM) 603 executes various appropriate actions and processing.
  • various programs and data required for the operation of the electronic device 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • the following devices can be connected to the I/O interface 605: including input devices 606 such as touch screens, touch pads, keyboards, mice, cameras, microphones, accelerometers, gyroscopes, etc.; including, for example, liquid crystal displays (LCD), speakers, vibration An output device 607 such as a device; a storage device 606 such as a magnetic tape, a hard disk, etc.; and a communication device 609.
  • the communication device 609 may allow the electronic device 600 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 6 shows an electronic device 600 having various devices, it should be understood that it is not required to implement or have all of the illustrated devices. It may be implemented alternatively or provided with more or fewer devices.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication device 609, or installed from the storage device 606, or installed from the ROM602.
  • the processing device 601 the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the electronic device provided in the embodiments of the present disclosure and the interactive record generation method provided in the above embodiments belong to the same inventive concept.
  • this embodiment please refer to the above embodiments, and this embodiment has the same characteristics as the above embodiments. The same beneficial effect.
  • the embodiments of the present disclosure provide a computer storage medium on which a computer program is stored, and when the program is executed by a processor, the interactive record generation method provided in the above-mentioned embodiments is implemented.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable signal medium may send, propagate or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.
  • the client and server can communicate with any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium.
  • Communication e.g., communication network
  • Examples of communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (for example, the Internet), and end-to-end networks (for example, ad hoc end-to-end networks), as well as any currently known or future research and development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.
  • the aforementioned computer-readable medium carries one or more programs, and when the aforementioned one or more programs are executed by the electronic device, the electronic device:
  • the computer program code used to perform the operations of the present disclosure can be written in one or more programming languages or a combination thereof.
  • the above-mentioned programming languages include but are not limited to object-oriented programming languages such as Java, Smalltalk, C++, and Including conventional procedural programming languages-such as "C" language or similar programming languages.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to pass Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagram may represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure can be implemented in software or hardware.
  • the name of the unit/module does not constitute a limitation on the unit itself under certain circumstances.
  • a behavioral data collection module can also be described as a “collection module”.
  • exemplary types of hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logical device (CPLD) and so on.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium, which may contain or store a program for use by the instruction execution system, apparatus, or device or in combination with the instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • the machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any suitable combination of the foregoing.
  • machine-readable storage media would include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or flash memory erasable programmable read-only memory
  • CD-ROM compact disk read only memory
  • magnetic storage device or any suitable combination of the foregoing.
  • Example 1 provides a method for generating an interactive record, and the method includes:
  • Example 2 provides a method for generating an interactive record, which further includes:
  • the behavior data includes operation information
  • the generating interaction record data corresponding to the behavior data includes: determining the operation object and the operation behavior in the operation information, and the information based on the operation object and the operation behavior Association relationship, generating interactive record data.
  • Example 3 provides a method for generating an interactive record, which further includes:
  • the operation information includes document sharing operation information
  • the operation object includes a shared document
  • the operation behavior includes a document sharing behavior
  • the interaction corresponding to the behavior data is generated based on the behavior data
  • Recording data includes: determining a document sharing address and/or storage address associated with the shared document based on the shared document, and generating the interactive recording data based on the sharing address and/or storage address.
  • Example 4 provides a method for generating an interactive record, which further includes:
  • the operation information includes screen sharing operation information
  • the operation object includes a shared screen
  • the operation behavior includes a sharing behavior on the shared screen
  • the generation and the behavior Interactive record data corresponding to the data, including:
  • the identification information in the shared screen is determined based on the shared screen, and the interaction record data is generated based on the identification information.
  • Example 5 provides a method for generating an interactive record, which further includes:
  • the interactive interface is a screen-recording video of a real-time interactive interface
  • the collection of user behavior data represented by the multimedia data stream includes:
  • Example 6 provides a method for generating an interactive record, which further includes:
  • the behavior data includes voice information
  • the generating interaction record data corresponding to the behavior data based on the behavior data includes:
  • Example 7 provides a method for generating an interactive record, which further includes:
  • the generating interaction record data corresponding to the behavior data based on the behavior data includes:
  • Example 8 provides a method for generating an interactive record, which further includes:
  • the generating interaction record data corresponding to the behavior data by extracting information from the operation object in the operation information includes:
  • Example 9 provides a method for generating an interactive record, which further includes:
  • the multimedia data stream is a data stream generated based on a real-time interactive interface
  • the collection of user behavior data represented by the multimedia data stream from the multimedia data stream includes:
  • the behavior data of each user is collected in real time based on the request information.
  • Example 10 provides a method for generating an interactive record, which further includes:
  • the collecting user behavior data represented by the multimedia data stream includes:
  • Example 11 provides a method for generating an interactive record, which further includes:
  • the determining the operation object and the operation behavior in the operation information, and generating interaction record data based on the association relationship between the operation object and the operation behavior includes:
  • the associated information includes the shared link of the shared document, and/or the storage address of the shared document;
  • Example 12 provides a method for generating an interactive record, which further includes:
  • the determining the operation object and the operation behavior in the operation information, and generating interaction record data based on the association relationship between the operation object and the operation behavior includes:
  • Example 13 provides a method for generating an interactive record, which further includes:
  • the generating interaction record data corresponding to the behavior data based on the behavior data includes:
  • Example 14 provides a method for generating an interactive record, which further includes:
  • the performing voice recognition on the voice information and generating the interactive record data based on the obtained voice recognition result includes:
  • the voice information in the behavior data is processed based on the target language type to generate the interactive record data.
  • Example 15 provides a method for generating an interactive record, which further includes:
  • the determining the target language type of the speaking user to which the voice information belongs includes:
  • Example 16 provides a method for generating an interactive record, which further includes:
  • determining the language type of the speaking user to which the current client belongs is determined by at least one of the following methods:
  • the login address of the client is obtained, and the language type corresponding to the user is determined based on the login address.
  • Example 17 provides a method for generating an interactive record, which further includes:
  • Example 18 provides a method for generating an interactive record, which further includes:
  • the acquiring historical interaction record data when a new user is detected, and pushing the historical interaction record data to the client of the newly added user includes:
  • the target language type of the new user is determined
  • the historical interaction record data is converted into interaction record data of the same target language type as the newly added user, and the converted interaction record data is sent to the client corresponding to the newly added user.
  • Example 19 provides a method for generating an interactive record, which further includes:
  • the interaction record data is sent to the target client terminal to display the interaction record data on the target client terminal.
  • Example 20 provides a method for generating an interactive record, which further includes:
  • the displaying the interaction record data on the target client terminal includes:
  • the interactive record data is displayed in the target area.
  • Example 21 provides a method for generating an interactive record, which further includes:
  • the target area is located on the periphery of the multimedia picture, or located in a blank area in the video picture.
  • Example 22 provides a method for generating an interactive record, which further includes:
  • the displaying the interactive record data in the target area includes:
  • the target area includes a blank area in the video frame.
  • Example 23 provides a method for generating an interactive record, which further includes:
  • the blank area is updated in real time based on image information displayed on the display interface.
  • Example 24 provides a method for generating an interactive record, which further includes:
  • the interaction record data is stored in a target location.
  • Example 25 provides a method for generating an interactive record, which further includes:
  • the storing the interaction record data to a target location includes:
  • the interaction record data is stored in the cloud, and a storage link corresponding to the interaction record data is generated, so as to obtain the interaction record data based on the storage link.
  • Example 26 provides a method for generating an interactive record, which further includes:
  • the multimedia data stream includes a video data stream generated based on a multimedia conference, a video data stream generated based on a live video broadcast, or a video data stream generated during a group chat.
  • Example 27 provides an interactive record generating device, which includes:
  • a behavior data collection module configured to collect user behavior data represented by the multimedia data stream from the multimedia data stream, the behavior data including voice information and/or operation information;
  • the interactive record data generating module is configured to generate interactive record data corresponding to the behavior data based on the behavior data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Social Psychology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Transfer Between Computers (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

本公开实施例公开了一种互动记录的生成方法、装置、设备及介质,该方法包括:首先从多媒体数据流中,采集多媒体数据流所表征的用户的行为数据,其中,行为数据中包括语音信息和/或操作信息;然后基于该行为数据,生成与该行为数据对应的互动记录数据。本公开实施例的技术方案,通过采集多媒体数据流中的语音信息和/或操作信息,并基于语音信息和操作信息生成互动记录数据,使得交互的用户可以通过互动记录数据确定互动信息,提高了交互用户的互动效率,进而也提高了用户体验。

Description

互动记录的生成方法、装置、设备及介质
本申请要求于2020年4月30日提交的申请号为202010366930.4、申请名称为“互动记录的生成方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开实施例涉及计算机数据处理技术领域,尤其涉及一种互动记录的生成方法、装置、设备及介质。
背景技术
目前,不论是实时互动还是录屏视频,服务端均可以接收到各发言用户的语音信息,并对语音信息进行处理后进行播放。
但是,在实际应用过程中,可能存在基于语音信息无法确定其发言内容时,只能让发言用户重复已发表的言论,或者通过发言用户接下来的发言内容来猜测其具体的意思,亦或是回放录屏视频来确认某个发言用户的核心思想,不论采用上述哪一种方式均会导致交互效率较低,以及用户体验较差的技术问题。
发明内容
本公开实施例提供了一种互动记录的生成方法、装置、设备及介质,以优化对互动过程的记录方式,从而提高互动沟通效率。
第一方面,本公开实施例提供了一种互动记录的生成方法,该方法包括:
从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;
基于所述行为数据,生成与所述行为数据对应的互动记录数据。
第二方面,本公开实施例还提供了一种互动记录的生成装置,该装置包括:
行为数据采集模块,用于从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;
互动记录数据生成模块,用于基于所述行为数据,生成与所述行为数据对应的互动记录数据。
第三方面,本公开实施例还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,用于存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如本公开实施例任一所述的互动记录的生成方法。
第四方面,本公开实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如本公开实施例任一所述的互动记录的生成方法。
本公开实施例的技术方案,通过采集多媒体数据流中的语音信息和/或操作信息,并基于语音信息和操作信息生成互动记录数据,使得交互的用户可以通过互动记录数据确定互动信息,提高了交互用户的互动效率,进而也提高了用户体验。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例一所提供的一种互动记录的生成方法流程示意图;
图2为本公开实施例二所提供的一种互动记录的生成方法流程示意图;
图3为本公开实施例三所提供的一种互动记录的生成方法流程示意图;
图4为本公开实施例四所提供的一种互动记录的生成方法流程示意图;
图5为本公开实施例五所提供的一种互动记录的生成装置结构示意图;
图6为本公开实施例六所提供的一种电子设备结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
实施例一
图1为本公开实施例一所提供的一种互动记录的生成方法流程示意图,本公开实施例适用于在互联网所支持的互动应用场景中,基于用户的交互信息生成互动记录数据的情形,该方法可以由互动记录的生成装置来执行,该装置可以通过软件和/或硬件的形式实现,可选的,通过电子设备来实现,该电子设备可以是移动终端、PC端或服务器等。互动场景通常是由客户端和服务器来配合实现的,本实施例所提供的方法可以由服务端来执行,或者 是客户端和服务端的配合来执行。
如图1,本实施例的方法包括:
S110、从多媒体数据流中,采集多媒体数据流所表征的用户的行为数据。
其中,多媒体数据流可以是与实时互动交互界面对应的视频流数据,或者是对实时互动界面进行录屏后,录屏视频中的视频流数据。实时互动界面为实时互动应用场景中的任意交互界面。实时互动场景可通过互联网和计算机手段实现,例如,通过原生程序或web程序等实现的交互应用程序。在实时交互界面中,可允许多个用户以各种形式的互动行为来进行交互,例如,输入文字、语音、视频或者共享等互动行为。行为数据中可包括互动行为所涉及的各种数据,例如,互动行为类型、以及互动行为中所涉及的具体内容。
由此,可以从基于与互动行为界面相对应的多媒体数据流中,采集各个参与互动的互动用户的语音数据和/或行为数据,以根据行为数据生成与行为数据对应的互动记录数据。
S120、基于行为数据,生成与行为数据对应的互动记录数据。
其中,互动记录数据与采集的行为数据相对应。互动记录数据可以是将互动行为中的语音信息、互动行为中所涉及的具体内容转换为相应的文字表述。可选的,互动记录数据可以是与行为数据相对应的互动记录文本,互动记录文本中可以包括与语音信息相对应的文字表述,或者是将操作信息转换为相应的文字表述;或者是互动记录文本中可以既包括与每个语音信息相对应的文字表述,也包括与操作信息相对应的文字表述。生成互动记录数据的操作,可以由各个客户端自行处理产生,也可以由服务端统一对各用户的行为数据处理产生。
具体的,在获取到行为数据中的语音信息和/或操作信息后,服务器可以对各行为数据进行处理,并分别得到与语音信息和操作信息对应的文字表述,即,生成与行为数据对应的互动记录数据。
生成互动记录数据的好处在于:考虑到在视频会议或者直播的过程中,当无法理解或者漏掉其他发言用户的发言信息时,可以采集各个发言用户的行为数据,并基于行为数据生成与相应的互动记录数据,以便用户可以基于互动记录数据查看其它发言用户的发言信息,从而确定各发言用户的核心思想,提高了互动过程中的用户交互效率以及用户体验的技术效果。还考虑到,在基于录屏视频无法确定发言用户的语音信息时,需要用户手动触发回放的操作,如,每触发一次回放控件,可以回退五秒钟视频,或者拖动进度条控制录屏视频的播放画面,由于此种方式无法准确定位到发言用户发表言论的视频画面,将导致需要用户手动多次操作,不仅提高了人力成本,还降低了用户交互效率,而本公开实施例的方案可以生成与录屏视频相对应的互动记录数据,进而可以比较直观的根据互动记录数据确定各发言用户的核心思想,提高了互动交互的便捷性和高效性。
例如,假设基于多媒体数据流获取了两个用户的语音信息,以及在视频会议过程中的触发的操作信息,如,操作信息可以是,触发了文档中的某段文字,则可以将语音信息和操作信息转换为相应的文字表述,并且还可以获取发言用户触发的某段文字内容,进而可以将文字表述和文字内容作为互动记录数据。
为了便于发言用户基于互动记录数据,确定其他各发言用户的语音信息以及操作信息, 互动记录数据中记录的数据可以包括发言用户标识、发言时间以及相应的文字表述,例如,互动记录数据中记录的数据可以是“ID1-20:00-我认同这件事情”。
本公开实施例的技术方案,通过采集多媒体数据流中的语音信息和/或操作信息,并基于语音信息和操作信息生成互动记录数据,使得交互的用户可以通过互动记录数据确定互动信息,提高了交互用户的互动效率,进而也提高了用户体验。
在本公开实施例中,行为数据包括操作信息,则本步骤S120中“生成与行为数据对应的互动记录数据”可以包括:确定操作信息中的操作对象和操作行为,并基于操作对象和操作行为的关联关系,生成互动记录数据。
其中,用户的操作行为可以有多种,相应的,与操作行为相对应的操作对象也包括多种。可以获取操作行为中的操作对象和操作行为,并将操作对象和操作行为转化为对应的互动记录数据。
在本实施例中,用户的操作行为有多种,典型的用户操作行为数据可以包括共享行为和共享对象,共享行为可以是文档共享操作和/或屏幕共享操作,共享对象可以是共享的具体内容。可选的,操作信息包括文档共享操作信息,则操作对象包括共享文档,操作行为包括文档共享的行为。在此基础上,基于行为数据,生成与行为数据对应的互动记录数据,包括:基于共享文档确定与共享文档关联的文档共享地址和/或存储地址,并基于共享地址和/或存储地址生成互动记录数据。操作信息包括屏幕共享操作信息,则操作对象包括共享屏幕,操作行为包括对共享屏幕的共享行为。在此基础上,基于行为数据,生成与行为数据对应的互动记录数据,包括:基于共享屏幕确定共享屏幕中的标识信息,并基于标识信息生成互动记录数据。本领域技术人员可以理解,在基于多媒体的互动应用场景中,用户操作行为可以是但不限于上述所列列举的,例如,还可以包括白板书写等行为。
实施例二
图2为本公开实施例二所提供的一种互动记录的生成方法流程示意图。本实施例以前述实施例为基础,多媒体数据流可以是基于实时互动界面来确定的,还可以是基于录屏视频确定的。当多媒体数据流的获取方式不同时,采集多媒体数据流所表征的用户行为数据也不相同,相应的,生成与行为数据相对应的互动记录数据也存在一定的差异。在本实施例中,以多媒体数据流是基于实时互动界面确定的为例来介绍。
如图2所示,本实施例的方法包括:
S210、当接收到生成互动记录的请求信息时,基于请求信息采集各个用户的行为数据。
其中,当基于实时互动界面生成互动记录数据时,可以检测各发言用户是否触发了目标控件,可选的,目标控件可以是生成互动记录数据的控件,若触发了目标控件,则可以采集各发言用户的行为数据,反之,则不采集发言用户的行为数据。若服务端生成互动记录,则具体可以是在实时互动过程中(例如在视频会议过程中),若用户在客户端触发了互动记录生成控件,则客户端可以基于该触发操作生成互动记录的请求信息,并将请求信息发送至服务端。而服务端在接收到该请求信息后,可以基于请求信息开始实时采集各互动用户的语音信息和/或操作信息。
在本实施例中,采集多媒体数据流所表征的用户的行为数据,包括:接收客户端采集 各个用户的语音信息,和/或接收与触发操作相对应的请求信息,确定与请求信息对应的操作信息。
具体的,可以将参与实时互动的用户称为互动用户或发言用户。若互动用户A通过语音发表了一些言论,与互动用户A对应的客户端可以采集该互动用户A的语音数据;和/或,若服务端接收到触发操作相对应的请求信息时,可以基于该请求信息去确定互动用户在客户端所触发的操作,进而确定与触发操作相对应操作对象和操作行为,以便基于操作对象和操作行为生成互动记录。
S220、基于行为数据,生成与行为数据相对应的互动记录。
可选的,如果行为数据中包括操作信息,则确定操作信息中的操作对象和操作行为,并基于操作对象和操作行为的关联关系,生成互动记录数据,包括:当检测到触发文档共享的操作时,首先获取共享文档,以及与共享文档相对应的关联信息;然后基于触发操作、共享文档以及关联信息,确定操作信息;其中,关联信息中包括共享文档的共享链接,和/或共享文档的存储地址;接着可以基于操作信息,生成与行为数据对应的互动记录数据。
其中,若触发操作为文档共享操作,则操作信息中的操作对象为共享文档,操作行为为文档共享的操作。此时,基于操作信息生成互动记录可以是:检测到触发共享控件时,可以获取多媒体视频流中的共享文档,以及确定与共享文档对应的共享链接,或者共享文档的存储地址。可以基于文档共享的触发操作、共享文档以及与共享文档对应的链接或存储地址,生成互动记录数据中的一条数据,例如,与共享触发操作相对应的互动记录数据可以是:ID-共享操作-共享文档A-存储链接为http//xxxxxxxxxx.com。
可选的,如果行为数据包括操作信息,操作信息为屏幕共享,则生成与操作信息对应的互动记录数据,包括:当检测到触发屏幕共享的操作时,先识别共享屏幕中的标识性信息;然后再基于标识性信息、触发屏幕共享操作、共享屏幕作为操作信息,以基于操作信息生成互动记录数据;其中,标识性信息包括共享屏幕中的链接。
其中,若触发操作为屏幕共享操作,操作信息中的操作对象为共享屏幕,操作行为为屏幕共享的操作。则基于操作信息生成互动记录可以是:检测到触发共享控件时,可以获取多媒体视频流中的共享屏幕,并提取出共享屏幕中的标识性信息,如若共享屏幕中显示的内容为网页信息,则提取出的共享屏幕中的标识性信息可以是该网页的链接。进而可以基于屏幕共享的触发操作、共享屏幕以及共享屏幕中的标识性信息,生成互动记录数据中的一条数据。
在本实施例中,生成与操作信息对应的互动记录,不仅可以确定互动用户的操作行为,还可以记录操作信息中的与操作对象关联的信息,以便用户基于互动记录数据中记录的操作信息调取相应的共享文档或者共享屏幕中的内容,进一步提高了实时互动的交互效率。
当然,在实时互动过程中(例如,在视频会议过程中),采集的行为数据中不可避免的包括各互动用户的语音信息。可选的,如果行为数据中包括语音信息,则基于行为数据,生成与行为数据对应的互动记录数据,可以包括:对语音信息进行语音识别,并基于对语音信息的识别结果,生成互动记录数据。
其中,对语音信息识别可以包括声纹识别,基于声纹识别结果可以确定各发言用户的 身份信息。对语音信息识别还可以包括语种类型识别,可以确定语音信息所属发言用户的目标语种类型,进而将各发言用户的语音信息翻译为目标语种类型相同的文字表述,这样处理的好处在于,可以将其它语种类型的语音数据翻译为与目标语种类型相同的互动记录数据,从而便于用户根据互动记录数据辅助了解其他用户的发言信息,提高了互动过程中的沟通效率。
可选的,对语音信息进行语音识别,基于对语音信息的语音识别结果,生成互动记录数据,包括:确定语音信息所属发言用户的目标语种类型,并基于目标语种类型对行为数据中的语音信息进行处理,生成互动记录数据。
需要说明的是,可以将每个客户端所对应的用户作为发言用户,所对应的客户端作为目标客户端。目标语种类型可以是目标发言用户当前所使用的语种类型,或者是,目标发言用户在客户端上预先设置的语种类型。
具体的,基于语音信息,可以确定各语音信息所属发言用户的目标语种类型,将语音数据转化为与目标语种类型相同的互动记录数据。
由于目标语种类型可以是发言用户所使用的语种类型,或者是发言用户预先在客户端上设置的语种类型,即,发言用户比较熟悉的语种,因此,在将其它发言用户的语音数据转换为与目标语种类型对应的互动记录数据时,可以提高发言用户阅读互动记录数据的效率,并辅助发言用户便捷、准确的理解其它发言用户的语音信息,从而达到了提高交互效率的技术效果。
在本实施例中,确定语音信息所属发言用户的目标语种类型,包括:基于当前客户端所属发言用户的语种类型确定目标语种类型。
也就是说,可以根据每个客户端所属发言用户的语种类型来确定目标语种类型。
在本实施例中,确定当前客户端所属发言用户的语种类型通过以下至少一种方式确定:通过对行为数据中的语音信息进行语种类型识别,确定用户的语种类型;获取客户端上预先设置的语种类型;获取客户端的登录地址,基于所述登录地址确定与用户对应的语种类型。
第一种方式可以是:首先获取行为数据中的语音信息,然后基于语音信息确定与其对应的发言用户的语种类型,接着可以将此语种类型作为目标语种类型。例如,若采集到发言用户A的语音信息,并在对语音信息处理后,确定发言用户A的语种类型为中文,则可以将中文语种类型作为目标语种类型。
示例性的,假设有三个用户参与实时互动,可选的,实时互动为视频会议,可以将三个用户分别标记为用户A、用户B以及用户C,用户A所使用的语种类型为中文语种类型,用户B所使用的语种类型为英文语种类型,用户C所使用的语种类型为日文语种类型。当用户A触发字幕显示控件时,可以分别采集用户A、用户B以及用户C的语音信息,通过对用户A的语音信息进行处理,确定用户A的语种类型为中文,此时可将中文作为目标语种类型,同时可以将用户B以及用户C的语音信息翻译为中文,并将操作信息转换为与其相对应的中文,即,互动记录数据为将各用户的行为数据转化为中文的数据;当然,若用户B触发了字幕显示控件,可以根据用户B的语音信息确定目标语种类型为英文,相应的, 可以将用户A以及用户C的语音信息翻译为英文。也就是说,可以将发言用户的语种类型作为目标语种类型,并将其它发言用户的语音信息翻译为目标语种类型后作为互动记录数据。
第二种方式可以是:在发言用户触发显示字幕的操作时,设置字幕的语种类型,并将设置的语种类型作为目标语种类型。示例性的,在发言用户触发字幕显示控件时,可以弹出语种选择列表以供用户选择。用户可以选择任意一种语种类型,如,用户触发了语种选择列表中的中文语种类型并点击了确认按键,则服务端或客户端可以确定发言用户选择了中文语种类型,并将中文语种类型作为目标语种类型。
第三种方式可以是:在检测到发言用户触发字幕显示控件时,获取客户端的登录地址,即,客户端的IP地址,以根据登录地址确定客户端所属的区域,进而将所属区域所使用的语种类型作为目标语种类型。例如,在用户触发字幕显示控件时,可以获取该客户端的登录地址,并基于登录地址确定客户端所属的区域为中国,则目标语种类型为中文。
在本实施例中,确定与各发言用户对应的目标语种类型,并将其它发言用户的语音信息转换为目标语种类型,使得生成的互动记录数据,符合各发言用户的阅读习惯,以便于用户能够快速理解其它发言用户的发言信息,进而提高互动交互效率的技术效果。
需要说明的是,在采集各发言用户的行为数据并生成互动记录的过程中,当检测到用户触发暂停采集的控件时,可以不再采集用户的行为数据,并不再生成与行为数据对应的互动记录数据。
本公开实施例的技术方案,在实时互动的过程中采集各发言用户的行为数据,并将行为数据转换为目标语种类型的互动记录数据,便于发言用户基于互动记录数据理解其它发言用户的语音信息,以及基于互动记录数据进行互动复盘,不仅提高了互动交互效率,还提高了会议内容总结的技术效果。
在上述技术方案的基础上,当检测到互动界面中新增互动用户时,即新增发言用户时,获取当前互动界面中的历史互动记录数据,并将历史互动记录发送至新增用户的客户端。
也就是说,在实时互动过程中(如视频会议过程中),若新增了发言用户,则可以在确定新增发言用户的目标语种类型时,获取历史互动记录数据,并在将历史互动记录数据转换为与目标语种类型相同的互动记录数据后,将其发送至与新增用户对应的客户端,以便新增用户阅读。此种设置方式的好处在于,在新增用户加入互动时,可以及时了解各发言用户的历史发言信息,以确定各发言用户所持的观点或者言论,从而便于与其它发言用户进行有效沟通的技术效果。
实施例三
图3为本公开实施例三所提供的一种互动记录的生成方法流程示意图。在实施例一的基础上,多媒体数据流还可以是基于录屏视频来确定的,相应的,对采集行为数据以及生成互动记录数据可进行具体优化。
如图3所示,所述方法包括:
S310、采集录屏视频中的语音信息以及操作信息。
其中,在实时互动过程中,可以利用录屏设备录制互动过程,得到录屏视频。例如, 在视频会议过程中,录制视频会议,并将录制的视频会议作为录屏视频。基于录屏视频可以确定各发言用户的语音信息以及操作信息。
具体的,当需要基于录屏视频生成互动记录数据时,用户可以先触发互动记录数据生成控件,基于用户的触发操作,可以从录屏视频的多媒体数据流中采集的语音信息以及操作信息。
S320、基于行为数据,生成与行为数据对应的互动记录数据。
可选的,若采集的录屏视频中包括语音数据,对语音数据进行处理可以是:首先对语音信息进行声纹识别,确定语音信息所对应的发言用户;同时对语音信息进行语音识别,得到语音识别结果;然后可以基于发言用户和语音识别结果的关联,生成与行为数据对应的互动记录数据。
通常,客户端有对应的客户端账号或客户端ID,从而可以根据不同客户端账号来区分不同的发言用户。但是基于录屏视频,无法通过客户端ID对发言用户进行区别,由此,可以针对各发言用户的语音信息进行声纹识别,由于每个发言用户的语音具有独特的声纹,从而可以据此来区分不同的发言用户。而后,互动记录数据中可以包括用户A-语音数据对应的译文,用户B-语音数据对应的译文。
具体的,采集录屏视频中的各语音信息,通过对语音进行声纹识别来确定不同的发言用户身份;通过对语音数据的分析处理,确定与语音数据相对应的译文数据;将发言用户的身份与译文数据进行关联,确定与行为数据相对应的互动记录数据。互动记录数据中可以是发言用户A-翻译后的行为数据;发言用户B-翻译后的行为数据…。
录屏视频中还包括发言用户的操作信息,对操作信息进行处理可以是:通过对操作信息中的操作对象进行信息提取,生成与行为数据对应的互动记录数据。
可选的,对操作信息中的操作对象进行信息提取,生成与行为数据对应的互动记录数据,可以包括:首先基于图像识别确定与操作信息相对应的目标图像中的目标元素;然后基于目标元素生成与行为数据对应的互动记录;其中,目标图像包括与共享文档对应的图像和/或与共享屏幕对应的图像,其中,目标元素可以是目标链接、目标存储地址、电影、电视剧名称等标识信息中的至少一个或者多个。从而可以基于上述信息(即互动记录数据中包括目标链接、目标存储地址、电影、电视剧名称等信息)生成互动记录数据。
本公开实施例的技术方案,通过对录屏视频中各发言用户的行为数据进行采集,并生成与行为数据相对应的互动记录数据,可以便于用户通过浏览互动记录数据,确定各发言用户的核心思想,避免了相关技术确定各发言用户的核心思想时,需要回放录屏视频产生的问题,例如,若录屏视频中存在较长时间的停顿,用户需要等待一定时长或者触发快进按键,通过触发快进按键存在无法准确定位到用户想要浏览的位置,导致浪费时间,以及实时互动复盘效率较低的问题,从而实现了将录屏视频转换为相应的互动记录文本,并基于互动记录文本进行快速浏览,进而能够及时、便捷的了解各发言用户的核心思想时,起到了节省时间的技术效果。
实施例四
在上述实施例的基础上,在生成互动记录数据后,可以将互动记录数据展示在显示界 面上。图4为本公开实施例四所提供的一种互动记录生成方法流程示意图。如图4所示,所述方法包括:
S410、从多媒体数据流中,采集多媒体数据流所表征的用户的行为数据,其中,行为数据中包括语音信息和/或操作信息;
S420、基于行为数据,生成与行为数据对应的互动记录数据。
S430、将互动记录数据发送至目标客户端,以在目标客户端展示互动记录数据。
其中,将与发言用户对应的客户端作为目标客户端。
具体的,在确定互动记录数据后,可以将互动记录数据发送至目标客户端,以在目标客户端展示互动记录数据。
在客户端展示时,可以将其展示在目标区域。可选的,将互动记录数据展示在目标区域。
其中,可以预先设置互动记录的显示区域并将预先设置的显示区域作为目标区域。目标区域例如可以是主交互区域周边的区域,可是顶部、底部或侧边等。例如,在视频会议场景下,视频交互窗口是主交互区域,占据屏幕2/3的区域,展示互动记录数据的区域可以是侧边的1/3区域,相应的,侧边的1/3区域为目标区域。可以将互动记录数据展示在侧边的1/3区域。可选的,将互动记录数据显示在目标区域,包括:将互动记录数据以弹幕的形式显示在目标区域;其中,目标区域包括视频画面中的空白区域。
其中,空白区域可以是交互界面中未包括任何元素的区域,例如,文字、头像等元素。空白区域可以是预先设置的,以将互动记录数据存储在目标区域。当然,空白区域还可以是实时更新的,其具体的更新方式可以根据显示界面上显示的图像信息实时更新。
也就是说,服务端可以实时检测互动交互界面,例如,检测视频会议界面,并基于交互界面上展示的各个元素实时确定显示界面上没有任何元素的区域,并将此时确定的区域作为空白区域。
在上述技术方案的基础上,在生成互动记录数据后,还可以将互动记录数据存储至目标位置。
可选的,可以将互动记录数据将互动记录数据存储至本地;和/或,将互动记录数据存储至云端,并生成与互动记录数据对应的存储链接,以基于存储链接获取互动记录数据。
在实际应用过程中,为了会议复盘,例如,视频会议后,需要对视频会议中的内容进行复盘或者是会议内容汇总,可以将互动记录数据导出至目标位置。
在本实施例中,目标位置可以是云端,也可以是本地。则将互动记录数据存储至目标位置,可以包括:将互动记录数据导出至本地;和/或将互动记录数据存储至云端,并生成与互动记录数据对应的存储链接,以基于存储链接获取互动记录数据。
本公开实施例的技术方案,可以将互动记录数据展示在目标区域,便于用户在实时互动过程中,可以浏览交互界面上显示的交互记录数据,提高了用户阅读交互记录数据的便捷性。
实施例五
图5是本公开实施例五所提供的一种互动记录的生成装置结构示意图。如图5所示, 所述装置包括:行为数据采集模块510以及互动记录数据生成模块520。
其中,行为数据采集模块510,用于从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;互动记录数据生成模块520,用于基于所述行为数据,生成与所述行为数据对应的互动记录数据。
本公开实施例的技术方案,通过采集多媒体数据流中的语音信息和/或操作信息,并基于语音信息和操作信息生成互动记录数据,使得交互的用户可以通过互动记录数据确定互动信息,提高了交互用户的互动效率,进而提高用户体验的技术效果。
在上述技术方案的基础上,所述行为数据包括操作信息,互动记录数据生成模块还用于:确定所述操作信息中的操作对象和操作行为,基于所述操作对象和操作行为的关联关系,生成互动记录数据。
在上述技术方案的基础上,所述操作信息包括文档共享操作信息,则所述操作对象包括共享文档,所述操作行为包括文档共享的行为,所述互动记录数据生成模块,还用于:基于所述共享文档确定与所述共享文档关联的文档共享地址和/或存储地址,并基于所述共享地址和/或存储地址生成所述互动记录数据。
在上述技术方案的基础上,所述操作信息包括屏幕共享操作信息,则所述操作对象包括共享屏幕,所述操作行为包括对所述共享屏幕的共享行为,所述互动记录数据生成模块,还用于:基于所述共享屏幕确定所述共享屏幕中的标识信息,并基于所述标识信息生成所述互动记录数据。
在上述技术方案的基础上,所述行为数据采集模块,还用于:采集所述录屏视频中的语音信息以及操作信息。
在上述技术方案的基础上,所述行为数据采集模块,还包括:
发言用户确定单元,用于对所述语音信息进行声纹识别,确定语音信息所对应的发言用户;语音识别单元,用于对所述语音信息进行语音识别,得到语音识别结果;互动记录数据生成单元,用于基于所述发言用户和所述语音识别结果的关联,生成与所述行为数据对应的互动记录数据。
在上述技术方案的基础上,所述行为数据采集模块,还用于:通过对操作信息中的操作对象进行信息提取,生成与所述行为数据对应的互动记录数据。
在上述技术方案的基础上,所述行为数据采集模块,还用于:
基于图像识别确定与所述操作信息相对应的目标图像中的目标元素;基于所述目标元素生成与所述行为数据对应的互动记录;其中,所述目标图像包括与共享文档对应的图像和/或与共享屏幕对应的图像。在上述技术方案的基础上,所述多媒体数据流为基于实时互动界面产生的数据流,所述行为数据采集模块,还用于:当接收到生成互动记录的请求信息时,基于所述请求信息实时采集各个用户的行为数据。
在上述技术方案的基础上,所述行为数据采集模块,还用于:接收客户端采集的各个用户的语音信息,和/或接收与触发操作相对应的请求信息,确定与所述请求信息对应的操作信息。
在上述技术方案的基础上,如果所述行为数据包括操作信息,互动记录数据生成模块, 还用于:当检测到触发文档共享的操作时,获取所述共享文档,以及与所述共享文档相对应的关联信息;基于触发操作、所述共享文档以及所述关联信息,确定所述操作信息;所述关联信息中包括所述共享文档的共享链接,和/或所述共享文档的存储地址;基于所述操作信息,生成与所述行为数据对应的互动记录数据。
在上述技术方案的基础上,如果所述行为数据包括操作信息,互动记录数据生成模块,还用于:当检测到触发屏幕共享的操作时,识别所述共享屏幕中的标识性信息;基于所述标识性信息、触发操作以及所述共享屏幕的视频帧,确定所述操作信息;基于所述操作信息,生成与所述行为数据对应的互动记录数据;其中,所述标识性信息包括所述共享屏幕中的链接。
在上述技术方案的基础上,如果所述行为数据包括语音信息,所述互动记录数据生成模块,还用于:
对所述语音信息进行语音识别,并基于得到的语音识别结果,生成所述互动记录数据。
在上述技术方案的基础上,所述互动记录数据生成模块,还包括:
语种类型确定单元,用于确定所述语音信息所属发言用户的目标语种类型;互动记录数据生成子模块,用于基于所述目标语种类型对行为数据中的语音信息进行处理,生成所述互动记录数据。
在上述技术方案的基础上,语种类型确定单元,还用于基于当前客户端所属发言用户的语种类型确定目标语种类型。
在上述技术方案的基础上,确定当前客户端所属发言用户的语种类型通过以下至少一种方式确定:通过对行为数据中的语音信息进行语种类型识别,确定用户的语种类型;获取客户端上预先设置的语种类型;获取客户端的登录地址,基于所述登录地址确定与用户对应的语种类型。
在上述技术方案的基础上,所述装置还包括互动记录数据展示模块,用于将所述互动记录数据发送至目标客户端,以在所述目标客户端展示所述互动记录数据。
在上述技术方案的基础上,所述装置还包括互动记录数据展示模块,还用于将所述互动记录数据显示在目标区域。
在上述技术方案的基础上,所述目标区域位于所述多媒体画面的外围,或,位于所述视频画面中的空白区域。
在上述技术方案的基础上,互动记录数据展示模块,还用于将所述互动记录数据以弹幕的形式显示在所述目标区域;所述目标区域包括所述视频画面中的空白区域。
在上述技术方案的基础上,所述空白区域为基于显示界面上显示的图像信息实时更新的。
在上述技术方案的基础上,所述装置还包括:互动记录数据存储模块,用于将所述互动记录数据存储至目标位置。
在上述技术方案的基础上,互动记录数据存储模块还用于将所述互动记录数据存储至本地;和/或,将所述互动记录数据存储至云端,并生成与所述互动记录数据对应的存储链接,以基于所述存储链接获取所述互动记录数据。
在上述技术方案的基础上,所述多媒体数据流包括基于多媒体会议的视频数据流、基于视频直播的视频数据流或群聊的视频数据数据流。
本公开实施例所提供的装置可执行本公开任意实施例所提供的方法,具备执行方法相应的功能模块和有益效果。
值得注意的是,上述装置所包括的各个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,各功能单元的具体名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
实施例六
下面参考图6,其示出了适于用来实现本公开实施例的电子设备(例如图6中的终端设备或服务器)600的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置606加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置606;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置606被安装,或者从ROM602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施例提供的电子设备与上述实施例提供的互动记录的生成方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的有益效果。
实施例七
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的互动记录的生成方法。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:
从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;
基于所述行为数据,生成与所述行为数据对应的互动记录数据。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表 一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元/模块的名称在某种情况下并不构成对该单元本身的限定,例如,行为数据采集模块还可以被描述为“采集模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种互动记录的生成方法,该方法包括:
从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;
基于所述行为数据,生成与所述行为数据对应的互动记录数据。
根据本公开的一个或多个实施例,【示例二】提供了一种互动记录的生成方法,还包括:
可选的,所述行为数据包括操作信息,所述生成与所述行为数据对应的互动记录数据,包括:确定所述操作信息中的操作对象和操作行为,基于所述操作对象和操作行为的关联关系,生成互动记录数据。
根据本公开的一个或多个实施例,【示例三】提供了一种互动记录的生成方法,还包括:
可选的,所述操作信息包括文档共享操作信息,则所述操作对象包括共享文档,所述操作行为包括文档共享的行为,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:基于所述共享文档确定与所述共享文档关联的文档共享地址和/或存储地址,并基于所述共享地址和/或存储地址生成所述互动记录数据。
根据本公开的一个或多个实施例,【示例四】提供了一种互动记录的生成方法,还包括:
可选的,所述操作信息包括屏幕共享操作信息,则所述操作对象包括共享屏幕,所述操作行为包括对所述共享屏幕的共享行为,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
基于所述共享屏幕确定所述共享屏幕中的标识信息,并基于所述标识信息生成所述互动记录数据。
根据本公开的一个或多个实施例,【示例五】提供了一种互动记录的生成方法,还包括:
可选的,所述互动界面为实时互动界面的录屏视频,所述采集所述多媒体数据流所表征的用户的行为数据,包括:
采集所述录屏视频中的语音信息以及操作信息。
根据本公开的一个或多个实施例,【示例六】提供了一种互动记录的生成方法,还包括:
可选的,所述行为数据包括语音信息,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
对所述语音信息进行声纹识别,确定语音信息所对应的发言用户;
对所述语音信息进行语音识别,得到语音识别结果;
基于所述发言用户和所述语音识别结果的关联,生成与所述行为数据对应的互动记录数据。
根据本公开的一个或多个实施例,【示例七】提供了一种互动记录的生成方法,还包括:
可选的,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
通过对操作信息中的操作对象进行信息提取,生成与所述行为数据对应的互动记录数据。
根据本公开的一个或多个实施例,【示例八】提供了一种互动记录的生成方法,还包括:
可选的,所述通过对操作信息中的操作对象进行信息提取,生成与所述行为数据对应的互动记录数据,包括:
基于图像识别确定与所述操作信息相对应的目标图像中的目标元素;基于所述目标元素生成与所述行为数据对应的互动记录;其中,所述目标图像包括与共享文档对应的图像和/或与共享屏幕对应的图像。
根据本公开的一个或多个实施例,【示例九】提供了一种互动记录的生成方法,还包括:
可选的,所述多媒体数据流为基于实时互动界面产生的数据流,所述从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,包括:
当接收到生成互动记录的请求信息时,基于所述请求信息实时采集各个用户的行为数据。
根据本公开的一个或多个实施例,【示例十】提供了一种互动记录的生成方法,还包括:
可选的,所述采集所述多媒体数据流所表征的用户的行为数据,包括:
接收客户端采集的各个用户的语音信息,和/或接收与触发操作相对应的请求信息,确定与所述请求信息对应的操作信息。
根据本公开的一个或多个实施例,【示例十一】提供了一种互动记录的生成方法,还包括:
可选的,如果所述行为数据包括操作信息,所述确定所述操作信息中的操作对象和操作行为,基于所述操作对象和操作行为的关联关系,生成互动记录数据,包括:
当检测到触发文档共享的操作时,获取所述共享文档,以及与所述共享文档相对应的关联信息;
基于触发操作、所述共享文档以及所述关联信息,确定所述操作信息;所述关联信息中包括所述共享文档的共享链接,和/或所述共享文档的存储地址;
基于所述操作信息,生成与所述行为数据对应的互动记录数据。
根据本公开的一个或多个实施例,【示例十二】提供了一种互动记录的生成方法,还包括:
可选的,如果所述行为数据包括操作信息,所述确定所述操作信息中的操作对象和操作行为,基于所述操作对象和操作行为的关联关系,生成互动记录数据,包括:
当检测到触发屏幕共享的操作时,识别所述共享屏幕中的标识性信息;基于所述标识性信息、触发操作以及所述共享屏幕的视频帧,确定所述操作信息;基于所述操作信息,生成与所述行为数据对应的互动记录数据;其中,所述标识性信息包括所述共享屏幕中的链接。根据本公开的一个或多个实施例,【示例十三】提供了一种互动记录的生成方法,还包括:
可选的,如果所述行为数据包括语音信息,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
对所述语音信息进行语音识别,并基于得到的语音识别结果,生成所述互动记录数据。
根据本公开的一个或多个实施例,【示例十四】提供了一种互动记录的生成方法,还包括:
可选的,所述对所述语音信息进行语音识别,并基于得到的语音识别结果,生成所述互动记录数据,包括:
确定所述语音信息所属发言用户的目标语种类型;
基于所述目标语种类型对行为数据中的语音信息进行处理,生成所述互动记录数据。
根据本公开的一个或多个实施例,【示例十五】提供了一种互动记录的生成方法,还包括:
可选的,所述确定所述语音信息所属发言用户的目标语种类型,包括:
基于当前客户端所属发言用户的语种类型确定目标语种类型。
根据本公开的一个或多个实施例,【示例十六】提供了一种互动记录的生成方法,还包括:
可选的,确定当前客户端所属发言用户的语种类型通过以下至少一种方式确定:
通过对行为数据中的语音信息进行语种类型识别,确定用户的语种类型;
获取客户端上预先设置的语种类型;
获取客户端的登录地址,基于所述登录地址确定与用户对应的语种类型。
根据本公开的一个或多个实施例,【示例十七】提供了一种互动记录的生成方法,还包括:
当检测到新增用户时,获取历史互动记录数据,并将所述历史互动记录数据推送至所述新增用户的客户端。
根据本公开的一个或多个实施例,【示例十八】提供了一种互动记录的生成方法,还包括:
可选的,所述当检测到新增用户时,获取历史互动记录数据,并将所述历史互动记录数据推送至所述新增用户的客户端,包括:
当检测到新增用户时,确定所述新增用户的目标语种类型;
根据所述新增用户的目标语种类型,获取历史互动记录数据;
将所述历史互动记录数据转换为与所述新增用户的目标语种类型相同的互动记录数据,并将转换后的互动记录数据发送至与所述新增用户对应的客户端。
根据本公开的一个或多个实施例,【示例十九】提供了一种互动记录的生成方法,还包括:
可选的,将所述互动记录数据发送至目标客户端,以在所述目标客户端展示所述互动记录数据。
根据本公开的一个或多个实施例,【示例二十】提供了一种互动记录的生成方法,还包括:
可选的,所述在所述目标客户端展示所述互动记录数据,包括:
将所述互动记录数据显示在目标区域。
根据本公开的一个或多个实施例,【示例二十一】提供了一种互动记录的生成方法,还包括:
可选的,所述目标区域位于所述多媒体画面的外围,或,位于所述视频画面中的空白区域。
根据本公开的一个或多个实施例,【示例二十二】提供了一种互动记录的生成方法,还包括:
可选的,所述将所述互动记录数据显示在目标区域,包括:
将所述互动记录数据以弹幕的形式显示在所述目标区域;
所述目标区域包括所述视频画面中的空白区域。
根据本公开的一个或多个实施例,【示例二十三】提供了一种互动记录的生成方法,还包括:
可选的,所述空白区域为基于显示界面上显示的图像信息实时更新的。
根据本公开的一个或多个实施例,【示例二十四】提供了一种互动记录的生成方法,还包括:
可选的,将所述互动记录数据存储至目标位置。
根据本公开的一个或多个实施例,【示例二十五】提供了一种互动记录的生成方法,还包括:
可选的,所述将所述互动记录数据存储至目标位置,包括:
将所述互动记录数据存储至本地;和/或,
将所述互动记录数据存储至云端,并生成与所述互动记录数据对应的存储链接,以基于所述存储链接获取所述互动记录数据。
根据本公开的一个或多个实施例,【示例二十六】提供了一种互动记录的生成方法,还包括:
可选的,所述多媒体数据流包括基于多媒体会议产生的视频数据流、基于视频直播产生的视频数据流或群聊过程中产生的视频数据流。
根据本公开的一个或多个实施例,【示例二十七】提供了一种互动记录的生成装置,该装置包括:
行为数据采集模块,用于从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;
互动记录数据生成模块,用于基于所述行为数据,生成与所述行为数据对应的互动记录数据。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (29)

  1. 一种互动记录的生成方法,其特征在于,包括:
    从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;
    基于所述行为数据,生成与所述行为数据对应的互动记录数据。
  2. 根据权利要求1所述的方法,其特征在于,所述行为数据包括操作信息,所述生成与所述行为数据对应的互动记录数据,包括:
    确定所述操作信息中的操作对象和操作行为,基于所述操作对象和操作行为的关联关系,生成互动记录数据。
  3. 根据权利要求2所述的方法,其特征在于,所述操作信息包括文档共享操作信息,则所述操作对象包括共享文档,所述操作行为包括文档共享的行为,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
    基于所述共享文档确定与所述共享文档关联的文档共享地址和/或存储地址,并基于所述共享地址和/或存储地址生成所述互动记录数据。
  4. 根据权利要求2所述的方法,其特征在于,所述操作信息包括屏幕共享操作信息,则所述操作对象包括共享屏幕,所述操作行为包括对所述共享屏幕的共享行为,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
    基于所述共享屏幕确定所述共享屏幕中的标识信息,并基于所述标识信息生成所述互动记录数据。
  5. 根据权利要求1所述的方法,其特征在于,所述行为数据包括语音信息,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
    对所述语音信息进行声纹识别,确定语音信息所对应的发言用户;
    对所述语音信息进行语音识别,得到语音识别结果;
    基于所述发言用户和所述语音识别结果的关联,生成与所述行为数据对应的互动记录数据。
  6. 根据权利要求2所述的方法,其特征在于,所述采集所述多媒体数据流所表征的用户的行为数据,包括:
    采集录屏视频中的语音信息以及操作信息。
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
    通过对操作信息中的操作对象进行信息提取,生成与所述行为数据对应的互动记录数据。
  8. 根据权利要求7所述的方法,其特征在于,所述通过对操作信息中的操作对象进行信息提取,生成与所述行为数据对应的互动记录数据,包括:
    基于图像识别确定与所述操作信息相对应的目标图像中的目标元素;
    基于所述目标元素生成与所述行为数据对应的互动记录;
    其中,所述目标图像包括与共享文档对应的图像和/或与共享屏幕对应的图像。
  9. 根据权利要求2所述的方法,其特征在于,所述多媒体数据流为基于实时互动界面产生的数据流,所述从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,包括:
    当接收到生成互动记录的请求信息时,基于所述请求信息采集各个用户的行为数据。
  10. 根据权利要求9所述的方法,其特征在于,所述采集所述多媒体数据流所表征的用户的行为数据,包括:
    接收客户端采集的各个用户的语音信息,和/或接收与触发操作相对应的请求信息,确定与所述请求信息对应的操作信息。
  11. 根据权利要求9所述的方法,其特征在于,如果所述行为数据包括操作信息,所述确定所述操作信息中的操作对象和操作行为,基于所述操作对象和操作行为的关联关系,生成互动记录数据,包括:
    当检测到触发文档共享的操作时,获取所述共享文档,以及与所述共享文档相对应的关联信息;
    基于触发操作、所述共享文档以及所述关联信息,确定所述操作信息;所述关联信息中包括所述共享文档的共享链接,和/或所述共享文档的存储地址;
    基于所述操作信息,生成与所述行为数据对应的互动记录数据。
  12. 根据权利要求9所述的方法,其特征在于,如果所述行为数据包括操作信息,所述确定所述操作信息中的操作对象和操作行为,基于所述操作对象和操作行为的关联关系,生成互动记录数据,包括:
    当检测到触发屏幕共享的操作时,识别所述共享屏幕中的标识性信息;
    基于所述标识性信息、触发操作以及所述共享屏幕的视频帧,确定所述操作信息;
    基于所述操作信息,生成与所述行为数据对应的互动记录数据;
    其中,所述标识性信息包括所述共享屏幕中的链接。
  13. 根据权利要求1所述的方法,其特征在于,所述行为数据包括语音信息,所述基于所述行为数据,生成与所述行为数据对应的互动记录数据,包括:
    对所述语音信息进行语音识别,并基于得到的语音识别结果,生成所述互动记录数据。
  14. 根据权利要求13所述的方法,其特征在于,所述对所述语音信息进行语音识别,并基于得到的语音识别结果,生成所述互动记录数据,包括:
    确定所述语音信息所属发言用户的目标语种类型;
    基于所述目标语种类型对行为数据中的语音信息进行处理,生成所述互动记录数据。
  15. 根据权利要求14所述的方法,其特征在于,所述确定所述语音信息所属发言用户的目标语种类型,包括:
    基于当前客户端所属发言用户的语种类型确定目标语种类型。
  16. 根据权利要求15所述的方法,其特征在于,确定当前客户端所属发言用户的语种类型通过以下至少一种方式确定:
    通过对行为数据中的语音信息进行语种类型识别,确定用户的语种类型;
    获取客户端上预先设置的语种类型;
    获取客户端的登录地址,基于所述登录地址确定与用户对应的语种类型。
  17. 根据权利要求1所述的方法,其特征在于,还包括:
    当检测到新增用户时,获取历史互动记录数据,并将所述历史互动记录数据推送至所述新增用户的客户端。
  18. 根据权利要求17所述的方法,其特征在于,所述当检测到新增用户时,获取历史互动记录数据,并将所述历史互动记录数据推送至所述新增用户的客户端,包括:
    当检测到新增用户时,确定所述新增用户的目标语种类型;
    根据所述新增用户的目标语种类型,获取历史互动记录数据;
    将所述历史互动记录数据转换为与所述新增用户的目标语种类型相同的互动记录数据,并将转换后的互动记录数据发送至与所述新增用户对应的客户端。
  19. 根据权利要求1所述的方法,其特征在于,还包括:
    将所述互动记录数据发送至目标客户端,以在所述目标客户端展示所述互动记录数据。
  20. 根据权利要求19所述的方法,其特征在于,所述在所述目标客户端展示所述互动记录数据,包括:
    将所述互动记录数据显示在目标区域。
  21. 根据权利要求20所述的方法,其特征在于,所述目标区域位于多媒体画面的外围,或,位于所述多媒体画面中的空白区域。
  22. 根据权利要求20所述的方法,其特征在于,所述将所述互动记录数据显示在目标区域,包括:
    将所述互动记录数据以弹幕的形式显示在所述目标区域;
    所述目标区域包括所述多媒体画面中的空白区域。
  23. 根据权利要求21所述的方法,其特征在于,所述空白区域为基于显示界面上显示的图像信息实时更新的。
  24. 根据权利要求1所述的方法,其特征在于,还包括:
    将所述互动记录数据存储至目标位置。
  25. 根据权利要求24所述的方法,其特征在于,所述将所述互动记录数据存储至目标位置,包括:
    将所述互动记录数据存储至本地;和/或,
    将所述互动记录数据存储至云端,并生成与所述互动记录数据对应的存储链接,以基于所述存储链接获取所述互动记录数据。
  26. 根据权利要求1-25中任一所述的方法,其特征在于,所述多媒体数据流包括基于多媒体会议产生的视频数据流、基于视频直播产生的视频数据流或群聊过程中产生的视频数据流。
  27. 一种互动记录的生成装置,其特征在于,包括:
    行为数据采集模块,用于从多媒体数据流中,采集所述多媒体数据流所表征的用户的行为数据,所述行为数据中包括语音信息和/或操作信息;
    互动记录数据生成模块,用于基于所述行为数据,生成与所述行为数据对应的互动记 录数据。
  28. 一种电子设备,其特征在于,所述电子设备包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-26中任一所述的互动记录的生成方法。
  29. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-26中任一所述的互动记录的生成方法。
PCT/CN2021/090395 2020-04-30 2021-04-28 互动记录的生成方法、装置、设备及介质 WO2021218981A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2022563348A JP2023522092A (ja) 2020-04-30 2021-04-28 インタラクション記録生成方法、装置、デバイス及び媒体
EP21795627.5A EP4124024A4 (en) 2020-04-30 2021-04-28 METHOD AND APPARATUS FOR GENERATING AN INTERACTION RECORD, DEVICE AND MEDIUM
US17/881,999 US20220375460A1 (en) 2020-04-30 2022-08-05 Method and apparatus for generating interaction record, and device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010366930.4 2020-04-30
CN202010366930.4A CN113014854B (zh) 2020-04-30 2020-04-30 互动记录的生成方法、装置、设备及介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/881,999 Continuation US20220375460A1 (en) 2020-04-30 2022-08-05 Method and apparatus for generating interaction record, and device and medium

Publications (1)

Publication Number Publication Date
WO2021218981A1 true WO2021218981A1 (zh) 2021-11-04

Family

ID=76383544

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090395 WO2021218981A1 (zh) 2020-04-30 2021-04-28 互动记录的生成方法、装置、设备及介质

Country Status (5)

Country Link
US (1) US20220375460A1 (zh)
EP (1) EP4124024A4 (zh)
JP (1) JP2023522092A (zh)
CN (1) CN113014854B (zh)
WO (1) WO2021218981A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113891146B (zh) * 2021-09-23 2023-08-22 中汽创智科技有限公司 一种多屏互动系统、方法、驾驶设备及存储介质
CN114125342A (zh) * 2021-11-16 2022-03-01 中国银行股份有限公司 一种应急操作记录方法及装置
CN115547330A (zh) * 2022-10-31 2022-12-30 北京字跳网络技术有限公司 基于语音交互的信息展示方法、装置和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102572356A (zh) * 2012-01-16 2012-07-11 华为技术有限公司 记录会议的方法和会议系统
US8887067B2 (en) * 2008-05-30 2014-11-11 Microsoft Corporation Techniques to manage recordings for multimedia conference events
CN104427294A (zh) * 2013-08-29 2015-03-18 中兴通讯股份有限公司 支持电视会议同声传译的方法及云端服务器
CN106657865A (zh) * 2016-12-16 2017-05-10 联想(北京)有限公司 会议纪要的生成方法、装置及视频会议系统
CN107430723A (zh) * 2015-03-09 2017-12-01 微软技术许可有限责任公司 会议摘要
CN109361825A (zh) * 2018-11-12 2019-02-19 平安科技(深圳)有限公司 会议纪要记录方法、终端及计算机存储介质

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6594688B2 (en) * 1993-10-01 2003-07-15 Collaboration Properties, Inc. Dedicated echo canceler for a workstation
US6343313B1 (en) * 1996-03-26 2002-01-29 Pixion, Inc. Computer conferencing system with real-time multipoint, multi-speed, multi-stream scalability
US6119147A (en) * 1998-07-28 2000-09-12 Fuji Xerox Co., Ltd. Method and system for computer-mediated, multi-modal, asynchronous meetings in a virtual space
US7356763B2 (en) * 2001-09-13 2008-04-08 Hewlett-Packard Development Company, L.P. Real-time slide presentation multimedia data object and system and method of recording and browsing a multimedia data object
US20060136220A1 (en) * 2004-12-22 2006-06-22 Rama Gurram Controlling user interfaces with voice commands from multiple languages
US8125509B2 (en) * 2006-01-24 2012-02-28 Lifesize Communications, Inc. Facial recognition for a videoconference
JP5522369B2 (ja) * 2009-12-25 2014-06-18 日本電気株式会社 会議記録要約システム、会議記録要約方法及びプログラム
US8768686B2 (en) * 2010-05-13 2014-07-01 International Business Machines Corporation Machine translation with side information
WO2012118917A2 (en) * 2011-03-03 2012-09-07 Social Communications Company Realtime communications and network browsing client
US9131280B2 (en) * 2013-03-15 2015-09-08 Sony Corporation Customizing the display of information by parsing descriptive closed caption data
US10965633B2 (en) * 2014-09-29 2021-03-30 Microsoft Technoiogy Licensing, LLC Session history horizon control
CN105632498A (zh) * 2014-10-31 2016-06-01 株式会社东芝 生成会议记录的方法、装置和系统
CN105306861B (zh) * 2015-10-15 2017-03-01 深圳市鹰硕技术有限公司 一种网络教学录播方法及系统
CN105893636A (zh) * 2016-06-23 2016-08-24 乐视控股(北京)有限公司 一种历史分享的记录方法及装置
US10820061B2 (en) * 2016-10-17 2020-10-27 DISH Technologies L.L.C. Apparatus, systems and methods for presentation of media content using an electronic Braille device
CN108510976B (zh) * 2017-02-24 2021-03-19 芋头科技(杭州)有限公司 一种多语言混合语音识别方法
CN107659416B (zh) * 2017-03-27 2021-11-16 广州视源电子科技股份有限公司 一种会议记录分享的方法、装置、会议终端和存储介质
US10296289B2 (en) * 2017-06-01 2019-05-21 Salesforce.Com, Inc. Multimodal commands
US20190065458A1 (en) * 2017-08-22 2019-02-28 Linkedin Corporation Determination of languages spoken by a member of a social network
CN109388701A (zh) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 会议记录生成方法、装置、设备和计算机存储介质
CN109660447B (zh) * 2018-11-08 2022-03-15 厦门快商通信息技术有限公司 基于聊天数据的信息定向抓取方法及信息管理系统
CN112312057A (zh) * 2020-02-24 2021-02-02 北京字节跳动网络技术有限公司 多媒体会议数据处理方法、装置和电子设备
CN111343473B (zh) * 2020-02-25 2022-07-01 北京达佳互联信息技术有限公司 直播应用的数据处理方法、装置、电子设备及存储介质
CN112291504B (zh) * 2020-03-27 2022-10-28 北京字节跳动网络技术有限公司 信息交互方法、装置和电子设备
CN112231498A (zh) * 2020-09-29 2021-01-15 北京字跳网络技术有限公司 互动信息处理方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8887067B2 (en) * 2008-05-30 2014-11-11 Microsoft Corporation Techniques to manage recordings for multimedia conference events
CN102572356A (zh) * 2012-01-16 2012-07-11 华为技术有限公司 记录会议的方法和会议系统
CN104427294A (zh) * 2013-08-29 2015-03-18 中兴通讯股份有限公司 支持电视会议同声传译的方法及云端服务器
CN107430723A (zh) * 2015-03-09 2017-12-01 微软技术许可有限责任公司 会议摘要
CN106657865A (zh) * 2016-12-16 2017-05-10 联想(北京)有限公司 会议纪要的生成方法、装置及视频会议系统
CN109361825A (zh) * 2018-11-12 2019-02-19 平安科技(深圳)有限公司 会议纪要记录方法、终端及计算机存储介质

Also Published As

Publication number Publication date
EP4124024A4 (en) 2023-08-30
CN113014854B (zh) 2022-11-11
JP2023522092A (ja) 2023-05-26
EP4124024A1 (en) 2023-01-25
US20220375460A1 (en) 2022-11-24
CN113014854A (zh) 2021-06-22

Similar Documents

Publication Publication Date Title
WO2021218981A1 (zh) 互动记录的生成方法、装置、设备及介质
WO2021196903A1 (zh) 视频处理方法、装置、可读介质及电子设备
WO2022095957A1 (zh) 信息显示方法、装置、设备及介质
WO2022042593A1 (zh) 字幕编辑方法、装置和电子设备
WO2020233142A1 (zh) 多媒体文件播放方法、装置、电子设备和存储介质
WO2021008223A1 (zh) 信息的确定方法、装置及电子设备
CN107659850B (zh) 媒体信息处理方法和装置
US11652763B2 (en) Information display method and apparatus, and electronic device
CN111901695B (zh) 视频内容截取方法、装置和设备及计算机存储介质
CN111629253A (zh) 视频处理方法及装置、计算机可读存储介质、电子设备
US20220391058A1 (en) Interaction information processing method and apparatus, electronic device and storage medium
WO2023051294A9 (zh) 道具处理方法、装置、设备及介质
WO2023005831A1 (zh) 一种资源播放方法、装置、电子设备和存储介质
CN111818383B (zh) 视频数据的生成方法、系统、装置、电子设备及存储介质
WO2024008184A1 (zh) 一种信息展示方法、装置、电子设备、计算机可读介质
WO2020124966A1 (zh) 节目搜索方法、装置、设备及介质
US20230139416A1 (en) Search content matching method, and electronic device and storage medium
WO2023143299A1 (zh) 消息展示方法、装置、设备及存储介质
WO2022105760A1 (zh) 一种多媒体浏览方法、装置、设备及介质
WO2024083149A1 (zh) 媒体内容处理方法、装置和电子设备
US20220374618A1 (en) Interaction information processing method and apparatus, device, and medium
WO2020233171A1 (zh) 歌单切换方法、装置、系统、终端和存储介质
CN115278346B (zh) 在直播间发送评论和接收评论的方法及相关设备
WO2022257777A1 (zh) 多媒体处理方法、装置、设备及介质
JP2023536992A (ja) ターゲットコンテンツの検索方法、装置、電子機器および記憶媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21795627

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022563348

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021795627

Country of ref document: EP

Effective date: 20221018

NENP Non-entry into the national phase

Ref country code: DE