WO2021155692A1 - 在线虚拟解说方法、设备和介质 - Google Patents

在线虚拟解说方法、设备和介质 Download PDF

Info

Publication number
WO2021155692A1
WO2021155692A1 PCT/CN2020/128018 CN2020128018W WO2021155692A1 WO 2021155692 A1 WO2021155692 A1 WO 2021155692A1 CN 2020128018 W CN2020128018 W CN 2020128018W WO 2021155692 A1 WO2021155692 A1 WO 2021155692A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
commentary
attribute data
events
frame image
Prior art date
Application number
PCT/CN2020/128018
Other languages
English (en)
French (fr)
Inventor
林少彬
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021155692A1 publication Critical patent/WO2021155692A1/zh
Priority to US17/580,553 priority Critical patent/US11833433B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/53Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game
    • A63F13/537Controlling the output signals based on the game progress involving additional visual information provided to the game scene, e.g. by overlay to simulate a head-up display [HUD] or displaying a laser sight in a shooting game using indicators, e.g. showing the condition of a game character on screen
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/50Controlling the output signals based on the game progress
    • A63F13/54Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects

Definitions

  • the present disclosure relates to the field of artificial intelligence, and more specifically, to online virtual interpretation methods, devices, and media.
  • AI Artificial Intelligence
  • a virtual host technology for news broadcast scenes is proposed. First of all, prepare in advance the text to be broadcast, the speed of the speech, and the mood of the speech. Then, through the application of TTS (Text-to-speech) technology, it is converted into a simulated human voice and played, and the human body parameters such as facial expression parameters and action parameters configured offline are converted into anthropomorphic expressions and actions using 3D technology. And show it.
  • TTS Text-to-speech
  • an online virtual explanation method including: acquiring attribute data for describing basic information of elements in the frame image based at least on the frame image of the video being played; and based on the attribute data, Extract feature data used to represent the comprehensive information related to the commentary in the frame image; generate candidate events based on the feature data; select commentary events from the generated candidate events; determine the corresponding commentary text based on the selected commentary events ; And based on the commentary text, output the corresponding commentary content.
  • an online virtual interpretation device including: an attribute acquisition unit for acquiring attribute data used to describe basic information of elements in the frame image based at least on the frame image of the video being played Feature extraction unit, based on the attribute data, extract feature data used to represent comprehensive information related to the commentary in the frame image; event generation unit, based on the feature data, generate candidate events; selection unit, using From the generated candidate events, select a commentary event; a text generation unit for determining a corresponding commentary text based on the selected commentary event; and an output unit for outputting corresponding commentary content based on the commentary text.
  • an online virtual interpretation device including: one or more processors and one or more memories, wherein the one or more memories store a computer program, when the computer When the program is executed by the one or more processors, the device is caused to execute the online virtual explanation method described above.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor of a computing device, the computing device is caused to execute the above-mentioned Online virtual commentary method.
  • the problem of real-time online commentary can be solved.
  • videos currently in progress such as online games or sports events
  • the commentary content can be output synchronously in real time, and the commentary voice can even be broadcast by the virtual commentary host, with corresponding facial expressions, actions and other anthropomorphic effects.
  • FIG. 1A shows a schematic diagram of an application environment of an online virtual explanation method according to an embodiment of the present disclosure
  • FIG. 1B shows a flowchart of the process of an online virtual explanation method according to an embodiment of the present disclosure
  • Fig. 2 shows a flowchart of a process for establishing an interpretive feature library according to an embodiment of the present disclosure
  • Fig. 3 shows a schematic diagram illustrating an example of feature classification included in a feature library in an online game application scenario according to an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of a data flow diagram used for feature extraction in an application scenario of online game virtual commentary according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of a data flow diagram for feature extraction in an application scenario of virtual interpretation of sports events according to an embodiment of the present disclosure
  • FIG. 6 shows a schematic diagram of a data flow diagram generated by events in an application scenario of online game virtual commentary according to an embodiment of the present disclosure
  • FIG. 7 shows a schematic diagram of the relationship between a single-frame event and a multi-frame event according to an embodiment of the present disclosure
  • FIG. 8 shows a schematic diagram of a data flow diagram of event selection in an application scenario of online game virtual commentary according to an embodiment of the present disclosure
  • FIG. 9 shows a schematic diagram of an example of a data flow graph generated by explanation content according to an embodiment of the present disclosure
  • FIG. 10 shows a schematic diagram of an example of commentary content output in an online game virtual commentary application scenario according to an embodiment of the present disclosure
  • FIG. 11 shows a schematic structural diagram of an online virtual explanation device according to an embodiment of the present disclosure.
  • Fig. 12 shows a schematic diagram of an exemplary computing device architecture according to an embodiment of the present disclosure.
  • FIG. 1A shows a schematic diagram of an application environment of an online virtual explanation method according to an embodiment of the present disclosure.
  • the method is applied to a communication system 100, which includes one or more terminal devices 101, a server 102, a video data acquisition device 103, and a database 104, wherein the terminal device 101 and the server 102 pass through
  • the network 105 is connected, for example, through a wired or wireless network connection.
  • the video data acquisition device 103 may acquire frame images of the video being played, and send the acquired frame images to the server 102.
  • the server 102 interprets the received frame image according to the reference feature data stored in the database 104, and outputs the interpretive content to the terminal device 101 via the network 105.
  • the terminal device 101 can display the commentary content to the user.
  • the server 102 may be a single server, a server cluster composed of multiple servers, or a cloud computing platform.
  • the server 102, the video data acquisition device 103, and the database 104 may be independent devices or one device.
  • the terminal device 10 may be a mobile phone, a tablet computer, a notebook computer, or a personal computer (PC, Personal Computer), a smart TV, or the like.
  • FIG. 1B An online virtual commentary method according to an embodiment of the present disclosure will be described with reference to FIG. 1B. As shown in Fig. 1B, the method includes the following steps S101-S106.
  • step S101 based on at least the frame image of the video being played, the attribute data used to describe the basic information of the element in the frame image is acquired.
  • the online virtual explanation method can be applied to the scene of online game virtual explanation.
  • the video being played is an online game video.
  • Game players can input game instructions to online games to control the movement, skills, actions and other behaviors of player characters and non-player characters in the game while the game is in progress.
  • the acquiring attribute data for describing basic information of elements in the frame image based at least on the frame image of the video being played further includes: the frame image based on the frame image of the video being played and the frame image
  • the input frame command (eg, game command) acquires attribute data used to describe the basic information of the elements in the frame image.
  • the attribute data includes direct attribute data and indirect attribute data.
  • the obtaining the attribute data used to describe the basic information of the element in the frame image based at least on the frame image of the video being played may further include: obtaining the direct attribute data of the element in the current frame image in the current frame.
  • the direct attribute data is automatically generated by the game application.
  • the direct attribute data may include the coordinate position of the game character in the image. Then, by performing analysis processing on the direct attribute data of the elements in the current frame image in the current frame, the indirect attribute data is determined.
  • the indirect attribute data is the data obtained by further calculation on the basis of the direct attribute data.
  • the direct attribute data includes the coordinate position of the game character in the image
  • the direct attribute data includes the coordinate position of the game character in the image
  • the wild area is a certain area on the map set in the game application, which belongs to the activity range of a neutral third-party game character.
  • indirect attribute data can be obtained through understanding of game instructions. For example, if the game player inputs a game instruction to move, then based on the current coordinate position of the game character corresponding to the game player, the new coordinate position of the game character located on the map after moving can be calculated. Or, as another example, if the game player inputs a game instruction to cast a skill, based on the current time, the remaining time for the skill cooling of the game character corresponding to the game player after the skill is cast can be calculated.
  • the direct attribute data of one or more previous frames of the image may be further combined to obtain indirect attribute data.
  • the obtaining the attribute data used to describe the basic information of the element in the frame image based at least on the frame image of the video being played may further include: obtaining the direct information of the element in the current frame image in the current frame and the previous frame. Attribute data; and determine indirect attribute data by performing analysis processing on the direct attribute data of the elements in the current frame image in the current frame and the previous frame.
  • the previous frame is a frame that precedes the current frame in time order, and the number of previous frames may be one or more.
  • the direct attribute data may include information boxes in the current frame and the previous frame of the game image.
  • the blood volume data of the game character in the current frame and the previous frame of the game image can be obtained, and the blood volume data in the current frame and the previous frame of the game image can be compared to determine the "game character
  • the indirect attribute data of the amount of change in blood volume may include information boxes in the current frame and the previous frame of the game image.
  • the online virtual explanation method can also be applied to the scene of virtual explanation of sports events.
  • the video being played is an ongoing sporting event.
  • sports events do not need to receive user input, that is, there is no input game instruction in each frame, and it can be regarded as an automatically played video.
  • the attribute data of the elements in the image is obtained based only on the frame image.
  • the direct attribute data may include the coordinate position of the player in a frame of image. By comparing the coordinate position of the player with the coordinate position of each area of the basketball court, the indirect attribute data of "the player is in the front court" can be obtained.
  • step S101 for the attribute data acquired at least based on the frame image of the video being played, whether it is the direct attribute data directly acquired or the indirect attribute data obtained by further analysis based on the directly acquired direct attribute data. It is the data used to describe the basic information of a single dimension.
  • step S102 based on the attribute data, feature data representing comprehensive information related to the commentary in the frame image is extracted.
  • the direct attribute data and the indirect attribute data obtained based on the frame image can be reused to extract different types of feature data.
  • some of the attribute data used may be the same.
  • the extraction of feature data used to represent comprehensive information related to commentary in the frame image based on the attribute data may further include: for each feature included in a pre-established commentary feature library, The attribute data associated with the characteristic is selected from the attribute data. For example, for the feature "whether it is in the grass" included in the interpretive feature library, the attribute data associated with the feature is the coordinates of the game character and the coordinates of the grass. The grass here is a plant in the game application, which can be used to hide game characters. Then, by performing analysis processing on the attribute data associated with the feature, the value of the feature is determined as the feature data of the frame image. For example, by comparing the coordinates of the game character with the coordinates of the grass, the value of the feature "whether in the grass” is determined to be "in the grass” or "not in the grass”.
  • the forward interpretation process is from the attribute data to the characteristic data, and then from the characteristic data to the interpreting event.
  • the narrative feature library is established based on the reverse interpretation process, that is: the narrative feature library is the disassembly from event to feature and feature to attribute for the narrative events involved in the interpretation process, through the application of a priori annotation and mining The method is gradually established and improved.
  • the process of establishing the interpretive feature library is described in detail with reference to FIG. 2.
  • the method for establishing the narration feature library specifically includes the following steps S201-S204.
  • step S201 a reference commentary text is extracted based on a reference commentary video as a standard.
  • the standard reference commentary video can be a video that has been manually explained, such as a game video or a sports event.
  • the output commentary content is already included. It is hoped that based on the existing reference commentary videos, the specific features of the commentary process can be deduced backwardly.
  • multiple reference commentary videos will be selected as the standard. The more the number of selected reference commentary videos, the more complete the features included in the established commentary feature library.
  • the established narration feature library corresponds to the specific type of the video to be narrated one-to-one.
  • the commentary feature library for online games and the commentary feature library for sports events must be different.
  • different types of online games correspond to different commentary feature libraries.
  • the commentary audio may be extracted from the reference commentary video, and then the extracted commentary audio may be converted into a reference commentary text.
  • the extracted reference commentary text reads "This Shen Mengxi lost a mixed bomb, and the damage is really high, I admire it.”
  • a reference commentary event is determined based on the reference commentary text.
  • the reference commentary events are "Shen Mengxi dropped a bomb” and "Bomb damage is high”.
  • step S203 based on the reference commentary event, reference feature data used to characterize comprehensive information related to the commentary is determined. For example, based on the reference commentary event of "Shen Mengxi dropping a bomb", the reference feature data of "the name of the game character” and the “action of the game character” can be determined. Based on the reference commentary event of "bomb damage high”, the reference feature data of "damage output to other game characters" can be determined.
  • step S204 based on the reference feature data, an interpretive feature library is established.
  • a combination of manual labeling and automatic labeling can be used to determine the reference narration event from the reference narration text and to determine the reference feature data from the reference narration event. Specifically, first, a batch of reference commentary videos are manually labeled to extract events and label features to initially establish the commentary feature database. Then, replace a new batch of reference commentary videos, and perform event extraction and feature labeling on this batch of new reference commentary videos through automatic labeling. After automatic labeling, errors are corrected and supplemented by manual inspection, so as to further expand and improve the commentary feature library. In accordance with this manual and automatic alternate method, new reference commentary videos are constantly replaced. As the processing progresses, it will be found that there will be fewer and fewer manual corrections and supplements. When there are no more manual error correction and supplementary parts, it means that the commentary feature library is complete and can be used in the feature extraction of online virtual commentary.
  • FIG. 3 shows a schematic diagram illustrating an example of feature classification included in a feature library in an online game application scenario according to an embodiment of the present disclosure.
  • the narrative features mainly include categories such as heroes, non-player characters (NPC), battles (team battles), summary analysis, time description, global data, camp, and early stage.
  • the hero features are mainly related to the hero's state (hp, location, etc.) and actions (using skills, moving position, etc.); NPC features are mainly related to the state (hp, location, etc.) and actions of non-player characters in the game (Attack heroes, tower beating, birth, etc.); combat features are mainly related to the attributes of multi-to-many team battles formed by heroes in the game (start and end time, heroes participating in the team, ending, effect, etc.); summary analysis The characteristics are mainly statistics and analysis of the current game situation; the time description is mainly to divide the current game stage (early, mid, late, etc.); the global data characteristics are mainly the overall data statistics of the game (such as heroes from the first perspective of the game, The position and scope of the game's field of vision, the time of
  • the feature data of each category is calculated based on the attribute data, which is the key to feature extraction.
  • Fig. 4 shows a schematic diagram of a data flow diagram used for feature extraction in an application scenario of online game virtual commentary according to an embodiment of the present disclosure.
  • indirect attribute data can be generated through real-time in-game element attribute calculation, and combined with the commentary feature library established for game commentary, game features can be extracted.
  • Fig. 5 shows a schematic diagram of a feature flow diagram used for feature extraction in an application scenario of virtual interpretation of a sports event according to an embodiment of the present disclosure.
  • indirect attribute data can be generated through real-time calculation of element attributes in the image, and combined with the commentary feature library established for sports event commentary, the game can be extracted feature.
  • a candidate event is generated based on the characteristic data.
  • Candidate events refer to events that occur during the playback of the video.
  • the candidate events generated here are not necessarily all events that occur during the playback of the video. Which candidate events are generated depends on which feature data has been extracted. In addition, the generated candidate events do not necessarily have to be explained.
  • the generating candidate events based on the characteristic data further includes: loading conditions corresponding to all pre-defined events.
  • An event can correspond to one or more conditions.
  • the event of "hero killed by wild monster” can correspond to two conditions: C1. The hero is killed; C2. The wild monster is killed.
  • the event is generated as a candidate event. For example, if the value of the extracted feature data indicates that the hero status is alive or the killer category is another hero, in other words, the condition C1 and the condition C2 are not met at the same time, then the "hero killed by a monster" event will not be generated .
  • the value of the extracted feature data indicates that the hero status is killed and the killer category is wild monsters, in other words, conditions C1 and C2 are met at the same time, then the event "hero killed by wild monsters" will be generated, and This event is regarded as a candidate event.
  • candidate events are divided into basic events and advanced events.
  • the "hero is killed by a monster" event listed above is a basic event.
  • basic events are generated based on characteristic data.
  • High-level events are defined relative to basic events.
  • Advanced events may be generated based on basic events or may be generated based on basic events and characteristic data.
  • the auxiliary, tyrant, and jungler in this event can be regarded as the game character or the occupation of the game character.
  • the game features that are directly divided into include: F1. Hero occupation-auxiliary, F2. Tyrant state-attacked, F3. Tyrant attacker occupation-jungle, F4. Auxiliary hero position-grass after tyrant. Since the basic events have been defined and extracted before: E1. Auxiliary squatting behind the tyrant, E2. The jungle hero is fighting the tyrant, so the advanced event H1 can be formed directly based on the combination of basic events E1+E2 instead of the lengthy F1 and F2. , F3, F4 combination, and perform multiple condition judgments.
  • FIG. 6 shows a schematic diagram of a data flow diagram generated by events in an application scenario of virtual interpretation of online games according to an embodiment of the present disclosure.
  • candidate events can also be divided into team battle events and non-team battle events.
  • the team battle event is a collection of candidate events in which multiple game characters participate in a predetermined time period before the current frame
  • the non-team battle event is an event other than the team battle event.
  • Teamfight events and non-teamfight events can be basic events or advanced events.
  • Fig. 7 shows a schematic diagram of the relationship between a single-frame event and a multi-frame event according to an embodiment of the present disclosure.
  • a single-frame event is an event based on one frame of image
  • a multi-frame event is a collection of multiple single-frame events.
  • the process of generating a single frame event is shown.
  • the multi-frame event includes three single-frame events generated based on the first frame image, the second frame image, and the third frame image.
  • the present disclosure is not limited to this.
  • Multi-frame events can be any number of single-frame events.
  • the team battle described above is based on a period of time in the game. Game data exists in the form of frames, and a game time period contains multiple game frames. Therefore, the team battle event can be regarded as a multi-frame event.
  • step S104 a commentary event is selected from the generated candidate events.
  • a commentary event is selected from the generated candidate events.
  • only a part of the candidate events will be selected as interpretive events and will be narrated.
  • the selecting a commentary event from the generated candidate events may further include: selecting a commentary event based on the playback state of the video.
  • the generated candidate events include operational events, which are used to supplement the introduction of the currently playing video (for example, game introduction or sports event introduction), news and other narration opening events.
  • Such events can be broadcast in time periods that are not related to the commentary (eg, game loading, game pause, intermission of sports events, etc.). Therefore, when the playing state of the video is a game pause or an intermission in a sports event, an operation event is selected from the candidate events as the narration event (operation selection).
  • selecting an interpretive event from the generated candidate events further includes: determining the importance of the generated candidate event according to a predetermined rule; and selecting the candidate event with the highest importance as the interpreting event. For example, in the application scenario of virtual interpretation of online sports events, when the location of the sports event is located in the central area of the screen, the candidate event is considered to have a high degree of importance. Or, when a sports event causes a score change, the candidate event is considered to have a high degree of importance.
  • the commentary event can be selected in the following manner.
  • the candidate events include team battle events and non-team battle events.
  • the selection of the commentary event from the generated candidate events further includes: when there are multiple teamfight events within the predetermined time period, for example, based on the positions of the participating game characters, there are The team battle in area A, the team battle in area B, and the team battle in area C determine the importance of each team battle event according to predetermined rules. Then, select the team battle event with the highest degree of importance (team battle selection). For example, the teamfight event that brings the largest number of kills is of high importance. Or, the teamfight event that casts the most skills is of high importance.
  • the teamfight event that appears in the vision of the narration is of high importance.
  • determine the degree of importance of each candidate event for all candidate events included in the selected team battle event and select the candidate event with the highest degree of importance as the interpretive event (selection within the group). For example, the static weight of the event (for example, whether the game character is in the C position) and the dynamic weight of the event (for example, whether the event appears in the field of view, team battle results, skill effects, etc.) can be scored and weighted, and the highest score can be selected
  • the incident within the group is used as an interpretive event.
  • the event with the highest static weight can be directly selected from the single frame event as the narration event (selected outside the team).
  • predetermined rules are not limited to the examples listed above. Any other possible rules should also be included in the scope of this disclosure.
  • FIG. 8 shows a schematic diagram of a data flow diagram of event selection in an application scenario of virtual interpretation of online games according to an embodiment of the present disclosure.
  • the narration events can be selected in the order of operation selection 801, team battle selection 802, in-group selection 803, and out-of-group selection 804.
  • step S105 the corresponding commentary text is determined based on the selected commentary event. In other words, after determining the event that needs to be explained, a commentary corresponding to the event needs to be generated.
  • the determination of the corresponding commentary text based on the selected commentary event further includes the following steps.
  • the commentary text library is essentially a collection of (events, commentary templates), and includes all commentary stems and commentary events corresponding to each commentary template.
  • the commentary template [M1. ⁇ hero name> don't worry, take a mouthful of medicine and save your strength] corresponds to the commentary event [E1. Hero uses the blood recovery skill].
  • the template field in the commentary template is replaced, and the commentary text is generated.
  • the attribute data obtained in the feature extraction process described above, the hero name-Ake is dynamically replaced to obtain the commentary text.
  • step S106 based on the commentary text, output the corresponding commentary content.
  • the narration content here can be text, audio, or video.
  • the output of the corresponding commentary content based on the commentary text further includes one or more of the following processing: outputting the commentary text (subtitle commentary); outputting the voice (audio) used to broadcast the commentary text Commentary); Display the avatar, and output the voice matching the avatar to broadcast the commentary text (virtual host video commentary); and display the avatar, and broadcast the commentary through the actions of the avatar Text (video commentary by the virtual host), such as a sign language announcement.
  • FIG. 9 shows a schematic diagram of an example of a data flow diagram generated by the explanation content according to an embodiment of the present disclosure.
  • the commentary template corresponding to the selected commentary event is determined (commentary template selection 901).
  • the template field in the commentary template is replaced, and the commentary text is generated (stem replacement 902).
  • a corresponding voice voice generation 903
  • an expression action an expression action 904
  • generating the corresponding voice based on the narration text can be achieved through TTS (Text-to-Speech) technology.
  • TTS Text-to-Speech
  • FIG. 10 shows a schematic diagram of an example of commentary content output in an application scenario of online game virtual commentary according to an embodiment of the present disclosure.
  • the commentary content includes commentary text 1001 in the form of subtitles.
  • the commentary content may further include audio data for broadcasting the commentary text.
  • the online virtual explanation method according to the embodiment of the present disclosure has been described in detail with reference to FIGS. 1B to 10.
  • the problem of real-time online explanation can be solved.
  • videos currently in progress such as online games or sports events
  • the commentary content can be output synchronously in real time, and the commentary voice can even be broadcast by the virtual commentary host, with corresponding facial expressions, actions and other anthropomorphic effects.
  • the online virtual explanation device 1100 includes: an attribute acquisition unit 1101, a feature extraction unit 1102, an event generation unit 1103, a selection unit 1104, a text generation unit 1105, and an output unit 1106.
  • the attribute obtaining unit 1101 is configured to obtain attribute data describing basic information of elements in the frame image based at least on the frame image of the video being played.
  • the online virtual commentary device according to an embodiment of the present disclosure can be applied to a scene of virtual commentary of an online game.
  • the video being played is an online game video.
  • Game players can input game instructions to online games to control the movement, skills, actions and other behaviors of player characters and non-player characters in the game while the game is in progress.
  • the attribute acquiring unit 1101 is further configured to: acquire information describing the elements in the frame image based on the frame image of the video being played and the frame instruction input to the frame image (eg, game instruction) The attribute data of the basic information.
  • the attribute data includes direct attribute data and indirect attribute data.
  • the attribute obtaining unit 1101 may be further configured to obtain direct attribute data of elements in the current frame image in the current frame.
  • the direct attribute data is automatically generated by the game application.
  • the direct attribute data may include the coordinate position of the game character in the image. Then, by performing analysis processing on the direct attribute data of the elements in the current frame image in the current frame, the attribute acquisition unit 1101 determines the indirect attribute data.
  • the indirect attribute data is the data obtained by further calculation on the basis of the direct attribute data.
  • the direct attribute data includes the coordinate position of the game character in the image
  • the direct attribute data includes the coordinate position of the game character in the image
  • indirect attribute data can be obtained through understanding of game instructions. For example, if the game player inputs a game instruction to move, then based on the current coordinate position of the game character corresponding to the game player, the new coordinate position of the game character located on the map after moving can be calculated. Or, as another example, if the game player inputs a game instruction to cast a skill, based on the current time, the remaining time for the skill cooling of the game character corresponding to the game player after the skill is cast can be calculated.
  • the direct attribute data of one or more previous frames of the image may be further combined to obtain indirect attribute data.
  • the attribute obtaining unit 1101 may be further configured to: obtain the direct attribute data of the elements in the current frame image in the current frame and the previous frame; The direct attribute data is analyzed and processed to determine the indirect attribute data.
  • the previous frame is a frame that precedes the current frame in time order, and the number of previous frames may be one or more.
  • the direct attribute data may include information boxes in the current frame and the previous frame of the game image.
  • the blood volume data of the game character in the current frame and the previous frame of the game image can be obtained, and the blood volume data in the current frame and the previous frame of the game image can be compared to determine the "game character
  • the indirect attribute data of the amount of change in blood volume can be obtained.
  • the online virtual commentary device can also be applied to the scene of virtual commentary of sports events.
  • the video being played is an ongoing sporting event.
  • sports events do not need to receive user input, that is, there is no input game instruction in each frame, and it can be regarded as an automatically played video.
  • the attribute data of the elements in the image is obtained based only on the frame image.
  • the direct attribute data may include the coordinate position of the player in a frame of image. By comparing the coordinate position of the player with the coordinate position of each area of the basketball court, the indirect attribute data of "the player is in the front court" can be obtained.
  • the attribute data obtained by the attribute obtaining unit 1101 based on the frame image of the video being played whether it is the direct attribute data obtained directly or the indirect attribute data obtained by further analysis based on the directly obtained direct attribute data, They are all data used to describe the basic information of a single dimension.
  • the feature extraction unit 1102 is configured to extract feature data representing comprehensive information related to the commentary in the frame image based on the attribute data.
  • feature data representing comprehensive information related to the commentary in the frame image based on the attribute data.
  • direct attribute data and indirect attribute data obtained based on frame images can be reused to extract different types of feature data. That is to say, when the feature extraction unit 1102 extracts different feature data, some of the used attribute data may be the same.
  • the feature extraction unit 1102 is further configured to: for each feature included in a pre-established interpretive feature library, select attribute data associated with the feature from the attribute data. For example, for the feature "whether it is in the grass" included in the interpretive feature library, the attribute data associated with the feature is the coordinates of the game character and the coordinates of the grass. Then, by performing analysis processing on the attribute data associated with the feature, the value of the feature is determined as the feature data of the frame image. For example, by comparing the coordinates of the game character with the coordinates of the grass, the value of the feature "whether in the grass” is determined to be "in the grass” or "not in the grass”.
  • the forward interpretation process is from the attribute data to the characteristic data, and then from the characteristic data to the interpreting event.
  • the narrative feature library is established based on the reverse interpretation process, that is: the narrative feature library is the disassembly from event to feature and feature to attribute for the narrative events involved in the interpretation process, through the application of a priori annotation and mining The method is gradually established and improved.
  • the online virtual commentary device may further include a commentary feature library construction unit (not shown in the figure) configured to build the commentary feature library by executing the following processing: extracting Reference commentary text; based on the reference commentary text, determine a reference commentary event; based on the reference commentary event, determine reference feature data for characterizing comprehensive information related to the commentary; based on the reference feature data, establish a commentary feature library.
  • a commentary feature library construction unit (not shown in the figure) configured to build the commentary feature library by executing the following processing: extracting Reference commentary text; based on the reference commentary text, determine a reference commentary event; based on the reference commentary event, determine reference feature data for characterizing comprehensive information related to the commentary; based on the reference feature data, establish a commentary feature library.
  • the standard reference commentary video can be a video that has been manually explained, such as a game video or a sports event.
  • the output commentary content is already included. It is hoped that based on the existing reference commentary videos, the specific features of the commentary process can be deduced backwardly.
  • multiple reference commentary videos will be selected as the standard. The more the number of selected reference commentary videos, the more complete the features included in the established commentary feature library.
  • the established narration feature library corresponds to the specific type of the video to be narrated one-to-one.
  • the commentary feature library for online games and the commentary feature library for sports events must be different.
  • different types of online games correspond to different commentary feature libraries.
  • the commentary feature library construction unit may extract commentary audio from the reference commentary video, and then convert the extracted commentary audio into a reference commentary text.
  • the extracted reference commentary text reads "This Shen Mengxi lost a mixed bomb, and the damage is really high, I admire it.”
  • the commentary feature library construction unit determines a reference commentary event based on the reference commentary text.
  • the reference commentary text of "Shen Mengxi lost a mixed bomb, the damage is really high, I admire it"
  • the reference commentary events are "Shen Mengxi dropped a bomb” and "Bomb damage is high”.
  • the commentary feature library construction unit determines reference feature data used to characterize comprehensive information related to the commentary based on the reference commentary event. For example, based on the reference commentary event of "Shen Mengxi dropping a bomb", the reference feature data of "the name of the game character” and the “action of the game character” can be determined. Based on the reference commentary event of "bomb damage high”, the reference feature data of "damage output to other game characters" can be determined.
  • the commentary feature library construction unit builds the commentary feature library based on the reference feature data.
  • a combination of manual labeling and automatic labeling can be used to determine the reference narration event from the reference narration text and to determine the reference feature data from the reference narration event. Specifically, first, a batch of reference commentary videos are manually labeled to extract events and label features to initially establish the commentary feature database. Then, replace a new batch of reference commentary videos, and perform event extraction and feature labeling on this batch of new reference commentary videos through automatic labeling. After automatic labeling, errors are corrected and supplemented by manual inspection, so as to further expand and improve the commentary feature library. In accordance with this manual and automatic alternate method, new reference commentary videos are constantly replaced. As the processing progresses, it will be found that there will be fewer and fewer manual corrections and supplements. When there are no more manual error correction and supplementary parts, it means that the commentary feature library is complete and can be used in the feature extraction of online virtual commentary.
  • the commentary feature library includes a large amount of feature data, it is divided into different categories in order to facilitate retrieval and management. It can be seen that the feature data of each category is calculated based on the attribute data, which is the key to feature extraction.
  • the event generating unit 1103 is configured to generate candidate events based on the characteristic data.
  • Candidate events refer to events that occur during the playback of the video.
  • the candidate events generated here are not necessarily all events that occur during the playback of the video. Which candidate events are generated depends on which feature data has been extracted. In addition, the generated candidate events do not necessarily have to be explained.
  • the event generating unit 1103 is further configured to load conditions corresponding to all pre-defined events.
  • An event can correspond to one or more conditions.
  • the event of "hero killed by wild monster” can correspond to two conditions: C1. The hero is killed; C2. The wild monster is killed.
  • the event generating unit 1103 When it is determined based on the characteristic data that the condition corresponding to an event is satisfied, the event generating unit 1103 generates the event as a candidate event. For example, if the value of the extracted feature data indicates that the hero status is alive or the killer category is another hero, in other words, the condition C1 and the condition C2 are not met at the same time, then the event generating unit 1103 will not generate a "hero being wild "The blame is killed" incident.
  • the event generation unit 1103 will generate "Heroes killed by monsters" "Event and regard this event as a candidate event.
  • the event generating unit 1103 traverses the conditions corresponding to all pre-defined events, and generates events that satisfy the corresponding conditions.
  • candidate events are divided into basic events and advanced events.
  • the "hero is killed by a monster" event listed above is a basic event.
  • the event generating unit 1103 may generate a basic event based on the characteristic data. High-level events are defined relative to basic events.
  • the event generating unit 1103 may generate advanced events based on basic events or may generate advanced events based on basic events and characteristic data.
  • the game features that are directly divided into include: F1. Hero occupation-auxiliary, F2. Tyrant state-attacked, F3. Tyrant attacker occupation-jungle, F4. Auxiliary hero position-grass after tyrant. Since the basic events have been defined and extracted before: E1. Auxiliary squatting behind the tyrant, E2. The jungler is fighting the tyrant. Therefore, the event generating unit 1103 can directly form the advanced event H1 based on the combination of basic events E1+E2, without using it. Define lengthy combinations of F1, F2, F3, and F4, and perform multiple condition judgments.
  • events are divided into basic events and advanced events.
  • candidate events can also be divided into team battle events and non-team battle events.
  • the team battle event is a collection of candidate events in which multiple game characters participate in a predetermined time period before the current frame
  • the non-team battle event is an event other than the team battle event.
  • Teamfight events and non-teamfight events can be basic events or advanced events.
  • events can be divided into single-frame events and multi-frame events. Teamfights are based on a period of time in the game. Game data exists in the form of frames, and a game time period contains multiple game frames. Therefore, teamfight events can be regarded as multi-frame events.
  • the above-mentioned attribute acquisition unit 1101, feature extraction unit 1102, and event generation unit 1103 execute the processes of attribute acquisition, feature extraction, and event generation respectively.
  • the generated events will be used as candidate events and provided to the subsequent selection unit 1104.
  • the selection unit 1104 is configured to select a commentary event from the generated candidate events. In other words, only a part of the candidate events will be selected as interpretive events and will be narrated.
  • the selection unit 1104 is further configured to: select a commentary event based on the playing state of the video.
  • the candidate events generated by the event generating unit 1103 include operational events, which are used to supplement the introduction of the currently playing video (for example, game introduction or sports event introduction), news and other commentary opening events.
  • Such events can be broadcast in time periods that are not related to the commentary (eg, game loading, game pause, intermission of sports events, etc.). Therefore, when the playing state of the video is a game pause or an intermission of a sports event, the selection unit 1104 selects an operation event from the candidate events as the narration event (operation selection).
  • the selection unit 1104 is further configured to: determine the degree of importance of the generated candidate event according to a predetermined rule; and select the candidate event with the highest degree of importance as the narration event. For example, in the application scenario of virtual interpretation of online sports events, when the location of the sports event is located in the central area of the screen, the candidate event is considered to have a high degree of importance. Or, when a sports event causes a score change, the candidate event is considered to have a high degree of importance.
  • the commentary event can be selected in the following manner.
  • the candidate events include team battle events and non-team battle events.
  • the selection unit 1104 may be further configured to perform the following processing.
  • the selection unit 1104 selects the team battle event with the highest degree of importance (team battle selection). For example, the teamfight event that brings the largest number of kills is of high importance. Or, the teamfight event that casts the most skills is of high importance. Or, the teamfight event that appears in the vision of the narration is of high importance.
  • the selection unit 1104 determines the degree of importance of each candidate event for all the candidate events included in the selected team battle event according to predetermined rules; and selects the candidate event with the highest degree of importance as the interpretive event (within the team). Select). For example, the static weight of the event (for example, whether the game character is in the C position) and the dynamic weight of the event (for example, whether the event appears in the field of view, team battle results, skill effects, etc.) can be scored and weighted, and the highest score can be selected The incident within the group is used as an interpretive event.
  • the static weight of the event for example, whether the game character is in the C position
  • the dynamic weight of the event for example, whether the event appears in the field of view, team battle results, skill effects, etc.
  • the selection unit 1104 can directly select the event with the highest static weight from the single-frame events as the narration event (selection outside the team).
  • predetermined rules are not limited to the examples listed above. Any other possible rules should also be included in the scope of this disclosure.
  • the text generating unit 1105 is configured to determine the corresponding commentary text based on the selected commentary event. In other words, after determining the event that needs to be explained, a commentary corresponding to the event needs to be generated.
  • the text generating unit 1105 may be further configured to determine a commentary template corresponding to the selected commentary event based on a pre-established commentary text library.
  • the commentary text library is essentially a collection of (events, commentary templates), and includes all commentary stems and commentary events corresponding to each commentary template.
  • the commentary template [M1. ⁇ hero name>, don't worry, take a mouthful of medicine and save your strength first] corresponds to the commentary event [E1. Hero uses healing skills].
  • the text generating unit 1105 replaces the template field in the commentary template based on the attribute data corresponding to the commentary event, and generates the commentary text.
  • the text generating unit 1105 dynamically replaces the attribute data obtained by the attribute obtaining unit described above, the hero name-Ake, to obtain the commentary text.
  • the output unit 1106 is configured to output corresponding commentary content based on the commentary text.
  • the narration content here can be text, audio, or video.
  • the output unit is further configured to perform one or more of the following processes: output the commentary text (subtitle commentary); output the voice used to broadcast the commentary text (audio commentary); display the avatar , And output a voice matching the avatar to broadcast the commentary text (virtual host video commentary); and display the avatar, and broadcast the commentary text (virtual host video) through the actions of the avatar Interpretation), such as a sign language broadcast.
  • the problem of real-time online commentary can be solved.
  • videos currently in progress such as online games or sports events
  • the commentary content can be output synchronously in real time, and the commentary voice can even be broadcast by the virtual commentary host, with corresponding facial expressions, actions and other anthropomorphic effects.
  • an online virtual interpretation device including: one or more processors and one or more memories, wherein the one or more memories store a computer program, when the When the computer program is executed by the one or more processors, the device executes the online virtual explanation method described above.
  • the method or device according to the embodiment of the present disclosure may also be implemented with the aid of the architecture of the computing device 1200 shown in FIG. 12.
  • the computing device 1200 may include a bus 1210, one or more CPUs 1220, a read only memory (ROM) 1230, a random access memory (RAM) 1240, a communication port 1250 connected to a network, and an input/output component 1260 , Hard Disk 1270, etc.
  • the storage device in the computing device 1200 can store various data or files used in the processing and/or communication of the image processing method provided by the embodiments of the present disclosure and the program instructions executed by the CPU.
  • the architecture shown in FIG. 12 is only exemplary. When implementing different devices, one or more components in the computing device shown in FIG. 12 may be omitted according to actual needs.
  • the embodiments of the present disclosure can also be implemented as a computer-readable storage medium.
  • Computer-readable instructions ie, computer programs
  • the computer-readable instructions are stored on the computer-readable storage medium according to an embodiment of the present disclosure.
  • the computing device can execute the online virtual explanation method according to the embodiments of the present disclosure described with reference to the above drawings.
  • the computer-readable storage medium includes, but is not limited to, for example, volatile memory and/or non-volatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, and the like.
  • the online virtual commentary method and device have been described with reference to FIGS. 1B to 12.
  • the problem of real-time online commentary can be solved.
  • videos currently in progress such as online games or sports events
  • the commentary content can be output synchronously in real time, and the commentary voice can even be broadcast by the virtual commentary host, with corresponding facial expressions, actions and other anthropomorphic effects.
  • the terms “include”, “include” or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes no Other elements clearly listed, or also include elements inherent to this process, method, article, or equipment. If there are no more restrictions, the element defined by the sentence “including" does not exclude the existence of other same elements in the process, method, article or equipment that includes the element.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Optics & Photonics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Acoustics & Sound (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种在线虚拟解说方法、设备和介质,所述方法包括:至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据(S101);基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据(S102);基于所述特征数据,生成候选事件(S103);从生成的候选事件中,选择解说事件(S104);基于所选择的解说事件,确定对应的解说文本(S015);以及基于所述解说文本,输出对应的解说内容(S106)。

Description

在线虚拟解说方法、设备和介质
本申请要求于2020年2月7日提交中国专利局、申请号为202010082914.2、发明名称为“在线虚拟解说方法、设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及人工智能的领域,更具体地说,涉及在线虚拟解说方法、设备和介质。
背景技术
随着人工智能(Artificial Intelligence,AI)各方向不同能力的发展,大众已渐渐不满足于在实际场景中只应用某个AI能力,因此对于AI综合能力应用场景的探索也在不断推进。
例如,提出了针对新闻播报场景的虚拟主持人技术。首先,预先准备好需要播报的文本、播报时的语速、说话情绪等。然后,通过应用TTS(Text-to-speech)技术转换为模拟人声的声音播放出来,并且把离线配置好的人脸表情参数、动作参数等人体参数应用3D技术转换为拟人的表情和动作,并展示出来。
然而,这种新闻播报虚拟主持人技术只适用于离线场景。虚拟主持人的播报内容、语音效果、动作效果在播报前已经固定下来,不能适用于实时在线解说、播报主持的场景。
发明内容
鉴于以上情形,期望提供一种在线虚拟解说方法、设备和介质,以便对于当前正在播放的视频,能够实时同步地输出解说内容。
根据本公开的一个方面,提供了一种在线虚拟解说方法,包括:至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据;基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据;基于所述特征数据,生成候选事件;从生成的候选事件中,选择解说事件;基于所选择的解说事件,确定对应的解说文本;以及基于所述解说文本,输出对应的解说内容。
根据本公开的另一方面,提供了一种在线虚拟解说设备,包括:属性获取单元,用于至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据;特征提取单元,用于基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据;事件生成单元,用于基于所述特征数据,生成候选事件;选择单元,用于从生成的候选事件中,选择解说事件;文本生成单元,用于基于所选择的解说事件,确定对应的解说文本;以及输出单元,用于基于所述解说文本,输出对应的解说内容。
根据本公开的再一方面,提供了一种在线虚拟解说设备,包括:一个或多个处理器以及一个或多个存储器,其中所述一个或多个存储器中存储有计算机程序,当所述计算机程序由所述一个或多个处理器执行时,使得所述设备执行执行上文中所述的在线虚拟解说方法。
根据本公开的又一方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序由计算设备的处理器执行时,使得所述计算设备执行上文中所述的在线虚拟解说方法。
通过根据本公开的实施例的在线虚拟解说方法和设备,能够解决实时在线解说的问题。对于当前正在进行中的视频(如,在线游戏或体育赛事),可以实时同步地输出解说内容,甚至可以通过虚拟解说主持人播报解说语音,并搭配相应的表情、动作等拟人效果。
附图简要说明
图1A示出了本公开实施例的在线虚拟解说方法的应用环境示意图;
图1B示出了根据本公开实施例的在线虚拟解说方法的过程的流程图;
图2示出了根据本公开实施例的用于建立解说特征库的过程的流程图;
图3示出了根据本公开实施例的在在线游戏的应用场景下解说特征库中包括的特征分类的一种示例的示意图;
图4示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下用于特征提取的数据流图的示意图;
图5示出了根据本公开实施例的在体育赛事虚拟解说的应用场景下用于特征提取的数据流图的示意图;
图6示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下事件生成的数据流图的示意图;
图7示出了根据本公开实施例的单帧事件与多帧事件之间的关系的示意图;
图8示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下事件选取的数据流图的示意图;
图9示出了根据本公开实施例的解说内容生成的数据流图的一种示例的示意图;
图10示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下输出的解说内容的一种示例的示意图;
图11示出了根据本公开实施例的在线虚拟解说设备的结构示意图;以及
图12示出了根据本公开实施例的一种示例性的计算设备的架构的示意图。
具体实施方式
下面将参照附图对本公开的各个实施方式进行描述。提供以下参照附图的描述,以帮助对由权利要求及其等价物所限定的本公开的示例实施方式的理解。其包括帮助理解的各种具体细节,但它们只能被看作是示例性的。因此,本领域技术人员将认识到,可对这里描述的实施方式进行各种改变和修改,而不脱离本公开的范围。而且,为了使说明书更加清楚简洁,将省略对本领域熟知功能和构造的详细描述。
图1A示出了本公开实施例的在线虚拟解说方法的应用环境示意图。如图1所示,该方法应用于通信系统100,该通信系统100包括一个或多个终端设备101、服务器102、视频数据获取设备103和数据库104,其中,终端设备101和服务器102之间通过网络105连接,比如,通过有线或无线网络连接等。
在图1所示的实施例中,视频数据获取设备103可以获取正在播放的视频的帧图像, 并将获取的帧图像发送给服务器102。服务器102根据数据库104中存储的参考特征数据对接收到的帧图像进行解说,并将解说内容通过网络105输出至终端设备101。终端设备101可以将解说内容显示给用户。
其中,服务器102可以是单台服务器,也可以是由多台服务器组成的服务器集群,或云计算平台等。服务器102、视频数据获取设备103和数据库104可以是独立的设备,也可以是一台设备。终端设备10可以为手机、平板电脑、笔记本电脑、或个人计算机(PC,Personal Computer)、智能电视等。
将参照图1B描述根据本公开实施例的在线虚拟解说方法。如图1B所示,所述方法包括以下步骤S101-S106。
首先,在步骤S101,至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据。
例如,根据本公开实施例的在线虚拟解说方法可以应用于在线游戏虚拟解说的场景中。在这种情况下,正在播放的视频为在线游戏的视频。游戏玩家可以对在线游戏输入游戏指令,用于在游戏进行时操控游戏内玩家角色、非玩家角色等元素的移动、技能、动作等行为。并且,在这种情况下,所述至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据进一步包括:基于正在播放的视频的帧图像以及对于帧图像输入的帧指令(如,游戏指令),获取用于描述帧图像内的元素的基础信息的属性数据。
例如,属性数据包括直接属性数据和间接属性数据。所述至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据可以进一步包括:获取当前帧图像内的元素在当前帧的直接属性数据。这里,直接属性数据是由游戏应用自动生成的。例如,直接属性数据可以包括游戏角色在图像中的坐标位置。然后,通过对当前帧图像内的元素在当前帧的直接属性数据执行分析处理,确定间接属性数据。也就是说,间接属性数据是在直接属性数据的基础上进一步计算得到的数据。例如,沿用上文中的例子,在直接属性数据包括游戏角色在图像中的坐标位置的情况下,通过将该游戏角色的坐标位置与野区的坐标位置进行比较,可以得到“该游戏角色在野区”的间接属性数据。其中,野区是游戏应用中设定的地图上的某个区域,属于中立的第三方游戏角色的活动范围。在另一实施方式中,在直接属性数据的基础上,通过对游戏指令的理解,可以得到间接属性数据。例如,如果游戏玩家输入了移动的游戏指令,那么基于游戏玩家对应的游戏角色的当前坐标位置,可以计算出该游戏角色移动后位于地图的新坐标位置。或者,又如,如果游戏玩家输入了施放技能的游戏指令,那么基于当前时间,可以计算出游戏玩家对应的游戏角色在技能施放后的技能冷却剩余时间。
在一种实施方式中,除了仅考虑当前帧图像的直接属性数据之外,还可以进一步结合一个或多个先前帧的图像的直接属性数据,来获取间接属性数据。具体来说,所述至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据可以进一步包括:获取当前帧图像内的元素在当前帧和先前帧的直接属性数据;以及通过对当前帧图像内的元素在当前帧以及先前帧的直接属性数据执行分析处理,确定间接属性数据。这里先前帧是在时间顺序上排在当前帧之前的帧,并且先前帧的数量可以是一个或多个。例如,直接属性数据可以包括当前帧和先前帧游戏图像中的信息框。通过对所述信息框执行图像分析处理,可以得到游戏角色在当前帧和先前帧游戏图像中的血量数据,并将当前帧与先前帧游戏图像中的血量数据进行比较,确定“游戏角色的血量变化量”的间接属性数 据。
又如,根据本公开实施例的在线虚拟解说方法也可以应用于体育赛事虚拟解说的场景中。在这种情况下,正在播放的视频为正在进行中的体育赛事。与在线游戏不同,体育赛事不需要接收用户的输入,即每帧不存在输入的游戏指令,可以将其看作自动播放的视频。并且,针对每一帧图像,仅基于帧图像,获取图像内的元素的属性数据。例如,在正在进行中的体育赛事为篮球比赛的情况下,直接属性数据可以包括球员在一帧图像中的坐标位置。通过将该球员的坐标位置与篮球场地各区域的坐标位置进行比较,可以得到“该球员在前场”的间接属性数据。
可以看出,在步骤S101,对于至少基于正在播放的视频的帧图像而获取的属性数据,不论是直接获取的直接属性数据,还是基于直接获取的直接属性数据进一步分析得到的间接属性数据,都是用于描述单个维度的基础信息的数据。
然后,在步骤S102,基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据。这里,基于帧图像获得的直接属性数据和间接属性数据可以复用于提取不同类型的特征数据。也就是说,当提取不同特征数据时,所使用的属性数据中可能有部分是相同的。
作为一种可能的实施方式,所述基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据可以进一步包括:针对预先建立的解说特征库中包括的每一个特征,从所述属性数据中选择与该特征相关联的属性数据。例如,针对解说特征库中包括的特征“是否在草里”,与该特征相关联的属性数据为游戏角色的坐标和草的坐标。这里的草是游戏应用中的植物,可用于隐藏游戏角色。然后,通过对与该特征相关联的属性数据执行分析处理,确定该特征的取值作为帧图像的特征数据。例如,通过将游戏角色的坐标和草的坐标进行比较,确定特征“是否在草里”的取值为“在草里”或“不在草里”。
下面,将具体描述解说特征库的构建方法。
正向的解说流程为从属性数据到特征数据,然后从特征数据到解说事件。然而,解说特征库是基于反向的解说流程而建立的,即:解说特征库是针对解说过程涉及到的解说事件进行事件到特征、特征到属性的拆解,通过应用先验标注、挖掘的方式逐步建立和完善。
通过参照图2具体描述建立所述解说特征库的过程。如图2所示,用于建立所述解说特征库的方法具体包括以下步骤S201-S204。
首先,在步骤S201,基于作为标准的参考解说视频,提取参考解说文本。
这里,作为标准的参考解说视频可以是已经人工完成解说的视频,如游戏视频或体育赛事。也就是说,在该参考解说视频中,已经包括了输出的解说内容。期望基于已有的参考解说视频,反向推导出解说过程具体关注哪些特征。通常,在实践中,将选取多个作为标准的参考解说视频。选取的参考解说视频的数量越多,则建立的解说特征库中包括的特征将越完备。
并且,建立的解说特征库是与待解说的视频的具体类型一一对应的。例如,用于在线游戏的解说特征库与用于体育赛事的解说特征库必然是不同的。另外,同样在在线游戏虚拟解说的应用场景下,不同类型的在线游戏所对应的解说特征库也是不同的。
具体来说,在步骤S201,首先,可以从参考解说视频中提取出解说音频,然后将提取出的解说音频转换为参考解说文本。例如,提取出的参考解说文本为“这沈梦溪丢了一个混合炸弹伤害真心高,佩服佩服”。
然后,在步骤S202,基于所述参考解说文本,确定参考解说事件。沿用上文中的示例,基于“这沈梦溪丢了一个混合炸弹伤害真心高,佩服佩服”的参考解说文本,可以确定参考解说事件为“沈梦溪丢炸弹”和“炸弹伤害高”。
接下来,在步骤S203,基于所述参考解说事件,确定用于表征与解说相关的综合信息的参考特征数据。例如,基于“沈梦溪丢炸弹”的参考解说事件,可以确定“游戏角色的名字”和“游戏角色的动作”的参考特征数据。基于“炸弹伤害高”的参考解说事件,可以确定“对其他游戏角色的伤害输出”的参考特征数据。
最后,在步骤S204,基于所述参考特征数据,建立解说特征库。
可以采用人工标注和自动标注相结合的方式来从参考解说文本中确定参考解说事件并从参考解说事件中确定参考特征数据。具体来说,首先,通过人工标注的方式对一批参考解说视频进行事件的提取和特征的标注,以初步地建立所述解说特征库。然后,更换新的一批参考解说视频,通过自动标注的方式对这批新的参考解说视频进行事件的提取和特征的标注。在自动标注之后,再通过人工检查的方式进行纠错和补充,从而进一步扩充和完善所述解说特征库。按照这样人工和自动交替进行的方式,不断地更换新的参考解说视频。随着处理的进行,将发现人工纠错和补充的部分将越来越少。当不再存在人工纠错和补充的部分时,这意味着解说特征库已经完备,且可以用于在线虚拟解说的特征提取中。
由于解说特征库包括海量的特征数据,因此为了便于检索和管理,将其划分为不同的类别。例如,图3示出了根据本公开实施例的在在线游戏的应用场景下解说特征库中包括的特征分类的一种示例的示意图。
如图3所示,解说特征主要包括英雄、非玩家角色(NPC)、战斗(团战)、总结分析、时间描述、全局数据、阵营、前期等分类。其中,英雄类特征主要与英雄的状态(血量、位置等)、动作(使用技能、移动位置等)有关;NPC类特征主要与游戏内非玩家角色的状态(血量、位置等)、动作(攻击英雄、打塔、出生等)有关;战斗类特征主要与游戏内英雄参与形成的多对多团战的属性(开始结束时间、参团英雄、结局、产生效果等)有关;总结分析类特征主要是对当前游戏局势的统计、分析;时间描述主要是划分当前游戏阶段(前期、中期、后期等);全局数据类特征主要是游戏进行的整体数据统计(如游戏第一视角的英雄、游戏视野位置及范围、游戏进行时间等);阵营类特征主要为对双方阵营维度进行属性统计(阵营经济、阵营塔状态、阵营击杀野怪情况等);前期类特征主要是针对游戏解说开局介绍(开头、解说引导、阵容介绍、场外背景等)。
可以看出,各个类别的特征数据都是基于属性数据计算而来的,这是特征提取的关键所在。
另外,解说特征库包括哪些特征数据以及包括的特征数据的数量是需要权衡的。如果解说特征库包括尽可能多的类别和数量的特征数据,那么将不会出现遗漏发生的事件的问题,但是每帧事件生成的计算量将大大提高。如果解说特征库仅包括少量的关键特征数据,并且基于这些关键特征数据所生成的事件将被大概率地选择作为解说事件,那么这将大大地减少每帧事件生成的计算量并提升实时解说性能。当然,选择哪些关键特征数据需要根据特定的解说场合,并依靠多次实验来确定。
图4示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下用于特征提取的数据流图的示意图。如图4所示,基于游戏图像和游戏指令,可以通过实时的游戏内元素属性计算来产生间接的属性数据,再结合针对游戏解说建立起来的解说特征库,可以提取出 游戏特征。
图5示出了根据本公开实施例的在体育赛事虚拟解说的应用场景下用于特征提取的特征流图的示意图。如图5所示,与图4不同的是,基于比赛图像,可以通过实时的图像内元素属性计算来产生间接的属性数据,再结合针对体育赛事解说建立起来的解说特征库,可以提取出比赛特征。
接下来,返回参照图1B,在步骤S103,基于所述特征数据,生成候选事件。候选事件是指在视频的播放过程中发生的事件。当然,这里生成的候选事件并不一定是在视频的播放过程中发生的所有事件。生成哪些候选事件取决于已经提取了哪些特征数据。另外,生成的候选事件也并不一定都要解说出来。
作为一种可能的实施方式,所述基于所述特征数据,生成候选事件进一步包括:加载预先定义的所有事件对应的条件。一个事件可以对应于一个或多个条件。举例而言,“英雄被野怪杀死”的事件可以对应于两个条件:C1.英雄死亡;C2.野怪杀的。当基于所述特征数据,确定与一个事件对应的条件满足时,生成该事件作为一个候选事件。例如,如果提取的特征数据的取值指示英雄状态为存活或者击杀者类别为其他英雄,换言之,没有同时满足条件C1和条件C2,那么将不会生成“英雄被野怪杀死”的事件。如果提取的特征数据的取值指示英雄状态为被击杀且击杀者类别为野怪,换言之,同时满足条件C1和条件C2,那么将生成“英雄被野怪杀死”的事件,并将该事件作为一个候选事件。
按照这样的方式,遍历预先定义的所有事件对应的条件,并生成对应条件都满足的事件。
当一个事件的实现复杂度较高时,该事件所对应的条件和特征数据的数量将增多。因此,作为一种可能的实施方式,从实现复杂度角度上,将候选事件区分为基础事件和高级事件。前面所列举的“英雄被野怪杀死”的事件属于基础事件。如前所述,基础事件基于特征数据而生成。高级事件是相对于基础事件而定义的。高级事件可以基于基础事件而生成或者可以基于基础事件和特征数据而生成。
由于高级事件可以由基础事件合成,因此对于那些特征非常复杂、条件层级非常多的事件,它们可以很轻易地由基础事件组合而成,不但可以避免拆分成非常多的游戏特征的问题,也提升了所有事件的复用率。
以高级事件H1:辅助蹲暴君后面的草丛帮打野卡视野打龙为例进行描述。在游戏应用中,该事件中的辅助、暴君、打野可视为游戏角色或游戏角色的职业。对于这个高级事件,直接拆分成的游戏特征包括:F1.英雄职业-辅助,F2.暴君状态-被攻击,F3.暴君攻击者职业-打野,F4.辅助英雄位置-暴君后草丛。由于之前已经定义并提取到了基础事件:E1.辅助蹲暴君后草丛,E2.打野英雄在打暴君,因此可以直接基于基础事件E1+E2组合形成高级事件H1,而不用定义冗长的F1、F2、F3、F4组合,并执行多次条件判断。
在上文中,从实现复杂度角度上,将事件区分为基础事件和高级事件。当然,还可以存在其他的用于区分事件的方式。例如,图6示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下事件生成的数据流图的示意图。如图6所示,除了基础事件和高级事件之外,从事件特点角度上,还可以将候选事件区分为团战事件和非团战事件。具体来说,所述团战事件为当前帧之前的预定时间段内多个游戏角色共同参与的候选事件的集合,所述非团战事件为所述团战事件以外的事件。团战事件和非团战事件可以是基础事件,也可以是高级事件。
另外,从事件的时间跨度角度上,可以将事件区分为单帧事件和多帧事件。图7示出了根据本公开实施例的单帧事件与多帧事件之间的关系的示意图。单帧事件是基于一帧图像得到的事件,而多帧事件是多个单帧事件的集合。在图7中,以第1帧图像、第2帧图像和第3帧图像为例,示出了单帧事件的生成过程。并且,多帧事件包括基于第1帧图像、第2帧图像和第3帧图像而生成的三个单帧事件。然而,本领域的技术人员可以理解,本公开并不仅限于此。多帧事件可以是任意数量的单帧事件。另外,上文中描述的团战是基于游戏内一段时间进行的,游戏数据以帧的形式存在,且一个游戏时间段包含多个游戏帧,因此团战事件可以看作是多帧事件。
针对每一个游戏帧,执行上文中所述的特征提取和事件生成流程。并且,生成的事件都将作为候选事件,进入后续的事件选取流程,这将在下文中具体描述。
然后,返回参照图1B,在步骤S104,从生成的候选事件中,选择解说事件。也就是说,仅有一部分候选事件将被选择为解说事件,并被解说出来。
作为一种可能的实施方式,所述从生成的候选事件中,选择解说事件可以进一步包括:基于所述视频的播放状态,选择解说事件。例如,为了增强解说的现场效果,生成的候选事件包括运营事件,用于补充当前正在播放的视频的介绍(例如,游戏介绍或体育赛事介绍)、新闻等解说开场白事件。这类事件可以在与解说无关的时间段(如,游戏加载中、游戏暂停、体育赛事中场休息等)播出。因此,当所述视频的播放状态为游戏暂停或体育赛事中场休息时,从候选事件中选择运营事件作为解说事件(运营选取)。
除了上文中所述的事件选取依据之外,还可以按照如下的方式选择解说事件。具体来说,所述从生成的候选事件中,选择解说事件进一步包括:依据预定的规则,确定生成的候选事件的重要性程度;选取重要性程度最高的候选事件作为解说事件。例如,在在线体育赛事虚拟解说的应用场景下,当体育事件的发生位置位于画面中央区域时,认为该候选事件的重要性程度高。或者,当体育事件引起分数变化时,认为该候选事件的重要性程度高。
此外,在在线游戏虚拟解说的应用场景下,可以按照如下的方式来选择解说事件。如上文中所述,所述候选事件包括团战事件和非团战事件。所述从生成的候选事件中,选择解说事件进一步包括:当在所述预定时间段内存在多个团战事件时,例如,基于参与的游戏角色的位置,在所述预定时间段内存在位于区域A的团战、位于区域B的团战和位于区域C的团战,依据预定的规则,确定各团战事件的重要性程度。然后,选取重要性程度最高的团战事件(团战选取)。例如,带来最大数量的击杀数的团战事件的重要性程度高。或者,施放技能次数最多的团战事件的重要性高。或者,出现在解说视野内的团战事件的重要性高。接下来,依据预定的规则,对选中的团战事件中包括的所有候选事件,确定各候选事件的重要性程度;以及选取重要性程度最高的候选事件作为解说事件(团内选取)。例如,可以根据事件的静态权重(例如,游戏角色是否位于C位)和事件的动态权重(如,事件是否出现在解说视野内、团战结果、技能效果等)进行打分加权,并选取分数最高的团内事件作为解说事件。
此外,在不存在团战事件的情况下,可以直接从单帧事件中选取静态权重最高的事件作为解说事件(团外选取)。
当然,预定规则并不仅限于以上所列举的示例。任何其他可能的规则也应该包括在本公开的范围内。
图8示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下事件选取的数据流图的示意图。如图8所示,针对所有的候选事件,可以按照运营选取801、团战选取802、团内选取803和团外选取804的顺序来选择解说事件。首先,判断当前游戏的播放状态是否为游戏加载中、或游戏暂停等。如果是,则选取运营事件。如果否,则进一步执行团战选取和团内选取。如果不存在团战,则执行团外选取。
接着,返回参照图1B,在步骤S105,基于所选择的解说事件,确定对应的解说文本。也就是说,在确定了需要进行解说的事件之后,需要生成对应事件的解说词。
具体来说,所述基于所选择的解说事件,确定对应的解说文本进一步包括以下步骤。
首先,基于预先建立的解说文本库,确定与所选择的解说事件对应的解说模板。解说文本库本质上是一个(事件,解说模板)集合,并且包括所有解说词干内容以及每个解说模板对应的解说事件。例如,解说模板【M1.<英雄名字>先不要着急,吃口药先保存一下实力】对应解说事件【E1.英雄使用回血技能】。
然后,基于与所述解说事件对应的属性数据,替换所述解说模板中的模板字段,并生成所述解说文本。例如,将在上文中描述的特征提取过程中获取的属性数据,英雄名字-阿轲,动态替换进去,以得到所述解说文本。
最后,在步骤S106,基于所述解说文本,输出对应的解说内容。这里的解说内容可以是文字,可以是音频,也可以是视频。
具体来说,所述基于所述解说文本,输出对应的解说内容进一步包括以下处理中的一个或多个:输出所述解说文本(字幕解说);输出用于播报所述解说文本的语音(音频解说);显示虚拟形象,并输出与所述虚拟形象相配合的语音来播报所述解说文本(虚拟主持人视频解说);以及显示虚拟形象,并通过所述虚拟形象的动作来播报所述解说文本(虚拟主持人视频解说),如手语播报。
图9示出了根据本公开实施例的解说内容生成的数据流图的一种示例的示意图。如图9所示,基于在之前的处理中确定的解说事件,首先基于预先建立的解说文本库,确定与所选择的解说事件对应的解说模板(解说模板选取901)。然后,基于与所述解说事件对应的属性数据,替换所述解说模板中的模板字段,并生成所述解说文本(词干替换902)。接下来,基于解说文本生成对应的语音(语音生成903)和表情动作(表情动作904)。例如,基于解说文本生成对应的语音可以通过TTS(Text-to-Speech)技术来实现。并且,由于本公开应用于实时解说的场景,因此在应用TTS技术来实现文本到语音的转换时,需要采用文本翻译和语音输出并行进行的方式。
图10示出了根据本公开实施例的在在线游戏虚拟解说的应用场景下输出的解说内容的一种示例的示意图。如图10所示,解说内容包括字幕形式的解说文本1001。当然,在此基础之上,解说内容还可以进一步包括用于播报该解说文本的音频数据。
在上文中,已经参照图1B至图10详细描述了根据本公开实施例的在线虚拟解说方法。通过根据本公开实施例的在线虚拟解说方法,能够解决实时在线解说的问题。对于当前正在进行中的视频(如,在线游戏或体育赛事),可以实时同步地输出解说内容,甚至可以通过虚拟解说主持人播报解说语音,并搭配相应的表情、动作等拟人效果。
在下文中,将参照图11描述根据本公开实施例的在线虚拟解说设备。
如图11所示,所述在线虚拟解说设备1100包括:属性获取单元1101、特征提取单元1102、事件生成单元1103、选择单元1104、文本生成单元1105和输出单元1106。
属性获取单元1101用于至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据。例如,根据本公开实施例的在线虚拟解说设备可以应用于在线游戏虚拟解说的场景中。在这种情况下,正在播放的视频为在线游戏的视频。游戏玩家可以对在线游戏输入游戏指令,用于在游戏进行时操控游戏内玩家角色、非玩家角色等元素的移动、技能、动作等行为。并且,在这种情况下,属性获取单元1101进一步被配置为:基于正在播放的视频的帧图像以及对于帧图像输入的帧指令(如,游戏指令),获取用于描述帧图像内的元素的基础信息的属性数据。
例如,属性数据包括直接属性数据和间接属性数据。所述属性获取单元1101可以进一步被配置为:获取当前帧图像内的元素在当前帧的直接属性数据。这里,直接属性数据是由游戏应用自动生成的。例如,直接属性数据可以包括游戏角色在图像中的坐标位置。然后,通过对当前帧图像内的元素在当前帧的直接属性数据执行分析处理,所述属性获取单元1101确定间接属性数据。也就是说,间接属性数据是在直接属性数据的基础上进一步计算得到的数据。例如,沿用上文中的例子,在直接属性数据包括游戏角色在图像中的坐标位置的情况下,通过将该游戏角色的坐标位置与野区的坐标位置进行比较,可以得到“该游戏角色在野区”的间接属性数据。在另一实施方式中,在直接属性数据的基础上,通过对游戏指令的理解,可以得到间接属性数据。例如,如果游戏玩家输入了移动的游戏指令,那么基于游戏玩家对应的游戏角色的当前坐标位置,可以计算出该游戏角色移动后位于地图的新坐标位置。或者,又如,如果游戏玩家输入了施放技能的游戏指令,那么基于当前时间,可以计算出游戏玩家对应的游戏角色在技能施放后的技能冷却剩余时间。
在一种实施方式中,除了仅考虑当前帧图像的直接属性数据之外,还可以进一步结合一个或多个先前帧的图像的直接属性数据,来获取间接属性数据。具体来说,所述属性获取单元1101可以进一步被配置为:获取当前帧图像内的元素在当前帧和先前帧的直接属性数据;以及通过对当前帧图像内的元素在当前帧以及先前帧的直接属性数据执行分析处理,确定间接属性数据。这里先前帧是在时间顺序上排在当前帧之前的帧,并且先前帧的数量可以是一个或多个。例如,直接属性数据可以包括当前帧和先前帧游戏图像中的信息框。通过对所述信息框执行图像分析处理,可以得到游戏角色在当前帧和先前帧游戏图像中的血量数据,并将当前帧与先前帧游戏图像中的血量数据进行比较,确定“游戏角色的血量变化量”的间接属性数据。
又如,根据本公开实施例的在线虚拟解说设备也可以应用于体育赛事虚拟解说的场景中。在这种情况下,正在播放的视频为正在进行中的体育赛事。与在线游戏不同,体育赛事不需要接收用户的输入,即每帧不存在输入的游戏指令,可以将其看作自动播放的视频。并且,针对每一帧图像,仅基于帧图像,获取图像内的元素的属性数据。例如,在正在进行中的体育赛事为篮球比赛的情况下,直接属性数据可以包括球员在一帧图像中的坐标位置。通过将该球员的坐标位置与篮球场地各区域的坐标位置进行比较,可以得到“该球员在前场”的间接属性数据。
可以看出,对于所述属性获取单元1101基于正在播放的视频的帧图像而获取的属性数据,不论是直接获取的直接属性数据,还是基于直接获取的直接属性数据进一步分析得到的间接属性数据,都是用于描述单个维度的基础信息的数据。
特征提取单元1102用于基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据。如上文中所述,基于帧图像获得的直接属性数据和间接属性数据可以复 用于提取不同类型的特征数据。也就是说,当特征提取单元1102提取不同特征数据时,所使用的属性数据中可能有部分是相同的。
作为一种可能的实施方式,特征提取单元1102进一步被配置为:针对预先建立的解说特征库中包括的每一个特征,从所述属性数据中选择与该特征相关联的属性数据。例如,针对解说特征库中包括的特征“是否在草里”,与该特征相关联的属性数据为游戏角色的坐标和草的坐标。然后,通过对与该特征相关联的属性数据执行分析处理,确定该特征的取值作为帧图像的特征数据。例如,通过将游戏角色的坐标和草的坐标进行比较,确定特征“是否在草里”的取值为“在草里”或“不在草里”。
下面,将具体描述解说特征库的构建方法。
正向的解说流程为从属性数据到特征数据,然后从特征数据到解说事件。然而,解说特征库是基于反向的解说流程而建立的,即:解说特征库是针对解说过程涉及到的解说事件进行事件到特征、特征到属性的拆解,通过应用先验标注、挖掘的方式逐步建立和完善。
根据本公开实施例的在线虚拟解说设备可以进一步包括解说特征库构建单元(图中未示出),被配置为通过执行以下处理来建立所述解说特征库:基于作为标准的参考解说视频,提取参考解说文本;基于所述参考解说文本,确定参考解说事件;基于所述参考解说事件,确定用于表征与解说相关的综合信息的参考特征数据;基于所述参考特征数据,建立解说特征库。
这里,作为标准的参考解说视频可以是已经人工完成解说的视频,如游戏视频或体育赛事。也就是说,在该参考解说视频中,已经包括了输出的解说内容。期望基于已有的参考解说视频,反向推导出解说过程具体关注哪些特征。通常,在实践中,将选取多个作为标准的参考解说视频。选取的参考解说视频的数量越多,则建立的解说特征库中包括的特征将越完备。
并且,建立的解说特征库是与待解说的视频的具体类型一一对应的。例如,用于在线游戏的解说特征库与用于体育赛事的解说特征库必然是不同的。另外,同样在在线游戏虚拟解说的应用场景下,不同类型的在线游戏所对应的解说特征库也是不同的。
具体来说,首先,解说特征库构建单元可以从参考解说视频中提取出解说音频,然后将提取出的解说音频转换为参考解说文本。例如,提取出的参考解说文本为“这沈梦溪丢了一个混合炸弹伤害真心高,佩服佩服”。
然后,解说特征库构建单元基于所述参考解说文本,确定参考解说事件。沿用上文中的示例,基于“这沈梦溪丢了一个混合炸弹伤害真心高,佩服佩服”的参考解说文本,可以确定参考解说事件为“沈梦溪丢炸弹”和“炸弹伤害高”。
接下来,解说特征库构建单元基于所述参考解说事件,确定用于表征与解说相关的综合信息的参考特征数据。例如,基于“沈梦溪丢炸弹”的参考解说事件,可以确定“游戏角色的名字”和“游戏角色的动作”的参考特征数据。基于“炸弹伤害高”的参考解说事件,可以确定“对其他游戏角色的伤害输出”的参考特征数据。
最后,解说特征库构建单元基于所述参考特征数据,建立解说特征库。
可以采用人工标注和自动标注相结合的方式来从参考解说文本中确定参考解说事件并从参考解说事件中确定参考特征数据。具体来说,首先,通过人工标注的方式对一批参考解说视频进行事件的提取和特征的标注,以初步地建立所述解说特征库。然后,更换新的一批参考解说视频,通过自动标注的方式对这批新的参考解说视频进行事件的提取和特 征的标注。在自动标注之后,再通过人工检查的方式进行纠错和补充,从而进一步扩充和完善所述解说特征库。按照这样人工和自动交替进行的方式,不断地更换新的参考解说视频。随着处理的进行,将发现人工纠错和补充的部分将越来越少。当不再存在人工纠错和补充的部分时,这意味着解说特征库已经完备,且可以用于在线虚拟解说的特征提取中。
由于解说特征库包括海量的特征数据,因此为了便于检索和管理,将其划分为不同的类别。可以看出,各个类别的特征数据都是基于属性数据计算而来的,这是特征提取的关键所在。
另外,解说特征库包括哪些特征数据以及包括的特征数据的数量是需要权衡的。如果解说特征库包括尽可能多的类别和数量的特征数据,那么将不会出现遗漏发生的事件的问题,但是每帧事件生成的计算量将大大提高。如果解说特征库仅包括少量的关键特征数据,并且基于这些关键特征数据所生成的事件将被大概率地选择作为解说事件,那么这将大大地减少每帧事件生成的计算量并提升实时解说性能。当然,选择哪些关键特征数据需要根据特定的解说场合,并依靠多次实验来确定。
事件生成单元1103用于基于所述特征数据,生成候选事件。候选事件是指在视频的播放过程中发生的事件。当然,这里生成的候选事件并不一定是在视频的播放过程中发生的所有事件。生成哪些候选事件取决于已经提取了哪些特征数据。另外,生成的候选事件也并不一定都要解说出来。
作为一种可能的实施方式,所述事件生成单元1103进一步被配置为:加载预先定义的所有事件对应的条件。一个事件可以对应于一个或多个条件。举例而言,“英雄被野怪杀死”的事件可以对应于两个条件:C1.英雄死亡;C2.野怪杀的。当基于所述特征数据,确定与一个事件对应的条件满足时,所述事件生成单元1103生成该事件作为一个候选事件。例如,如果提取的特征数据的取值指示英雄状态为存活或者击杀者类别为其他英雄,换言之,没有同时满足条件C1和条件C2,那么所述事件生成单元1103将不会生成“英雄被野怪杀死”的事件。如果提取的特征数据的取值指示英雄状态为被击杀且击杀者类别为野怪,换言之,同时满足条件C1和条件C2,那么所述事件生成单元1103将生成“英雄被野怪杀死”的事件,并将该事件作为一个候选事件。
按照这样的方式,所述事件生成单元1103遍历预先定义的所有事件对应的条件,并生成对应条件都满足的事件。
当一个事件的实现复杂度较高时,该事件所对应的条件和特征数据的数量将增多。因此,作为一种可能的实施方式,从实现复杂度角度上,将候选事件区分为基础事件和高级事件。前面所列举的“英雄被野怪杀死”的事件属于基础事件。如前所述,所述事件生成单元1103可以基于特征数据而生成基础事件。高级事件是相对于基础事件而定义的。所述事件生成单元1103可以基于基础事件而生成高级事件或者可以基于基础事件和特征数据而生成高级事件。
由于高级事件可以由基础事件合成,因此对于那些特征非常复杂、条件层级非常多的事件,它们可以很轻易地由基础事件组合而成,不但可以避免拆分成非常多的游戏特征的问题,也提升了所有事件的复用率。
以高级事件H1:辅助蹲暴君后面的草丛帮打野卡视野打龙为例进行描述。对于这个高级事件,直接拆分成的游戏特征包括:F1.英雄职业-辅助,F2.暴君状态-被攻击,F3.暴君攻击者职业-打野,F4.辅助英雄位置-暴君后草丛。由于之前已经定义并提取到了基础事件: E1.辅助蹲暴君后草丛,E2.打野英雄在打暴君,因此所述事件生成单元1103可以直接基于基础事件E1+E2组合形成高级事件H1,而不用定义冗长的F1、F2、F3、F4组合,并执行多次条件判断。
在上文中,从实现复杂度角度上,将事件区分为基础事件和高级事件。当然,还可以存在其他的用于区分事件的方式。例如,从事件特点角度上,还可以将候选事件区分为团战事件和非团战事件。具体来说,所述团战事件为当前帧之前的预定时间段内多个游戏角色共同参与的候选事件的集合,所述非团战事件为所述团战事件以外的事件。团战事件和非团战事件可以是基础事件,也可以是高级事件。另外,从事件的时间跨度角度上,可以将事件区分为单帧事件和多帧事件。团战是基于游戏内一段时间进行的,游戏数据以帧的形式存在,且一个游戏时间段包含多个游戏帧,因此团战事件可以看作是多帧事件。
针对每一个游戏帧,分别由上文中所述的属性获取单元1101、特征提取单元1102和事件生成单元1103执行属性获取、特征提取和事件生成的流程。并且,生成的事件都将作为候选事件,提供给后续的选择单元1104。
选择单元1104用于从生成的候选事件中,选择解说事件。也就是说,仅有一部分候选事件将被选择为解说事件,并被解说出来。
作为一种可能的实施方式,所述选择单元1104进一步被配置为:基于所述视频的播放状态,选择解说事件。例如,为了增强解说的现场效果,事件生成单元1103生成的候选事件包括运营事件,用于补充当前正在播放的视频的介绍(例如,游戏介绍或体育赛事介绍)、新闻等解说开场白事件。这类事件可以在与解说无关的时间段(如,游戏加载中、游戏暂停、体育赛事中场休息等)播出。因此,当所述视频的播放状态为游戏暂停或体育赛事中场休息时,所述选择单元1104从候选事件中选择运营事件作为解说事件(运营选取)。
除了上文中所述的事件选取依据之外,还可以按照如下的方式选择解说事件。具体来说,所述选择单元1104进一步被配置为:依据预定的规则,确定生成的候选事件的重要性程度;选取重要性程度最高的候选事件作为解说事件。例如,在在线体育赛事虚拟解说的应用场景下,当体育事件的发生位置位于画面中央区域时,认为该候选事件的重要性程度高。或者,当体育事件引起分数变化时,认为该候选事件的重要性程度高。
此外,在在线游戏虚拟解说的应用场景下,可以按照如下的方式来选择解说事件。如上文中所述,所述候选事件包括团战事件和非团战事件。所述选择单元1104可以进一步被配置为执行以下处理。
当在所述预定时间段内存在多个团战事件时,例如,基于参与的游戏角色的位置,在所述预定时间段内存在位于区域A的团战、位于区域B的团战和位于区域C的团战,依据预定的规则,确定各团战事件的重要性程度。然后,所述选择单元1104选取重要性程度最高的团战事件(团战选取)。例如,带来最大数量的击杀数的团战事件的重要性程度高。或者,施放技能次数最多的团战事件的重要性高。或者,出现在解说视野内的团战事件的重要性高。
接下来,所述选择单元1104依据预定的规则,对选中的团战事件中包括的所有候选事件,确定各候选事件的重要性程度;并选取重要性程度最高的候选事件作为解说事件(团内选取)。例如,可以根据事件的静态权重(例如,游戏角色是否位于C位)和事件的动态权重(如,事件是否出现在解说视野内、团战结果、技能效果等)进行打分加权,并选取分数最高的团内事件作为解说事件。
此外,在不存在团战事件的情况下,所述选择单元1104可以直接从单帧事件中选取静态权重最高的事件作为解说事件(团外选取)。
当然,预定规则并不仅限于以上所列举的示例。任何其他可能的规则也应该包括在本公开的范围内。
文本生成单元1105用于基于所选择的解说事件,确定对应的解说文本。也就是说,在确定了需要进行解说的事件之后,需要生成对应事件的解说词。
具体来说,文本生成单元1105可以进一步被配置为:基于预先建立的解说文本库,确定与所选择的解说事件对应的解说模板。解说文本库本质上是一个(事件,解说模板)集合,并且包括所有解说词干内容以及每个解说模板对应的解说事件。例如,解说模板【M1.<英雄名字>先不要着急,吃口药先保存一下实力】对应解说事件【E1.英雄使用回血技能】。然后,文本生成单元1105基于与所述解说事件对应的属性数据,替换所述解说模板中的模板字段,并生成所述解说文本。例如,文本生成单元1105将由上文中描述的属性获取单元获取的属性数据,英雄名字-阿轲,动态替换进去,以得到所述解说文本。
输出单元1106用于基于所述解说文本,输出对应的解说内容。这里的解说内容可以是文字,可以是音频,也可以是视频。
具体来说,所述输出单元进一步被配置为执行以下处理中的一个或多个:输出所述解说文本(字幕解说);输出用于播报所述解说文本的语音(音频解说);显示虚拟形象,并输出与所述虚拟形象相配合的语音来播报所述解说文本(虚拟主持人视频解说);以及显示虚拟形象,并通过所述虚拟形象的动作来播报所述解说文本(虚拟主持人视频解说),如手语播报。
通过根据本公开实施例的在线虚拟解说设备,能够解决实时在线解说的问题。对于当前正在进行中的视频(如,在线游戏或体育赛事),可以实时同步地输出解说内容,甚至可以通过虚拟解说主持人播报解说语音,并搭配相应的表情、动作等拟人效果。
此外,根据本公开的实施例,提供了一种在线虚拟解说设备,包括:一个或多个处理器以及一个或多个存储器,其中所述一个或多个存储器中存储有计算机程序,当所述计算机程序由所述一个或多个处理器执行时,使得所述设备执行执行上文中所述的在线虚拟解说方法。具体地,根据本公开实施例的方法或设备也可以借助于图12所示的计算设备1200的架构来实现。如图12所示,计算设备1200可以包括总线1210、一个或多个CPU1220、只读存储器(ROM)1230、随机存取存储器(RAM)1240、连接到网络的通信端口1250、输入/输出组件1260、硬盘1270等。计算设备1200中的存储设备,例如ROM 1230或硬盘1270可以存储本公开实施例提供的图像处理方法的处理和/或通信使用的各种数据或文件以及CPU所执行的程序指令。当然,图12所示的架构只是示例性的,在实现不同的设备时,根据实际需要,可以省略图12示出的计算设备中的一个或多个组件。
本公开的实施例也可以被实现为计算机可读存储介质。根据本公开实施例的计算机可读存储介质上存储有计算机可读指令(即计算机程序)。当所述计算机可读指令由计算设备的处理器运行时,使得该计算设备可以执行参照以上附图描述的根据本公开实施例的在线虚拟解说方法。所述计算机可读存储介质包括但不限于例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。
迄今为止,已经参照图1B到图12描述了根据本公开实施例的在线虚拟解说方法和设 备。通过根据本公开实施例的在线虚拟解说方法和设备,能够解决实时在线解说的问题。对于当前正在进行中的视频(如,在线游戏或体育赛事),可以实时同步地输出解说内容,甚至可以通过虚拟解说主持人播报解说语音,并搭配相应的表情、动作等拟人效果。
在本说明书中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
最后,还需要说明的是,上述一系列处理不仅包括以这里所述的顺序按时间序列执行的处理,而且包括并行或分别地、而不是按时间顺序执行的处理。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本公开实施例可借助软件加必需的硬件平台的方式来实现,当然也可以全部通过软件来实施。基于这样的理解,本公开实施例的技术方案对背景技术做出贡献的全部或者部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例或者实施例的某些部分所述的方法。
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本公开的方法及其核心思想;同时,对于本领域的一般技术人员,依据本公开的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本公开的限制。

Claims (17)

  1. 一种在线虚拟解说方法,由计算设备执行,包括:
    至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据;
    基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据;
    基于所述特征数据,生成候选事件;
    从生成的候选事件中,选择解说事件;
    基于所选择的解说事件,确定对应的解说文本;以及
    基于所述解说文本,输出对应的解说内容。
  2. 根据权利要求1所述的方法,其中,所述至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据进一步包括:
    基于正在播放的视频的帧图像以及对于帧图像输入的帧指令,获取用于描述帧图像内的元素的基础信息的属性数据。
  3. 根据权利要求1所述的方法,其中,所述属性数据包括直接属性数据和间接属性数据,并且
    所述至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据进一步包括:
    获取当前帧图像内的元素在当前帧的直接属性数据;以及
    通过对当前帧图像内的元素在当前帧的直接属性执行分析处理,确定间接属性数据。
  4. 根据权利要求1所述的方法,其中,所述属性数据包括直接属性数据和间接属性数据,并且,
    所述至少基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据进一步包括:
    获取当前帧图像内的元素在当前帧以及先前帧的直接属性数据;以及
    通过对当前帧图像内的元素在当前帧以及先前帧的直接属性数据执行分析处理,确定间接属性数据。
  5. 根据权利要求1所述的方法,其中,所述基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据进一步包括:
    针对预先建立的解说特征库中包括的每一个特征,从所述属性数据中选择与该特征相关联的属性数据;以及
    通过对与该特征相关联的属性数据执行分析处理,确定该特征的取值作为帧图像的特征数据。
  6. 根据权利要求5所述的方法,其中,所述解说特征库通过以下处理来建立:
    基于作为标准的参考解说视频,提取参考解说文本;
    基于所述参考解说文本,确定参考解说事件;
    基于所述参考解说事件,确定用于表征与解说相关的综合信息的参考特征数据;以及
    基于所述参考特征数据,建立解说特征库。
  7. 根据权利要求1所述的方法,其中,所述基于所述特征数据,生成候选事件进一 步包括:
    加载预先定义的所有事件对应的条件;以及
    当基于所述特征数据,确定与一个事件对应的条件满足时,生成该事件作为一个候选事件。
  8. 根据权利要求1所述的方法,其中,所述候选事件包括基础事件和高级事件,所述基础事件基于特征数据而生成,且所述高级事件基于基础事件而生成或者基于基础事件和特征数据而生成。
  9. 根据权利要求1所述的方法,其中,所述从生成的候选事件中,选择解说事件进一步包括:
    基于所述视频的播放状态,选择解说事件。
  10. 根据权利要求1所述的方法,其中,所述从生成的候选事件中,选择解说事件进一步包括:
    依据预定的规则,确定生成的候选事件的重要性程度;以及
    选取重要性程度最高的候选事件作为解说事件。
  11. 根据权利要求1所述的方法,其中,所述正在播放的视频为在线游戏,并且所述候选事件包括团战事件和非团战事件,所述团战事件为当前帧之前的预定时间段内多个游戏角色共同参与的候选事件的集合,所述非团战事件为所述团战事件以外的事件。
  12. 根据权利要求11所述的方法,其中,所述从生成的候选事件中,选择解说事件进一步包括:
    当在所述预定时间段内存在多个团战事件时,依据预定的规则,确定各团战事件的重要性程度;
    选取重要性程度最高的团战事件;
    依据预定的规则,对选中的团战事件中包括的所有候选事件,确定各候选事件的重要性程度;以及
    选取重要性程度最高的候选事件作为解说事件。
  13. 根据权利要求1所述的方法,其中,所述基于所选择的解说事件,确定对应的解说文本进一步包括:
    基于预先建立的解说文本库,确定与所选择的解说事件对应的解说模板;以及
    基于与所述解说事件对应的属性数据,替换所述解说模板中的模板字段,并生成所述解说文本。
  14. 根据权利要求1所述的方法,其中,所述基于所述解说文本,输出对应的解说内容进一步包括以下处理中的一个或多个:输出所述解说文本;输出用于播报所述解说文本的语音;显示虚拟形象,并输出与所述虚拟形象相配合的语音来播报所述解说文本;以及显示虚拟形象,并通过所述虚拟形象的动作来播报所述解说文本。
  15. 一种在线虚拟解说设备,包括:
    属性获取单元,用于基于正在播放的视频的帧图像,获取用于描述帧图像内的元素的基础信息的属性数据;
    特征提取单元,用于基于所述属性数据,提取用于表示帧图像中与解说相关的综合信息的特征数据;
    事件生成单元,用于基于所述特征数据,生成候选事件;
    选择单元,用于从生成的候选事件中,选择解说事件;
    文本生成单元,用于基于所选择的解说事件,确定对应的解说文本;以及
    输出单元,用于基于所述解说文本,输出对应的解说内容。
  16. 一种在线虚拟解说设备,包括:一个或多个处理器以及一个或多个存储器,其中所述一个或多个存储器中存储有计算机程序,当所述计算机程序由所述一个或多个处理器执行时,使得所述设备执行权利要求1到14中任意一项所述的方法。
  17. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序由计算设备的处理器执行时,使得所述计算设备执行权利要求1到14中任意一项所述的方法。
PCT/CN2020/128018 2020-02-07 2020-11-11 在线虚拟解说方法、设备和介质 WO2021155692A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/580,553 US11833433B2 (en) 2020-02-07 2022-01-20 Online virtual narration method and device, and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010082914.2A CN111290724B (zh) 2020-02-07 2020-02-07 在线虚拟解说方法、设备和介质
CN202010082914.2 2020-02-07

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/580,553 Continuation US11833433B2 (en) 2020-02-07 2022-01-20 Online virtual narration method and device, and medium

Publications (1)

Publication Number Publication Date
WO2021155692A1 true WO2021155692A1 (zh) 2021-08-12

Family

ID=71026705

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128018 WO2021155692A1 (zh) 2020-02-07 2020-11-11 在线虚拟解说方法、设备和介质

Country Status (3)

Country Link
US (1) US11833433B2 (zh)
CN (1) CN111290724B (zh)
WO (1) WO2021155692A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111290724B (zh) * 2020-02-07 2021-07-30 腾讯科技(深圳)有限公司 在线虚拟解说方法、设备和介质
CN111953910B (zh) * 2020-08-11 2024-05-14 腾讯科技(深圳)有限公司 基于人工智能的视频处理方法、装置及电子设备
CN114697685B (zh) * 2020-12-25 2023-05-23 腾讯科技(深圳)有限公司 解说视频生成方法、装置、服务器及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030139209A1 (en) * 2002-01-18 2003-07-24 Konami Corporation Game apparatus and storage medium for carrying program therefore
CN1759909A (zh) * 2004-09-15 2006-04-19 微软公司 在线游戏观众系统
CN107362538A (zh) * 2017-07-05 2017-11-21 腾讯科技(深圳)有限公司 一种游戏辅助信息展示方法、装置以及客户端
CN108549486A (zh) * 2018-04-11 2018-09-18 腾讯科技(深圳)有限公司 虚拟场景中实现解说的方法及装置
CN111290724A (zh) * 2020-02-07 2020-06-16 腾讯科技(深圳)有限公司 在线虚拟解说方法、设备和介质

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6976031B1 (en) * 1999-12-06 2005-12-13 Sportspilot, Inc. System and method for automatically generating a narrative report of an event, such as a sporting event
US9756349B2 (en) * 2002-12-10 2017-09-05 Sony Interactive Entertainment America Llc User interface, system and method for controlling a video stream
US8016664B2 (en) * 2005-04-13 2011-09-13 Take Two Interactive Software, Inc. Systems and methods for simulating a particular user in an interactive computer system
US20070233680A1 (en) * 2006-03-31 2007-10-04 Microsoft Corporation Auto-generating reports based on metadata
JP4462339B2 (ja) * 2007-12-07 2010-05-12 ソニー株式会社 情報処理装置、および情報処理方法、並びにコンピュータ・プログラム
US8688434B1 (en) * 2010-05-13 2014-04-01 Narrative Science Inc. System and method for using data to automatically generate a narrative story
US9396385B2 (en) * 2010-08-26 2016-07-19 Blast Motion Inc. Integrated sensor and video motion analysis method
US8892417B1 (en) * 2011-01-07 2014-11-18 Narrative Science, Inc. Method and apparatus for triggering the automatic generation of narratives
US9026446B2 (en) * 2011-06-10 2015-05-05 Morgan Fiumi System for generating captions for live video broadcasts
US8821271B2 (en) * 2012-07-30 2014-09-02 Cbs Interactive, Inc. Techniques for providing narrative content for competitive gaming events
US20140279731A1 (en) * 2013-03-13 2014-09-18 Ivan Bezdomny Inc. System and Method for Automated Text Coverage of a Live Event Using Structured and Unstructured Data Sources
CN103927445B (zh) * 2014-04-16 2017-06-20 北京酷云互动科技有限公司 一种特征事件生成方法和装置
US20170228600A1 (en) * 2014-11-14 2017-08-10 Clipmine, Inc. Analysis of video game videos for information extraction, content labeling, smart video editing/creation and highlights generation
US20160317933A1 (en) * 2015-05-01 2016-11-03 Lucidlogix Technologies Ltd. Automatic game support content generation and retrieval
US10300394B1 (en) * 2015-06-05 2019-05-28 Amazon Technologies, Inc. Spectator audio analysis in online gaming environments
US9578351B1 (en) * 2015-08-28 2017-02-21 Accenture Global Services Limited Generating visualizations for display along with video content
US10569180B2 (en) * 2015-11-06 2020-02-25 Sportal Systems, LLC Visually representing virtual fantasy sports contests
US11305198B2 (en) * 2015-11-06 2022-04-19 Sportal Systems, LLC Visually representing virtual fantasy sports contests II
US10223449B2 (en) * 2016-03-15 2019-03-05 Microsoft Technology Licensing, Llc Contextual search for gaming video
CN107707931B (zh) * 2016-08-08 2021-09-10 阿里巴巴集团控股有限公司 根据视频数据生成解释数据、数据合成方法及装置、电子设备
CN107423274B (zh) * 2017-06-07 2020-11-20 北京百度网讯科技有限公司 基于人工智能的比赛解说内容生成方法、装置及存储介质
CN109145733A (zh) * 2018-07-17 2019-01-04 焦点科技股份有限公司 一种篮球比赛的人工智能解说方法及系统
US11087584B2 (en) * 2019-10-10 2021-08-10 Igt Gaming systems and methods for alternating the presentation of live events

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030139209A1 (en) * 2002-01-18 2003-07-24 Konami Corporation Game apparatus and storage medium for carrying program therefore
CN1759909A (zh) * 2004-09-15 2006-04-19 微软公司 在线游戏观众系统
CN107362538A (zh) * 2017-07-05 2017-11-21 腾讯科技(深圳)有限公司 一种游戏辅助信息展示方法、装置以及客户端
CN108549486A (zh) * 2018-04-11 2018-09-18 腾讯科技(深圳)有限公司 虚拟场景中实现解说的方法及装置
CN111290724A (zh) * 2020-02-07 2020-06-16 腾讯科技(深圳)有限公司 在线虚拟解说方法、设备和介质

Also Published As

Publication number Publication date
US20220143510A1 (en) 2022-05-12
CN111290724A (zh) 2020-06-16
US11833433B2 (en) 2023-12-05
CN111290724B (zh) 2021-07-30

Similar Documents

Publication Publication Date Title
WO2021155692A1 (zh) 在线虚拟解说方法、设备和介质
US11765439B2 (en) Intelligent commentary generation and playing methods, apparatuses, and devices, and computer storage medium
JP7253570B2 (ja) 遠隔ユーザ入力に基づく文脈インゲーム要素認識、注釈及び対話
US20210170281A1 (en) System and Method for Replaying Video Game Streams
US10300394B1 (en) Spectator audio analysis in online gaming environments
CN111953910A (zh) 基于人工智能的视频处理方法、装置及电子设备
CN110841287B (zh) 视频处理方法、装置、计算机可读存储介质和计算机设备
CN115348458A (zh) 虚拟直播控制方法以及系统
Salge et al. Generative design in minecraft: Chronicle challenge
CN112827172A (zh) 拍摄方法、装置、电子设备及存储介质
EP4347070A1 (en) Simulating crowd noise for live events through emotional analysis of distributed inputs
US20010006909A1 (en) Speech generating device and method in game device, and medium for same
CN112423093A (zh) 游戏视频生成方法、装置、服务器和存储介质
Mangiron Found in translation: Evolving approaches for the localization of Japanese video games. Arts, 10 (1), 9
CN114297354B (zh) 一种弹幕生成方法及装置、存储介质、电子装置
WO2022198971A1 (zh) 一种虚拟角色的动作切换方法、装置及存储介质
US11704980B2 (en) Method, apparatus, and computer storage medium for outputting virtual application object
CN110990550B (zh) 一种话术生成的方法、基于人工智能的解说方法及装置
US11778279B2 (en) Social media crowd-sourced discussions
CN113509724A (zh) 记录介质、信息处理装置及方法
US20230018621A1 (en) Commentary video generation method and apparatus, server, and storage medium
CN114697741A (zh) 多媒体信息的播放控制方法及相关设备
Cesário et al. Design recommendations for improving immersion in role-playing video games. A focus on storytelling and localisation
CN112231220B (zh) 一种游戏测试方法和装置
KR102343359B1 (ko) 친구 감정 표정을 이용한 게임의 에너지 충전 장치 및 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20917336

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20917336

Country of ref document: EP

Kind code of ref document: A1