CN116506674B - Target event triggering method and device based on virtual object - Google Patents
Target event triggering method and device based on virtual object Download PDFInfo
- Publication number
- CN116506674B CN116506674B CN202310795861.2A CN202310795861A CN116506674B CN 116506674 B CN116506674 B CN 116506674B CN 202310795861 A CN202310795861 A CN 202310795861A CN 116506674 B CN116506674 B CN 116506674B
- Authority
- CN
- China
- Prior art keywords
- features
- trigger
- driving
- virtual object
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000013507 mapping Methods 0.000 claims abstract description 24
- 230000000875 corresponding effect Effects 0.000 claims description 46
- 239000013598 vector Substances 0.000 claims description 32
- 239000011159 matrix material Substances 0.000 claims description 12
- 239000000284 extract Substances 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 abstract description 8
- 230000001960 triggered effect Effects 0.000 abstract description 8
- 238000012545 processing Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 13
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000006399 behavior Effects 0.000 description 6
- 238000004590 computer program Methods 0.000 description 6
- 230000001276 controlling effect Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000002028 premature Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8146—Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8455—Structuring of content, e.g. decomposing content into time segments involving pointers to the content, e.g. pointers to the I-frames of the video stream
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Graphics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Processing Or Creating Images (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The application provides a target event triggering method and device based on a virtual object, wherein the method comprises the following steps: acquiring text content to be played of a virtual object, extracting keywords associated with a target event from the text content, and setting a first trigger mark at the position of the keywords; converting the text content with the first trigger mark into voice data to be played, and extracting a plurality of voice features from the voice data to be played; acquiring driving data for driving the virtual object, extracting a plurality of driving features from the driving data, and mapping the plurality of voice features and the second trigger mark onto the plurality of driving features; the virtual object is driven based on the mapped plurality of driving features, and the target event is triggered when the third trigger tag is identified. The application solves the technical problem that the picture content of the interaction node is inconsistent with that of the live data stream when the virtual object is live.
Description
Technical Field
The application relates to the technical field of data processing, in particular to a target event triggering method and device based on a virtual object.
Background
With the progress of computer technology and internet technology, virtual objects provide a variety of functions and services in the fields of life, entertainment, etc. For example, real-time interpretation with virtual objects in a digital living room is one application. However, virtual objects need to trigger some business related instructions in a specific context when live interpretation is performed. For example, when a virtual object is taught, when it refers to a product, the user interface pops up a purchase link or related recommendation for that product. Therefore, how to trigger corresponding business behavior accurately in the explanation process of the virtual object so as to ensure the synchronization of the explanation of the virtual object and the triggering instruction becomes a technical problem to be solved urgently.
In some related art, when a virtual object performs automatic explanation, when a node that needs to trigger interaction in a live studio is encountered, a live operator manually triggers corresponding interaction in the background according to live content or a predetermined time. However, due to human-triggered errors, the interaction node may not coincide with the live stream content, triggering may be premature or delayed. Even human error may result in the instruction not triggering accurately.
In other related technologies, a script is executed at a predetermined time by adopting a mode of timing tasks to trigger a trigger instruction corresponding to interaction behavior in a live broadcasting room. However, since the appearance time of the live streaming picture of the virtual object is difficult to accurately estimate, and the content of the live scenario of the virtual object may be adjusted at any time, this increases the difficulty of accurate estimation. Thus, a scheme employing a timing task to trigger an instruction at a specified picture may cause a problem of inconsistency of picture and instruction trigger timing.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the application provides a target event triggering method and device based on a virtual object, which at least solve the technical problem that in the prior art, due to the fact that a triggering instruction for triggering a target event has a time error, the picture contents of an interaction node and a live broadcast data stream are inconsistent when the virtual object is live broadcast.
According to an aspect of an embodiment of the present application, there is provided a target event triggering method based on a virtual object, including: acquiring text content to be played of a virtual object, extracting keywords associated with a target event from the text content, and setting a first trigger mark at the position of the keywords; converting the text content provided with the first trigger mark into voice data to be played, and extracting a plurality of voice features from the voice data to be played, wherein part of voice features in the voice features carry a second trigger mark corresponding to the first trigger mark; obtaining driving data for driving the virtual object, extracting a plurality of driving features from the driving data, and mapping the voice features and the second trigger marks onto the driving features, wherein part of the driving features after mapping carry third trigger marks corresponding to the second trigger marks; the virtual object is driven based on the mapped plurality of driving features, and the target event is triggered when the third trigger tag is identified.
According to another aspect of the embodiment of the present application, there is further provided a target event triggering device based on a virtual object, including an acquisition module configured to acquire text content to be played of the virtual object, extract a keyword associated with a target event from the text content, and set a first trigger mark at a position of the keyword; the voice conversion module is configured to convert the text content provided with the first trigger mark into voice data to be played, and extract a plurality of voice features from the voice data to be played, wherein part of the voice features in the plurality of voice features carry a second trigger mark corresponding to the first trigger mark; the mapping module is configured to acquire driving data for driving the virtual object, extract a plurality of driving features from the driving data, and map the plurality of voice features and the second trigger marks onto the plurality of driving features, wherein part of driving features in the mapped plurality of driving features carry third trigger marks corresponding to the second trigger marks; a triggering module configured to drive the virtual object based on the mapped plurality of drive features and trigger the target event upon recognition of the third trigger tag.
In the embodiment of the application, text content to be played of a virtual object is obtained, keywords associated with a target event are extracted from the text content, and a first trigger mark is set at the position of the keywords; converting the text content with the first trigger mark into voice data to be played, and extracting a plurality of voice features from the voice data to be played; acquiring driving data for driving the virtual object, extracting a plurality of driving features from the driving data, and mapping the plurality of voice features and the second trigger mark onto the plurality of driving features; the virtual object is driven based on the mapped plurality of driving features, and the target event is triggered when the third trigger tag is identified. By the scheme, the technical problem that in the prior art, due to the fact that a time error exists in a trigger instruction of a trigger target event, when a virtual object is live broadcast, the picture content of an interaction node is inconsistent with that of a live broadcast data stream is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a virtual object based target event triggering method according to an embodiment of the present application;
FIG. 2 is a flow chart of another virtual object based target event triggering method according to an embodiment of the application;
FIG. 3 is a flow chart of a method of extracting a plurality of drive features according to an embodiment of the present application;
FIG. 4 is a flow chart of a method of mapping voice features and second trigger markers onto drive features according to an embodiment of the application;
FIG. 5 is a flow chart of a method of driving a virtual object and triggering a target event according to an embodiment of the application;
FIG. 6 is a schematic diagram of a virtual object-based target event triggering apparatus according to an embodiment of the present application;
fig. 7 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present application unless it is specifically stated otherwise. Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description. Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate. In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Example 1
The embodiment of the application provides a flow chart of a target event triggering method based on a virtual object, as shown in fig. 1, comprising the following steps:
step S102, obtaining text content to be played of a virtual object, extracting keywords associated with a target event from the text content, and setting a first trigger mark at the position of the keywords.
Searching the appearance position of the keyword in the text content by using a character string matching method; and setting the first trigger mark at the appearance position, wherein the first trigger mark is arranged behind the keyword.
In this embodiment, extracting keywords associated with a target event may help determine the specific actions or triggered events that the virtual object needs to perform. These keywords may be instructions, trigger conditions, or identifiers of specific content. In addition, by setting the first trigger mark at the position of the keyword, the portion related to the target event can be accurately marked. This facilitates further processing and driving in subsequent steps.
Step S104, converting the text content with the first trigger mark into voice data to be played, and extracting a plurality of voice features from the voice data to be played, wherein part of the voice features in the plurality of voice features carry a second trigger mark corresponding to the first trigger mark.
First, text content containing a first trigger is passed to a speech synthesis system that converts the text into corresponding speech data. This process may be accomplished using text-to-speech (TTS) technology, where text is converted to audio with natural speech. Then, the voice data to be played is analyzed and processed using voice signal processing technology to extract a plurality of voice features. And finally, organizing the extracted voice features into a feature vector matrix. And positioning the voice characteristic vector corresponding to the first trigger mark according to the position of the first trigger mark, and setting a second trigger mark on the vector. The second trigger flag may be a special flag value or flag bit to indicate that the speech feature vector is associated with the first trigger flag.
In this embodiment, the extracted voice features are organized into a feature vector matrix, and the voice feature vectors corresponding to the first trigger marks are located according to the positions of the first trigger marks. A second trigger is placed on the vector to indicate the association of the speech feature vector with the first trigger. Such a flag setting may help identify and process specific events or information, providing more accurate positioning and reference for subsequent analysis and applications.
Step S106, obtaining driving data for driving the virtual object, extracting a plurality of driving features from the driving data, and mapping the voice features and the second trigger marks onto the driving features, wherein part of the driving features after mapping carry third trigger marks corresponding to the second trigger marks.
First, driving data is acquired, and a plurality of driving features are extracted from the driving data.
The drive data may be various forms of input data, such as sensor data, user input, and the like. A plurality of drive features are extracted from the drive data to capture useful information related to the drive behavior. The driving characteristics may include, but are not limited to, characteristics in terms of time, position, speed, direction, force, sound, image, etc.
The plurality of speech features and the second trigger flag are then mapped to the plurality of drive features.
Temporally aligning the plurality of speech features and the plurality of drive features by interpolating the plurality of speech features and the plurality of drive features; setting the third trigger mark corresponding to the time point where the second trigger mark is located on the aligned driving features.
For example, mapping the plurality of speech features and the plurality of drive features onto the same timeline; the plurality of speech features and the plurality of drive features are respectively interpolated on the time axis using a linear interpolation method to align the plurality of speech features and the plurality of drive features in time. Specifically, determining the positions of the plurality of voice features at a first time point on the time axis, respectively calculating linear weights between two adjacent voice features in the plurality of voice features according to the positions of the first time point, and performing interpolation operation based on the linear weights; determining a location of the plurality of drive features at a second point in time on the time axis, interpolating between two adjacent ones of the plurality of drive features such that the plurality of drive features are aligned in time with the plurality of speech features.
The present embodiment can align a plurality of voice features and a plurality of driving features on the time axis by interpolating them. This has the advantage that it is ensured that the speech features and the driving features correspond at the same point in time, thereby enabling a more accurate data analysis and processing. Furthermore, aligning the speech features and the drive features may improve data consistency. By interpolation and alignment, correlation and consistency between speech features and driving features can be ensured for better understanding and analysis of the data. And finally, setting a third trigger mark corresponding to the time point where the second trigger mark is positioned on the aligned driving feature. This facilitates marking or identifying particular events at particular points in time to better understand and utilize the data.
Step S108, driving the virtual object based on the mapped driving features, and triggering the target event when the third trigger flag is identified.
Identifying, for each of the mapped plurality of drive features, whether the drive feature has a corresponding third trigger flag; driving the virtual object according to the driving characteristic under the condition that the third trigger mark exists, enabling the virtual object to execute corresponding actions and triggering the target event; in the absence of the third trigger, the virtual object is driven directly in accordance with the driving feature. Mapping the driving characteristics to control parameters of the virtual object, for example, by a mapping function; a data stream of the virtual object is generated based on the control parameter and a trigger instruction that triggers the target event.
In this embodiment, based on the mapped multiple driving features, the virtual object can be driven to execute a corresponding action. By associating drive characteristics with control parameters of the virtual object, precise control and operation of the virtual object may be achieved. Further, by detecting whether a corresponding third trigger flag is present in the drive feature, it may be determined whether a particular event or operation needs to be triggered. This helps trigger the target event of the virtual object at the appropriate time so that the triggered target event matches the content of the live video stream.
Example 2
The embodiment of the application provides a flow chart of another target event triggering method based on a virtual object, as shown in fig. 2, the method comprises the following steps:
step S202, extracting keywords related to a target event from text content to be played, and setting a first trigger mark at the position of the keywords.
And acquiring the text content of the virtual object to be played. For a target event, keywords associated therewith are determined. These keywords may be instructions, trigger conditions, or identifiers of specific content. The selection of keywords should be able to accurately represent the target event. For example, it may be "links" or "red packs" or the like. Then, the appearance position of the keyword is searched from the text content using a character string matching method. For example, a character string search algorithm, such as a KMP algorithm or a regular expression, may be used. A first trigger is placed at the location of the key so that subsequent steps can accurately identify the portion associated with the target event. After keyword extraction and first trigger flag setting are completed, the keyword locations in the text content will be accurately marked so that subsequent steps can be further processed and driven based on these marks.
In step S204, the text content with the first trigger flag set is converted into voice data to be played, and a plurality of voice features are extracted.
And transmitting the acquired text content into a voice synthesis system. Speech synthesis systems use text-to-speech (TTS) technology to convert text into corresponding speech data. This process may be implemented by invoking an appropriate speech synthesis API or library. The speech synthesis system generates audio with natural speech from the input text content. Then, the voice data to be played is analyzed and processed using voice signal processing techniques to extract a plurality of voice features.
And finally, organizing the extracted voice features into a feature vector matrix. Each speech feature may be represented as a vector, which is arranged in time order to form a feature vector matrix. Such feature vector matrices will provide detailed information about the speech data, providing a basis for subsequent processing and driving steps. By executing the above steps, the text content on which the first trigger flag is set can be converted into a feature vector matrix carrying the second trigger flag corresponding to the first trigger flag.
Step S206, obtaining driving data for driving the virtual object, and extracting a plurality of driving features therefrom.
As shown in fig. 3, the method of extracting a plurality of driving features may include the steps of:
in step S2062, drive data for driving the virtual object is acquired.
Drive data for driving the virtual object is acquired, wherein the drive data may be various forms of input data, such as sensor data, user input, etc.
Step S2064 extracts a plurality of drive features from the acquired drive data.
A driving feature is useful information related to driving behavior for controlling the behavior of a virtual object. Drive features include, but are not limited to: time, indicating a current time stamp or time period, for controlling the virtual object to perform different actions at different points in time. The position, which indicates position information of the virtual object, may be two-dimensional coordinates or three-dimensional coordinates, for controlling movement of the virtual object in space. The speed is used for indicating the movement speed of the virtual object and controlling the movement speed of the virtual object. A direction, indicating the direction or direction of movement of the virtual object, for controlling the direction or path of the virtual object. The strength indicates the strength or intensity of the virtual object and is used for controlling the strength or action amplitude of the virtual object. Sound, indicating sound characteristics, such as volume, tone, etc., may be used to control the sound appearance of the virtual object. The image, indicating visual characteristics of the virtual object, such as color, shape, etc., may be used to control the appearance of the virtual object.
In some embodiments, statistical analysis may be performed on continuous features of time, location, speed, direction, etc., such as calculating means, variances, maxima, minima, etc. For discrete features such as sound, image, etc., corresponding signal processing or image processing methods such as spectrum analysis, color histogram, etc. may be employed.
The extracted plurality of drive features are converted into feature vectors or forms of feature vectors for subsequent use. The feature vector may be a one-dimensional array or matrix in which each element corresponds to a value of a drive feature. For each drive feature, a normalization or normalization process may be performed to ensure that they have similar dimensions or ranges, depending on the actual requirements. This avoids that certain features have too much influence on the model or system. In the process of extracting the features, feature selection or dimension reduction processing can be performed according to the requirements, so that the data dimension and redundancy are reduced, and the calculation efficiency and the model performance are improved.
Step S208, mapping the voice feature and the second trigger mark onto the driving feature.
As shown in fig. 4, the method of mapping the voice feature and the second trigger mark onto the drive feature comprises the steps of:
Step S2082, obtaining the position information of the second trigger mark.
The second trigger tag is a tag for marking a relationship with the target event. By identifying and locating the position of the second trigger markers in the feature vector matrix, trigger opportunities associated with the target event can be determined.
Step S2084, maps the second trigger to the drive feature.
Depending on the position of the second trigger mark, a driving feature vector corresponding thereto may be located, and a third trigger mark may be set on the vector. The third trigger flag may be a special flag value or flag bit to indicate that the drive feature vector is associated with the second trigger flag. The mapped drive characteristics may be a one-dimensional array or matrix in which each element corresponds to a value of a drive characteristic.
In order to achieve the mapping of the driving features, data alignment or interpolation operations are required to keep the speech features and driving features consistent in time. An interpolation method may be used to map the speech feature and the driving feature to the same time axis, and a third trigger mark corresponding to the time point where the second trigger mark is located may be set on the aligned driving feature.
Specifically, the speech features and the drive features are first aligned by interpolation. On the voice characteristic, corresponding voice characteristic values are acquired according to known time points or time periods. On the driving feature, a position closest to the time point or the time period is found, and the driving feature value of the position is calculated by using an interpolation method. The interpolated drive feature values are then aligned with the speech feature values so that they remain consistent in time.
Assuming the driving feature is Y, the following formula may be used to interpolate the driving feature Y:
wherein Y' (t) new ) Is the interpolated drive characteristic at time point t new Value at c i Is a coefficient of an interpolation polynomial, t i Is the time point, t, corresponding to the ith sample point of the original driving characteristic j Is the time point corresponding to the jth sample point of the original driving feature, and n represents the number of data points used in interpolation, i.e. the number of sample points of the original driving feature.
In this embodiment, by the above interpolation alignment method, the voice feature and the driving feature can be mapped on the same time axis so that they are kept coincident in time. Thus, the corresponding relation between the voice and the driving characteristics can be ensured to be accurate. In other embodiments, the drive characteristic values may also be smoothly varied over time by interpolation of neighboring data points. This helps to reduce the disturbance of the feature analysis by abrupt changes and noise, making the features more continuous and reliable.
Then, after interpolation alignment, a third trigger flag is set. The third trigger flag may be set, for example, by the following equation:
wherein Y is i,j ' represents the elements of the ith row and the jth column in the driving feature matrix, Y i,j Elements representing corresponding positions in the original driving feature matrix, w represents weights, f (j) represents a third trigger mark, t trigger For the position of the second trigger mark。
In this embodiment, the voice feature and the driving feature can be aligned in time by the data alignment or interpolation operation so that they have the same time resolution. This helps to maintain consistency and comparability of the data during subsequent analysis and processing. By providing a third trigger on the aligned drive feature, it may be indicated that the drive feature is associated with a second trigger. Thus, the position of the driving feature corresponding to the voice feature can be marked, and the subsequent feature analysis and processing are convenient.
Step S210, driving the virtual object based on the mapped driving features, and triggering the target event.
As shown in fig. 5, the method of driving the virtual object and triggering the target event includes the steps of:
step S2102, a plurality of mapped driving features are acquired.
A mapped plurality of drive characteristics is obtained, which may include position, velocity, acceleration, etc. These features reflect the state and behavior of the virtual object. To apply the driving features to the virtual object, the present embodiment employs a driving model based on machine learning.
Step S2104, applies a driving signature to the virtual object and detects whether a third trigger is present.
For each drive feature, it is detected whether a corresponding third trigger flag is present. If a third trigger is present, the virtual object may be driven according to the driving feature. For example, the driving characteristics are mapped onto control parameters of the virtual object to generate a video stream of the virtual object. Meanwhile, a target event is generated according to the third trigger mark so as to trigger related interaction or action. Then, based on the video stream of the virtual object and the target event, a final live video stream is generated. If the third trigger mark does not exist, the virtual object can be directly driven according to the driving characteristic, and a live video stream is generated.
To enable a rendered display of virtual objects, a picture may be loaded with a virtual object rendering engine. Based on the method, a corresponding pseudo image, namely a target event video frame, can be loaded according to a triggered target event instruction so as to display the target event video frame on a client, and the situation that the virtual object video stream is abnormally displayed on the client due to the target event triggering instruction is avoided. Specifically, a video stream of a virtual object generated based on drive data and a pseudo image generated based on a target event trigger instruction are synthesized into a live data stream by acquiring them.
In some embodiments, smoothing may also be performed during generation of the live data stream. In particular, its inferred position in the target event video frame may be calculated from the position of the pixel in the last frame of the video stream of the virtual object and the motion vector. The pixel values of the intermediate frame to be inserted are then inferred based on the inferred position of the pixel in the target event video frame and the actual position of the pixel in the target event video frame.
The inferred position in the target event video frame may be derived based on the initial abscissa and ordinate positions of the pixel in the last frame of the video stream of the virtual object, the motion vector function of the last frame of the video stream of the virtual object, the motion vector function in the target event video frame, the bias parameters, and the weight parameters. For example, it can be found based on the following formula:
where x and y represent the initial abscissa and ordinate positions of the pixel. MV (motion vector) current_end (x, y) represents a motion vector function in the last frame of the video stream of the virtual object for calculating the positional offset of the pixel in the video stream of the virtual object. Biasx and Biasy represent first and second bias parameters for fine tuning the inferred position, avoiding possible offset or distortion factors. MV (motion vector) target (x, y) represents a motion vector function in the target event video frame for representing the motion characteristics of the pixel in the target event video frame. The weights x and weight represent the first weight parameter and the second weight parameter when pixel location in the target event video frame is inferred,for adjusting the degree of dependence on motion vectors in the target event video frames when deducing position. Among them, biasx, biasy, weightx and weight can be obtained by a deep learning method.
In this embodiment, motion vectors and weight parameters are introduced, and by analyzing the video stream of the virtual object, the video frame of the target event, and the pixel motion between them, the pixel position of the intermediate frame to be inserted can be inferred more accurately. By the method, the quality and accuracy of the live video stream are improved, and the generated live video stream is smoother and more natural.
In this embodiment, by the above manner, each driving feature may be determined, whether the third trigger mark exists or not is determined, and a corresponding driving manner is adopted. Therefore, the method can realize accurate control of the virtual object, trigger the target event according to the requirement, and provide richer virtual environment experience with strong interactivity for the user.
Example 3
The embodiment of the application provides a structure schematic diagram of a target event triggering device based on a virtual object, as shown in fig. 6, the device comprises: an acquisition module 62, a speech conversion module 64, a mapping module 66, and a triggering module 68.
The acquisition module 62 is configured to acquire text content to be played of a virtual object, extract keywords associated with a target event from the text content, and set a first trigger mark at the position of the keywords; the voice conversion module 64 is configured to convert the text content with the first trigger mark set into voice data to be played, and extract a plurality of voice features from the voice data to be played, wherein part of the voice features in the plurality of voice features carry a second trigger mark corresponding to the first trigger mark; the mapping module 66 is configured to obtain driving data for driving the virtual object, extract a plurality of driving features from the driving data, and map the plurality of voice features and the second trigger mark onto the plurality of driving features, wherein part of driving features in the mapped plurality of driving features carry a third trigger mark corresponding to the second trigger mark; the triggering module 68 is configured to drive the virtual object based on the mapped plurality of drive features and trigger the target event upon recognition of the third trigger flag.
It should be noted that: the virtual object-based target event triggering device provided in the above embodiment is only exemplified by the division of the above functional modules, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the virtual object-based target event triggering device and the virtual object-based target event triggering method provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
Example 4
Fig. 7 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present disclosure. It should be noted that the electronic device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.
As shown in fig. 7, the electronic device includes a Central Processing Unit (CPU) 1001 that can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data required for system operation are also stored. The CPU1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), etc., and a speaker, etc.; a storage portion 1008 including a hard disk or the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The drive 1010 is also connected to the I/O interface 1005 as needed. A removable medium 1011, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in the drive 1010, so that a computer program read out therefrom is installed as needed in the storage section 1008.
In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 1009, and/or installed from the removable medium 1011. When being executed by a Central Processing Unit (CPU) 1001, performs the various functions defined in the method and apparatus of the present application. In some embodiments, the electronic device may further include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present disclosure may be implemented by means of software, or may be implemented by means of hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.
As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device.
The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below. For example, the electronic device may implement the steps of the method embodiments described above, and so on.
The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In several embodiments provided in the present application, it should be understood that the disclosed terminal device may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.
Claims (8)
1. A virtual object-based target event triggering method, comprising:
acquiring text content to be played of a virtual object, extracting keywords associated with a target event from the text content, and setting a first trigger mark at the position of the keywords;
converting the text content provided with the first trigger mark into voice data to be played, and extracting a plurality of voice features from the voice data to be played, wherein part of voice features in the voice features carry a second trigger mark corresponding to the first trigger mark;
Obtaining driving data for driving the virtual object, extracting a plurality of driving features from the driving data, and mapping the voice features and the second trigger marks onto the driving features, wherein part of the driving features after mapping carry third trigger marks corresponding to the second trigger marks;
driving the virtual object based on the mapped plurality of driving features, and triggering the target event when the third trigger mark is identified;
wherein mapping the plurality of speech features and the second trigger tag onto the plurality of drive features comprises: temporally aligning the plurality of speech features and the plurality of drive features by interpolating the plurality of speech features and the plurality of drive features; setting the third trigger marks corresponding to the time points of the second trigger marks on the aligned driving features;
wherein driving the virtual object based on the mapped plurality of driving features and triggering the target event upon identifying the third trigger tag comprises: identifying, for each of the mapped plurality of drive features, whether the drive feature has a corresponding third trigger flag; driving the virtual object according to the driving characteristic under the condition that the third trigger mark exists, enabling the virtual object to execute corresponding actions and triggering the target event; in the absence of the third trigger, the virtual object is driven directly in accordance with the driving feature.
2. The method of claim 1, the aligning the plurality of speech features and the plurality of drive features in time by interpolating the plurality of speech features and the plurality of drive features, comprising:
mapping the plurality of speech features and the plurality of drive features onto a same timeline;
the plurality of speech features and the plurality of drive features are respectively interpolated on the time axis using a linear interpolation method to align the plurality of speech features and the plurality of drive features in time.
3. The method of claim 2, interpolating the plurality of speech features and the plurality of drive features, respectively, on the time axis using a linear interpolation method to align the plurality of speech features and the plurality of drive features in time, comprising:
determining the positions of the plurality of voice features at a first time point on the time axis, respectively calculating linear weights between two adjacent voice features in the plurality of voice features according to the positions of the first time point, and performing interpolation operation based on the linear weights;
determining a position of a second time point of the plurality of driving features on the time axis, and performing interpolation operation between two adjacent driving features in the plurality of driving features according to the position of the second time point, so that the plurality of driving features are aligned in time with the plurality of voice features.
4. The method of claim 1, wherein driving the virtual object according to the driving characteristic causes the virtual object to perform a corresponding action and trigger the target event, comprising:
mapping the driving characteristics to control parameters of the virtual object through a mapping function;
generating a live data stream of the virtual object based on the control parameter and a trigger instruction that triggers the target event.
5. The method of claim 1, wherein extracting a plurality of speech features from the speech data to be played comprises:
extracting a plurality of voice features from the voice data to be played to obtain a voice feature vector matrix;
and setting the second trigger mark on the voice characteristic vector corresponding to the position of the first trigger mark in the voice characteristic vector matrix.
6. The method of claim 1, wherein extracting keywords associated with a target event from the text content, setting a first trigger at a location of the keywords, comprises:
searching the appearance position of the keyword in the text content by using a character string matching method;
And setting the first trigger mark at the appearance position, wherein the first trigger mark is arranged behind the keyword.
7. A virtual object-based target event triggering apparatus, comprising:
the system comprises an acquisition module, a first trigger mark and a second trigger mark, wherein the acquisition module is configured to acquire text content to be played of a virtual object, extract keywords associated with a target event from the text content and set the first trigger mark at the position of the keywords;
the voice conversion module is configured to convert the text content provided with the first trigger mark into voice data to be played, and extract a plurality of voice features from the voice data to be played, wherein part of the voice features in the plurality of voice features carry a second trigger mark corresponding to the first trigger mark;
the mapping module is configured to acquire driving data for driving the virtual object, extract a plurality of driving features from the driving data, and map the plurality of voice features and the second trigger marks onto the plurality of driving features, wherein part of driving features in the mapped plurality of driving features carry third trigger marks corresponding to the second trigger marks;
A triggering module configured to drive the virtual object based on the mapped plurality of drive features and trigger the target event upon recognition of the third trigger tag;
wherein the mapping module is further configured to: temporally aligning the plurality of speech features and the plurality of drive features by interpolating the plurality of speech features and the plurality of drive features; setting the third trigger marks corresponding to the time points of the second trigger marks on the aligned driving features;
the trigger module is further configured to: identifying, for each of the mapped plurality of drive features, whether the drive feature has a corresponding third trigger flag; driving the virtual object according to the driving characteristic under the condition that the third trigger mark exists, enabling the virtual object to execute corresponding actions and triggering the target event; in the absence of the third trigger, the virtual object is driven directly in accordance with the driving feature.
8. A computer-readable storage medium, on which a program is stored, characterized in that the program, when run, causes a computer to perform the method of any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310795861.2A CN116506674B (en) | 2023-07-01 | 2023-07-01 | Target event triggering method and device based on virtual object |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310795861.2A CN116506674B (en) | 2023-07-01 | 2023-07-01 | Target event triggering method and device based on virtual object |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116506674A CN116506674A (en) | 2023-07-28 |
CN116506674B true CN116506674B (en) | 2023-09-05 |
Family
ID=87318835
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310795861.2A Active CN116506674B (en) | 2023-07-01 | 2023-07-01 | Target event triggering method and device based on virtual object |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116506674B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032355A (en) * | 2018-12-24 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Speech playing method, device, terminal device and computer storage medium |
CN110309238A (en) * | 2018-03-08 | 2019-10-08 | 上海博泰悦臻网络技术服务有限公司 | Point of interest interactive approach, system, electric terminal and storage medium in music |
CN111290568A (en) * | 2018-12-06 | 2020-06-16 | 阿里巴巴集团控股有限公司 | Interaction method and device and computer equipment |
CN112667068A (en) * | 2019-09-30 | 2021-04-16 | 北京百度网讯科技有限公司 | Virtual character driving method, device, equipment and storage medium |
CN113571099A (en) * | 2021-06-25 | 2021-10-29 | 海南视联大健康智慧医疗科技有限公司 | Operation recording and broadcasting method and device, electronic equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10708725B2 (en) * | 2017-02-03 | 2020-07-07 | T-Mobile Usa, Inc. | Automated text-to-speech conversion, such as driving mode voice memo |
-
2023
- 2023-07-01 CN CN202310795861.2A patent/CN116506674B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309238A (en) * | 2018-03-08 | 2019-10-08 | 上海博泰悦臻网络技术服务有限公司 | Point of interest interactive approach, system, electric terminal and storage medium in music |
CN111290568A (en) * | 2018-12-06 | 2020-06-16 | 阿里巴巴集团控股有限公司 | Interaction method and device and computer equipment |
CN110032355A (en) * | 2018-12-24 | 2019-07-19 | 阿里巴巴集团控股有限公司 | Speech playing method, device, terminal device and computer storage medium |
CN112667068A (en) * | 2019-09-30 | 2021-04-16 | 北京百度网讯科技有限公司 | Virtual character driving method, device, equipment and storage medium |
CN113571099A (en) * | 2021-06-25 | 2021-10-29 | 海南视联大健康智慧医疗科技有限公司 | Operation recording and broadcasting method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116506674A (en) | 2023-07-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US12094209B2 (en) | Video data processing method and apparatus, device, and medium | |
CN110288682B (en) | Method and apparatus for controlling changes in a three-dimensional virtual portrait mouth shape | |
CN107707931B (en) | Method and device for generating interpretation data according to video data, method and device for synthesizing data and electronic equipment | |
CN110347867B (en) | Method and device for generating lip motion video | |
CN109308469B (en) | Method and apparatus for generating information | |
EP3889912A1 (en) | Method and apparatus for generating video | |
CN113365147B (en) | Video editing method, device, equipment and storage medium based on music card point | |
KR20190139751A (en) | Method and apparatus for processing video | |
CN111901626A (en) | Background audio determining method, video editing method, device and computer equipment | |
CN109844736A (en) | Summarize video content | |
CN110534085B (en) | Method and apparatus for generating information | |
CN110213614A (en) | The method and apparatus of key frame are extracted from video file | |
CN111784776A (en) | Visual positioning method and device, computer readable medium and electronic equipment | |
CN110309720A (en) | Video detecting method, device, electronic equipment and computer-readable medium | |
CN112561840A (en) | Video clipping method and device, storage medium and electronic equipment | |
CN110673717A (en) | Method and apparatus for controlling output device | |
Sexton et al. | Automatic CNN-based enhancement of 360° video experience with multisensorial effects | |
CN117649537B (en) | Monitoring video object identification tracking method, system, electronic equipment and storage medium | |
CN111741329A (en) | Video processing method, device, equipment and storage medium | |
KR20230065339A (en) | Model data processing method, device, electronic device and computer readable medium | |
Nitika et al. | A study of Augmented Reality performance in web browsers (WebAR) | |
CN116506674B (en) | Target event triggering method and device based on virtual object | |
CN111292333A (en) | Method and apparatus for segmenting an image | |
CN113762056A (en) | Singing video recognition method, device, equipment and storage medium | |
CN109816791B (en) | Method and apparatus for generating information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: Building 60, 1st Floor, No.7 Jiuxianqiao North Road, Chaoyang District, Beijing 021 Patentee after: Shiyou (Beijing) Technology Co.,Ltd. Country or region after: China Address before: 4017, 4th Floor, Building 2, No.17 Ritan North Road, Chaoyang District, Beijing Patentee before: 4U (BEIJING) TECHNOLOGY CO.,LTD. Country or region before: China |
|
CP03 | Change of name, title or address |