WO2017094527A1 - Système de génération d'images animées, et système d'affichage d'images animées - Google Patents

Système de génération d'images animées, et système d'affichage d'images animées Download PDF

Info

Publication number
WO2017094527A1
WO2017094527A1 PCT/JP2016/084224 JP2016084224W WO2017094527A1 WO 2017094527 A1 WO2017094527 A1 WO 2017094527A1 JP 2016084224 W JP2016084224 W JP 2016084224W WO 2017094527 A1 WO2017094527 A1 WO 2017094527A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
moving image
frames
similarity
computer
Prior art date
Application number
PCT/JP2016/084224
Other languages
English (en)
Japanese (ja)
Inventor
五十嵐 健夫
伸樹 依田
Original Assignee
日本電産株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電産株式会社 filed Critical 日本電産株式会社
Priority to JP2017553773A priority Critical patent/JP7009997B2/ja
Publication of WO2017094527A1 publication Critical patent/WO2017094527A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/765Interface circuits between an apparatus for recording and another apparatus
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/91Television signal processing therefor
    • H04N5/93Regeneration of the television signal or of selected parts thereof

Definitions

  • This application relates to a moving image generation system and a moving image display system.
  • the present application also relates to a data structure used in the moving image generating system and a robot system including the moving image generating system.
  • anthropomorphic conversational agent using a computer has been developed.
  • the anthropomorphic dialogue agent displays a human or a human-like character on the screen of a display device (display), and interacts with the user according to the user's voice or keyboard input.
  • a moving image is displayed on the screen of the display, and the facial expression (particularly mouth movement) and posture of the subject image change in accordance with the content of the sound produced by a subject such as a human.
  • the moving image is generated by a moving image generation system.
  • the main video generation systems that display realistic people on the display screen are classified into two types.
  • a lip sync animation is synthesized from a two-dimensional (2D) photograph or moving image.
  • Such a synthetic moving image can be generated by a method using morphing of a still image (Non-Patent Document 1) or a method of preparing a number of mouth images and displaying them in an optimal order (Non-Patent Document 2).
  • a human three-dimensional (3D) model and its speech animation are created and drawn. Research on human face modeling and animation necessary for this drawing has been conducted (Non-Patent Document 3).
  • it is necessary to combine various techniques such as hair modeling, animation, rendering, and skin rendering.
  • the mouth only moves according to the content of the utterance, so it is far from a realistic human facial expression change.
  • the embodiment of the present disclosure provides a moving image generation system and a moving image display system capable of displaying a realistic change in human facial expression or posture.
  • An exemplary moving image generation system includes a recording device in which a moving image clip that is a sequence of a plurality of frames including a subject image is recorded, each frame included in a frame group selected from the plurality of frames, and the like.
  • a computer that reconstructs the plurality of frames based on the similarity between the frames and generates composite video data, and the similarity is a feature amount of the subject image in each frame.
  • the computer determines the similarity to the Nth frame as the (N + 1) th frame which is the next frame of the Nth frame (N is a positive integer) of the synthesized moving image.
  • One frame is determined from a plurality of candidate frames selected based on the frame group.
  • FIG. 1 is a diagram illustrating an example of a basic configuration of a moving image generation system according to the present disclosure.
  • FIG. 2 is a diagram schematically illustrating a configuration example of a table that defines a relationship between a certain frame and “a plurality of candidate frames” associated with the frame.
  • FIG. 3 is a diagram illustrating a configuration example of a non-limiting exemplary embodiment of the moving image generation system according to the present disclosure.
  • FIG. 4 is a diagram illustrating a configuration example of a moving image clip in which a standby section and an utterance section are alternately repeated.
  • FIG. 5 is a diagram showing a plurality of feature points of the subject image automatically extracted by the photographing apparatus.
  • FIG. 1 is a diagram illustrating an example of a basic configuration of a moving image generation system according to the present disclosure.
  • FIG. 2 is a diagram schematically illustrating a configuration example of a table that defines a relationship between a certain frame and “a plurality of candidate frames” associated with the frame.
  • FIG. 6 is a diagram schematically illustrating feature points of the subject image in the i-th frame i and the j-th frame j in the moving image clip.
  • FIG. 7 is a diagram schematically illustrating an example of a matrix that defines the distances D i and j between the frame i and the frame j obtained by pre-calculation.
  • FIG. 8 is a diagram schematically illustrating an example of a matrix that defines the transition costs C i , j between the frame i and the frame j obtained by pre-calculation.
  • FIG. 9 is a diagram illustrating an example of transition between frames when cross dissolve processing is used for frame interpolation.
  • FIG. 10 is a diagram schematically illustrating an example in which a predetermined frame is excluded from candidate frames.
  • FIG. 10 is a diagram schematically illustrating an example in which a predetermined frame is excluded from candidate frames.
  • FIG. 11 is a diagram illustrating a modification of the moving image display system.
  • FIG. 12 is a diagram schematically illustrating an exemplary embodiment of a robot system according to the present disclosure.
  • FIG. 13 is a flowchart illustrating an example of a processing procedure performed in advance.
  • FIG. 14 is a flowchart illustrating an example of a processing procedure for moving image generation and moving image display.
  • FIG. 15 is a diagram illustrating an internal configuration example of the computer 20.
  • Non-Patent Document 4 and Patent Document 1 disclose the technique of “Video Textures” as a technique capable of generating a realistic moving image of a human being.
  • Video Textures is a technology that can acquire a moving image clip of a finite time that originally has a beginning and an end by shooting and continuously reproduce the moving image synthesized from the moving image clip. According to this technology, instead of simply connecting the end point and start point of a video clip to make an infinite loop, the probability of transition from one frame to the next in the displayed video is based on the similarity between frames. Do it. For this reason, a frame sequence without a simple repetition can be generated.
  • Video Textures a plurality of frames similar to each other are selected in advance as candidate frames from among a large number of frames constituting a moving image clip acquired by shooting.
  • “transition” from the current frame to the next frame is performed according to a probability distribution according to the similarity. More specifically, one frame is selected from a plurality of candidate frames similar to the current frame according to a probability distribution corresponding to the degree of similarity. This probability distribution is defined by a function having the similarity between frames as a variable, and a transition to a frame with a high similarity occurs frequently.
  • the degree of similarity between frames is given by the distance between two frames of interest (frame-to-frame distance).
  • This “distance” is defined by adding the difference (absolute value or square) of pixel values in two frames for all the pixels constituting the frame. The smaller the “distance” defined in this way, the higher the similarity between frames.
  • Patent Document 2 discloses detecting the positions of a plurality of feature points arranged on a facial part. The apparatus of Patent Literature 2 takes correspondence of feature positions between captured face images, and analyzes how each feature position moves and how a video pattern changes due to utterance.
  • the degree of similarity is defined based on the “feature amount of the subject image” in the frame.
  • a typical example of a subject is a human face, which may include a part or the whole of the human body.
  • the subject may be an animal or a robot that can be anthropomorphized.
  • the subject image includes a face that defines an expression such as a human being, in particular, an eye and mouth image.
  • Such subject images are identified by various feature quantities that can be used in the field of pattern recognition.
  • the feature amount may be defined based on a plurality of points (feature points) extracted to express the facial expression and orientation, that is, the appearance of the face. For example, the position coordinates themselves within the frame of the feature point of the subject image can be the feature amount.
  • the sum of differences in the position coordinates of the corresponding feature points in the two frames corresponds to the distance for defining the similarity.
  • the position of the feature point may be a three-dimensional coordinate in the space where the subject is located. Note that pixel values at each feature point of the subject image, such as chromaticity or brightness, may be included in the feature quantity.
  • the similarity between frames is defined based on the feature amount of the subject image included in the frame instead of the entire frame, and therefore the similarity between frames can be obtained with a smaller calculation amount. .
  • some frames are excluded from a plurality of candidate frames selected based on the degree of similarity according to the order of the frames constituting the synthesized moving image.
  • some frames it is possible to suppress repeated use of the same frame having a high degree of similarity within a short period for the synthesized moving image.
  • K L is an integer of 1 or more
  • T L of frames T L is an integer of 2 or more
  • Claims can be selected as “partial frames” above and excluded from candidate frames.
  • FIG. 1 illustrates an example of a basic configuration of a moving image generation system according to the present disclosure.
  • the illustrated moving image generation system 100 includes a recording device 10 and a computer 20.
  • the moving image clip acquired by the previous shooting is recorded in the recording device 10.
  • the moving image clip is a sequence of a plurality of frames including a subject image such as a person facing the front.
  • the computer 20 reconstructs the frames based on the similarity between the frames included in the frame group of the moving image clip, and generates synthesized moving image data.
  • the “similarity” is defined based on the feature amount of the subject image in each frame.
  • the synthesized moving image generated by the moving image generating system 100 is a sequence of a plurality of frames in which the display order is reconfigured. That is, a composite moving image is configured by frames in which the order of frames in a moving image clip acquired by shooting is at least partially rearranged in a new order.
  • the computer 20 selects from the “plurality of candidate frames” the (N + 1) th frame that is the frame next to the Nth frame (N is a positive integer) in the combined moving image.
  • the “plurality of candidate frames” is selected based on the similarity from the frame group of the moving image clip.
  • the computer 10 determines one frame (for example, the frame 12 including the subject image 14) as the (N + 1) th frame in the synthesized moving image from the plurality of candidate frames selected in this way.
  • one frame for example, the frame 12 including the subject image 14
  • the frame having the highest similarity is selected, a specific frame is always selected from the candidate frames.
  • frames are selected probabilistically based on a probability distribution function with similarity as a parameter.
  • the change in the feature amount of the subject image between successive frames is small.
  • the moving image is synthesized so that the positions of the face, eyes, and mouth in the frame and their sizes naturally change.
  • the number of frames that transition from the same frame is not fixed to one and changes probabilistically, so even if a movie is displayed for a long time, the monotony of the facial expression or movement of the subject is reduced. obtain.
  • Multiple candidate frames are determined in advance for each frame.
  • the configuration of the “plurality of candidate frames” associated with the frame may be different.
  • the relationship between a frame and “a plurality of candidate frames” associated with the frame can be represented by a table, for example.
  • FIG. 2 is a diagram schematically showing a configuration example of such a table.
  • This table has a configuration capable of storing a value corresponding to the similarity (for example, “distance” between frames) for 12 frames selected from the moving image clip.
  • a table can be prepared for a huge number of frames exceeding tens of thousands.
  • the 12 frames are typically 12 consecutive frames acquired by photographing. However, the order of the frames does not necessarily need to match the order acquired at the time of shooting.
  • i and j are integers of 1 to 12.
  • a numerical value corresponding to the similarity between the frame i and the frame j is stored. It is assumed that the numerical values in this example indicate how much the coordinates of the feature points on the face of the person who is the subject are shifted for each frame.
  • the frame may be randomly selected from such candidate frames regardless of the degree of similarity, or some other rule A frame may be selected according to
  • a typical example of the subject image in the video clip is a human being, but it may be an anthropomorphized animal or robot.
  • “Reconstruction” is not only a mode in which a plurality of frames actually shot are used as they are to generate a synthesized movie, but a synthesized movie is generated by additionally using frames other than frames acquired by shooting.
  • a mode is also included.
  • a typical example of a mode in which a composite moving image is generated using a plurality of frames actually captured as it is is to generate a composite moving image by rearranging the display order of frames acquired at the time of shooting.
  • An example of “a frame other than a frame acquired by shooting” may be a frame generated by morphing, for example, from a frame acquired at the time of shooting.
  • FIG. 3 illustrates an example configuration of a non-limiting exemplary embodiment of a video generation system according to the present disclosure.
  • the illustrated moving image generation system 100 includes a recording device 10 in which a moving image clip is recorded, and a computer 20 that reconstructs a plurality of frames constituting the moving image clip and generates composite moving image data.
  • the moving image generating system 100 displays the combined moving image on the screen of the display 50 such as a liquid crystal display or an organic EL display.
  • the display 50 may be a projector that projects a moving image on a screen.
  • the display 50 can be connected to the moving image generating system 100 by wire or wireless at all times or as necessary.
  • One or both of the recording device 10 and the computer 20 in the moving image generation system 100 may be provided at a position away from the display 50.
  • the display 50 may be a part of the moving image generation system 100 or may not be included in the moving image generation system 100.
  • the display 50 in this embodiment has a built-in speaker (not shown) that outputs sound.
  • the computer 20 only needs to have a known configuration, and can typically be a commercially available general-purpose computer incorporating a central processing unit (CPU) and a memory.
  • the memory contains a program that includes a plurality of commands that define the order of processing by the computer 20. In operation, the computer 20 operates according to the instructions of this program.
  • the computer 20 determines one frame from a plurality of candidate frames as the (N + 1) th frame that is the frame next to the Nth frame (N is a positive integer) in the synthesized moving image.
  • this “determination” can be continuously performed in a mode in which a synthesized moving image including a waiting subject image is displayed on the display 50.
  • a display frame transitions from a certain frame (Nth frame) of the synthesized moving image to the next frame (N + 1th frame)
  • the transition occurs between two frames whose similarity satisfies a predetermined relationship. Arise.
  • the subject image in the synthesized moving image can show a natural movement as viewed from the user.
  • FIG. 3 shows a moving image display system 200 including the moving image generating system 100.
  • the moving image display system 200 includes an interface device 60 such as a microphone or a keyboard and a dialog engine 70 in addition to the moving image generation system 100.
  • the dialog engine 70 obtains some input from the user via the interface device 60, and causes the computer 20 to execute a dialog corresponding to the input, for example, to make an appropriate reply to the user. For example, when the user utters "Hello", the user's voice is input to the dialog engine 70 is converted into a digital signal by the interface unit 60.
  • the dialogue engine 70 discriminates the content of the words uttered by the user by voice recognition, and makes a response according to the discrimination result. Specifically, in response to a response instruction from the dialog engine 70, the computer 20 reads an appropriate part of the moving image clip from the recording device 10 and reproduces it.
  • the dialogue engine 70 may be realized by a program installed in the computer 20.
  • the dialogue engine 70 may be located away from the display 50.
  • the interface device 60 is typically placed near the display 50. Alternatively, the interface device 60 may be integrated with the display 50.
  • the moving image display system 200 according to an aspect includes a housing that houses the recording device 10, the computer 20, the display 50, and the interface device 60.
  • the moving image clip recorded in the recording device 10 includes a moving image portion for reply as will be described later.
  • the speaker of the display 50 for example voice "Hello”
  • humans accordance with the audio video utters "Hello” is displayed on the screen of the display 50. That is, a moving image showing a change in the appearance of the face corresponding to the voices of “ko”, “n”, “ni”, “chi”, and “wa” is displayed in accordance with the voice.
  • Such video had been obtained by previously imaging can be most easily produced by reproducing a series of frames that utters "Hello” to the sequence as at the time of shooting.
  • the moving image generation system according to the present embodiment does not generate a speech animation from an input character string or voice, unlike the above-described prior art.
  • the moving image generation system according to the present embodiment is preferably used for a dialogue system that selects an appropriate response from predetermined lines, such as a simple visitor correspondence and reservation reception system.
  • a standby subject image is displayed on the display 50. Such a period is called a “waiting period”.
  • the operation state of the moving image generation system 100 during the “standby period” is referred to as “standby mode”.
  • the period in which the voice and the moving image for replying to the user are reproduced is called “speech period”.
  • the operation state of the moving image generation system 100 in the “speech period” is referred to as “speech mode”.
  • the subject image displayed on the display 50 during the standby period is not a still image but a moving image synthesized by the computer 50.
  • this synthesized moving image is a new sequence of frames generated by the computer 50 reconstructing a frame group of moving image clips. Therefore, the order of the frames constituting the moving image displayed during the standby period is determined by the probabilistic transition between frames as described above.
  • the composite moving image data in the standby period and the reproduced moving image data in the utterance period are alternately sent to the display 50.
  • the length of the waiting period varies depending on the input timing by the user, but the length of the utterance period is defined by the length of the moving image portion selected according to the content of the utterance.
  • the “moving image clip” in the present embodiment includes a moving image portion of a subject image that emits a predetermined line, and a moving image portion of a subject image that is in a state before and after such a line is emitted.
  • the computer 20 is programmed to perform one of the generation of the data of the synthesized moving image and the selection of the partial clip that is a part of the moving image clip in accordance with the user input.
  • the display 50 displays the generated synthesized moving image or the selected partial clip.
  • a recording system used to acquire a movie clip includes, for example, one computer and one movie shooting device.
  • the moving image photographing apparatus used in the present embodiment is Kinect (registered trademark) manufactured by Microsoft (registered trademark).
  • a moving image capturing device such as a normal camera that does not acquire depth information may be used.
  • the actor immediately after the start of recording, the actor waits without doing anything in order to record the standby state (interactive section).
  • the operator selects a line
  • the actor utters the selected line (speech section). For example, when the words of "Hello” is selected, the actor gives a voice to the "Hello”.
  • the operator inputs the timing when the actor has finished speaking to the system. After that, the standby recording is continued for a while, and when the operator selects a dialogue again, the recording of the next dialogue starts.
  • a moving image clip in which a standby section and a speech section are alternately repeated is obtained as illustrated in FIG.
  • This increases the number of candidate frames at the transition positions before and after the utterance state, making it easier to realize more natural frame transitions when generating a moving image.
  • the same dialogue may be recorded multiple times.
  • a plurality of moving images in the utterance section that utter the same dialogue are acquired, even when the same dialogue is repeatedly uttered in response to a user input when the moving image is generated, it can be avoided that the same moving image is repeatedly reproduced.
  • the number of videos in the utterance state where the same dialogue is uttered increases, the number of frames that can be selected when transitioning from the various frames in the standby state to the first frame in each utterance state increases. It becomes easy to do.
  • the information stored in the recording system includes voice, camera RGB data, face recognition results, posture recognition results, and the start and end timings of utterances input by the operator.
  • FIG. 5 shows some of the face feature point positions f i, k (1 ⁇ k ⁇ F) of the subject image in the i-th frame i in the moving image clip.
  • some positions s i, k (1 ⁇ k ⁇ S) of posture feature points of the subject image are described.
  • F and S are a maximum value of the number of face feature points and a maximum value of the number of posture feature points used for processing, respectively.
  • the positions f i, k and s i, k of these feature points change. Even for the same corresponding feature points, their positions may vary from frame to frame.
  • a predetermined number of feature point positions are acquired for each frame.
  • the position of the human head, which is the subject is indicated by the intersection of the alternate long and short dash line 15X extending in the horizontal direction (X-axis direction) and the alternate long and short dash line 15Y extending in the vertical direction (Y-axis direction).
  • a position indicating the posture of a human being, which is a subject is shown by the intersection of one broken line 16X extending in the horizontal direction and each of two broken lines 16Y extending in the vertical direction.
  • this photographing apparatus 80 it is possible to display the difference in attitude from the start of recording on the display of the recording system. If the difference is too large during shooting, the operator can interrupt the recording or use it to determine the posture when resuming when taking a break during shooting.
  • pre-computation is performed to determine a frame transition candidate frame to be performed at the time of generating the synthesized video.
  • a transition cost between frames is calculated for an arbitrary frame pair in a recording clip, and a cost matrix is obtained.
  • the transition cost is a numerical value of unnaturalness when transitioning from one frame in a recording clip to another frame.
  • This calculation can follow the method disclosed in Non-Patent Document 4.
  • the entire contents of Non-Patent Document 4 are incorporated herein.
  • the similarity between frames in the present embodiment is performed using the feature points of face recognition, not the pixel-by-pixel comparison.
  • a plurality of frames that realize a low-cost transition are extracted for each of the transition from the standby state to the standby state and the transition from the standby state to the speech state.
  • a plurality of frames that are transition destinations with low costs are candidate frames.
  • a process of cutting out the sound recorded in parallel with the shooting of the candidate frame and a process of converting the RGB data of each frame for reproduction are also performed.
  • FIG. 6 schematically shows feature points of a subject image in an i-th frame i and a j-th frame j in a moving image clip.
  • 1 ⁇ i ⁇ j.
  • arrows indicating the positions f i, 15 and s i, 2 of two feature points are described.
  • the position f i, 15 is the fifteenth facial feature point
  • s i, 2 is the second posture feature.
  • arrows indicating the positions f j, 15 and s j, 2 of the two feature points corresponding to the two feature points in the frame i are described.
  • the distance (difference) between f i, 15 in frame i and position f j, 15 in frame j may occur for the fifteenth facial feature point.
  • the distance (difference) is large, when a transition occurs from frame i to frame j at the time of moving image generation, the movement of the feature point can be recognized by the user.
  • the distance (difference) between the feature points of the subject image is used as a value indicating the similarity between frames. The smaller this distance, the higher the similarity.
  • the distance between feature points of a subject image is obtained from face detection and posture information by Kinect (registered trademark). Then, the sum of the distances of the plurality of feature points is defined as “interframe distance”.
  • interframe distance an example of the “interframe distance” will be described.
  • the number F of facial feature points is 121.
  • those that are always visible when photographing the upper body head, right shoulder, left shoulder, center of shoulder
  • the number S of posture feature points is four.
  • the distance between the i-th frame and the j-th frame is expressed by the following expression when weighting parameters ⁇ and ⁇ are introduced.
  • the distance is standardized by the average of the distances, and the distance between the face and the posture is used at a ratio of 7: 3 as shown in the following expressions 2 and 3.
  • the distances D i , j between the frame i and the frame j obtained by such calculation can be expressed as a matrix as shown in FIG.
  • interpolation is performed by cross dissolve processing at the time of transition between non-adjacent frames.
  • d 3 corresponding to 0.25 seconds is used.
  • the edge distance can be approximated by the distance of the feature point. That is, the cost of transition from frame i to frame j can be defined using Equation 4 below.
  • the transition cost C i , j between the frame i and the frame j obtained by such calculation can be expressed as a matrix as shown in FIG.
  • the transition cost from the i-th frame to the i + 1-th frame is zero. Further, considering not only the static difference between frames but also the difference in motion expressed by a plurality of frames, it is preferable to take the sum of distances for a plurality of adjacent frames.
  • the cost of transition from one frame to another frame is specified using a plurality of frames (frame sets).
  • frame sets it is preferable to consider the order of frames in the frame set.
  • the similarity determined by the frame set is simply obtained as the sum of the similarities of each frame, the moving image may become unnatural.
  • N frame sets If only the sum of the similarities of the frames is used, there is a possibility that N frame sets in the direction in which the mouth closes are selected.
  • the videos played in that order are very unnatural. Therefore, in calculating the transition cost, the order of frames in the frame set may be added as a parameter.
  • the transition cost when calculating a transition cost when transitioning from frame i to frame j using a frame set, includes a first set of consecutive frames including frame i and frame j. For a second set of consecutive frames, it can be obtained as a linear combination of similarities between corresponding frames of the same order.
  • Non-Patent Document 1 for transition between standby states includes a method for selecting a transition with the minimum cost for each frame and a method for selecting a transition with a cost equal to or less than a certain threshold as a method for selecting a transition to be actually used from the transition cost matrix described above. Is disclosed. In the embodiment of the present disclosure, the former is used, but the latter may be used to select a necessary minimum number of more accurate transitions.
  • the moving image clip according to the embodiment of the present disclosure includes the standby state section and the utterance state section alternately. For this reason, when the technique of Video Textures is simply applied, there is a possibility that only a transition within the same section occurs.
  • a transition within the same section (local transition) and a transition to another section (global transition) are distinguished, and the number of transitions with the minimum cost is selected for each. Specifically, the following calculation is executed for each frame in the standby state. (1) Select N L transitions with the lowest cost among transitions to the same section. (2) Select NG pieces with low cost among transitions to different sections. However, select so that the transition destination sections do not overlap.
  • Transition to the utterance state The transition from the standby state to the utterance state occurs according to an instruction from the dialog engine. At that time, it is not always necessary to make a transition immediately after the instruction, and it naturally takes about one second to “react” with the subject image. A time delay may be provided from the instruction to the actual transition. In that case, candidates for the timing of transition increase, and a more natural transition can be selected. In the present embodiment, the maximum number of frames that can be spent from the time when the instruction is given until the first frame in the speech state is reached is k. When the transition to the utterance state starting from the j-th frame is instructed during the display of the i-th frame, the transition occurs as shown below.
  • the data structure used for the moving image generation system of the present embodiment can be prepared in advance.
  • Such a data structure indicates the similarity between a moving image clip, which is a sequence of a plurality of frames including a subject image, and each frame included in a frame group selected from the plurality of frames and each other frame.
  • the similarity has similarity information prescribed
  • a dialog agent is drawn using the recorded data and the result of pre-calculation.
  • a synthetic moving image in a standby state continues to be generated unless there is an instruction from the dialogue engine.
  • the video generation system in the standby mode accepts an instruction to speak any of the registered lines. Then, the best transition to the instructed dialogue is selected and a transition is made to reproduce the moving image. When the playback of the utterance period is finished, the system returns to the standby mode again. When transitioning to either the standby state or the utterance state, smoother transition is possible by performing frame interpolation.
  • Frame Interpolation at Transition Frame interpolation is performed when transitioning between non-adjacent frames.
  • a method using cross dissolve, morphing, optical flow, or a method further developed from the method can be considered.
  • Cross dissolve is the simplest method of performing linear interpolation for each pixel. In the present embodiment, cross dissolve is used for frame interpolation.
  • transition between frames is performed as shown in FIG.
  • frame i is displayed at time t and transition to frame j (j ⁇ i + 1) at time t + 1.
  • frame i + k and frame j + k ⁇ 1 are combined and displayed at a ratio of d + 1 ⁇ k: k.
  • d is a value defined in “3.2. Transition Cost” and satisfies the relationship of 1 ⁇ k ⁇ d.
  • the standby state is divided into two types, a section in which transition by cross dissolve is performed and a section in which transition is not performed.
  • the “section in which no transition is performed” includes, for example, a video clip playback section before transition and a video clip playback section after transition. It is assumed that the movie clip playback section (time t) before the transition is in the i-th frame. However, t is represented by the number of frames at the same frame rate as the recorded data. At this time, the probability of transition to the jth frame at time t + 1 is The transition destination is selected from among N L + NG selected in advance calculation. As ⁇ 0 is smaller, the probability is biased toward a lower cost transition.
  • the moving image generation system When the moving image generation system according to the present embodiment receives an instruction to utter speech from the dialogue engine, the moving image generation system switches from the standby mode to the utterance mode.
  • the moving image clip since the same dialogue is included in a plurality of moving image portions, a parameter ⁇ 1 for adjusting the probability distribution is introduced, and the probability is similarly calculated using Equation 6. The above-mentioned value is used as the transition cost at that time.
  • the standby state is recorded after the utterance state. For this reason, a transition is naturally made to the standby state.
  • An HTTP (Hypertext Transfer Protocol) server can be implemented in a moving image generation system in an embodiment.
  • the utterance instruction may be given as an HTTP request.
  • FIG. 10 shows a case where, for example, when a transition is made from the (N + X) th frame (X is an integer of 2 or more) to the next frame, a frame used for display twice in the past 10 seconds exists in the candidate frame. Is schematically shown. In this case, the frame (X mark in FIG. 10) is excluded from the candidate frames. However, such exclusion is not performed when there is no transition candidate. Record the history of the past K L visits in each frame.
  • ⁇ Example> shooting was performed using a Kinect (registered trademark) camera with a human (actor) as a subject.
  • RGB data of 1280 ⁇ 960 pixels was acquired at 12 FPS
  • depth data of 640 ⁇ 480 pixels was acquired at 30 FPS in the proximity mode.
  • the center 640 ⁇ 480 of 1280 ⁇ 960 pixels was cut out and used. This is because posture recognition and face recognition do not function if the distance between the camera and the actor is small.
  • the posture direction of the subject image can be further reduced by displaying the depth direction of the posture data on the screen of the display and showing it to the actor, or by using an actual recorded image for posture control.
  • a frame is formed by recording a larger size and cutting out the central portion. You may adjust the range of the center part made into the object of clipping based on attitude
  • the program in the system of this embodiment is basically written in C #.
  • the cost matrix is not stored in the memory in the computer, but is stored in a hard disk drive (HDD) that operates as another recording device. For this reason, processing with many disk accesses is increasing.
  • the computer that performed the pre-calculation has a central processing unit (CPU) that operates at 3.2 GHz and an 8 GB memory, and is connected to a 7200 rpm, 500 GB HDD.
  • CPU central processing unit
  • the playback system according to the present embodiment is realized by a tablet (including a CPU operating at 1.33 GHz, a 2 GB memory, and a 32 GB storage medium) in which Windows® 8.1 (registered trademark) is installed.
  • a tablet including a CPU operating at 1.33 GHz, a 2 GB memory, and a 32 GB storage medium
  • Windows® 8.1 registered trademark
  • the moving image generation system of the present disclosure may further include a line-of-sight tracking device that tracks the line of sight of the subject and outputs line-of-sight information related to the line of sight.
  • the computer calculates an additional cost that decreases as the line of sight of the subject specified based on the line of sight information decreases and increases as the line of sight of the subject increases.
  • the above-mentioned transition cost is corrected using such an additional cost. According to such a modification, it is possible to prevent generation of a composite image in which the line-of-sight direction of the subject image changes discontinuously.
  • the position of the feature point is used to calculate the similarity of the subject image, but the similarity of the subject is not limited to this example.
  • the subject image can be expressed by linearly combining products of each of a plurality of base images prepared in advance and each of a plurality of coefficients.
  • a base image may be referred to as a “unique face”.
  • the feature amount of the subject image can be defined by a combination of these multiple coefficients.
  • a “face” appearing in the i-th frame is expressed by linear superposition of a plurality of face images
  • a set of weighting factors defining the face in the i-th frame and a face in the j-th frame are defined.
  • the “similarity” between frames may be defined by the distance or difference from the set of weighting factors to be performed.
  • FIG. 11 is a diagram illustrating a modification of the moving image display system.
  • the moving image generation system 100 can distribute some or all of the generated combined moving image to the plurality of mobile terminal devices 500 via the communication network 300.
  • the moving image generation system 100 includes a system-side communication circuit 120 that transmits data of a composite moving image generated by the computer 20.
  • the mobile terminal device 500 includes a device-side communication circuit 520 that receives data of a composite video transmitted from the video generation system 100, and a display 550 that displays the composite video.
  • the system-side communication circuit 120 can receive an instruction signal transmitted from the outside via the communication network 300, for example.
  • a part of the communication network 300 may be an Internet line.
  • the instruction signal is sent in response to a user's voice input, for example, by a dialogue engine (not shown).
  • the dialogue engine may be provided in the moving image generation system 100.
  • the computer 20 Until the system-side communication circuit 120 receives the instruction signal, the computer 20 generates data of the synthesized moving image. After receiving the instruction signal, the computer 20 selects a partial clip that is a part of the moving image clip in accordance with the instruction signal.
  • the system-side communication circuit 120 transmits the composite video and partial clip data via the communication network 300.
  • the synthesized moving image and the partial clip received by the device-side communication circuit 520 are displayed.
  • FIG. 12 is a diagram schematically illustrating an exemplary embodiment of a robot system according to the present disclosure.
  • a robot system 400 shown in this figure includes the above-described moving image generation system 100 and a robot 90 including a plurality of actuators (electric motors) 92 for changing at least one of facial expressions and postures.
  • the robot 90 includes a drive circuit 94 that changes at least one of the expression and posture of the robot 90 by driving a plurality of actuators 92 following the change of at least one of the expression and posture of the subject image in the composite video. Yes.
  • the drive circuit 94 supplies power to an appropriate actuator 92 so that the mouth of the robot 90 opens when the subject image of the composite moving image opens the mouth, for example.
  • the robot 90 is equipped with a speaker (not shown) used for speech, and the speech selected by the dialogue engine 70 is uttered as speech.
  • the robot 90 having a three-dimensional structure interacts with the user, not the image displayed on the display.
  • the robot 90 does not have to be a human type, and may be an animal type, for example.
  • FIG. 13 is a flowchart illustrating an example of a processing procedure performed in advance.
  • step S10 video clip data including the standby section and the utterance section is stored in the recording device.
  • This recording device may typically be a recording device different from the recording device provided in the moving image generation system.
  • step S12 a frame group excluding unnecessary frames is selected from a number of frames constituting the moving image clip. This selection can typically be made by a human selecting and specifying an unnecessary frame.
  • step S14 the computer used for the pre-calculation calculates the cost matrix described above. The calculation result is stored in the recording device in the form of a table as described with reference to FIG.
  • FIG. 14 is a flowchart illustrating an example of a processing procedure for moving image generation and moving image display.
  • step S20 after the processing is started, it is determined in step S20 whether or not the input device has received an input from the user. If there is no input, the process proceeds to step S30, and the standby mode operation is executed. In the standby mode, a plurality of candidate frames including the next frame that transitions from the current display frame are selected with reference to a cost matrix prepared in advance and stored in the recording device. In step S32, one frame is determined probabilistically from the candidate frames according to the principle described above. In step S34, the determined next frame is displayed on the display. Thereafter, the process proceeds to step S20. Thus, stochastic inter-frame transition is realized.
  • step S40 an utterance section corresponding to the input is selected.
  • the first frame in any utterance section is selected probabilistically by the same operation as the operation in the standby mode.
  • step S42 the selected top frame is displayed on the display.
  • step S42 the moving image of the utterance section including the first frame is reproduced. In this moving image, the frame transitions according to the frame sequence in the moving image clip.
  • FIG. 15 is a diagram illustrating an internal configuration example of the computer 20.
  • the computer 20 includes a CPU 22, a memory 24, a GPU 26, and an interface (I / F) terminal 28, which are connected to each other via a bus so as to communicate with each other.
  • I / F interface
  • FIG. 15 also shows the recording device 10 connected to the computer 20.
  • the recording device 10 is, for example, a solid state drive (SSD) or an HDD.
  • the CPU 22 is an arithmetic circuit integrated on one chip.
  • the memory 24 is a ROM and / or RAM that stores a computer program composed of instructions that define processing by the processor 22.
  • the computer program is a group of instructions for realizing the above-described processing (for example, FIG. 14).
  • the computer program stored in the ROM is expanded in the RAM.
  • the CPU 22 sequentially reads instructions included in the computer program from the RAM, interprets them, and executes them.
  • the GPU 26 is an image processing circuit called a so-called graphics processing unit.
  • the GPU 26 performs image processing when generating the above-described moving image or image processing when displaying the moving image.
  • the GPU 26 may perform processing for displaying an image obtained as a result of the processing on the display 50 (FIG. 3).
  • the I / F terminal 28 is a connection terminal through which the computer 20 exchanges information with devices provided outside the computer 20. This will be described with reference to the example shown in FIG. Assume that a recording device 10, a display 20, and a dialogue engine 70 are provided outside the computer 20. Regarding the connection between the computer 20 and the recording apparatus 10, the I / F terminal 28 may be, for example, a serial ATA standard connection terminal. Regarding the connection between the computer 20 and the display 50, the I / F terminal 28 can be, for example, a display port (registered trademark), an HDMI (registered trademark), or a DVI video connection terminal.
  • the I / F terminal 28 is a communication terminal capable of performing wired communication of, for example, Ethernet (registered trademark) standard or IEEE 1394 standard, or Wi-fi (registered trademark). ) A communication circuit capable of performing wireless communication according to standards and the like.
  • the moving image generation system and the moving image display system of the present disclosure can be used for generating and displaying a moving image of an anthropomorphic dialogue agent in, for example, a customer response system and a reservation reception system that operate using a computer.
  • the moving image generation system of the present disclosure can also be used for a system in which a robot operates following a change in at least one of the expression and posture of a subject image in a combined moving image.
  • DESCRIPTION OF SYMBOLS 10 ... Recording device, 20 ... Computer, 50 ... Display, 60 ... Interface device, 70 ... Dialogue engine, 80 ... Shooting device, 90 ... Robot, 92 ... Actuator, 94 ... Drive circuit, 100 ... Movie generation system, 120 ... System Side communication circuit, 200 ... video display system, 300 ... communication network, 400 ... robot system, 500 ... terminal device, 520 ... device side communication circuit, 550 ... display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Processing Or Creating Images (AREA)
  • Television Signal Processing For Recording (AREA)
  • Image Analysis (AREA)

Abstract

L'invention vise à générer des images animées synthétisées pouvant afficher des changements réalistes de l'expression ou de la posture d'un sujet. Un système de génération d'images animées selon l'invention comprend : un dispositif d'enregistrement (10) pour enregistrer des clips d'images animées, qui sont des séquences d'une pluralité de trames contenant une image d'un sujet; et un ordinateur (20) qui génère des données d'images animées synthétisées en reconfigurant la pluralité de trames sur la base de degrés de similitude entre chaque trame incluse dans un groupe de trames sélectionnées parmi la pluralité de trames et chacune des autres trames du groupe de trames. Le degré de similitude est défini sur la base de quantités caractéristiques de l'image du sujet dans chaque trame. L'ordinateur (20) désigne, comme trame N+1, qui est la trame consécutive à une trame N (N étant un nombre entier) dans les images animées synthétisées, une trame d'une pluralité de trames candidates sélectionnées dans le groupe de trames, sur la base du degré de similitude de la trame N.
PCT/JP2016/084224 2015-12-04 2016-11-18 Système de génération d'images animées, et système d'affichage d'images animées WO2017094527A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017553773A JP7009997B2 (ja) 2015-12-04 2016-11-18 動画生成システムおよび動画表示システム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015237517 2015-12-04
JP2015-237517 2015-12-04

Publications (1)

Publication Number Publication Date
WO2017094527A1 true WO2017094527A1 (fr) 2017-06-08

Family

ID=58797258

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/084224 WO2017094527A1 (fr) 2015-12-04 2016-11-18 Système de génération d'images animées, et système d'affichage d'images animées

Country Status (2)

Country Link
JP (1) JP7009997B2 (fr)
WO (1) WO2017094527A1 (fr)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113066155A (zh) * 2021-03-23 2021-07-02 华强方特(深圳)动漫有限公司 一种3d表情处理方法及装置
KR20220057754A (ko) * 2020-10-30 2022-05-09 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
WO2022265148A1 (fr) * 2021-06-16 2022-12-22 주식회사 딥브레인에이아이 Procédé pour fournir une vidéo de parole et dispositif informatique pour exécuter le procédé
WO2022270669A1 (fr) * 2021-06-25 2022-12-29 주식회사 딥브레인에이아이 Procédé de fourniture d'une image d'énoncé et dispositif informatique pour la mise en œuvre de ce procédé
US20230005202A1 (en) * 2021-06-30 2023-01-05 Deepbrain Ai Inc. Speech image providing method and computing device for performing the same
KR20230049473A (ko) * 2021-10-06 2023-04-13 주식회사 마인즈랩 영상 촬영 가이드 제공 방법, 장치 및 컴퓨터 프로그램
WO2024038976A1 (fr) * 2022-08-16 2024-02-22 주식회사 딥브레인에이아이 Appareil et procédé permettant de fournir une vidéo vocale
WO2024038975A1 (fr) * 2022-08-16 2024-02-22 주식회사 딥브레인에이아이 Appareil et procédé de fourniture de vidéo de discours
KR102679446B1 (ko) 2022-08-16 2024-06-28 주식회사 딥브레인에이아이 발화 비디오 제공 장치 및 방법

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001076165A (ja) * 1999-09-06 2001-03-23 Fujitsu Ltd アニメーション編集システムおよびアニメーション編集プログラムを記録した記憶媒体
JP2012199613A (ja) * 2011-03-18 2012-10-18 Sony Corp 画像処理装置および方法、並びにプログラム
WO2013146508A1 (fr) * 2012-03-30 2013-10-03 ソニー株式会社 Dispositif et procédé de traitement d'image, et programme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001076165A (ja) * 1999-09-06 2001-03-23 Fujitsu Ltd アニメーション編集システムおよびアニメーション編集プログラムを記録した記憶媒体
JP2012199613A (ja) * 2011-03-18 2012-10-18 Sony Corp 画像処理装置および方法、並びにプログラム
WO2013146508A1 (fr) * 2012-03-30 2013-10-03 ソニー株式会社 Dispositif et procédé de traitement d'image, et programme

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102506604B1 (ko) * 2020-10-30 2023-03-06 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
KR20220057754A (ko) * 2020-10-30 2022-05-09 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
KR20220109373A (ko) * 2020-10-30 2022-08-04 주식회사 딥브레인에이아이 발화 영상 제공 방법
US11967336B2 (en) 2020-10-30 2024-04-23 Deepbrain Ai Inc. Method for providing speech video and computing device for executing the method
KR102639526B1 (ko) * 2020-10-30 2024-02-22 주식회사 딥브레인에이아이 발화 영상 제공 방법
CN113066155A (zh) * 2021-03-23 2021-07-02 华强方特(深圳)动漫有限公司 一种3d表情处理方法及装置
WO2022265148A1 (fr) * 2021-06-16 2022-12-22 주식회사 딥브레인에이아이 Procédé pour fournir une vidéo de parole et dispositif informatique pour exécuter le procédé
KR20220168328A (ko) * 2021-06-16 2022-12-23 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
KR102510892B1 (ko) * 2021-06-16 2023-03-27 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
WO2022270669A1 (fr) * 2021-06-25 2022-12-29 주식회사 딥브레인에이아이 Procédé de fourniture d'une image d'énoncé et dispositif informatique pour la mise en œuvre de ce procédé
KR102509106B1 (ko) * 2021-06-25 2023-03-10 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
KR20230000702A (ko) * 2021-06-25 2023-01-03 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
US11830120B2 (en) 2021-06-30 2023-11-28 Deepbrain Ai Inc. Speech image providing method and computing device for performing the same
WO2023277231A1 (fr) * 2021-06-30 2023-01-05 주식회사 딥브레인에이아이 Procédé permettant de fournir une vidéo de parole et dispositif informatique pour exécuter celui-ci
US20230005202A1 (en) * 2021-06-30 2023-01-05 Deepbrain Ai Inc. Speech image providing method and computing device for performing the same
KR102546532B1 (ko) * 2021-06-30 2023-06-22 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
KR20230003894A (ko) * 2021-06-30 2023-01-06 주식회사 딥브레인에이아이 발화 영상 제공 방법 및 이를 수행하기 위한 컴퓨팅 장치
KR102604672B1 (ko) * 2021-10-06 2023-11-23 주식회사 마음에이아이 영상 촬영 가이드 제공 방법, 장치 및 컴퓨터 프로그램
WO2023058813A1 (fr) * 2021-10-06 2023-04-13 주식회사 마인즈랩 Procédé, dispositif et programme d'ordinateur pour fournir un guide de tournage vidéo
KR20230049473A (ko) * 2021-10-06 2023-04-13 주식회사 마인즈랩 영상 촬영 가이드 제공 방법, 장치 및 컴퓨터 프로그램
WO2024038976A1 (fr) * 2022-08-16 2024-02-22 주식회사 딥브레인에이아이 Appareil et procédé permettant de fournir une vidéo vocale
WO2024038975A1 (fr) * 2022-08-16 2024-02-22 주식회사 딥브레인에이아이 Appareil et procédé de fourniture de vidéo de discours
KR102679446B1 (ko) 2022-08-16 2024-06-28 주식회사 딥브레인에이아이 발화 비디오 제공 장치 및 방법

Also Published As

Publication number Publication date
JP7009997B2 (ja) 2022-01-26
JPWO2017094527A1 (ja) 2018-09-27

Similar Documents

Publication Publication Date Title
WO2017094527A1 (fr) Système de génération d'images animées, et système d'affichage d'images animées
JP7248859B2 (ja) 仮想、拡張、または複合現実環境内で3dビデオを生成および表示するための方法およびシステム
JP7228682B2 (ja) 動画解析のためのゲーティングモデル
US10217261B2 (en) Deep learning-based facial animation for head-mounted display
US8958686B2 (en) Information processing device, synchronization method, and program
CN111540055B (zh) 三维模型驱动方法、装置、电子设备及存储介质
KR20240050463A (ko) 얼굴 재연을 위한 시스템 및 방법
US9852767B2 (en) Method for generating a cyclic video sequence
US20130321586A1 (en) Cloud based free viewpoint video streaming
CN111464834B (zh) 一种视频帧处理方法、装置、计算设备及存储介质
CN113228625A (zh) 支持复合视频流的视频会议
CN117873313A (zh) 具有彩色虚拟内容扭曲的混合现实系统及使用该系统生成虚拟内容的方法
KR20210124312A (ko) 인터랙티브 대상의 구동 방법, 장치, 디바이스 및 기록 매체
US20220156998A1 (en) Multiple device sensor input based avatar
JP7127659B2 (ja) 情報処理装置、仮想・現実合成システム、学習済みモデルの生成方法、情報処理装置に実行させる方法、プログラム
JP2015184689A (ja) 動画生成装置及びプログラム
JP2018113616A (ja) 情報処理装置、情報処理方法、およびプログラム
JP7502354B2 (ja) 3次元(3d)環境用の統合された入出力(i/o)
CN113395569B (zh) 视频生成方法及装置
WO2023231712A1 (fr) Procédé de conduite humaine numérique, dispositif de conduite humaine numérique et support de stockage
JP2022522579A (ja) ウェアラブルヘッドマウントディスプレイのための装置、システム、および方法
US20150002516A1 (en) Choreography of animated crowds
CN116524087A (zh) 融合神经辐射场的音频驱动的说话人视频合成方法及系统
US11961201B2 (en) Viewing 3D photo galleries in VR
KR102284913B1 (ko) 360도 영상에서의 사용자 시야각 선택 방법 및 그 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16870465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017553773

Country of ref document: JP

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 16870465

Country of ref document: EP

Kind code of ref document: A1