CN117933318A - Method for constructing teaching digital person - Google Patents
Method for constructing teaching digital person Download PDFInfo
- Publication number
- CN117933318A CN117933318A CN202410214465.0A CN202410214465A CN117933318A CN 117933318 A CN117933318 A CN 117933318A CN 202410214465 A CN202410214465 A CN 202410214465A CN 117933318 A CN117933318 A CN 117933318A
- Authority
- CN
- China
- Prior art keywords
- human
- limb
- digital
- video
- teaching
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000033001 locomotion Effects 0.000 claims abstract description 22
- 238000005516 engineering process Methods 0.000 claims abstract description 20
- 230000008921 facial expression Effects 0.000 claims abstract description 20
- 230000014509 gene expression Effects 0.000 claims abstract description 17
- 238000013135 deep learning Methods 0.000 claims abstract description 7
- 230000008451 emotion Effects 0.000 claims abstract description 7
- 238000013136 deep learning model Methods 0.000 claims abstract description 6
- 210000003128 head Anatomy 0.000 claims description 16
- 238000009877 rendering Methods 0.000 claims description 12
- 230000008485 antagonism Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 3
- 230000009471 action Effects 0.000 abstract description 19
- 238000012549 training Methods 0.000 abstract description 5
- 230000001815 facial effect Effects 0.000 abstract description 3
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 238000003058 natural language processing Methods 0.000 description 5
- 230000002996 emotional effect Effects 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000013256 Gubra-Amylin NASH model Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000010367 cloning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008909 emotion recognition Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000002360 prefrontal effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Landscapes
- Processing Or Creating Images (AREA)
Abstract
The invention relates to the field of human work, and provides a method for constructing teaching digital human, which uses a deep learning algorithm so-vits-svc to clone the sound tone of a teaching teacher and generate an audio stream with the sound tone of the teaching teacher; constructing a SADTALKER-based Wav2Talker digital human model, and generating natural limbs, gestures and expression dynamic videos of a teaching teacher from audio streams and character pictures by using a deep learning model; applying video-retalking technology to add emotion change to the facial expression; expanding GFPGAN human face eyes and nose super-resolution algorithm, and high-definition of the facial features of the whole human body; and adopting FACECHAIN deep learning model tools to construct a digital human image like a true human portrait. The method for constructing the teaching digital person has the advantages of reduced cost and no need of three-dimensional modeling and motion capture technology to form a virtual person. Training various generation models including tone, gesture, expression and limb action models according to recorded broadcast and live broadcast of the teacher; digital persons with multiple pictorial images.
Description
Technical Field
The invention relates to the field of artificial intelligence, in particular to a method for constructing teaching digital persons.
Background
The digital human technology is an innovative technology based on artificial intelligence and natural language processing technology, and aims to create more intelligent, personalized and humanized human-computer interaction experience. The system can be applied to the fields of virtual assistants, customer service robots, education and training and the like, and can realize more natural and smooth communication with users by simulating human language communication and thinking logic.
The back of digital man-made technology involves a number of techniques such as deep learning, natural language processing, dialog system design, and a large number of corpus and algorithm optimizations. Through continuous training and optimization, the digital human technology can gradually improve the intelligent level of the digital human technology, so that the interaction with the user meets the personalized requirement.
In general, the development of digital human technology represents the latest progress of artificial intelligence and natural language processing technology in the field of human-computer interaction, and provides new possibilities for improving user experience and intelligent service capability.
The prior art scheme is to realize the live broadcasting of virtual characters by using a computer technology and an artificial intelligence technology. The following are some key technical points of live broadcasting of the existing virtual persons:
(1) Three-dimensional modeling: live video requires three-dimensional modeling of virtual characters, including head, body, and limbs. Features such as appearance contours, facial expressions, gestures and the like of the real characters are converted into a three-dimensional model through advanced graphic processing technology and computer vision algorithm.
(2) Motion capture: live video requires capturing the motion of a real person and applying it to the virtual person so that it can simulate the motion of a real person in real time. Motion data of a real character is typically captured using a sensor device or camera and applied to the skeletal system of the virtual character by an algorithm.
(3) And (3) speech synthesis: in live broadcasting of a virtual person, the virtual person needs to have natural and smooth speech expression capability. The voice synthesis technology can convert the characters into realistic voices, so that the virtual characters can live broadcast in natural voices.
(4) Semantic understanding: to achieve semantic understanding and automatic return of virtual characters to viewers, natural language processing and artificial intelligence algorithms are required. These techniques can analyze audience questions or comments and generate meaningful responses. Techniques such as semantic analysis, emotion recognition, and dialog generation play an important role here.
(5) Rendering in real time: virtual live broadcasting requires virtual characters to be rendered in real-time scenes to maintain smooth live broadcasting experience. Real-time rendering techniques transform three-dimensional models into realistic images by exploiting the parallel computing power of a Graphics Processing Unit (GPU) and present them to viewers in real time.
(6) And (3) interaction communication: live video broadcasting requires real-time interactive communication with the audience. This can be achieved by natural language processing, emotion recognition, etc., so that the avatar can understand the questions of the audience and respond accordingly.
Through the combination of the key technical points, the real-time live broadcast of the virtual character can be realized, and brand-new entertainment and communication experience is brought to audience.
Disadvantages of the prior art
(1) Professionals, professional sensors, professional software are required to capture the foreign trade contours, facial expressions, pose gestures of real characters to be converted into three-dimensional models, which are very complex and costly,
(2) Modeling by professional software generally forms a virtual animated person, which is quite different from a real person, and has unreal and unnatural feeling.
(3) The true person lacks timbre and is not dynamically synchronized with the lips of the virtual person.
(4) The true person lacks emotion, and the expression is hard or lacks emotion expression ability.
(5) The posture is awkward, unsynchronized with the voice of the virtual person and uncoordinated.
(6) Facial expression is missing and is mechanized.
(7) The image is single, and only the image is concentrated.
Disclosure of Invention
Aiming at the defects in the prior art and solving the problem of limb action deficiency, the invention provides a method for creating a more comprehensive teaching digital man by combining tone, expression, limbs and gestures.
The technical scheme adopted by the invention is as follows:
A method of constructing a lecture digital person comprising:
(1) Using a deep learning algorithm so-vits-svc to clone the sound tone of the appointed teaching teacher and generate an audio stream with the sound tone of the teaching teacher;
(2) The digital human model of Wav2Talker is constructed, wav2Talker is based on a SADTALKER model, but SADTALKER model only fuses 3 characteristics of sound audio, head posture and facial expression to generate a speaker head video, is limited to head dynamics, lacks limb motions below the head, and cannot express the limb language of a complete digital human. Therefore, a Wav2Talker model is provided, and a Wav2Talker model is added with human body limb motion generation on the basis of a SADTALKER model, wherein the human body limb motion generation comprises human body skeleton key points and hand key points, so that a finished digital human image is formed;
(3) The video-retalking technology is applied to realize that facial expressions add emotion changes, such as happiness, neutrality, sadness, happiness and the like. So that the digital person's performance has the same emotional expression as the real person;
(4) The human face eyes and nose super-resolution algorithm of GFPGAN is expanded, and the whole human face is high-definition. The algorithm is capable of converting an input low resolution image to a high resolution image while preserving the details and sharpness of the image. By this technique, we can refine the whole facial organ of a digital person, making it more realistic and clear.
(5) And adopting FACECHAIN deep learning model tools to construct a digital human image like a true human portrait.
By combining the above technology, the audio stream with the sound tone of the teacher and the lip dynamic, facial expression and limb actions are synchronized, so that the digital person can be expressed more truly and naturally than the traditional virtual person.
The process of constructing the digital mannequin of Wav2Talker is as follows:
Input: the input video key frame sequence V is { V 0,...,Vn }, n is the key frame number, and the input audio corresponding to the input video is recorded as alpha { alpha 0,...,ɑn };
Firstly, 24 human body skeleton key points and 21 hand key points are extracted from a video initial key frame teaching teacher picture V 0 (single frame image), 45 of the key points are called initial limb key points, and the initial limb key points are called eta 0;
Then, build condition generation antagonism network CGAN (limb GAN), the generator module of limb GAN gradually generates { η 1,...,ηn } subsequent limb key point sequence by gradually inputting η 0、{ɑ0,...,ɑn };
Thirdly, the { eta 1,...,ηn } sequence gives out a natural continuous consistent limb action video of a teaching teacher through limb key points to video rendering (skeleton-to-video rendering);
finally, combining SADTALKER human head gestures (gestures VAE) and facial expression (expressions Net) modules to form a complete teaching digital human video.
Compared with the prior art, the invention has the beneficial effects that:
according to the method for constructing the teaching digital person, through cloning the sound tone, the facial expression and the limb action characteristics of the real person teaching teacher, the digital person can replace the real person teaching teacher to conduct 24-hour live broadcast and interaction, so that the teaching effect identical to that of the real person is achieved.
The invention discloses a method for constructing teaching digital persons, which is based on Wav2Lip and SADTALKER, provides a Wav2Talker deep learning model, can complement the defect that Wav2Lip and SADTALKER only can generate the expression and action of the head of a person, and can generate facial expression and limb action, so that the digital person is more complete and more realistic.
The method for constructing the teaching digital person has the advantages of reduced cost, no need of three-dimensional modeling and motion capture technology to form a virtual person, and direct cloning of the original tone and gesture motion of the real person for new video production.
The invention constructs a teaching digital person method, and forms a real tone model of a person according to the past recorded broadcast and live broadcast audio of the cloned person; true natural lip dynamics, facial expression, limb movements; facial expressions with emotional expressions; digital persons with multiple pictorial images.
Drawings
FIG. 1 is a main frame diagram of Wav2 Talker;
In the SadTaler framework, the monocular three-dimensional face reconstruction module uses coefficients of 3DMM as the intermediate motion representation. To this end, we first generate realistic 3D motion coefficients (facial expression β, head pose ρ) from the audio, and then use these coefficients to implicitly adjust the three-dimensional perceived face rendering to generate the final video.
Adding a limb action key point extraction module on the basis of SADTALKER frames, wherein the module extracts 24 human body skeleton key points and 21 hand key points, namely 45 initial limb key points, which are marked as eta 0; however, the build condition generates an antagonism network CGAN (limb GAN), and the generator module of the limb GAN gradually generates { η 1,...,ηn } subsequent limb key point sequences by gradually inputting η 0、{ɑ0,...,ɑn }; thirdly, the { eta 1,...,ηn } sequence gives out a natural continuous consistent limb action video of a teaching teacher through limb key points to video rendering (skeleton-to-video rendering); finally, combining SADTALKER human head gestures (gestures VAE) and facial expression (expressions Net) modules to form a complete teaching digital human video.
Detailed Description
The invention is described in detail below with reference to the attached drawings and examples:
A method of constructing a lecture digital person comprising:
(1) Using a deep learning algorithm so-vits-svc to clone the sound tone of the appointed teaching teacher and generate an audio stream with the sound tone of the teaching teacher;
(2) The digital human model of Wav2Talker is constructed, wav2Talker is based on SADTALKER model, but SADTALKER model only fuses 3 characteristics of voice audio, head gesture and facial expression to generate speaker head state video, is limited to head dynamics, lacks limb actions below the head, and cannot express the limb language of a complete digital human. Therefore, the Wav2Talker model and the Wav2Talker model are added with the action characteristics of human limbs on the basis of the SADTALKER model, wherein the action characteristics comprise human skeleton key points and hand key points, and a finished digital human image is formed;
(3) The video-retalking technology is applied to realize that facial expressions add emotion changes, such as happiness, neutrality, sadness, happiness and the like. So that the digital person's performance has the same emotional expression as the real person;
(4) The human face eyes and nose super-resolution algorithm of GFPGAN is expanded, and the whole human face is high-definition. The algorithm is capable of converting an input low resolution image to a high resolution image while preserving the details and sharpness of the image. By this technique, we can refine the whole facial organ of a digital person, making it more realistic and clear.
(5) And adopting FACECHAIN deep learning model tools to construct a digital human image like a true human portrait.
And generating digital people with various images, such as office images, teacher images, white collar images and the like in real time.
The Shangde teaching digital man system is built through the Pipeline.
By combining the above technology, the audio stream with the sound tone of the teacher and the lip dynamic, facial expression and gesture actions are synchronized, so that the digital person's performance is more real and natural than the traditional virtual person.
As can be seen from fig. 1, the process of constructing the Wav2 Talker-based digital mannequin is as follows:
Input: the input video key frame sequence V is { V 0,...,Vn }, n is the key frame number, and the input audio corresponding to the input video is recorded as alpha { alpha 0,...,ɑn };
Firstly, 24 human body skeleton key points and 21 hand key points are extracted from a video initial key frame teaching teacher picture V 0 (single frame image), 45 of the key points are called initial limb key points, and the initial limb key points are called eta 0;
Then, build condition generation antagonism network CGAN (limb GAN), the generator module of limb GAN gradually generates { η 1,...,ηn } subsequent limb key point sequence by gradually inputting η 0、{ɑ0,...,ɑn };
Thirdly, the { eta 1,...,ηn } sequence gives out a natural continuous consistent limb action video of a teaching teacher through limb key points to video rendering (skeleton-to-video rendering);
finally, combining SADTALKER human head gestures (gestures VAE) and facial expression (expressions Net) modules to form a complete teaching digital human video.
Fig. 1 is a main frame diagram of Wav2Talker, and a limb GAN model is added on the basis of SADTALKER, and the limb GAN model is used for gradually generating subsequent limb key points according to input audio and initial limb key points, and finally rendering a real person action video from key to video.
The limb GAN is a conditional generation countermeasure network (CGAN) which is a variant of the expansion of the original generation countermeasure network (GAN) and which is capable of generating limb motion key points under specific conditions. CGAN is based on the following two key ideas:
(1) Condition input: unlike the original GAN, the conditional GAN introduces additional conditional inputs in the generator and the arbiter. This condition may be any form of auxiliary information such as audio features, preamble limb keypoints, etc. The generator combines these conditions with a random noise input to generate Limb key points (Limb Landmark) under the current conditions; the arbiter uses these conditions as additional inputs to determine the authenticity of the Limb movement key (Limb Landmark) and the consistency, continuity, etc. of the previous Limb key.
(2) Challenge training: the idea of performing countermeasure training between the generator and the arbiter remains unchanged in the condition GAN. The goal of the generator is to generate fluent Limb-motion keypoints (Limb labmark) to fool the discriminators as much as possible; the goal of the arbiter is to determine as accurately as possible whether the generated limb-action keypoints are real or false, and consistent and continuous with the precursor limb keypoints, etc.
The specific working procedure is as follows:
(1) Inputting the condition: the condition information (audio features, prefrontal limb keypoints) is provided as additional input to the generator and the arbiter.
(2) Generating an image: the generator receives random noise and input conditions and generates successive limb-motion keypoints.
(3) Judging true or false: the arbiter receives the generated limb movement key points and input conditions, tries to distinguish the real limb movement key points from the generated limb movement key points, and judges whether the limb movement key points are consistent with and continuous with the preamble key points.
(4) Calculating loss: from the output of the arbiter, the losses of the generator and the arbiter are calculated. The loss goal of the generator is to have the generated limb-action keypoints erroneously classified as true limb-action keypoints by the arbiter, while the loss goal of the arbiter is to accurately judge the distinction between true and generated action keypoints and whether consistency and continuity are maintained.
(5) Updating parameters: the parameters of the generator and the arbiter are updated according to the gradient of the loss function.
By repeatedly iterating the steps, the performances of the generator and the discriminator are gradually improved, and the generated limb action key points are more and more lifelike. The introduction of the condition GAN enables the generator to generate limb action key points with certain attributes under specific conditions, so that the application range of the GAN is expanded.
As mentioned above, for a better understanding of the invention, reference may be made to the following references. The entire contents of each of these references are incorporated herein by reference.
1. Svc-develop-team/so-vits-svc based on generating sound timbre transitions against a network. SoftVC VITS SINGING Voice Conversion
2. Github-Rudrabha/Wav2 Lip. The repository contains code that "one Lip sync expert is enough to implement speech-to-Lip generation", published in ACM Multimedia 2020 .This repository contains the codes of"A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild",published at ACM Multimedia 2020.
3. OpenTalker/SADTALKER [ CVPR 2023] to learn the true 3D motion coefficients for a stylized audio-driven single image speaker face animation .[CVPR 2023]SadTalker:Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation(github.com)
4. Github-TENCENTARC/GFPGAN is intended to develop practical algorithms for real world face restoration. GFPGAN AIMS AT developing Practical Algorithms for Real-world Face Restoration.
5. Github-modelscope/facechain a deep learning tool chain for generating your digital twins. FACECHAIN IS A DEEP-learning toolchain for generating your Digital-Twin.
6. OpenTalker/video-retalking [ SIGGRAPH ASIA 2022] Audio-based lip sync video editing, which can adjust character expressions .[SIGGRAPH Asia 2022]VideoReTalking:Audio-based Lip Synchronization for Talking Head Video Editing In the Wild(github.com)
7. NVlabs/few-shot-vid2vid, namely realizing Pytorch from few-sample realistic video to video, and realizing action migration. Pytorch implementation for few-shot photorealistic video-to-video Transmission (gilth. Com)
The above description is only of the preferred embodiment of the present invention, and is not intended to limit the structure of the present invention in any way. Any simple modification, equivalent variation and modification of the above embodiments according to the technical substance of the present invention fall within the technical scope of the present invention.
Claims (2)
1. A method of constructing a lecture digital person comprising:
(1) Using a deep learning algorithm so-vits-svc to clone the sound tone of the appointed teaching teacher and generate an audio stream with the sound tone of the teaching teacher;
(2) Constructing a digital human model of Wav2Talker, wherein Wav2Talker is based on a SADTALKER model;
(3) The video-retalking technology is applied to realize that the facial expression adds emotion change, so that the expression of the digital person has the emotion expression same as that of a real person;
(4) Expanding GFPGAN human face eyes and nose super-resolution algorithm, and high-definition of the whole human face; the algorithm can convert an input low-resolution image into a high-resolution image, and meanwhile, the details and the definition of the image are kept;
(5) And adopting FACECHAIN deep learning model tools to construct a digital human image like a true human portrait.
2. The digital person of claim 1, wherein:
The process of constructing the Wav2 Talker-based digital human model is as follows:
Input: the input video key frame sequence V is { V 0,...,Vn }, n is the key frame number, and the input audio corresponding to the input video is recorded as alpha { alpha 0,...,ɑn };
firstly, 24 human body skeleton key points and 21 hand key points are extracted from a video initial key frame teaching teacher picture V 0 (single frame image), 45 of the key points are called initial limb key points, and the initial limb key points are marked as eta 0;
Then, build condition generation antagonism network CGAN (limb GAN), the generator module of limb GAN gradually generates { η 1,...,ηn } subsequent limb key point sequence by gradually inputting η 0、{ɑ0,...,ɑn };
Again, the { η 1,...,ηn } sequence goes out of the teacher's natural continuous limb motion video through limb keypoints to video rendering (skeleton-to-video rendering);
finally, combining SADTALKER human head gestures (gestures VAE) and facial expression (expressions Net) modules to form a complete teaching digital human video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410214465.0A CN117933318A (en) | 2024-02-27 | 2024-02-27 | Method for constructing teaching digital person |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410214465.0A CN117933318A (en) | 2024-02-27 | 2024-02-27 | Method for constructing teaching digital person |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117933318A true CN117933318A (en) | 2024-04-26 |
Family
ID=90755572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410214465.0A Pending CN117933318A (en) | 2024-02-27 | 2024-02-27 | Method for constructing teaching digital person |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117933318A (en) |
-
2024
- 2024-02-27 CN CN202410214465.0A patent/CN117933318A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110599573B (en) | Method for realizing real-time human face interactive animation based on monocular camera | |
Mattheyses et al. | Audiovisual speech synthesis: An overview of the state-of-the-art | |
WO2003063079A2 (en) | Apparatus and method for efficient animation of believable speaking 3d characters in real time | |
GB2601162A (en) | Methods and systems for video translation | |
CN111724457A (en) | Realistic virtual human multi-modal interaction implementation method based on UE4 | |
CN115209180A (en) | Video generation method and device | |
CN115049016B (en) | Model driving method and device based on emotion recognition | |
Rebol et al. | Passing a non-verbal turing test: Evaluating gesture animations generated from speech | |
CN113886641A (en) | Digital human generation method, apparatus, device and medium | |
Zhao et al. | Computer-aided graphic design for virtual reality-oriented 3D animation scenes | |
Gachery et al. | Designing MPEG-4 facial animation tables for web applications | |
Wang et al. | Computer-aided traditional art design based on artificial intelligence and human-computer interaction | |
CN117171392A (en) | Virtual anchor generation method and system based on nerve radiation field and hidden attribute | |
Tan et al. | Style2talker: High-resolution talking head generation with emotion style and art style | |
Lokesh et al. | Computer Interaction to human through photorealistic facial model for inter-process communication | |
CN116721190A (en) | Voice-driven three-dimensional face animation generation method | |
Adamo-Villani | 3d rendering of american sign language finger-spelling: a comparative study of two animation techniques | |
Nakatsuka et al. | Audio-guided Video Interpolation via Human Pose Features. | |
Perng et al. | Image talk: a real time synthetic talking head using one single image with chinese text-to-speech capability | |
CN117933318A (en) | Method for constructing teaching digital person | |
CN115984452A (en) | Head three-dimensional reconstruction method and equipment | |
Zeng et al. | Virtual Face Animation Generation Based on Conditional Generative Adversarial Networks | |
Liu | Audio-Driven Talking Face Generation: A Review | |
CN116884066A (en) | Lip synthesis technology-based 2D real person digital avatar generation method | |
WO2024066549A1 (en) | Data processing method and related device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |