WO2022166801A1 - 数据处理方法、装置、设备以及介质 - Google Patents
数据处理方法、装置、设备以及介质 Download PDFInfo
- Publication number
- WO2022166801A1 WO2022166801A1 PCT/CN2022/074513 CN2022074513W WO2022166801A1 WO 2022166801 A1 WO2022166801 A1 WO 2022166801A1 CN 2022074513 W CN2022074513 W CN 2022074513W WO 2022166801 A1 WO2022166801 A1 WO 2022166801A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- user
- video
- target
- data
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 93
- 230000004044 response Effects 0.000 claims abstract description 27
- 238000005457 optimization Methods 0.000 claims description 60
- 238000012545 processing Methods 0.000 claims description 38
- 230000008569 process Effects 0.000 claims description 34
- 238000006243 chemical reaction Methods 0.000 claims description 28
- 238000004590 computer program Methods 0.000 claims description 24
- 238000001514 detection method Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 7
- 230000000977 initiatory effect Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 36
- 238000010586 diagram Methods 0.000 description 28
- 238000004891 communication Methods 0.000 description 16
- 241001672694 Citrus reticulata Species 0.000 description 8
- 230000003796 beauty Effects 0.000 description 6
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000001960 triggered effect Effects 0.000 description 3
- 230000014509 gene expression Effects 0.000 description 2
- 230000036651 mood Effects 0.000 description 2
- 206010013887 Dysarthria Diseases 0.000 description 1
- 208000003028 Stuttering Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010079 rubber tapping Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 208000026473 slurred speech Diseases 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 210000004243 sweat Anatomy 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4888—Data services, e.g. news ticker for displaying teletext characters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/2222—Prompting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/414—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
- H04N21/41407—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42203—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/433—Content storage operation, e.g. storage operation in response to a pause request, caching operations
- H04N21/4334—Recording operations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/8166—Monomedia components thereof involving executable data, e.g. software
- H04N21/8173—End-user applications, e.g. Web browser, game
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/64—Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/765—Interface circuits between an apparatus for recording and another apparatus
- H04N5/77—Interface circuits between an apparatus for recording and another apparatus between a recording apparatus and a television camera
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/76—Television signal recording
- H04N5/91—Television signal processing therefor
- H04N5/915—Television signal processing therefor for field- or frame-skip recording or reproducing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Definitions
- the present application relates to the field of Internet technology, in particular to data processing technology.
- users can print out the content of the document and place it next to the camera as a reminder.
- the user may not be able to quickly locate the content of the speech to be delivered, or the positioning may be wrong.
- the camera will capture the user's actions, which will affect the quality of the final video.
- Embodiments of the present application provide a data processing method, apparatus, device, and medium, which can improve the effectiveness of a word prompting function in a video recording service, thereby improving the quality of the recorded video.
- an embodiment of the present application provides a data processing method, and the method is executed by a computer device, including:
- Collect the user's voice in the video recording service determine the target text that matches the user's voice in the prompt text data associated with the video recording service, and identify the target text;
- an embodiment of the present application provides a data processing method, and the method is executed by a computer device, including:
- Collect the user voice corresponding to the target user perform text conversion on the user voice, and generate the user voice text corresponding to the user voice;
- the same text as the user's voice text is determined as the target text, and the target text is identified in the prompting application.
- an embodiment of the present application provides a data processing apparatus, and the apparatus is deployed on computer equipment, including:
- the startup module is used to respond to the business startup operation in the video application and start the video recording business in the video application;
- the display module is used to collect the user's voice in the video recording service, determine the target text that matches the user's voice in the prompt text data associated with the video recording service, and identify the target text;
- the acquiring module is configured to acquire the target video data corresponding to the video recording service when the text position of the target text in the prompt text data is the end position in the prompt text data.
- an embodiment of the present application provides a data processing apparatus, and the apparatus is deployed on computer equipment, including:
- the prompt text uploading module is used to upload the prompt text data to the prompting application
- the user voice acquisition module is used to collect the user voice corresponding to the target user, perform text conversion on the user voice, and generate the user voice text corresponding to the user voice;
- the user voice text display module is used for determining the same text as the user voice text as the target text in the prompt text data, and marking the target text in the prompting application.
- an embodiment of the present application provides a computer device, including a memory and a processor, the memory is connected to the processor, the memory is used for storing a computer program, and the processor is used for calling the computer program, so that the computer device executes the embodiments of the present application.
- a computer device including a memory and a processor, the memory is connected to the processor, the memory is used for storing a computer program, and the processor is used for calling the computer program, so that the computer device executes the embodiments of the present application.
- One aspect of the embodiments of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is adapted to be loaded and executed by a processor, so that a computer device having a processor executes the implementation of the present application
- the method provided by any of the above aspects in the example.
- a computer program product or computer program comprising computer instructions stored in a computer readable storage medium.
- the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform the method provided by any of the above aspects.
- This embodiment of the present application may respond to a service start operation in a video application, start a video recording service in the video application, collect user voice in the video recording service, and determine a target associated with the user voice in the prompt text data associated with the video recording service Text, mark the target text, so that the user who is speaking can quickly and accurately locate the content of the speech according to the mark, and improve the effectiveness of the text prompt function in the video recording service.
- the text position of the target text in the prompt text data is the end position in the prompt text data, obtain the target video data corresponding to the video recording service.
- the target text that matches the user's voice can be located and identified in the prompt text data, that is, the target text displayed in the video application matches the content of the user's speech, thereby The effectiveness of the text prompt function in the video recording service is improved, and the risk of recording failure caused by users forgetting words is reduced, thereby improving the quality of the recorded video.
- FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of a data processing scenario provided by an embodiment of the present application.
- FIG. 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
- FIG. 4 is a schematic diagram of an interface for inputting prompt text data provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of an interface for starting a video recording service in a video application provided by an embodiment of the present application
- FIG. 6 is a schematic diagram of an interface for displaying prompt text data provided by an embodiment of the present application.
- FIG. 7 is a schematic diagram of an interface for displaying speech rate prompt information provided by an embodiment of the present application.
- FIG. 8 is a schematic diagram of an interface for stopping a video recording service provided by an embodiment of the present application.
- FIG. 9 is a schematic diagram of an interface for performing editing optimization on a recorded video provided by an embodiment of the present application.
- FIG. 10 is a schematic diagram of an interface for recommending a tutorial video according to a speech error type provided by an embodiment of the present application
- FIG. 11 is a flowchart for realizing a video recording service provided by an embodiment of the present application.
- FIG. 12 is a schematic flowchart of a data processing method provided by an embodiment of the present application.
- FIG. 13 is a schematic diagram of an application scenario of a teleprompter provided by an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present application.
- FIG. 15 is a schematic structural diagram of implementing a data processing apparatus provided by an embodiment of the present application.
- 16 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- FIG. 17 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- FIG. 1 is a schematic structural diagram of a network architecture provided by an embodiment of the present application.
- the network architecture may include a server 10d and a user terminal cluster, and the user terminal cluster may include one or more user terminals, and the number of user terminals is not limited here.
- the user terminal cluster may specifically include a user terminal 10a, a user terminal 10b, a user terminal 10c, and the like.
- the server 10d may be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud Cloud servers for basic cloud computing services such as communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
- the user terminal 10a, the user terminal 10b, the user terminal 10c, etc. may include: a smart phone, a tablet computer, a notebook computer, a palmtop computer, a mobile internet device (mobile internet device, MID), a wearable device (such as a smart watch, a smart bracelet, etc.) etc.) and smart terminals with video/image playback functions such as smart TVs.
- the user terminal 10a, the user terminal 10b, and the user terminal 10c can be respectively connected to the server 10d via a network, so that each user terminal can exchange data with the server 10d through the network connection.
- a video application with a video recording function may be installed in the user terminal 10a, wherein the video application may be a video editing application, a short video application, or the like.
- the user can open the video application installed in the user terminal 10a, and the video application can provide the user with a video recording function, and the video recording function can include a conventional shooting mode and a prompting shooting mode; wherein, the conventional shooting mode may refer to using a user terminal.
- the conventional shooting mode may refer to using a user terminal.
- the prompting shooting method may refer to the process of using the camera built in the user terminal 10a or an external camera device to shoot the user, which can be displayed on the terminal screen of the user terminal 10a for the user.
- the content of the manuscript can be switched and displayed according to the user's voice progress (eg scrolling display, etc.).
- the content of the manuscript here can also be referred to as prompt text data in the video recording service.
- the user terminal 10a can respond to the trigger operation for the prompt shooting portal, display the recording page in the video application, and display the recording page in the video application.
- the user Before recording, the user can enter prompt text data on the recording page, or upload existing prompt text data to the recording page.
- the user terminal 10a When the user starts video recording, the user terminal 10a can respond to the user's video recording start operation, start the video recording function in the video application, and during the video recording process, the user terminal 10a can display the progress according to the voice of the user on the terminal screen of the user terminal 10a. to display. In other words, during the video recording process, prompt text data can be displayed according to the progress of the user's voice.
- the switching display speed of the prompt text data in the video application (which can be scrolling speed) is accelerated; when the user's voice speed decreases
- the switching display speed of the prompt text data in the video application is slowed down, that is, the text displayed in the video application of the prompt text data matches the user's voice to ensure the effectiveness of the text prompt function during the video recording process, and Help users to smoothly complete the video recording, which can improve the quality of the recorded video.
- FIG. 2 is a schematic diagram of a data processing scenario provided by an embodiment of the present application. Taking a video recording scenario as an example, the implementation process of the data processing method provided by the embodiment of the present application is described.
- the user terminal 20a shown in FIG. 2 may be any user terminal in the user terminal cluster shown in FIG. 1, and the user terminal 20a is installed with a video application, and the video application has a video recording function.
- User A (the user A may refer to the user of the user terminal 20a) can open the video application in the user terminal 20a and enter the homepage of the video application, the user can perform a trigger operation on the shooting entry in the video application, and the user terminal 20a responds to the In the triggering operation of the shooting portal, a shooting page 20m is displayed in the video application, and the shooting page 20m may include a shooting area 20b, a filter control 20c, a shooting control 20d, a beauty control 20e, and the like.
- the shooting area 20b is used to display the video image captured by the user terminal 20a, and the video image can be a video image directed to the user A, which can be obtained through a camera built in the user terminal 20a or a camera device having a communication connection with the user terminal 20a.
- the shooting control 20d can be used to control the opening and closing of video recording. After entering the shooting page 20m, the shooting control 20d is triggered to perform a trigger operation, which can indicate that shooting is started, and the shot video can be displayed in the shooting area 20b.
- the filter control 20c can be used for the video picture collected by the user terminal 20a.
- the beauty control 20e can be used for the video collected by the user terminal 20a.
- the portraits in the picture are processed for beauty, such as automatically repairing the face shape of the portrait, increasing the eyes of the portrait, and increasing the nose of the portrait.
- the shooting page 20m may also include a prompting and shooting portal 20f.
- the user A can select the teleprompter shooting function in the video application, that is, the user A can perform a trigger operation on the teleprompter shooting entry 20f in the shooting page 20m, and the user terminal 20a can respond to the trigger operation of the user A on the teleprompter shooting entry 20f, and the video
- the shooting page 20m in the application is switched and displayed as the recording page corresponding to the teleprompter shooting entry 20f.
- the recording page can first display a text input area, and user A can enter the document content required for recording the video in the text input area. It can be used to prompt user A during the video recording process; in short, during the video recording process, user A can record according to the content of the manuscript displayed in the video application, and the content of the manuscript at this time can also be called prompt text data 20g.
- the statistical information 20h of the content of the manuscript input by the user A can also be displayed in the text input area, and the statistical information 20h can include the number of words in the content of the input manuscript (that is, the number of prompt words, for example, the number of words in the content of the manuscript is 134) ), and the estimated video duration (eg, 35 seconds) corresponding to the content of the input manuscript.
- User A can add or delete the content of the manuscript according to the estimated video duration. For example, user A wants to record a 1-minute video. When the estimated duration of the video corresponding to the content of the manuscript input by user A in the text input area is 4 minutes, then user A can record the content of the manuscript displayed in the text input area. Delete, so that the estimated video duration corresponding to the deleted manuscript content is about 1 minute (for example, the estimated video duration can range from 55 seconds to 65 seconds); when user A enters the manuscript content in the text input area The estimated duration of the corresponding video is 35 seconds, then user A can increase the content of the manuscript displayed in the text input area, so that the estimated duration of the video corresponding to the increased content of the manuscript is about 1 minute, and then the final determination can be made. The content of the document is determined as the text prompt text 20g.
- the user A After the user A determines the prompt text data 20g, the user A can perform a trigger operation on the "next" control in the recording page, and the user terminal 20a can respond to the trigger operation for the "next" control and turn on the camera ( or a camera device with a communication connection), enter the video recording preparation state (that is, before the video starts recording); as shown in FIG. 2, the video screen 20i for user A collected by the user terminal 20a can be displayed on the recording page, and in the recording page The recording page displays a prompt message "adjust the position and put the mobile phone, and say 'start' to start teleprompter shooting", that is, user A can adjust his position and the position of the user terminal 20a according to the video screen 20i. After adjusting the position, the user A video recording can be initiated by voice, eg a user can initiate a video recording by saying "start”.
- the user terminal 20a may respond to the voice activation operation of the user A, start the video recording in the video application, and display the prompt text data 20g on the recording page.
- the text displayed on the recording page may only be part of the text in the prompt text data 20g, such as a sentence in the prompt text data 20g, so after starting the video recording, the first sentence in the prompt text data 20g can be displayed first. talk.
- the user terminal 20a can collect the user voice corresponding to the user A, and the client of the video application installed in the user terminal 20a can transmit the user voice to the background of the video application
- the server 20j sends a voice matching instruction to the background server 20j.
- the background server 20j can convert the user's voice into the user's voice and text.
- the background server 20j can also Convert the user's voice text into the first Hanyu Pinyin (when the user's voice text is Chinese, the first syllable information can be called the first Hanyu Pinyin); of course, after the user A enters the prompt text data 20g in the text input area, the video application The client can also transmit the prompt text data 20g to the background server 20j, so the background server 20j can convert the prompt text data 20g into the second Hanyu Pinyin (when the user's voice text is Chinese, the second syllable information can be called the second Chinese Pinyin).
- the background server 20j can match the first Hanyu Pinyin and the second Hanyu Pinyin, search for the same pinyin as the first Hanyu Pinyin in the second Hanyu Pinyin, that is, search for the text position of the first Hanyu Pinyin in the second Hanyu Pinyin,
- the text corresponding to the text position in the prompt text data 20g is determined as the target text (that is, the text matched by the user's voice in the prompt text data 20g), and the background server 20j can transmit the target text to the client of the video application, and the terminal device 20a
- the target text can be identified in the video application (such as increasing the display size of the target text, changing the display color of the target text, enclosing the target text by a circle or a rectangular frame, etc.). Understandably, when user A speaks in the order of the text prompt data, the prompt text data can be scrolled and displayed on the recording page; when user A does not speak in the order of the text prompt data, the prompt text data can
- the sentence in which the target application is located can be identified in the video application.
- the background server 20j can match the target text corresponding to the user's voice in the prompt text data 20g as: weekend.
- the target text "weekend” can be recorded on the recording page ” is marked with the sentence “On weekends, participate in the consumption class of xx and xx in Changsha” (increase the display size of the text, and make the text bold, as shown in the area 20k in Figure 2).
- the prompt text data 20g can be displayed directly on the recording page, or can be displayed on a sub-page that is independently displayed on the recording page. This application does not limit the display form of the prompt text data 20g on the recording page. .
- the purpose of matching the user's voice in the prompt text data 20g is to determine the text position of the user's voice in the prompt text data 20g, and when converting the user's voice into the user's voice text, only the consistency between the pronunciation of the text and the user's voice can be considered. There is no need to consider the accuracy between the converted user voice text and the user voice, so Chinese audio can be used for matching, which can improve the matching efficiency between the user voice and the prompt text data.
- the user terminal 20a can collect the user voice spoken by the user A in real time, and the background server 20n can determine the target text corresponding to the user voice in the prompt text data 20g in real time, and then can scroll and display the prompt text data according to the progress of the user voice. For example, when user A speaks to the first sentence in the prompt text data 20g, the first sentence in the prompt text data 20g can be identified on the recording page; user A speaks to the second sentence in the prompt text data 20g When speaking, on the recording page, the first sentence in the prompt text data 20g can be switched and displayed to the second sentence, and the second sentence can be marked, and the target text marked each time on the recording page is User A The content of the current speech.
- the user terminal 20a may turn off the video recording, and determine the video recorded this time as the recorded video. If user A is satisfied with the video recorded this time, the video can be saved; if user A is not satisfied with the video recorded this time, user A can re-shoot. Of course, user A can also perform editing optimization on the recorded video to obtain a final recorded video, that is, to obtain target video data.
- prompt text data may be displayed according to the progress of the user's voice, so as to achieve the effect of accurate word prompting for the user, thereby improving the quality of the recorded video.
- FIG. 3 is a schematic flowchart of a data processing method provided by an embodiment of the present application. Understandably, the data processing method can be executed by a computer device, and the computer device can be a user terminal, or an independent server, or a cluster composed of multiple servers, or a system composed of a user terminal and a server, or a system composed of a user terminal and a server.
- a computer program application (including program code), which is not specifically limited here.
- the data processing method may include the following S101-S103:
- a user who needs to perform video recording may be referred to as a target user, and a device used by the target user for video recording may be referred to as a computer device.
- the target user performs a service start operation for the video recording service in the video application installed on the computer device
- the computer device can respond to the service start operation in the video application and start the video recording service in the video application, that is, in the video application Start video recording in .
- the service initiation operations may include, but are not limited to, contact trigger operations such as single-click, double-click, long-press, and tapping on the screen, and non-contact trigger operations such as voice, remote control, and gesture.
- the target user before the computer device starts the video recording service, the target user can also upload the prompt text data required in the video recording service to the video application, and the prompt text data can be used to prompt the target user in the video recording service, which can greatly reduce the The target user forgets words during the video recording process.
- the target user opens the video application installed in the computer device, he can enter the shooting page in the video application (for example, the shooting page 20m in the embodiment corresponding to FIG. 2 above), and the shooting page of the video application can include a prompting and shooting entrance. .
- the computer device may respond to the trigger operation on the teleprompter shooting entry in the video application, and display a recording page in the video application, where the recording page may include a text input area , the text input area can be used to edit the text content; the computer device can respond to the information editing operation for the text input area, and display the prompt text data determined by the information editing operation in the text input area.
- the quantity threshold here can be preset according to actual needs, for example, the quantity threshold can be set to 100
- the number of prompt texts and the estimated video duration corresponding to the number of prompt texts can be displayed in the text input area.
- the shooting page can be switched to the recording page in the video application, and the target user can edit the text input area of the recording page.
- the content of the manuscript received that is, the above-mentioned prompt text data
- the number of prompt words input in the text input area can be counted in real time, and when the number of prompt words is greater than the preset number threshold,
- the number of prompt texts and the estimated video duration corresponding to the currently input prompt text data can be displayed in the text input area.
- the teleprompter shooting portal can also be displayed on any page of the video application, and the embodiment of the present application does not limit the display position of the teleprompter shooting portal.
- the estimated video duration can be used as the duration reference information of the finished video recorded in the subsequent video recording service.
- the target user can add or delete the text in the text input area. For example, when the estimated duration of the video displayed in the text input area is 35 seconds, and the target user expects the recorded video duration to be 2 minutes, the target user can continue to edit the text in the text input area until the video displayed in the text input area is The estimated video duration is within the set duration range (for example, the estimated video duration is between 1 minute and 50 seconds to 2 minutes and 10 seconds).
- the displayed recording page can also display the text upload control, and the target user can perform the trigger operation on the text upload control in the recording page, and the edited
- the prompt text data is uploaded to the recording page, that is, the computer device can respond to the trigger operation for the text upload control, determine the text content uploaded to the recording page as prompt text data, and display the prompt text data in the text input area of the recording page.
- the number of prompt texts corresponding to the prompt text data and the estimated video duration corresponding to the prompt text data can also be displayed.
- the text upload control may include but is not limited to: a text paste control and a select last text control; when the target user performs a trigger operation on the text paste control, it means that the target user can directly paste the pre-edited prompt text data to the text input In the area, there is no need to temporarily edit the text content; when the target user performs a trigger operation on selecting the last text control, it means that the target user can use the prompt text data in the last video recording service in this video recording service, that is, the target user may If you are not satisfied with the finished video recorded in the last video recording service, re-recording in this video recording service can avoid repeatedly inputting the same prompt text data, thereby improving the input efficiency of prompt text data.
- FIG. 4 is a schematic diagram of an interface for inputting prompt text data provided by an embodiment of the present application.
- the user terminal 30a can respond to the trigger operation for the shooting portal (the user terminal 30a at this time can be the above-mentioned computer equipment)
- a shooting page 30g is displayed in the video application, and the shooting page 30g may include a shooting area 30b, a filter control 30c, a shooting control 30d, a beauty control 30e, a prompting shooting entrance 30f, and the like.
- the description of the functions of the shooting area 30b, the filter control 30c, the shooting control 30d and the beauty control 30e in the video application can be referred to in the embodiment corresponding to FIG. 2 above.
- the functional description of the beauty control 20e which will not be repeated here.
- the user terminal 30a may respond to the trigger operation on the teleprompter shooting entry 30f in the shooting page 30g, and switch the shooting page 30g to display the recording in the video application
- the page 30h, the recording page 30h may include a text input area 30i, which may be used to directly edit the text content.
- the target user can click on the text input area 30i, the keyboard 30p pops up on the recording page 30h, and the prompt text data required in this video recording service can be edited through the keyboard 30p.
- the text content determined by the editing operation is displayed in the text input area 30i as hint text data.
- the user terminal 30a can count the number of prompt characters of the prompt text data input in the text input area 30i in real time.
- the number of prompt characters of the prompt text data input in the text input area 30i is greater than the preset number threshold (for example, when the quantity threshold is set to 100)
- the number of prompt texts and the estimated length of filming corresponding to the input prompt text data ie, the estimated video length
- the target user enters the text content in the text input area 30i “On weekends, participate in the consumption class of xx and xx cooperation in Changsha.
- the user terminal 30a obtained statistics
- the number of prompt texts is 32, and the estimated length of the film is 15 seconds, that is, "the current number of characters is 32, and the estimated filming time is 15 seconds" in the area 30m;
- the target user can edit the text content according to the estimated filming time displayed in the area 30m , after the target user finishes editing the text content in the text input area 30i, the text content in the text input area 30i can be determined as prompt text data, and then the "Next" control 30n in the recording page 30h can be triggered to perform a trigger operation to Trigger the user terminal 30n to enter the next operation of the video recording service.
- the text input area 30i may further include a paste text control 30j and a last text control 30k.
- a trigger operation on the paste text control 30j it means that the target user has edited the prompt text in other applications. data, and copy the prompt text data from other applications; the user terminal 30a pastes the prompt text data copied by the target user into the text input area 30i in response to the trigger operation for the paste text control 30j.
- the target user can perform a trigger operation on the last text control 30k, and the user terminal 30a responds to the above
- the trigger operation of the secondary text control 30k obtains the prompt text data in the last video recording service, displays the prompt text data in the last video recording service in the text input area 30i, and directly converts the prompt text data used in the last video recording service.
- the prompt text data is used as the prompt text data of this video recording service.
- the target user can adjust the prompt text data used in the last video recording service in the text input area 30i according to the experience of the last video recording service. If there is a logic error in sentence 1, in this video recording service, the prompt text data of the previous video recording service can be modified in the text input area 30i.
- the target user uses the paste text control 30j and the last text control 30k to input the prompt text data in the video recording service into the text input area 30i, which can improve the input efficiency of the prompt text data in the video recording service.
- the target user can perform a voice start operation on the video recording service in the video application after completing the editing operation of the prompt text data, and the computer device can respond to the above voice start operation and record the video application in the video application.
- the page displays the recording countdown animation associated with the video recording service.
- the recording countdown animation ends, the video recording service in the video application is started and executed, that is, the video recording is officially started.
- the camera device corresponding to the computer device can be turned on, and the target user can adjust the position of himself and the computer device according to the video screen displayed on the recording page to find the best shooting angle.
- the animation cancellation control corresponding to the recording countdown animation can also be displayed on the recording page.
- the animation cancellation control can be triggered to cancel the recording of the countdown animation; that is, the computer device can respond to the target user's The trigger operation of the animation cancel control, cancels the recording countdown animation on the recording page, and starts and executes the video recording service in the video application.
- the video application will not directly enter the formal recording mode, but will play the recording countdown animation on the recording page, providing the target user with a short recording preparation time (that is, recording the countdown animation. time, such as 5 seconds), the official recording mode will only be entered after the recording countdown animation is played; or the target user can cancel the playback of the recording countdown animation and directly enter the official recording mode if the recording is prepared in advance.
- FIG. 5 is a schematic diagram of an interface for starting a video recording service in a video application according to an embodiment of the present application.
- the target user can perform the next operation (such as performing a trigger operation on the “next step” control 30n in the embodiment corresponding to the above-mentioned FIG. 4 ), and exit the display text input area on the recording page.
- the target user edits the prompt text data and performs the next step, he can exit the text input area in the recording page 40b, and display the video screen of the target user in the area 40c of the recording page 40b.
- the prompt information 40d (“adjust the position, put the mobile phone and say 'start' to start the teleprompter shooting") can also be displayed on the recording page 40b, that is, before starting the video recording service, the user terminal 40a (the user terminal 40a at this time can be called as computer equipment) just can open its associated camera equipment (such as the camera that comes with the user terminal 40a), collect the image data of the target user, and render the collected image data into a video screen corresponding to the target user, on the recording page 40b
- the video screen of the target user is displayed in the area 40c of .
- the target user can adjust the position of himself and the lens according to the video image displayed in the area 40c, so as to find the best shooting angle.
- the target user adjusts the position of himself and the camera, that is, after the target user is ready to record the video, he can say "start” to start the video recording service in the video application.
- the user terminal 40a may respond to the voice activation operation for the video recording service, and display a recording countdown animation in the area 40e of the recording page 40b, which The countdown animation can be recorded as long as 5 seconds.
- the first few sentences of the prompt text data eg, the first two sentences of the prompt text data
- the user terminal 40a can start and execute the video recording service in the video application. If the target user does not want to wait for the recording countdown animation to finish playing before starting the video recording service, he can trigger the animation cancel control 40f on the recording page 40b to cancel the playback of the recording countdown animation on the recording page 40b, and directly start and execute the video Recording business.
- the target user can start to speak, and the user terminal 40a can collect the user's voice of the target user, find the target text matching the user's voice in the prompt text data, and record the target text in the area 40g of the recording page 40b.
- the text is identified (for example, the target text is bolded or enlarged), wherein the specific determination process of the target text will be described in the following S102.
- S102 Collect the user's voice in the video recording service, determine a target text matching the user's voice in the prompt text data associated with the video recording service, and identify the target text.
- the computer device can enable the audio capture function, collect the user voice of the target user in the video recording service, find the target text that matches the user's voice in the prompt text data, and check the prompt text data on the recording page.
- the included target text is identified.
- the computer equipment can collect the user voice of the target user in the video recording service in real time, and by performing text conversion on the user voice, determine the text position corresponding to the user voice in the prompt text data, and determine the target text corresponding to the user voice according to the text position.
- the target text is identified on the page.
- the identification can include but is not limited to: text display color, text font size, text background, and the target text can refer to the text data containing the user's voice text.
- the user's voice text is: New Year
- the target text at this time can refer to A complete sentence containing "New Year", such as the target text: As the New Year arrives, I wish everyone acultivated year of the Ox.
- the computer equipment refers to the directly collected voice as the user's initial voice, that is, the computer equipment can collect the user's initial voice in the video recording service, and perform Voice Activity Detection (VAD) on the user's initial voice to obtain the valid voice in the user's initial voice. data, and determine the valid voice data as the user voice. Then, the user's voice can be converted into user's voice and text, text matching is performed on the prompt text data associated with the user's voice text and the video recording service, and the target text matching the user's voice text can be determined in the prompt text data; In the page, the target text is identified.
- VAD Voice Activity Detection
- the user's initial voice collected by the computer equipment may contain the noise of the target user's environment and the paused part of the target user's speech. Therefore, the voice endpoint detection can be performed on the user's initial voice and the The mute and noise are deleted as interference information, and the valid voice data in the user's initial voice is retained, and the valid voice data at this time may be called the user voice of the target user.
- the computer equipment can convert the user's voice into the user's voice and text through a fast voice-to-text model, compare the user's voice text with the prompt text data, and find the text position of the user's voice text in the prompt text data, and then can according to the text. The location determines the target text corresponding to the user's voice in the text data, and the target text can be identified on the recording page of the video recording service.
- the fast speech-to-text model means that in the process of converting the user's speech into text, there is no need to correct the context, and it is not necessary to consider whether the semantics is correct.
- the computer device can determine the target text corresponding to the user's voice in the prompt text data according to the pronunciation of the user's voice text and the pronunciation of the prompt text data, that is, the computer device can obtain The first syllable information corresponding to the user's voice text, obtain the second syllable information corresponding to the prompt text data associated with the video recording service, obtain the target syllable information that is the same as the first syllable information in the second syllable information, and in the prompt text data Determine the target text corresponding to the target syllable information.
- the syllable information may refer to pinyin information in Chinese, or phonetic symbol information in English, or the like.
- the computer device can convert the user's voice text into the first pinyin information, convert the prompt text data into the second pinyin information, and find the text position corresponding to the first pinyin information in the second pinyin information, according to The text position determines the target text corresponding to the user's voice in the prompt text data; when the prompt text data is in other languages such as English, the computer device can convert the user's voice text into the first phonetic symbol information, and convert the prompt text data into the second phonetic symbol information , and further, according to the first phonetic symbol information and the second phonetic symbol information, the target text corresponding to the user's voice can be determined in the prompt text data.
- the area used to display the target text in the recording page can be set according to the terminal screen size of the computer device, as shown in the above-mentioned FIG.
- the width is the same as the screen width of the computer device (eg, the user terminal 40a), and the display height of the area 40g is smaller than the screen height of the computer device.
- the terminal screen size of the computer equipment is large (such as the display screen of a desktop computer)
- the size and width of the area used to display the target text is the same as the terminal screen size and width of the computer equipment, the target user will watch the target text in the video recording service.
- a text prompt area corresponding to the target text can be determined on the recording page of the video recording service according to the position of the camera device corresponding to the computer device, and the prompt text data can be displayed in the prompt text data according to the target text.
- the text position of mark the target text in the text prompt area.
- the target user can face the camera head-on.
- FIG. 6 is a schematic diagram of an interface for displaying prompt text data provided by an embodiment of the present application.
- the user terminal 50a that is, the above-mentioned computer equipment
- determines the target text corresponding to the user's voice "Weekend, participate in the consumption class of xx and xx in Changsha" in the prompt text data it can be
- a text prompt area 50e for displaying the target text is determined on the recording page 50b of the video recording service, and the text prompt area 50e is located in the same orientation as the camera 50d.
- the target user's video picture may be displayed in the area 50c of the recording page 50b, and the video recording duration (eg, the video recording duration is 00:13 seconds) may be displayed in the area 50f of the recording page 50b.
- the video recording duration eg, the video recording duration is 00:13 seconds
- the computer equipment can collect the user's initial voice of the target user in real time, obtain the voice duration corresponding to the user's initial voice and the number of voice characters contained in the user's initial voice, and determine the ratio of the voice and text quantity to the voice duration as the user's voice.
- the speech rate threshold can be manually set based on actual needs, for example, the speech rate threshold is 500 words/minute
- the speech rate prompt information can be displayed on the recording page.
- the speech rate prompt information may be used to prompt the target user associated with the video recording service to reduce the user's speech rate.
- the computer device can obtain the user's speech rate of the target user in real time.
- the user's speech rate is greater than the speech rate threshold, it indicates that the target user's speech rate in the video recording service is too fast, and the target user can be reminded to appropriately slow down the speech rate.
- FIG. 7 is a schematic diagram of an interface for displaying speech rate prompt information provided by an embodiment of the present application.
- the user's speech rate of the target user can be determined according to the number of voice characters and the voice duration included in the user's initial voice.
- the speech rate prompt information 60c may be displayed on the recording page 60b of the video recording service (for example, the speech rate prompt information may be "You are currently The speaking rate is too fast, to ensure the quality of the recorded video, please slow down your speaking rate").
- the target user may also be reminded to slow down the speech rate in the form of voice broadcast, and the embodiment of the present application does not limit the display form of the speech rate prompt information.
- the recording page of the video recording service may further include a cancel recording control and a complete recording control.
- the computer device can respond to the trigger operation for the cancel recording control, cancel the video recording service, delete the video data recorded by the video recording service, and generate a recording for the video recording service.
- Prompt information the recording prompt information is displayed on the recording page, wherein the recording prompt information may include a re-recording control.
- the computer device can respond to the triggering operation for the re-recording control, and switch the target text displayed on the recording page to display the prompt text data, that is, display the prompt text in the text input area of the recording page. data, and restart the video recording service.
- the recording prompt information can also include a home page control, the target user performs a trigger operation on the home page control, and the computer device can respond to the trigger operation for the home page control, in the video application, switch the recording page to display the application home page, that is, cancel the ongoing After the video recording service is executed, the video recording service will not continue to be enabled for the time being.
- the computer device may respond to the triggering operation for the recording completion control, stop the video recording service, and determine the video data recorded by the video recording service as the recorded target video data, That is, the video recording service is stopped when the prompt text data is not finished, and the video recorded before the video recording service is stopped is called target video data.
- FIG. 8 is a schematic diagram of an interface for stopping a video recording service provided by an embodiment of the present application.
- the user terminal 70a that is, the above-mentioned computer equipment
- the user terminal 70a can determine the target text of the user's voice in the prompt text data of the video recording service according to the user's voice of the target user in the video recording service, and record the target text on the recording page 70b.
- the text is identified, that is, the user terminal 70a can scroll and display the prompt text data according to the progress of the user's voice.
- a cancel recording control 70c and a complete recording control 70d may also be displayed in the recording page 70b.
- the user terminal 70a may respond to the triggering operation on the recording completion control 70d, stop the video recording service, and save the video data recorded by the current video recording service, that is, the current video recording is completed. business.
- the user terminal 70a can respond to the trigger operation for the cancel recording control 70c, cancel the video recording service, and delete the video data recorded by this video recording service, and the user terminal 70a can be
- the target user in the video recording service generates recording prompt information 70e (for example, the recording prompt information can be "The recorded segment will be cleared, re-shoot the segment?"), and the recording prompt information 70e is displayed on the recording page 70b of the video recording service.
- the recording prompt information 70e may include a "return to the home page” control and a "re-shoot” control; when the target user performs a trigger operation on the "return to the home page” control, the user terminal 70a can exit the video recording service, from the recording page 70b Returning to the application home page of the video application, that is, the target user gives up re-shooting; when the target user performs a trigger operation on the "re-shoot” control, the user terminal 70a can exit the video recording service, return to the text input area from the recording page 70b, and log in The prompt text data is displayed in the text input area, that is, the target user chooses to re-record the video.
- the computer device can automatically end the video recording without the target user's operation. service, and save the video data recorded in the video recording service, and determine the target video data from the video data recorded in the video recording service.
- the computer device may determine the video data saved when the video recording service is stopped as the original video data, enter the editing page of the video application, and display the original video data and the editing optimization controls corresponding to the original video data in the editing page of the video application.
- the target user can perform a triggering operation on the editing optimization control displayed in the editing page, and the computer device at this time can respond to the triggering operation for the editing optimization control, and display M editing optimization methods for the original video data, wherein M is a positive integer , that is, M can take a value of 1, 2, .
- the editing optimization method of pauses between sentences may be called the second editing method
- the computer equipment can respond to the selection operation for the M editing optimization methods, according to The editing optimization mode determined by the operation is selected, and the editing optimization processing is performed on the original video data to obtain target video data corresponding to the video recording service. It can be understood that the display area and display size of the original video data and the target video data in the editing page can be adjusted according to actual requirements.
- the display area of the original video data may be located at the top of the editing page, or may be located at the bottom of the editing page, or may be located in the middle area of the editing page, etc.; the display of the original video data (or the target video data)
- the size can be a 16:9 aspect ratio, etc.
- the computer equipment can obtain the target voice data contained in the original video data, and convert the target voice data into The target text result, and then the target text result can be compared with the prompt text data, and the text that is different from the prompt text data in the target text result is determined as the wrong text; the voice data corresponding to the wrong text is deleted in the original video data, The target video data corresponding to the video recording service is obtained.
- the computer equipment can use an accurate speech-to-text model to perform text-conversion processing on the target speech data contained in the original video data.
- the above-mentioned precise speech-to-text model It can learn the semantic information in the target speech data, not only need to consider the consistency between the converted text pronunciation and the user's speech, but also consider the semantic information between the user's speech, and correct the converted text through the contextual semantic information.
- the computer equipment can perform voice endpoint detection on the target voice data contained in the original video data, remove noise and mute in the original video data, and obtain valid voice data in the original video data.
- the data is converted into text to obtain the target text result corresponding to the target voice data; the text contained in the target text result and the text contained in the prompt text data are compared one by one, and then the target text result and the prompt text data can be compared between the two.
- Different texts are determined to be wrong texts, and the wrong texts here may be caused by the target user making a slip of the tongue during the recording process of the video recording service.
- the computer equipment deletes the voice data corresponding to the wrong text from the original video data, and the final target video data can be obtained.
- the computer equipment can convert the target voice data contained in the original video data into the target text result.
- the text in the target text result that is different from the prompt text data is determined as the error text; then the target text result can be divided into N text characters, and the timestamps of the N text characters in the target speech data can be obtained, where N is a positive integer, such as N can be 1, 2, ...; the computer device can determine the speech pause segment in the target speech data according to the timestamp, delete the speech pause segment and the speech data corresponding to the wrong text in the original video data, The target video data corresponding to the video recording service is obtained.
- N is a positive integer, such as N can be 1, 2, ...
- the process of obtaining the speech pause segment by the computer device may include: the computer device may perform word segmentation processing on the target text result corresponding to the target speech data, obtain N text characters, and obtain the time stamp of each text character in the target speech data, that is, in the target speech data. For the timestamps in the original video data, according to the timestamps corresponding to each of the two adjacent text characters in the N text characters, the time interval between each adjacent two text characters is obtained. When the time interval is greater than the duration threshold (for example, the duration threshold can be set to 1.5 seconds), the speech segment between two adjacent text characters can be determined as a speech pause segment, wherein the number of speech pause segments can be one, or It can be multiple or zero (that is, there is no speech pause segment).
- the duration threshold for example, the duration threshold can be set to 1.5 seconds
- N text characters can be represented as: text character 1, text character 2, text character 3, text character 4, text character 5, and text character 6 according to the arrangement order in the target text result, and text character 1 is in the original
- the timestamp in the video data is t1
- the timestamp of text character 2 in the original video data is t2
- the timestamp of text character 3 in the original video data is t3
- the timestamp of text character 4 in the original video data is t4
- the time stamp of text character 6 in the original video data is t5
- the time stamp of text character 6 in the original video data is t6
- the computer device calculates that the time interval between text character 2 and text character 3 is greater than the duration threshold, Then the speech segment between the text character 2 and the text character 3 can be determined as the speech pause segment 1.
- the speech segment in between is determined as speech pause segment 2.
- the final target video data can be obtained by deleting the speech corresponding to the wrong text and the video segments corresponding to the speech pause segment 1 and the speech pause segment 2 respectively in the original video data.
- FIG. 9 is a schematic diagram of an interface for editing and optimizing a recorded video provided by an embodiment of the present application.
- the editing page 80b of the video application can be entered, and the video data 80c (such as the above-mentioned original video data) recorded in the video recording service can be previewed and played in the editing page 80b.
- the video data 80c can be displayed in the clip page 80b according to the ratio of 16:9, and the time axis 80d corresponding to the video data 80c can be displayed in the clip page 80b.
- the time axis 80d can include the video nodes in the video data 80c, and the target user can Quickly locate the playback point in the video data 80c through the video node in the time axis 80d.
- the clip optimization control 80e (also referred to as a clip optimization option button) may also be displayed in the clip page 80b.
- the user terminal 80a ie, the computer device
- the clip optimization control 80e can respond to the clip optimization control 80e.
- the selection page 80f pops up in the editing page 80b (in the embodiment of the present application, the selection page may refer to a certain area in the editing page, or a sub-page displayed independently in the editing page, or a floating page in the editing page page, or a page covering the clip page, the display form of the selected page is not limited here).
- different editing optimization methods for the video data 80c can be displayed, and the corresponding video durations of the different editing optimization methods can be displayed; as shown in FIG. (i.e. the above-mentioned first editing mode), the video duration of the optimized video data 80c is 57 seconds (the video duration of the video data 80c is 60 seconds); "Pause" (that is, the above-mentioned second editing method), the video duration of the optimized video data 80c after editing is 50 seconds; if the target user chooses not to do any processing in the selection page 80f, that is, keep the video data 80c without processing.
- the user terminal 80a can perform text conversion processing on the target voice data in the video data 80c, obtain the target text result corresponding to the target voice data, and compare the target text result with the target text result. Text matching is performed on the prompt text data to determine the erroneous text, and the voice data corresponding to the erroneous text is deleted in the video data 80c to obtain the target video data.
- the user terminal 80a deletes the voice data corresponding to the erroneous text in the video data 80c, and the voice pause segment in the video data 80c, and then obtains the target Video data
- the target video data here refers to the video data in which the slip-of-slip part and the pause part between sentences are deleted.
- the target user can save the target video data, or upload the target video data to the information release platform, so that all user terminals in the information release platform can watch the target video data.
- the above error text may include K error sub-texts, where K is a positive integer, for example, the value of K may be 1, 2, ...; the computer device may, according to the K error sub-texts and the video duration corresponding to the original video data, Determine the error frequency in the video recording service; when the error frequency is greater than the error threshold (for example, the error threshold can be set to 2 errors per minute), identify the speech error types corresponding to the K error sub-texts, and then can be used in video applications.
- the error threshold for example, the error threshold can be set to 2 errors per minute
- the computer device can recommend a corresponding tutorial video for the target user in the video application according to the speech error type corresponding to the error text, where the speech error type includes but is not limited to: non-standard Mandarin, wrong pronunciation, and unclear words.
- the computer device can determine the speech error type of the wrong subtext corresponding to the three errors.
- the computer device can push a Mandarin tutorial video for the target user in the video application; if the speech error type is a pronunciation error type, the computer device can push a language tutorial video for the target user in the video application; If the type is the slurred speech type, the computer device can push a dubbing tutorial video for the target user in the video application.
- FIG. 10 is a schematic diagram of an interface for recommending a tutorial video according to a speech error type provided by an embodiment of the present application.
- the target user selects the "remove the slip-of-slip part" editing optimization mode, and the original video data recorded in the video recording service is edited and optimized to obtain the target video data 90c after the editing optimization (that is, the part of the slip-of-slip is removed)
- the user terminal 90a (that is, the above-mentioned computer equipment) can display the target video data 90c in the editing page 90b, and the time axis 90d can also be displayed in the editing page 90b, and the time axis 90d can include and
- the video node associated with the target video data 90c can be positioned to play at a specific time point in the target video data 90c by triggering the video node in the timeline 90d, and the target user can preview and play the target video data 90c on the clip page 90b .
- the user terminal 90a may push a tutorial video matching the speech error type to the target user in the video application according to the speech error type corresponding to the error text in the editing optimization process.
- the speech error type corresponding to the error text is Mandarin Chinese.
- Non-standard type that is, the reason for the slip of the tongue is that the Mandarin is not standard
- the user terminal 90a can obtain the tutorial video (that is, the Mandarin tutorial video) related to the Mandarin video teaching in the video application, and display the pushed Mandarin tutorial in the area 90e of the editing page 90b video.
- FIG. 11 is a flowchart for implementing a video recording service provided by an embodiment of the present application.
- the implementation process of the video recording service is described by taking the client and the background server of the video application as an example.
- the client and the background server at this time can be called computer equipment; the implementation process of the video recording service can be achieved through The following S11-S25 are implemented.
- input prompt text data that is, the target user can open the client of the video application, enter the shooting page of the client, and enter the recording page from the prompting and shooting entrance of the shooting page.
- the recording page includes a text input area, and the target user can Enter prompt text data in the text input area.
- the target user finishes editing the prompt text data he can execute S12, and the voice starts "start", that is, "start” can be used as the wake-up word.
- the client can respond to the user's voice start operation , and perform S13 to start the video recording service, that is, start to enter the recording mode.
- the target user can read the text on the screen aloud (the screen is the screen of the terminal device where the client is installed, and the text on the screen of the terminal device at this time can be part of the text content in the prompt text data, for example, enter
- the text displayed in the recording mode can be the first two sentences in the prompt text data
- the client can collect the user's initial voice of the target user, transmit the user's initial voice to the background server of the video application, and send the text conversion to the background server. instruction.
- the background server can perform S15, detect the user's initial voice through the voice endpoint detection technology (VAD technology), delete the noise and mute in the user's initial voice, and obtain the corresponding target user's voice.
- VAD voice endpoint detection technology
- User voice ie valid voice data
- S15 may be performed by the client through a local voice endpoint detection module, or may be performed by the background server using the VAD technology.
- the background server can use a fast text conversion model to perform text conversion on the user's voice, convert the user's voice into text (that is, the user's voice and text), continue to perform S17, and convert the user's voice and text (text) into pinyin (the embodiment of the present application).
- the default text prompt data is Chinese
- the background server can obtain the prompt text data input by the target user, and convert the prompt text data into pinyin, and match the pinyin of the user's voice text and the pinyin of the prompt text data , continue to execute S19, find the text position matching the user's voice in the prompt text data, and transmit the text position of the user's voice in the prompt text data to the client.
- the client terminal after receiving the text position transmitted by the background server, the client terminal can determine the target text corresponding to the user's voice according to the text position, and identify the target text on the recording page of the client terminal, that is, the prompt text data can be scrolled and displayed according to the text position;
- the client can execute S21 to end the video recording service.
- the target user can trigger the complete recording control on the recording page or trigger the cancel recording control on the recording page to end the video recording service.
- the client can transmit the recorded video corresponding to the video recording service (that is, the above-mentioned original video data) to the backend server, and send a text conversion instruction to the backend server.
- the backend server can Execute S22, use an accurate text conversion model to perform text conversion on the voice data contained in the recorded video, convert the voice data contained in the recorded video into text (that is, the target text result), and obtain the timing when the text appears in the recorded video, or you can It is called the time stamp of the text in the recorded video; the background server at this time can execute S23 and S24 in parallel.
- the background server can compare the target text result with the prompt text data, and find out the slip of the tongue in the recorded video (that is, the voice data corresponding to the above-mentioned error text); S24, the background server can use the text to appear in the recorded video. Timing (ie timestamp) to find the pauses in the user's speech contained in the recorded video.
- the backend server can transmit the slips and pauses in the recorded video to the client.
- the client After the client receives the slip-up part and the pause part transmitted by the background server, it can execute S25, and according to the slip-up part and the pause part, the client can provide different editing optimization methods for the target user. In the editing optimization mode, select an appropriate editing optimization mode, and the client can perform editing optimization on the recorded video based on the editing optimization mode selected by the target user to obtain the final target video data.
- the video recording service can be started by voice, and the user can be provided with a word prompting function during the recording process of the video recording service; Match the target text, and identify the target text in the video application, that is, the target text displayed in the video application matches the content of the user's speech, which can improve the effectiveness of the text prompt function in the video recording service and reduce user costs.
- the risk of recording failure due to forgetting words can improve the quality of the recorded video; start or stop the video recording service through the user's voice, which can reduce user operations in the video recording service and improve the effect of video recording; after the video recording service ends , which can automatically edit and optimize the recorded video in the video recording service, which can further improve the quality of the recorded video.
- FIG. 12 is a schematic flowchart of a data processing method provided by an embodiment of the present application. Understandably, the data processing method can be executed by a computer device, and the computer device can be a user terminal, or an independent server, or a cluster composed of multiple servers, or a system composed of a user terminal and a server, or a system composed of a user terminal and a server.
- a computer program application (including program code), which is not specifically limited here.
- the data processing method may include the following S201-S203:
- the target user can input prompt text data in the teleprompter application, or upload the edited prompt text data to the teleprompter application.
- the computer device can respond to the target user's text input operation or text upload operation, and upload the prompt text data to the prompting application, that is, when using the prompting function provided by the prompting application, the prompting text data needs to be uploaded to the prompting application.
- the computer device in the embodiment of the present application may refer to a device installed with a teleprompter application, and may also be referred to as a teleprompter.
- S202 Collect user voice corresponding to the target user, perform text conversion on the user voice, and generate user voice text corresponding to the user voice.
- the computer equipment can collect the user's initial voice of the target user, perform voice endpoint detection on the user's initial voice, delete the noise and mute contained in the user's initial voice, and obtain the user's voice corresponding to the target user (that is, the effective voice in the user's initial voice). data), perform text conversion on the user's voice, and generate the user's voice text corresponding to the user's voice.
- the computer device can convert the user's voice text into the first syllable information, convert the prompt text data into the second syllable information, compare the first syllable information with the second syllable information, and determine the text of the user's voice text in the prompt text data. Position, according to the text position, the target text matching the user's voice can be determined in the prompt text data, and the target text can be identified in the word prompt application.
- S202 and S203 reference may be made to S102 in the above-mentioned embodiment corresponding to FIG. 3 , which will not be repeated here.
- the number of target users can be one or more, and different target users can correspond to different prompt text data; when the number of target users is 1, the determination and display process of the target text in the prompt application can refer to the corresponding figure 3 above.
- S102 in the embodiment when the number of target users is multiple, after the user voice collected by the computer equipment, voiceprint recognition can be performed on the user voice, and the user identity corresponding to the collected user voice can be determined according to the voiceprint recognition result, The target text corresponding to the user's voice is determined in the prompt text data corresponding to the user identity, and the target text is identified in the word prompting application.
- voiceprint recognition may refer to extracting voiceprint features (for example, spectrum, cepstrum, formant, fundamental tone, reflection coefficient, etc.) in the user's voice data, and by identifying the voiceprint features, the user corresponding to the user's voice can be determined. identity, so voiceprint recognition can also be called speaker recognition.
- voiceprint features for example, spectrum, cepstrum, formant, fundamental tone, reflection coefficient, etc.
- the following description takes the number of target users as 2, that is, the target users include the first user and the second user as an example, and the prompt text data at this time includes the first prompt text corresponding to the first user and the second prompt corresponding to the second user.
- Text can obtain the user voiceprint feature in the user's voice, and determine the user identity corresponding to the user's voice according to the user voiceprint feature; if the user identity is the first user, the first prompt text will be the same as the user's voice text.
- the text is determined as the target text, and the target text is identified in the prompting application; if the user identity is the second user, in the second prompting text, the same text as the user's voice text is determined as the target text, and the prompting text
- the target text is identified in the application.
- the user identity corresponding to the user's voice needs to be determined first, and then the target text that matches the user's voice can be determined in the prompt text data corresponding to the user identity, and the target text can be identified. , which can improve the effectiveness of the teleprompter function in the teleprompter application.
- FIG. 13 is a schematic diagram of an application scenario of a teleprompter provided by an embodiment of the present application. Taking the teleprompter scene of the party as an example, the data processing process will be described.
- the lines 90a of the host in the party can be edited in advance, and the lines 90a can be uploaded to the teleprompter (understandable). It is the device where the above-mentioned teleprompter application is located, which can provide the host with a line prompting function); the lines 90a can include the lines of the host A and the host B. After the teleprompter receives the lines 90a, it can Line 90a is stored locally.
- the teleprompter can collect the voice data of all hosts in real time.
- the teleprompter can perform voiceprint recognition on the user's voice, and determine the corresponding user's voice according to the voiceprint recognition results.
- user ID When the user identity of the collected user voice is Xiao A, the teleprompter can search for the target text that matches the collected user voice in the lines of the host Xiao A (such as "With the warm blessings of winter, full of joy mood”), and marked in the teleprompter “with the warm blessings of winter, full of joyful mood”.
- the teleprompter can search for the target text matching the collected user voice in the lines of the host Xiao B (such as "In the past year, we paid sweat"), and marked "In the past year, we have sweated" in the teleprompter.
- the teleprompter can identify the sentence that the target user is reading aloud, and automatically recognize the target user's voice along with the target user's reading progress, and scrolling and displaying the prompt text data in the teleprompter can improve the prompting performance.
- the effectiveness of the text prompt function in the browser can improve the prompting performance.
- FIG. 14 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present application.
- the data processing apparatus can perform the steps in the above-mentioned embodiment corresponding to FIG. 3 .
- the data processing apparatus 1 may include: a startup module 101 , a display module 102 , and an acquisition module 103 ;
- a startup module 101 configured to start a video recording service in the video application in response to a service start operation in the video application;
- the display module 102 is used to collect the user's voice in the video recording service, determine the target text that matches the user's voice in the prompt text data associated with the video recording service, and identify the target text;
- the obtaining module 103 is configured to obtain target video data corresponding to the video recording service when the text position of the target text in the prompt text data is the end position of the prompt text data.
- the specific function implementation manner of the startup module 101 , the display module 102 , and the acquisition module 103 may refer to S101 - S103 in the embodiment corresponding to FIG. 3 , and details will not be repeated here.
- the data processing apparatus 1 may further include: a first recording page display module 104 , an editing module 105 , a first estimated duration display module 106 , a second recording page display module 107 , and a text upload module 108 , the second estimated duration display module 109;
- the first recording page display module 104 is used to display the recording page in the video application in response to the triggering operation for the teleprompter shooting portal in the video application before starting the video recording service in the video application; the recording page includes a text input area;
- the editing module 105 is configured to display the prompt text data determined by the information editing operation in the text input area in response to the information editing operation for the text input area;
- the first estimated duration display module 106 is configured to display the number of prompt texts and the estimated video duration corresponding to the prompt text data in the text input area when the number of prompt texts corresponding to the prompt text data is greater than the quantity threshold.
- the second recording page display module 107 is configured to display the recording page in the video application in response to a trigger operation for the prompting entry in the video application before starting the video recording service in the video application;
- the recording page includes text uploading controls and text input areas;
- the text uploading module 108 is configured to respond to the trigger operation for the text uploading control, determine the text content uploaded to the recording page as prompt text data, and display the prompt text data in the text input area;
- the second estimated duration display module 109 is configured to display the number of prompt texts corresponding to the prompt text data and the estimated video duration corresponding to the prompt text data.
- the first recording page display module 104, the editing module 105, the first estimated duration display module 106, the second recording page display module 107, the text upload module 108, and the second estimated duration display module 109 can be implemented in a specific way. Refer to S101 in the embodiment corresponding to FIG. 3 above, which will not be repeated here.
- the second recording page display module 107, the text upload module 108, and the second estimated duration display module 109 suspend the execution of operations; when the second recording page display module 107, the text uploading module 108, and the second estimated duration display module 109 are performing corresponding operations, the first recording page display module 104, the editing module 105, the first The estimated duration display module 106 suspends the execution of operations.
- first recording page display module 104 and the second recording page display module 107 may be combined into the same recording page display module; the first estimated duration display module 106 and the second estimated duration display module 109 may be combined into the same predetermined duration display module Estimated duration display module.
- the service activation operation includes a voice activation operation
- the startup module 101 may include: a countdown animation display unit 1011, and a recording service startup unit 1012;
- the countdown animation display unit 1011 is used to display the recording countdown animation associated with the video recording service in the recording page of the video application in response to the voice activation operation in the video application;
- the recording service starting unit 1012 is configured to start and execute the video recording service in the video application when the recording countdown animation ends.
- the specific function implementation manner of the countdown animation display unit 1011 and the recording service initiation unit 1012 may refer to S101 in the embodiment corresponding to FIG. 3 above, which will not be repeated here.
- recording a countdown animation includes an animation cancellation control
- the data processing apparatus 1 may further include: a countdown animation cancellation module 110;
- the countdown animation cancellation module 110 is configured to, when the recording countdown animation ends, before starting and executing the video recording service in the video application, in response to a triggering operation for the animation cancellation control, cancel the display of the recording countdown animation, and start and execute the countdown animation. Execute the video recording service in the video application.
- the specific function implementation manner of the countdown animation canceling module 110 may refer to S101 in the embodiment corresponding to FIG. 3 above, which will not be repeated here.
- the display module 102 may include: a voice endpoint detection unit 1021, a target text determination unit 1022, and a target text display unit 1023;
- Voice endpoint detection unit 1021 for collecting the user's initial voice in the video recording service, performing voice endpoint detection on the user's initial voice to obtain valid voice data in the user's initial voice, and determining the valid voice data as the user's voice;
- the target text determining unit 1022 is used to convert the user's voice into the user's voice and text, perform text matching on the prompt text data associated with the user's voice text and the video recording service, and determine the target text that matches the user's voice text in the prompt text data;
- the target text display unit 1023 is configured to identify the target text in the recording page of the video recording service.
- the specific function implementation of the voice endpoint detection unit 1021 , the target text determination unit 1022 , and the target text display unit 1023 may refer to S102 in the embodiment corresponding to FIG. 3 , and will not be repeated here.
- the target text determination unit 1022 may include: a syllable information acquisition subunit 10221, and a syllable matching subunit 10222;
- the syllable information obtaining subunit 10221 is used to obtain the first syllable information of the user's voice text, and obtain the second syllable information of the prompt text data associated with the video recording service;
- the syllable matching subunit 10222 is configured to obtain the same target syllable information as the first syllable information in the second syllable information, and determine the target text corresponding to the target syllable information in the prompt text data.
- the specific function implementation manner of the syllable information acquisition subunit 10221 and the syllable matching subunit 10222 may refer to S102 in the embodiment corresponding to FIG. 3 above, which will not be repeated here.
- the target text display unit 1023 may include: a prompt area determination subunit 10231, and an identification subunit 10232;
- the prompt area determination subunit 10231 determines the text prompt area corresponding to the target text in the recording page of the video recording service
- the identification subunit 10232 is configured to identify the target text in the text prompt area according to the text position of the target text in the prompt text data.
- the specific function implementation manner of the prompt area determination subunit 10231 and the identification subunit 10232 may refer to S102 in the embodiment corresponding to FIG. 3 above, which will not be repeated here.
- the recording page includes a cancel recording control
- the data processing device 1 may further include: a recording cancellation module 111, a recording prompt information display module 112, and a re-recording module 113;
- the recording cancellation module 111 is used to cancel the video recording service and delete the video data recorded by the video recording service in response to the triggering operation for the cancellation recording control;
- the recording prompt information display module 112 is used to generate the recording prompt information for the video recording service, and display the recording prompt information on the recording page; the recording prompt information includes a re-recording control;
- the re-recording module 113 is configured to switch and display the target text displayed on the recording page as prompt text data in response to the triggering operation for the re-recording control.
- the specific function implementation of the recording cancellation module 111 , the recording prompt information display module 112 , and the re-recording module 113 can be referred to S102 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
- the recording page includes a complete recording control
- the data processing apparatus 1 may include: a recording completion module 114;
- the recording completion module 114 is used to, when the text position of the target text in the prompt text data is the end position of the prompt text data, before acquiring the target video data corresponding to the video recording service, respond to the completion of the recording
- the trigger operation of the control stops the video recording service, and determines the video data recorded by the video recording service as the target video data.
- the acquisition module 103 may include: an original video acquisition unit 1031, an optimization control display unit 1032, an optimization mode display unit 1033, and an optimization processing unit 1034;
- Original video acquisition unit 1031 for when the text position of the target text in the prompt text data is the end position of the prompt text data, stop the video recording service, and determine the video data recorded by the video recording service as the original video data;
- An optimization control display unit 1032 used for displaying the original video data in the editing page of the video application, and the editing optimization control corresponding to the original video data;
- the optimization mode display unit 1033 is used to respond to the trigger operation for the editing optimization control, and display M editing optimization modes for the original video data; M is a positive integer;
- the optimization processing unit 1034 is configured to, in response to the selection operation for the M editing optimization modes, perform editing optimization processing on the original video data according to the editing optimization mode determined by the selection operation to obtain target video data.
- the original video acquisition unit 1031, the optimization control display unit 1032, the optimization mode display unit 1033, and the optimization processing unit 1034 may refer to S103 in the embodiment corresponding to FIG.
- the optimization processing unit 1034 may include: a first speech conversion subunit 10341, a text comparison subunit 10342, a speech deletion subunit 10343, a second speech conversion subunit 10344, and a timestamp acquisition subunit 10345 , the speech pause segment determination subunit 10346;
- the first voice conversion subunit 10341 is used to obtain the target voice data contained in the original video data if the clipping optimization mode determined by the selection operation is the first clipping mode, and convert the target voice data into the target text result;
- the text comparison subunit 10342 is used to perform text comparison between the target text result and the prompt text data, and determine the text that is different from the prompt text data in the target text result as the wrong text;
- the voice deletion subunit 10343 is used to delete the voice data corresponding to the wrong text in the original video data to obtain the target video data.
- the second voice conversion subunit 10344 is configured to convert the target voice data contained in the original video data into a target text result if the clipping optimization mode determined by the selection operation is the second clipping mode, and convert the target text result with the prompt text Texts with different data are determined as error texts;
- the timestamp obtaining subunit 10345 is used to divide the target text result into N text characters, and obtain the timestamps of the N text characters in the target speech data respectively; N is a positive integer;
- the speech pause segment determination subunit 10346 is configured to determine the speech pause segment in the target speech data according to the timestamp, delete the speech pause segment and the speech data corresponding to the wrong text in the original video data, and obtain the target video data.
- the first speech conversion subunit 10341 the text comparison subunit 10342, the speech deletion subunit 10343, the second speech conversion subunit 10344, the timestamp acquisition subunit 10345, the speech pause segment determination subunit 10346
- the specific function implementation mode Reference may be made to S103 in the embodiment corresponding to FIG. 3 above, and details are not repeated here.
- the second speech conversion subunit 10344, the timestamp acquisition subunit 10345, the speech pause segment determination subunit 10346 all suspend the execution of operations; when the second speech conversion subunit 10344, the time stamp acquisition subunit 10345, and the speech pause segment determination subunit 10346 perform corresponding operations, the first speech conversion subunit 10341, the text comparison subunit 10342, the voice deletion sub-unit 10343 all suspend the operation.
- the data processing apparatus 1 may further include: a user speech rate determination module 115, and a speech rate prompt information display module 116;
- the user's speech rate determination module 115 is used to obtain the speech duration corresponding to the user's initial speech and the number of speech characters contained in the user's initial speech, and determine the ratio of the number of speech characters to the speech duration as the user's speech rate;
- the speech rate prompt information display module 116 is used to display the speech rate prompt information on the recording page when the user's speech rate is greater than the speech rate threshold; the speech rate prompt information is used to prompt the target user associated with the video recording service to reduce the user's speech rate.
- the specific function implementation of the user speech rate determination module 115 and the speech rate prompt information display module 116 may refer to S102 in the embodiment corresponding to FIG. 3 above, which will not be repeated here.
- the error text includes K error sub-texts, and K is a positive integer
- the data processing device 1 may further include: an error frequency determination module 117, an error type identification module 118, and a tutorial video push module 119;
- the error frequency determination module 117 is used to determine the error frequency in the video recording service according to the video duration corresponding to the K error subtexts and the original video data;
- the error type identification module 118 is used to identify the speech error types corresponding to the K error sub-texts respectively when the error frequency is greater than the error threshold;
- the tutorial video push module 119 is configured to push the tutorial video associated with the speech error type to the target user associated with the video recording service in the video application.
- the specific function implementation manner of the error frequency determination module 117 , the error type identification module 118 , and the tutorial video push module 119 may refer to S103 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
- the video recording service can be started by voice, and the user can be provided with a word prompting function during the recording process of the video recording service; Match the target text, and identify the target text in the video application, that is, the target text displayed in the video application matches the content of the user's speech, which can improve the effectiveness of the text prompt function in the video recording service and reduce user costs.
- the risk of recording failure due to forgetting words can improve the quality of the recorded video; start or stop the video recording service through the user's voice, which can reduce user operations in the video recording service and improve the effect of video recording; after the video recording service ends , which can automatically edit and optimize the recorded video in the video recording service, which can further improve the quality of the recorded video.
- FIG. 15 is a schematic structural diagram of implementing a data processing apparatus provided by an embodiment of the present application.
- the data processing apparatus can perform the steps in the embodiment corresponding to FIG. 12 .
- the data processing apparatus 2 can include: a prompt text uploading module 21 , a user voice collection module 22 , and a user voice text display module 23 ;
- the prompt text uploading module 21 is used to upload the prompt text data to the prompting application
- the user voice collection module 22 is used to collect the user voice corresponding to the target user, perform text conversion on the user voice, and generate the user voice text corresponding to the user voice;
- the user voice text display module 23 is configured to determine the same text as the user voice text as the target text in the prompt text data, and identify the target text in the prompting application.
- the specific implementation of the prompt text uploading module 21, the user voice collection module 22, and the user voice text display module 23 can refer to S201-S203 in the embodiment corresponding to FIG. 12, which will not be repeated here.
- the target user includes a first user and a second user
- the prompt text data includes a first prompt text corresponding to the first user and a second prompt text corresponding to the second user
- the user voice and text display module 23 may include: a user identity determination unit 231, a first determination unit 232, and a second determination unit 233;
- the user identity determination unit 231 is used to obtain the user voiceprint feature in the user voice, and determine the user identity corresponding to the user voice according to the user voiceprint feature;
- the first determining unit 232 is used for, if the user identity is the first user, in the first prompt text, determine the text identical to the user's voice text as the target text, and identify the target text in the prompting application;
- the second determining unit 233 is configured to determine, in the second prompt text, the same text as the user's voice text as the target text if the user identity is the second user, and identify the target text in the prompting application.
- the specific implementation manner of the user identity determination unit 231, the first determination unit 232, and the second determination unit 233 may refer to S203 in the embodiment corresponding to FIG. 12, which will not be repeated here.
- the teleprompter can identify the sentence that the target user is reading aloud, and automatically recognize the target user's voice along with the target user's reading progress, and scrolling and displaying the prompt text data in the teleprompter can improve the prompting performance.
- the effectiveness of the text prompt function in the browser can improve the prompting performance.
- FIG. 16 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- the computer device 1000 may include: a processor 1001 , a network interface 1004 and a memory 1005 , in addition, the above-mentioned computer device 1000 may further include: a user interface 1003 , and at least one communication bus 1002 .
- the communication bus 1002 is used to realize the connection and communication between these components.
- the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may include a standard wired interface, a wireless interface (eg, a WI-FI interface).
- the memory 1005 may be high-speed RAM memory or non-volatile memory, such as at least one disk memory.
- the memory 1005 may also be at least one storage device located remotely from the aforementioned processor 1001 .
- the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
- the network interface 1004 can provide a network communication function;
- the user interface 1003 is mainly used to provide an input interface for the user; and
- the processor 1001 can be used to call the device control stored in the memory 1005 application to achieve:
- Collect the user's voice in the video recording service determine the target text that matches the user's voice in the prompt text data associated with the video recording service, and identify the target text;
- the computer device 1000 described in the embodiment of the present application can execute the description of the data processing method in the embodiment corresponding to FIG. 3 above, and can also execute the description of the data processing apparatus 1 in the embodiment corresponding to FIG. 14 above, It is not repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
- FIG. 17 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
- the computer device 2000 may include: a processor 2001 , a network interface 2004 and a memory 2005 , in addition, the above-mentioned computer device 2000 may further include: a user interface 2003 , and at least one communication bus 2002 .
- the communication bus 2002 is used to realize the connection and communication between these components.
- the user interface 2003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 2003 may also include a standard wired interface and a wireless interface.
- the network interface 2004 may include a standard wired interface, a wireless interface (eg, a WI-FI interface).
- the memory 2005 may be high-speed RAM memory or non-volatile memory, such as at least one disk memory.
- the memory 2005 may also be at least one storage device located remotely from the aforementioned processor 2001 .
- the memory 2005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
- the network interface 2004 can provide network communication functions;
- the user interface 2003 is mainly used to provide an input interface for the user; and
- the processor 2001 can be used to call the device control stored in the memory 2005 application to achieve:
- Collect the user voice corresponding to the target user perform text conversion on the user voice, and generate the user voice text corresponding to the user voice;
- the same text as the user's voice text is determined as the target text, and the target text is identified in the prompting application.
- the computer device 2000 described in the embodiment of the present application can execute the description of the data processing method in the embodiment corresponding to FIG. 6 above, and can also execute the description of the data processing apparatus 2 in the embodiment corresponding to FIG. 14 above, It is not repeated here. In addition, the description of the beneficial effects of using the same method will not be repeated.
- the embodiment of the present application further provides a computer-readable storage medium, and the computer-readable storage medium stores a computer program executed by the aforementioned data processing apparatus 1, and the computer program includes
- the program instruction when the processor executes the program instruction, can execute the description of the data processing method in any one of the corresponding embodiments in FIG. 3 , FIG. 11 , and FIG. 12 .
- the description of the beneficial effects of using the same method will not be repeated.
- program instructions may be deployed for execution on one computing device, or on multiple computing devices located at one site, or alternatively, distributed across multiple sites and interconnected by a communications network.
- multiple computing devices distributed in multiple locations and interconnected by a communication network can form a blockchain system.
- the embodiments of the present application further provide a computer program product or computer program
- the computer program product or computer program may include computer instructions, and the computer instructions may be stored in a computer-readable storage medium.
- the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor can execute the computer instructions, so that the computer device executes the data processing method in any of the corresponding embodiments of FIG. 3 , FIG. 11 and FIG. 12 . Therefore, it will not be repeated here.
- the description of the beneficial effects of using the same method will not be repeated.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Computer Security & Cryptography (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Television Signal Processing For Recording (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Management Or Editing Of Information On Record Carriers (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (22)
- 一种数据处理方法,所述方法由计算机设备执行,所述方法包括:响应视频应用中的业务启动操作,启动所述视频应用中的视频录制业务;采集所述视频录制业务中的用户语音,在所述视频录制业务关联的提示文本数据中确定与所述用户语音相匹配的目标文本,并对所述目标文本进行标识;当所述目标文本在所述提示文本数据中的文本位置为所述提示文本数据的末尾位置时,获取所述视频录制业务对应的目标视频数据。
- 根据权利要求1所述的方法,所述启动所述视频应用中的视频录制业务之前,所述方法还包括:响应针对所述视频应用中的提词拍摄入口的触发操作,在所述视频应用中显示录制页面;所述录制页面包括文本输入区域;响应针对所述文本输入区域的信息编辑操作,在所述文本输入区域中显示所述信息编辑操作所确定的提示文本数据;当所述提示文本数据对应的提示文字数量大于数量阈值时,在所述文本输入区域中显示所述提示文字数量和所述提示文本数据对应的视频预估时长。
- 根据权利要求1所述的方法,所述启动所述视频应用中的视频录制业务之前,所述方法还包括:响应针对所述视频应用中的提词拍摄入口的触发操作,在所述视频应用中显示录制页面;所述录制页面包括文本上传控件和文本输入区域;响应针对所述文本上传控件的触发操作,将上传至所述录制页面的文本内容确定为提示文本数据,在所述文本输入区域中显示所述提示文本数据;显示所述提示文本数据对应的提示文字数量,以及所述提示文本数据对应的视频预估时长。
- 根据权利要求1所述的方法,所述业务启动操作包括语音启动操作;所述响应视频应用中的业务启动操作,启动所述视频应用中的视频录制业务,包括:响应所述视频应用中的语音启动操作,在所述视频应用的录制页面中显示与所述视频录制业务相关联的录制倒计时动画;当所述录制倒计时动画结束时,启动并执行所述视频应用中的所述视频录制业务。
- 根据权利要求4所述的方法,所述录制倒计时动画包括动画取消控件;所述当所述录制倒计时动画结束时,启动并执行所述视频应用中的所述视频录制业务之前,所述方法还包括:响应针对所述动画取消控件的触发操作,取消显示所述录制倒计时动画,启动并执行所述视频应用中的所述视频录制业务。
- 根据权利要求1所述的方法,所述采集所述视频录制业务中的用户语音,在所述视频录制业务关联的提示文本数据中确定与所述用户语音相匹配的目标文本,对所述目标文本进行标识,包括:采集所述视频录制业务中的用户初始语音,对所述用户初始语音进行语音端点检测得到所述用户初始语音中的有效语音数据,将所述有效语音数据确定为所述用户语音;将所述用户语音转换为用户语音文本,对所述用户语音文本和所述视频录制业务关联的提示文本数据进行文本匹配,在所述提示文本数据中确定与所述用户语音文本相匹配的目标文本;在所述视频录制业务的录制页面中,对所述目标文本进行标识。
- 根据权利要求6所述的方法,所述对所述用户语音文本和所述视频录制业务关联的提示文本数据进行文本匹配,在所述提示文本数据中确定与所述用户语音文本相匹配的目标文本,包括:获取所述用户语音文本的第一音节信息,获取所述视频录制业务关联的提示文本数据的第二音节信息;在所述第二音节信息中获取与所述第一音节信息相同的目标音节信息,在所述提示文本数据中确定所述目标音节信息对应的目标文本。
- 根据权利要求6所述的方法,所述在所述视频录制业务的录制页面中,对所述目标文本进行标识,包括:在所述视频录制业务的录制页面中确定所述目标文本对应的文本提示区域;根据所述目标文本在所述提示文本数据中的文本位置,在所述文本提示区域中对所述目标文本进行标识。
- 根据权利要求4-8任一项所述的方法,所述录制页面包括取消录制控件;所述方法还包括:响应针对所述取消录制控件的触发操作,取消所述视频录制业务,删除所述视频录制业务所录制的视频数据;生成针对所述视频录制业务的录制提示信息,在所述录制页面中显示所述录制提示信息;所述录制提示信息包括重新录制控件;响应针对所述重新录制控件的触发操作,将所述录制页面所显示的目标文本切换显示为所述提示文本数据。
- 根据权利要求4-8任一项所述的方法,所述录制页面包括完成录制控件;所述当所述目标文本在所述提示文本数据中的文本位置为所述提示文本数据的末尾位置时,获取所述视频录制业务对应的目标视频数据之前,所述方法还包括:响应针对所述完成录制控件的触发操作,停止所述视频录制业务,将所述视频录制业务所录制的视频数据确定为所述目标视频数据。
- 根据权利要求1所述的方法,所述当所述目标文本在所述提示文本数据中的文本位置为所述提示文本数据的末尾位置时,获取所述视频录制业务对应的目标视频数据,包括:当所述目标文本在所述提示文本数据中的文本位置为所述提示文本数据的末尾位置时,停止所述视频录制业务,将所述视频录制业务所录制的视频数据确定为原始视频数据;在所述视频应用的剪辑页面中显示所述原始视频数据,以及所述原始视频数据对应的剪辑优化控件;响应针对所述剪辑优化控件的触发操作,显示针对所述原始视频数据的M个剪辑优化方式;M为正整数;响应针对所述M个剪辑优化方式的选取操作,根据所述选取操作所确定的剪辑优化方式,对所述原始视频数据进行剪辑优化处理,得到所述目标视频数据。
- 根据权利要求11所述的方法,所述根据所述选取操作所确定的剪辑优化方式,对所述原始视频数据进行剪辑优化处理,得到所述目标视频数据,包括:若所述选取操作所确定的剪辑优化方式为第一剪辑方式,则获取所述原始视频数据所包含的目标语音数据,将所述目标语音数据转换为目标文本结果;将所述目标文本结果与所述提示文本数据进行文本比对,将所述目标文本结果中与所述提示文本数据不相同的文本确定为错误文本;在所述原始视频数据中删除所述错误文本对应的语音数据,得到所述目标视频数据。
- 根据权利要求11所述的方法,所述根据所述选取操作所确定的剪辑优化方式,对所述原始视频数据进行剪辑优化处理,得到所述目标视频数据,包括:若所述选取操作所确定的剪辑优化方式为第二剪辑方式,则将所述原始视频数据所包含的目标语音数据转换为目标文本结果,将所述目标文本结果中与所述提示文本数据不相同的文本确定为错误文本;将所述目标文本结果划分为N个文本字符,获取所述N个文本字符分别在所述目标语音数据中的时间戳;N为正整数;根据所述时间戳确定所述目标语音数据中的语音停顿片段,在所述原始视频数据中删除所述语音停顿片段和所述错误文本对应的语音数据,得到所述目标视频数据。
- 根据权利要求6所述的方法,在执行所述视频录制业务的过程中,所述方法还包括:获取所述用户初始语音对应的语音时长,以及所述用户初始语音所包含的语音文字数量,将所述语音文字数量与所述语音时长的比值确定为用户语速;当所述用户语速大于语速阈值时,在所述录制页面中显示语速提示信息;所述语速提示信息用于提示所述视频录制业务所关联的目标用户降低用户语速。
- 根据权利要求12-13任一项所述的方法,所述错误文本包括K个错误子文本,K为正整数;所述方法还包括:根据所述K个错误子文本和所述原始视频数据对应的视频时长,确定所述视频录制业务中的错误频率;当所述错误频率大于错误阈值时,识别所述K个错误子文本分别对应的演讲错误类型;在所述视频应用中为所述视频录制业务关联的目标用户推送与所述演讲错误类型相关联的教程视频。
- 一种数据处理方法,所述方法由计算机设备执行,所述方法包括:将提示文本数据上传至提词应用;采集目标用户对应的用户语音,对所述用户语音进行文本转换,生成所述用户语音对应的用户语音文本;在所述提示文本数据中,将与所述用户语音文本相同的文本确定为目标文本,在所述提词应用中对所述目标文本进行标识。
- 根据权利要求16所述的方法,所述目标用户包括第一用户和第二用户,所述提示文本数据包括所述第一用户对应的第一提示文本和所述第二用户对应的第二提示文本;所述在所述提示文本数据中,将与所述用户语音文本相同的文本确定为目标文本,在所述提词应用中对所述目标文本进行标识,包括:获取所述用户语音中的用户声纹特征,根据所述用户声纹特征确定所述用户语音对应的用户身份;若所述用户身份为所述第一用户,则在所述第一提示文本中,将与所述用户语音文本相同的文本确定为目标文本,在所述提词应用中对所述目标文本进行标识;若所述用户身份为所述第二用户,则在所述第二提示文本中,将与所述用户语音文本相同的文本确定为目标文本,在所述提词应用中对所述目标文本进行标识。
- 一种数据处理装置,所述装置部署在计算机设备上,所述装置包括:启动模块,用于响应视频应用中的业务启动操作,启动所述视频应用中的视频录制业务;显示模块,用于采集所述视频录制业务中的用户语音,在所述视频录制业务关联的提示文本数据中确定与所述用户语音相匹配的目标文本,并对所述目标文本进行标识;获取模块,用于当所述目标文本在所述提示文本数据中的文本位置为所述提示文本数据的末尾位置时,获取所述视频录制业务对应的目标视频数据。
- 一种数据处理装置,所述装置部署在计算机设备上,所述装置包括:提示文本上传模块,用于将提示文本数据上传至提词应用;用户语音采集模块,用于采集目标用户对应的用户语音,对所述用户语音进行文本转换,生成所述用户语音对应的用户语音文本;用户语音文本显示模块,用于在所述提示文本数据中,将与所述用户语音文本相同的文本确定为目标文本,在所述提词应用中对所述目标文本进行标识。
- 一种计算机设备,包括存储器和处理器;所述存储器与所述处理器相连,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,以使得所述计算机设备执行权利要求1至15任一项所述的方法,或者执行权利要求16至17任一项所述的方法。
- 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序适于由处理器加载并执行,以使得具有所述处理器的计算机设备执行权利要求1至15任一项所述的方法,或者执行权利要求16至17任一项所述的方法。
- 一种计算机程序产品,当所述计算机程序产品被执行时,用于执行权利要求1至15任一项所述的方法,或者执行权利要求16至17任一项所述的方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2023547594A JP2024509710A (ja) | 2021-02-08 | 2022-01-28 | データ処理方法、装置、機器、及びコンピュータプログラム |
KR1020237019353A KR20230106170A (ko) | 2021-02-08 | 2022-01-28 | 데이터 처리 방법 및 장치, 디바이스, 및 매체 |
US17/989,620 US12041313B2 (en) | 2021-02-08 | 2022-11-17 | Data processing method and apparatus, device, and medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110179007.4 | 2021-02-08 | ||
CN202110179007.4A CN114911448A (zh) | 2021-02-08 | 2021-02-08 | 数据处理方法、装置、设备以及介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/989,620 Continuation US12041313B2 (en) | 2021-02-08 | 2022-11-17 | Data processing method and apparatus, device, and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022166801A1 true WO2022166801A1 (zh) | 2022-08-11 |
Family
ID=82741977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/074513 WO2022166801A1 (zh) | 2021-02-08 | 2022-01-28 | 数据处理方法、装置、设备以及介质 |
Country Status (5)
Country | Link |
---|---|
US (1) | US12041313B2 (zh) |
JP (1) | JP2024509710A (zh) |
KR (1) | KR20230106170A (zh) |
CN (1) | CN114911448A (zh) |
WO (1) | WO2022166801A1 (zh) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102385176B1 (ko) * | 2021-11-16 | 2022-04-14 | 주식회사 하이 | 심리 상담 장치 및 그 방법 |
CN117975949B (zh) * | 2024-03-28 | 2024-06-07 | 杭州威灿科技有限公司 | 基于语音转换的事件记录方法、装置、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180061256A1 (en) * | 2016-01-25 | 2018-03-01 | Wespeke, Inc. | Automated digital media content extraction for digital lesson generation |
CN111372119A (zh) * | 2020-04-17 | 2020-07-03 | 维沃移动通信有限公司 | 多媒体数据录制方法、装置及电子设备 |
CN112738618A (zh) * | 2020-12-28 | 2021-04-30 | 北京达佳互联信息技术有限公司 | 视频录制方法、装置及电子设备 |
CN113301362A (zh) * | 2020-10-16 | 2021-08-24 | 阿里巴巴集团控股有限公司 | 视频元素展示方法及装置 |
-
2021
- 2021-02-08 CN CN202110179007.4A patent/CN114911448A/zh active Pending
-
2022
- 2022-01-28 WO PCT/CN2022/074513 patent/WO2022166801A1/zh active Application Filing
- 2022-01-28 JP JP2023547594A patent/JP2024509710A/ja active Pending
- 2022-01-28 KR KR1020237019353A patent/KR20230106170A/ko unknown
- 2022-11-17 US US17/989,620 patent/US12041313B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180061256A1 (en) * | 2016-01-25 | 2018-03-01 | Wespeke, Inc. | Automated digital media content extraction for digital lesson generation |
CN111372119A (zh) * | 2020-04-17 | 2020-07-03 | 维沃移动通信有限公司 | 多媒体数据录制方法、装置及电子设备 |
CN113301362A (zh) * | 2020-10-16 | 2021-08-24 | 阿里巴巴集团控股有限公司 | 视频元素展示方法及装置 |
CN112738618A (zh) * | 2020-12-28 | 2021-04-30 | 北京达佳互联信息技术有限公司 | 视频录制方法、装置及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN114911448A (zh) | 2022-08-16 |
US20230109852A1 (en) | 2023-04-13 |
US12041313B2 (en) | 2024-07-16 |
JP2024509710A (ja) | 2024-03-05 |
KR20230106170A (ko) | 2023-07-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110634483B (zh) | 人机交互方法、装置、电子设备及存储介质 | |
US12020708B2 (en) | Method and system for conversation transcription with metadata | |
US10769495B2 (en) | Collecting multimodal image editing requests | |
WO2022166801A1 (zh) | 数据处理方法、装置、设备以及介质 | |
US20220343918A1 (en) | Systems and methods for live broadcasting of context-aware transcription and/or other elements related to conversations and/or speeches | |
JP4875752B2 (ja) | 編集可能なオーディオストリームにおける音声の認識 | |
CN111050201B (zh) | 数据处理方法、装置、电子设备及存储介质 | |
CN109859298B (zh) | 一种图像处理方法及其装置、设备和存储介质 | |
JP6280312B2 (ja) | 議事録記録装置、議事録記録方法及びプログラム | |
CN112188266A (zh) | 视频生成方法、装置及电子设备 | |
CN113343675B (zh) | 一种字幕生成方法、装置和用于生成字幕的装置 | |
TWI807428B (zh) | 一同管理與語音檔有關的文本轉換記錄和備忘錄的方法、系統及電腦可讀記錄介質 | |
CN112581965A (zh) | 转写方法、装置、录音笔和存储介质 | |
CN114154459A (zh) | 语音识别文本处理方法、装置、电子设备及存储介质 | |
US20230326369A1 (en) | Method and apparatus for generating sign language video, computer device, and storage medium | |
WO2023213314A1 (zh) | 用于编辑音频的方法、装置、设备和存储介质 | |
CN112837668B (zh) | 一种语音处理方法、装置和用于处理语音的装置 | |
CN115811665A (zh) | 一种视频生成方法、装置、终端设备及存储介质 | |
WO2021017302A1 (zh) | 一种数据提取方法、装置、计算机系统及可读存储介质 | |
KR102599001B1 (ko) | 템플릿 기반 회의문서 생성장치 및 그 방법 | |
KR102446300B1 (ko) | 음성 기록을 위한 음성 인식률을 향상시키는 방법, 시스템, 및 컴퓨터 판독가능한 기록 매체 | |
KR20170130198A (ko) | 모바일 기반의 실시간 대본 리딩 시스템 및 방법 | |
KR20170088255A (ko) | 온라인 기반의 연기자 대본 리딩을 위한 전자 대본 제공 시스템 및 방법 | |
CN113918114A (zh) | 文档控制方法、装置、计算机设备和存储介质 | |
CN118675555A (zh) | 音频处理方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22749080 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20237019353 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023547594 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08/12/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22749080 Country of ref document: EP Kind code of ref document: A1 |