CN112906650A - Intelligent processing method, device and equipment for teaching video and storage medium - Google Patents

Intelligent processing method, device and equipment for teaching video and storage medium Download PDF

Info

Publication number
CN112906650A
CN112906650A CN202110315710.3A CN202110315710A CN112906650A CN 112906650 A CN112906650 A CN 112906650A CN 202110315710 A CN202110315710 A CN 202110315710A CN 112906650 A CN112906650 A CN 112906650A
Authority
CN
China
Prior art keywords
action
processing result
teaching
mouth shape
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110315710.3A
Other languages
Chinese (zh)
Other versions
CN112906650B (en
Inventor
梁嘉兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110315710.3A priority Critical patent/CN112906650B/en
Publication of CN112906650A publication Critical patent/CN112906650A/en
Application granted granted Critical
Publication of CN112906650B publication Critical patent/CN112906650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Psychiatry (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Library & Information Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Machine Translation (AREA)

Abstract

The disclosure provides an intelligent processing method, an intelligent processing device, intelligent processing equipment and a storage medium for teaching videos, and relates to the technical field of computers, in particular to the technical field of online teaching. The specific implementation scheme is as follows: performing language form processing on teaching audio in a teaching video to obtain a language form processing result of the teaching audio; respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image; and performing cross check on at least two items of the language form processing result, the action type processing result and the mouth type processing result to obtain a teaching video processing result. The embodiment of the disclosure can improve the processing efficiency of teaching videos.

Description

Intelligent processing method, device and equipment for teaching video and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an intelligent processing method, apparatus, device, and storage medium for teaching videos.
Background
With the development of computer technology, users can learn in an electronic environment formed by communication technology, microcomputer technology, computer technology, artificial intelligence, network technology, multimedia technology and the like through the internet.
In an online learning scenario, a teacher may pre-record a teaching video. How to process the teaching video is very important.
Disclosure of Invention
The present disclosure provides an intelligent processing method, apparatus, device and storage medium for teaching video.
According to an aspect of the present disclosure, there is provided an intelligent processing method of a teaching video, including:
performing language form processing on teaching audio in a teaching video to obtain a language form processing result of the teaching audio;
respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image;
and performing cross check on at least two items of the language form processing result, the action type processing result and the mouth type processing result to obtain a teaching video processing result.
According to another aspect of the present disclosure, there is provided an intelligent processing device for teaching video, comprising:
the language form processing module is used for carrying out language form processing on the teaching audio in the teaching video to obtain a language form processing result of the teaching audio;
the action mouth shape processing module is used for respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image;
and the cross checking module is used for carrying out cross checking on at least two items of the language form processing result, the action type processing result and the mouth type processing result so as to obtain a teaching video processing result.
According to still another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method for intelligent processing of instructional video provided by any of the embodiments of the disclosure.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method for intelligent processing of instructional video provided by any of the embodiments of the present disclosure.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the intelligent processing method of instructional videos provided by any of the embodiments of the present disclosure.
According to the technology disclosed by the invention, the processing efficiency of the teaching video can be improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram of a method for intelligent processing of instructional videos, according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of another intelligent processing method of instructional video, according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of yet another intelligent processing method for instructional videos, according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an intelligent processing device for instructional videos, according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing the intelligent processing method of instructional video according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The scheme provided by the embodiment of the disclosure is described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an intelligent processing method for teaching videos according to an embodiment of the present disclosure, which is applicable to a case of processing audio and images in teaching videos. The method can be executed by an intelligent processing device for teaching videos, which can be realized by hardware and/or software and can be configured in electronic equipment. Referring to fig. 1, the method specifically includes the following steps:
s110, performing language form processing on teaching audio in a teaching video to obtain a language form processing result of the teaching audio;
s120, respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image;
s130, performing cross check on at least two items of the language form processing result, the action type processing result and the mouth type processing result to obtain a teaching video processing result.
The teaching video can be a video which is pre-recorded by a teacher and is used for students to learn online. The teaching video can comprise teaching audio and teaching images, and the teaching audio and the teaching images can be associated through timestamps, namely the teaching audio and the teaching images associated with the same moment are associated with each other.
The language form can include a spoken language form and a written language form, wherein the spoken language form refers to a spoken stop word or a spoken Buddhist, and the like. The action types may include valid actions and invalid actions; the effective action is an indispensable action in the teaching process, and the ineffective action is an indispensable action. The die types may include valid and invalid die; the effective mouth shape is the mouth shape that is essential in the teaching process, and the invalid mouth shape is the mouth shape that is not essential in the teaching process.
Specifically, the teaching audio in the teaching video can be obtained, and the language form processing is performed on the teaching audio to obtain word sets belonging to different language forms; the method comprises the steps of obtaining a teaching image in a teaching video, respectively identifying the action and the mouth shape in the teaching image, determining the type of the action and the mouth shape, and obtaining action sets belonging to different types and mouth shape sets belonging to different types.
The cross-checking is to select at least one item from the language form processing result, the action type processing result and the mouth type processing result as a checking standard, select at least one item as a checking object, and check the checking object by using the calibration standard. That is, in the case of the type inconsistency between the verification standard and the verification object, the type of the verification standard is taken as the type of the verification object, that is, the type of the verification standard is taken as the standard. For example, the action type processing result and/or the mouth shape processing result may be checked and adjusted with the language form processing result as a standard; or the action type processing result and/or the mouth shape processing result can be used as a standard to check and adjust the language form processing result; two items of the language form processing result, the action type processing result and the mouth shape processing result can be adopted to check and adjust the other item. Through cross checking among the language form, the action type and the mouth shape type, the accuracy of a checking result can be improved, namely the accuracy of the language form, the action type and the mouth shape type can be improved; in addition, through the automatic teaching video processing, the teaching video is not required to be processed manually, and the processing efficiency of the teaching video can be improved.
According to the technical scheme, cross verification is automatically performed among the language form, the action type and the mouth shape type of the teaching video, manual work is not needed, the processing efficiency of the teaching video can be improved, and the quality of a teaching video processing result can be improved.
Fig. 2 is a schematic flow chart diagram of another method for intelligently processing teaching videos according to an embodiment of the present disclosure. The present embodiment is an alternative proposed on the basis of the above-described embodiments. Referring to fig. 2, the intelligent processing method for teaching video provided by this embodiment includes:
s210, performing language form processing on teaching audio in a teaching video to obtain a language form processing result of the teaching audio;
s220, respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image;
s230, verifying the action type processing result and the mouth shape type processing result according to the language form processing result based on the timestamp incidence relation to obtain a new action type processing result and a new mouth shape type processing result;
and S240, aligning the language form processing result, the new action type processing result and the new mouth shape type processing result based on the timestamp incidence relation to obtain a teaching video processing result.
In the embodiment of the present disclosure, the language form processing result may be used as a verification standard to verify the action type processing result and the mouth shape type processing result, and the type adjustment is performed on the action type processing result and the mouth shape type processing result that are different from the type of the associated language form processing result, so as to obtain a new action type processing result and a new mouth shape type processing result. And based on the time stamp association relationship, the language form processing result, the new action type processing result and the new mouth type processing result are aligned, so that the language form processing result, the new action type processing result and the new mouth type processing result which are associated with the same time stamp have the same type, namely, the type alignment is realized. The accuracy of the language form, the action and the mouth shape type can be further improved through cross check processing, the language form, the action type and the mouth shape type which are associated with the same timestamp can be consistent, the teaching video processing result can be further processed conveniently, and the quality of the teaching video is further improved.
In an alternative embodiment, the linguistic form processing result includes a set of spoken words and a set of written words; the action type processing result comprises an invalid action set and a valid action set; the mouth shape processing result comprises an invalid mouth shape set and an effective mouth shape set.
Wherein one word in the spoken language set and the written language set can be associated with at least one teaching video time stamp, one action in the invalid action set and the valid action set can be associated with at least one teaching video time stamp, and one mouth in the invalid mouth shape set and the valid mouth shape set can be associated with at least one teaching video time stamp; the words, the actions and the mouth shapes which are associated with the same teaching video time stamp are associated with each other.
In an optional implementation manner, the verifying the action type processing result and the mouth type processing result according to the language form processing result based on the timestamp association relationship includes: acquiring an action associated with a spoken language in the set of spoken languages based on a timestamp association, and adjusting the action associated with the spoken language into the invalid action set if the action associated with the spoken language belongs to the valid action set; acquiring a mouth shape associated with a spoken language in the spoken language set based on the timestamp association relationship, and adjusting the mouth shape associated with the spoken language into the invalid mouth shape set if the mouth shape associated with the spoken language belongs to the valid mouth shape set.
Specifically, the spoken language in the spoken language set can be traversed, the action associated with the spoken language is acquired based on the timestamp association relationship, and the associated action is adjusted to an invalid action set when the associated action belongs to the valid action set, that is, the associated action is adjusted to an invalid action when the associated action is a valid action; it is also possible to acquire a mouth shape associated with the spoken utterance and, in the case where the associated mouth shape is a valid mouth shape, adjust the associated mouth shape to an invalid mouth shape. By adjusting the action associated with the spoken language to an invalid action and adjusting the mouth shape associated with the spoken language to an invalid mouth shape, the accuracy of the type of the action and the mouth shape can be improved. The method is particularly suitable for the condition that the accuracy of the language form processing result is higher than that of the action type processing result and the mouth type processing result.
In an optional implementation, the aligning the linguistic form processing result, the new action type processing result, and the new mouth type processing result based on a timestamp association includes: aiming at target words in the oral language set and the written language set, acquiring target actions associated with the target words and target mouth shapes associated with the target words based on a timestamp association relation; determining that the target word also belongs to the same type under the condition that the target action and the target mouth shape belong to the same type set; and under the condition that the target action and the target mouth shape belong to different types of sets, acquiring a labeling type, and taking the labeling type as the types of the target word, the target action and the target mouth shape.
The target words can belong to a set of oral languages and can also belong to a set of written languages, namely the target words can be the oral languages and can also be the written languages. And acquiring a target action and a target mouth shape associated with the target word based on the timestamp association relationship by traversing each word (namely the target word) in the language form processing result.
Under the condition that the types of the target action and the target mouth shape are both valid, determining that the target words belong to written languages (namely the target words are also valid); in the event that both the target action and the type of target mouth shape are invalid, it may be determined that the target word belongs to the spoken language (i.e., the target word is also invalid). And, in the case that the target action is different from the type of the target mouth shape, that is, one of the types is valid and the other one is invalid, the annotation type determined based on the quantization standard may be obtained, and the annotation type may be taken as the type of the target word, the target action, and the target mouth shape. Wherein the spoken language form, the invalid action, and the invalid mouth shape are of the same type; the type between written language form, valid action and valid mouth shape is the same. Through the type alignment among the language form, the action and the mouth shape, the accuracy of the language form, the action and the mouth shape can be further improved.
It should be noted that, the language form and the action type may also be adopted to perform alignment check on the mouth shape type, that is, traverse the mouth shapes in the valid mouth shape set and the invalid mouth shape set, and obtain the words associated with the mouth shape and the actions associated with the mouth shape; determining that the mouth shape also belongs to the same type in the case that the associated word and the associated action belong to the same type set; and in the case that the associated words and the associated actions belong to different types of sets, taking the label type as the types of the associated words, the associated actions and the mouth shape. The action type can also be aligned and checked by adopting a language form and a mouth shape type, namely, actions in an effective action set and an invalid action set are traversed, and words related to the actions and the mouth shape related to the actions are obtained; determining that the action also belongs to the same type in the case that the associated word and the associated mouth shape belong to the same type set; and in the case that the associated word and the associated mouth shape belong to different types of sets, taking the annotation type as the types of the associated word, the action and the associated mouth shape.
In addition, in the teaching video processing result, if the word of any timestamp belongs to the spoken language, the action of the timestamp belongs to an invalid action, the mouth shape is in an invalid mouth shape, and the teaching audio and the teaching image of the timestamp can be directly deleted, so that invalid information in the teaching video is removed, the quality of the teaching video is improved, the duration of the teaching video is shortened, and the online learning efficiency is improved. It should be noted that the key knowledge in the teaching video can be marked with key points, so that the learning efficiency of the key knowledge is improved.
According to the technical scheme of the embodiment of the disclosure, the action type and the mouth shape type are verified by using the spoken language, and the type alignment is performed on the language form, the action and the mouth shape, so that the quality of a teaching video processing result can be further improved; in addition, invalid information in the processing result of the teaching video can be removed, the duration of the teaching video is shortened, and the online learning efficiency is improved.
Fig. 3 is a schematic flow chart diagram of another method for intelligently processing teaching videos according to an embodiment of the present disclosure. The present embodiment is an alternative proposed on the basis of the above-described embodiments. Referring to fig. 3, the intelligent processing method for teaching video provided by this embodiment includes:
s310, extracting words with a language form of a spoken language from teaching audio of the teaching video based on the spoken language dictionary, and replacing the words with written language;
s320, identifying overlapped words in the teaching audio, and performing overlap removal processing on the overlapped words outside the white list of the overlapped words to obtain a language form processing result;
s330, respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image;
s340, performing cross check on at least two items of the language form processing result, the action type processing result and the mouth type processing result to obtain a teaching video processing result.
Wherein the spoken language dictionary includes spoken words and written languages associated with the spoken words. Specifically, sentence breaking can be performed on the whole teaching audio through semantic analysis, a sentence breaking result is converted into a text sentence, the text sentence is compared with spoken words in a spoken word dictionary, spoken words in the text sentence are recognized, and the spoken words are replaced by written words.
Where overlapping words refer to words in which there are overlapping characters, i.e., there are characters that occur twice or more in succession, for example, "this" is an overlapping word. The overlapped word white list comprises overlapped words meeting the grammar specification, and the overlapped word white list can be constructed by commonly used overlapped words in a statistical dictionary, for example, although overlapped words exist in the warm sunlight, the overlapped words can be added into the overlapped word white list according to the grammar specification.
Specifically, the overlapped words in the teaching audio can be obtained by extracting the overlapped words from the teaching audio based on a character string matching technology; and matching the overlapped words in the teaching audio with the overlapped word white list, reserving the overlapped words belonging to the overlapped word white list in the teaching audio, and performing de-overlapping processing on the overlapped words not belonging to the overlapped word white list. The overlap removal process refers to removing the overlapped characters in the overlapped words, for example, "this" is adjusted to "this" by the overlap removal process. By adopting the oral word dictionary and the overlapped word white list, the language form processing is carried out on the teaching audio, and the accuracy of the language form processing result can be improved.
In an alternative embodiment, the spoken word dictionary is determined from historical instructional audio of a user to whom the instructional video pertains. Specifically, the historical teaching audio of the user to which the teaching video belongs can be determined through manual labeling statistics, and the personalized spoken word dictionary of the user is obtained, so that the accuracy of language form processing is further improved.
In an optional implementation manner, the performing motion type and mouth shape type processing on the teaching image in the teaching video respectively to obtain a motion type processing result and a mouth shape type processing result of the teaching image includes: respectively identifying the motion and the mouth shape of the teaching image in the teaching video to obtain the motion and the mouth shape in the teaching image; clustering the actions and the mouth shapes in the teaching images to obtain at least two actions and at least two mouth shapes; dividing at least one action into an effective action set, and dividing at least one action into an ineffective action set; and, dividing at least one die into a set of valid dies and at least one die into a set of invalid dies.
Specifically, part of the at least two actions may be randomly divided into an effective action set, and the remaining actions may be divided into an ineffective action set, and part of the at least two dies may be randomly divided into an effective die set, and the remaining dies may be divided into an ineffective die set, so as to improve the determination efficiency of the action type processing result and the die type processing result.
Because the number of the actions and the mouth shapes in the teaching image is large, the action and the mouth shapes are clustered firstly, and the number of the actions and the mouth shapes is reduced for the clustering result based on the actions and the mouth shapes, so that the efficiency of cross checking the language form, the action type and the mouth shape type is improved. Moreover, the accuracy of the language form processing result, the action type processing result and the mouth type processing result can be improved by performing cross check on at least two items of the language form processing result, the action type processing result and the mouth type processing result.
It should be noted that, in the embodiment of the present disclosure, a pre-recorded teaching video may be processed, and an online processing may also be performed in the recording process of the teaching video, for example, the teaching video may be processed through a plug-in built in the recording device, or the teaching video may be processed through a video capture device integrated with a loudspeaker and a microphone used in the teaching process.
According to the technical scheme of the embodiment of the disclosure, the accuracy of the language form processing result, the action type processing result and the mouth shape type processing result can be improved; the oral word dictionary and the overlapped word white list are adopted to process the language form of the teaching audio, so that the accuracy of the language form processing result can be improved; moreover, the determination efficiency of the action type processing result and the mouth type processing result can be improved.
Fig. 4 is a schematic diagram of an intelligent processing apparatus for teaching video according to an embodiment of the present disclosure, where this embodiment is applicable to a case where a language type, an action type, a mouth shape type, and the like are processed on a teaching video, and the apparatus is configured in an electronic device, and can implement an intelligent processing method for teaching video according to any embodiment of the present disclosure. The intelligent processing device 400 for teaching video specifically includes the following:
the language form processing module 401 is configured to perform language form processing on a teaching audio in a teaching video to obtain a language form processing result of the teaching audio;
an action type processing module 402, configured to perform action type and mouth type processing on the teaching image in the teaching video respectively to obtain an action type processing result and a mouth type processing result of the teaching image;
and a cross checking module 403, configured to perform cross checking on at least two of the language form processing result, the action type processing result, and the mouth type processing result to obtain a teaching video processing result.
In an alternative embodiment, the cross-check module 403 includes:
the verification unit is used for verifying the action type processing result and the mouth shape type processing result according to the language form processing result based on the timestamp incidence relation so as to obtain a new action type processing result and a new mouth shape type processing result;
and the alignment unit is used for aligning the language form processing result, the new action type processing result and the new mouth shape type processing result based on the timestamp incidence relation.
In an alternative embodiment, the linguistic form processing result includes a set of spoken words and a set of written words; the action type processing result comprises an invalid action set and a valid action set; the mouth shape processing result comprises an invalid mouth shape set and an effective mouth shape set.
In an optional implementation manner, the verification unit is specifically configured to:
acquiring an action associated with a spoken language in the set of spoken languages based on a timestamp association, and adjusting the action associated with the spoken language into the invalid action set if the action associated with the spoken language belongs to the valid action set;
acquiring a mouth shape associated with a spoken language in the spoken language set based on the timestamp association relationship, and adjusting the mouth shape associated with the spoken language into the invalid mouth shape set if the mouth shape associated with the spoken language belongs to the valid mouth shape set.
In an alternative embodiment, the alignment unit is specifically configured to:
aiming at target words in the oral language set and the written language set, acquiring target actions associated with the target words and target mouth shapes associated with the target words based on a timestamp association relation;
determining that the target word also belongs to the same type under the condition that the target action and the target mouth shape belong to the same type set;
and under the condition that the target action and the target mouth shape belong to different types of sets, acquiring a labeling type, and taking the labeling type as the types of the target word, the target action and the target mouth shape.
In an alternative embodiment, the language form processing module 401 includes:
a spoken language processing unit for extracting a word in a language form of a spoken language from a teaching audio of the teaching video based on a spoken language dictionary and replacing the word with a written language;
and the overlapped word processing unit is used for identifying overlapped words in the teaching audio and carrying out overlap removal processing on the overlapped words outside the white list of the overlapped words.
In an alternative embodiment, the spoken word dictionary is determined from historical instructional audio of a user to whom the instructional video pertains.
In an alternative embodiment, the action shape processing module 402 comprises:
the action mouth shape recognition unit is used for respectively recognizing the action and the mouth shape of the teaching image in the teaching video to obtain the action and the mouth shape in the teaching image;
the action mouth shape clustering unit is used for clustering the actions and the mouth shapes in the teaching images to obtain at least two actions and at least two mouth shapes;
the action mouth shape dividing unit is used for dividing at least one action into an effective action set and dividing at least one action into an ineffective action set; and, dividing at least one die into a set of valid dies and at least one die into a set of invalid dies.
According to the technical scheme of the embodiment, the language form, the action type and the mouth shape type of the teaching video are automatically cross-checked without depending on manpower, so that the processing efficiency of the teaching video can be improved, and the quality of a teaching video processing result can be improved; moreover, the action type and the mouth shape type are verified by adopting the spoken language, and the type alignment is carried out on the language form, the action and the mouth shape, so that the quality of a teaching video processing result can be further improved.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units that perform machine learning model algorithms, a digital information processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as an intelligent processing method of teaching video. For example, in some embodiments, the intelligent processing method of instructional video may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by the computing unit 501, one or more steps of the intelligent processing method of teaching video described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured in any other suitable way (e.g., by means of firmware) to perform the intelligent processing method of the instructional video.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs executing on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the information desired by the technical solution disclosed in the present disclosure can be realized.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (19)

1. An intelligent processing method of teaching videos comprises the following steps:
performing language form processing on teaching audio in a teaching video to obtain a language form processing result of the teaching audio;
respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image;
and performing cross check on at least two items of the language form processing result, the action type processing result and the mouth type processing result to obtain a teaching video processing result.
2. The method of claim 1, wherein said cross-checking at least two of said linguistic form processing result, said action type processing result, and said mouth type processing result comprises:
based on the timestamp incidence relation, verifying the action type processing result and the mouth shape type processing result according to the language form processing result to obtain a new action type processing result and a new mouth shape type processing result;
and aligning the language form processing result, the new action type processing result and the new mouth type processing result based on the timestamp incidence relation.
3. The method of claim 2, wherein the linguistic form processing result includes a set of spoken words and a set of written words; the action type processing result comprises an invalid action set and a valid action set; the mouth shape processing result comprises an invalid mouth shape set and an effective mouth shape set.
4. The method of claim 3, wherein the verifying the action type processing result and the mouth type processing result according to the language form processing result based on the timestamp association comprises:
acquiring an action associated with a spoken language in the set of spoken languages based on a timestamp association, and adjusting the action associated with the spoken language into the invalid action set if the action associated with the spoken language belongs to the valid action set;
acquiring a mouth shape associated with a spoken language in the spoken language set based on the timestamp association relationship, and adjusting the mouth shape associated with the spoken language into the invalid mouth shape set if the mouth shape associated with the spoken language belongs to the valid mouth shape set.
5. The method of claim 3, wherein the aligning the linguistic form processing result, the new action type processing result, and the new mouth type processing result based on a timestamp association comprises:
aiming at target words in the oral language set and the written language set, acquiring target actions associated with the target words and target mouth shapes associated with the target words based on a timestamp association relation;
determining that the target word also belongs to the same type under the condition that the target action and the target mouth shape belong to the same type set;
and under the condition that the target action and the target mouth shape belong to different types of sets, acquiring a labeling type, and taking the labeling type as the types of the target word, the target action and the target mouth shape.
6. The method of claim 1, wherein the performing the language form processing on the teaching audio in the teaching video to obtain the language form processing result of the teaching audio comprises:
extracting a word in a language form of a spoken language from teaching audio of the teaching video based on a spoken language dictionary, and replacing the word with a written language;
and identifying overlapped words in the teaching audio, and performing de-overlapping processing on the overlapped words outside the white list of the overlapped words.
7. The method of claim 6, wherein the spoken word dictionary is determined from a historical instructional audio of a user to whom the instructional video pertains.
8. The method of claim 1, wherein the performing action type and mouth type processing on the teaching image in the teaching video respectively to obtain an action type processing result and a mouth type processing result of the teaching image comprises:
respectively identifying the motion and the mouth shape of the teaching image in the teaching video to obtain the motion and the mouth shape in the teaching image;
clustering the actions and the mouth shapes in the teaching images to obtain at least two actions and at least two mouth shapes;
dividing at least one action into an effective action set, and dividing at least one action into an ineffective action set; and, dividing at least one die into a set of valid dies and at least one die into a set of invalid dies.
9. An intelligent processing device for teaching video, comprising:
the language form processing module is used for carrying out language form processing on the teaching audio in the teaching video to obtain a language form processing result of the teaching audio;
the action mouth shape processing module is used for respectively carrying out action type and mouth shape type processing on the teaching image in the teaching video to obtain an action type processing result and a mouth shape type processing result of the teaching image;
and the cross checking module is used for carrying out cross checking on at least two items of the language form processing result, the action type processing result and the mouth type processing result so as to obtain a teaching video processing result.
10. The apparatus of claim 9, wherein the cross-check module comprises:
the verification unit is used for verifying the action type processing result and the mouth shape type processing result according to the language form processing result based on the timestamp incidence relation so as to obtain a new action type processing result and a new mouth shape type processing result;
and the alignment unit is used for aligning the language form processing result, the new action type processing result and the new mouth shape type processing result based on the timestamp incidence relation.
11. The apparatus of claim 10, wherein the linguistic form processing result includes a set of spoken words and a set of written words; the action type processing result comprises an invalid action set and a valid action set; the mouth shape processing result comprises an invalid mouth shape set and an effective mouth shape set.
12. The apparatus according to claim 11, wherein the verification unit is specifically configured to:
acquiring an action associated with a spoken language in the set of spoken languages based on a timestamp association, and adjusting the action associated with the spoken language into the invalid action set if the action associated with the spoken language belongs to the valid action set;
acquiring a mouth shape associated with a spoken language in the spoken language set based on the timestamp association relationship, and adjusting the mouth shape associated with the spoken language into the invalid mouth shape set if the mouth shape associated with the spoken language belongs to the valid mouth shape set.
13. The apparatus according to claim 11, wherein the alignment unit is specifically configured to:
aiming at target words in the oral language set and the written language set, acquiring target actions associated with the target words and target mouth shapes associated with the target words based on a timestamp association relation;
determining that the target word also belongs to the same type under the condition that the target action and the target mouth shape belong to the same type set;
and under the condition that the target action and the target mouth shape belong to different types of sets, acquiring a labeling type, and taking the labeling type as the types of the target word, the target action and the target mouth shape.
14. The apparatus of claim 9, wherein the linguistic form processing module comprises:
a spoken language processing unit for extracting a word in a language form of a spoken language from a teaching audio of the teaching video based on a spoken language dictionary and replacing the word with a written language;
and the overlapped word processing unit is used for identifying overlapped words in the teaching audio and carrying out overlap removal processing on the overlapped words outside the white list of the overlapped words.
15. The apparatus of claim 14, wherein the spoken word dictionary is determined from a historical instructional audio of a user to whom the instructional video pertains.
16. The apparatus of claim 9, wherein the action-style processing module comprises:
the action mouth shape recognition unit is used for respectively recognizing the action and the mouth shape of the teaching image in the teaching video to obtain the action and the mouth shape in the teaching image;
the action mouth shape clustering unit is used for clustering the actions and the mouth shapes in the teaching images to obtain at least two actions and at least two mouth shapes;
the action mouth shape dividing unit is used for dividing at least one action into an effective action set and dividing at least one action into an ineffective action set; and, dividing at least one die into a set of valid dies and at least one die into a set of invalid dies.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
CN202110315710.3A 2021-03-24 2021-03-24 Intelligent processing method, device, equipment and storage medium for teaching video Active CN112906650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110315710.3A CN112906650B (en) 2021-03-24 2021-03-24 Intelligent processing method, device, equipment and storage medium for teaching video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110315710.3A CN112906650B (en) 2021-03-24 2021-03-24 Intelligent processing method, device, equipment and storage medium for teaching video

Publications (2)

Publication Number Publication Date
CN112906650A true CN112906650A (en) 2021-06-04
CN112906650B CN112906650B (en) 2023-08-15

Family

ID=76106297

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110315710.3A Active CN112906650B (en) 2021-03-24 2021-03-24 Intelligent processing method, device, equipment and storage medium for teaching video

Country Status (1)

Country Link
CN (1) CN112906650B (en)

Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0082304A1 (en) * 1981-11-20 1983-06-29 Siemens Aktiengesellschaft Method of identifying a person by speech and face recognition, and device for carrying out the method
CN1130969A (en) * 1993-09-08 1996-09-11 Idt股份有限公司 Method and apparatus for data analysis
WO2002050798A2 (en) * 2000-12-18 2002-06-27 Digispeech Marketing Ltd. Spoken language teaching system based on language unit segmentation
US20030018475A1 (en) * 1999-08-06 2003-01-23 International Business Machines Corporation Method and apparatus for audio-visual speech detection and recognition
US20110053123A1 (en) * 2009-08-31 2011-03-03 Christopher John Lonsdale Method for teaching language pronunciation and spelling
JP2011070139A (en) * 2009-09-24 2011-04-07 悦子 ▲蔭▼山 Construction of work system for teaching of language learning, and teaching method of language learning
CN102663928A (en) * 2012-03-07 2012-09-12 天津大学 Electronic teaching method for deaf people to learn speaking
EP2562746A1 (en) * 2011-08-25 2013-02-27 Samsung Electronics Co., Ltd. Apparatus and method for recognizing voice by using lip image
US20130226587A1 (en) * 2012-02-27 2013-08-29 Hong Kong Baptist University Lip-password Based Speaker Verification System
KR20130117624A (en) * 2012-04-17 2013-10-28 삼성전자주식회사 Method and apparatus for detecting talking segments in a video sequence using visual cues
CN103561277A (en) * 2013-05-09 2014-02-05 陕西思智通教育科技有限公司 Transmission method and system for network teaching
US20150325240A1 (en) * 2014-05-06 2015-11-12 Alibaba Group Holding Limited Method and system for speech input
CN108062533A (en) * 2017-12-28 2018-05-22 北京达佳互联信息技术有限公司 Analytic method, system and the mobile terminal of user's limb action
CN109063587A (en) * 2018-07-11 2018-12-21 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN109377540A (en) * 2018-09-30 2019-02-22 网易(杭州)网络有限公司 Synthetic method, device, storage medium, processor and the terminal of FA Facial Animation
US20190130628A1 (en) * 2017-10-26 2019-05-02 Snap Inc. Joint audio-video facial animation system
CN109830132A (en) * 2019-03-22 2019-05-31 邱洵 A kind of foreign language language teaching system and teaching application method
CN109919434A (en) * 2019-01-28 2019-06-21 华中科技大学 A kind of classroom performance intelligent Evaluation method based on deep learning
CN110534109A (en) * 2019-09-25 2019-12-03 深圳追一科技有限公司 Audio recognition method, device, electronic equipment and storage medium
CN110610534A (en) * 2019-09-19 2019-12-24 电子科技大学 Automatic mouth shape animation generation method based on Actor-Critic algorithm
CN111091824A (en) * 2019-11-30 2020-05-01 华为技术有限公司 Voice matching method and related equipment
CN111612352A (en) * 2020-05-22 2020-09-01 北京易华录信息技术股份有限公司 Student expression ability assessment method and device
CN111741326A (en) * 2020-06-30 2020-10-02 腾讯科技(深圳)有限公司 Video synthesis method, device, equipment and storage medium
CN111739534A (en) * 2020-06-04 2020-10-02 广东小天才科技有限公司 Processing method and device for assisting speech recognition, electronic equipment and storage medium
CN111800646A (en) * 2020-06-24 2020-10-20 北京安博盛赢教育科技有限责任公司 Method, device, medium and electronic equipment for monitoring teaching effect
CN111915148A (en) * 2020-07-10 2020-11-10 北京科技大学 Classroom teaching evaluation method and system based on information technology
US20200372115A1 (en) * 2019-05-24 2020-11-26 International Business Machines Corporation Method and System for Language and Domain Acceleration with Embedding Alignment
CN112150638A (en) * 2020-09-14 2020-12-29 北京百度网讯科技有限公司 Virtual object image synthesis method and device, electronic equipment and storage medium
CN112508750A (en) * 2021-02-03 2021-03-16 北京联合伟世科技股份有限公司 Artificial intelligence teaching device, method, equipment and storage medium
CN112528768A (en) * 2020-11-26 2021-03-19 腾讯科技(深圳)有限公司 Action processing method and device in video, electronic equipment and storage medium

Patent Citations (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0082304A1 (en) * 1981-11-20 1983-06-29 Siemens Aktiengesellschaft Method of identifying a person by speech and face recognition, and device for carrying out the method
CN1130969A (en) * 1993-09-08 1996-09-11 Idt股份有限公司 Method and apparatus for data analysis
US20030018475A1 (en) * 1999-08-06 2003-01-23 International Business Machines Corporation Method and apparatus for audio-visual speech detection and recognition
WO2002050798A2 (en) * 2000-12-18 2002-06-27 Digispeech Marketing Ltd. Spoken language teaching system based on language unit segmentation
US20110053123A1 (en) * 2009-08-31 2011-03-03 Christopher John Lonsdale Method for teaching language pronunciation and spelling
JP2011070139A (en) * 2009-09-24 2011-04-07 悦子 ▲蔭▼山 Construction of work system for teaching of language learning, and teaching method of language learning
EP2562746A1 (en) * 2011-08-25 2013-02-27 Samsung Electronics Co., Ltd. Apparatus and method for recognizing voice by using lip image
US20130226587A1 (en) * 2012-02-27 2013-08-29 Hong Kong Baptist University Lip-password Based Speaker Verification System
CN102663928A (en) * 2012-03-07 2012-09-12 天津大学 Electronic teaching method for deaf people to learn speaking
KR20130117624A (en) * 2012-04-17 2013-10-28 삼성전자주식회사 Method and apparatus for detecting talking segments in a video sequence using visual cues
CN103561277A (en) * 2013-05-09 2014-02-05 陕西思智通教育科技有限公司 Transmission method and system for network teaching
US20150325240A1 (en) * 2014-05-06 2015-11-12 Alibaba Group Holding Limited Method and system for speech input
US20190130628A1 (en) * 2017-10-26 2019-05-02 Snap Inc. Joint audio-video facial animation system
CN108062533A (en) * 2017-12-28 2018-05-22 北京达佳互联信息技术有限公司 Analytic method, system and the mobile terminal of user's limb action
CN109063587A (en) * 2018-07-11 2018-12-21 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN109377540A (en) * 2018-09-30 2019-02-22 网易(杭州)网络有限公司 Synthetic method, device, storage medium, processor and the terminal of FA Facial Animation
CN109919434A (en) * 2019-01-28 2019-06-21 华中科技大学 A kind of classroom performance intelligent Evaluation method based on deep learning
CN109830132A (en) * 2019-03-22 2019-05-31 邱洵 A kind of foreign language language teaching system and teaching application method
US20200372115A1 (en) * 2019-05-24 2020-11-26 International Business Machines Corporation Method and System for Language and Domain Acceleration with Embedding Alignment
CN110610534A (en) * 2019-09-19 2019-12-24 电子科技大学 Automatic mouth shape animation generation method based on Actor-Critic algorithm
CN110534109A (en) * 2019-09-25 2019-12-03 深圳追一科技有限公司 Audio recognition method, device, electronic equipment and storage medium
CN111091824A (en) * 2019-11-30 2020-05-01 华为技术有限公司 Voice matching method and related equipment
CN111612352A (en) * 2020-05-22 2020-09-01 北京易华录信息技术股份有限公司 Student expression ability assessment method and device
CN111739534A (en) * 2020-06-04 2020-10-02 广东小天才科技有限公司 Processing method and device for assisting speech recognition, electronic equipment and storage medium
CN111800646A (en) * 2020-06-24 2020-10-20 北京安博盛赢教育科技有限责任公司 Method, device, medium and electronic equipment for monitoring teaching effect
CN111741326A (en) * 2020-06-30 2020-10-02 腾讯科技(深圳)有限公司 Video synthesis method, device, equipment and storage medium
CN111915148A (en) * 2020-07-10 2020-11-10 北京科技大学 Classroom teaching evaluation method and system based on information technology
CN112150638A (en) * 2020-09-14 2020-12-29 北京百度网讯科技有限公司 Virtual object image synthesis method and device, electronic equipment and storage medium
CN112528768A (en) * 2020-11-26 2021-03-19 腾讯科技(深圳)有限公司 Action processing method and device in video, electronic equipment and storage medium
CN112508750A (en) * 2021-02-03 2021-03-16 北京联合伟世科技股份有限公司 Artificial intelligence teaching device, method, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张家华 等: "口型模拟技术及其在网络课程中的应用探索", 《现代教育技术》, vol. 20, no. 3, pages 35 - 38 *

Also Published As

Publication number Publication date
CN112906650B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
CN108962282B (en) Voice detection analysis method and device, computer equipment and storage medium
US10777207B2 (en) Method and apparatus for verifying information
WO2020207167A1 (en) Text classification method, apparatus and device, and computer-readable storage medium
US20180365209A1 (en) Artificial intelligence based method and apparatus for segmenting sentence
US11856277B2 (en) Method and apparatus for processing video, electronic device, medium and product
US9811517B2 (en) Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text
CN112509566B (en) Speech recognition method, device, equipment, storage medium and program product
CN113657269A (en) Training method and device for face recognition model and computer program product
CN109670148A (en) Collection householder method, device, equipment and storage medium based on speech recognition
CN113657483A (en) Model training method, target detection method, device, equipment and storage medium
CN114639386A (en) Text error correction and text error correction word bank construction method
CN111144118A (en) Method, system, device and medium for identifying named entities in spoken text
CN111427996B (en) Method and device for extracting date and time from man-machine interaction text
CN116303951A (en) Dialogue processing method, device, electronic equipment and storage medium
US20220327803A1 (en) Method of recognizing object, electronic device and storage medium
CN112906650A (en) Intelligent processing method, device and equipment for teaching video and storage medium
CN115527520A (en) Anomaly detection method, device, electronic equipment and computer readable storage medium
CN115858776A (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN115906797A (en) Text entity alignment method, device, equipment and medium
CN115565186A (en) Method and device for training character recognition model, electronic equipment and storage medium
CN115631502A (en) Character recognition method, character recognition device, model training method, electronic device and medium
CN114218393A (en) Data classification method, device, equipment and storage medium
CN114528851A (en) Reply statement determination method and device, electronic equipment and storage medium
CN115098729A (en) Video processing method, sample generation method, model training method and device
CN114141236A (en) Language model updating method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant