CN115238129A - Method for extracting multilingual network audio and video data based on AI engine - Google Patents

Method for extracting multilingual network audio and video data based on AI engine Download PDF

Info

Publication number
CN115238129A
CN115238129A CN202210947268.0A CN202210947268A CN115238129A CN 115238129 A CN115238129 A CN 115238129A CN 202210947268 A CN202210947268 A CN 202210947268A CN 115238129 A CN115238129 A CN 115238129A
Authority
CN
China
Prior art keywords
audio
video data
data
text
platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210947268.0A
Other languages
Chinese (zh)
Inventor
李斌斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Xinzhi Technology Co ltd
Original Assignee
Anhui Xinzhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Xinzhi Technology Co ltd filed Critical Anhui Xinzhi Technology Co ltd
Priority to CN202210947268.0A priority Critical patent/CN115238129A/en
Publication of CN115238129A publication Critical patent/CN115238129A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7834Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using audio features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/7867Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Software Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a method for extracting multilingual network audio and video data based on an AI (Artificial intelligence) engine, which belongs to the technical field of AI, and is used for acquiring target audio and video data, classifying the target audio and video data and acquiring first data and second data; establishing a database for storage; extracting audio data of the first data, and classifying the audio of each time period into normal human voice audio, human voice environment sound mixed audio and non-human voice audio based on subtitles with a time axis by using a VADNN algorithm; extracting audio data of the second data, dividing the long audio into small audios by using a VADNN algorithm, numbering from front to back, sending the T divided audios to a multi-language voice recognition engine for recognition to obtain recognition texts corresponding to the audios of all numbers, and determining non-human voice parts in the divided small audios by combining the recognition texts and the VADNN algorithm result; and searching corresponding text segments in the subtitles, and matching the subtitle segments of all the segmented audios.

Description

Method for extracting multi-language network audio and video data based on AI engine
Technical Field
The invention belongs to the technical field of AI (Artificial Intelligence) and particularly relates to a method for extracting multilingual network audio and video data based on an AI engine.
Background
With the development of AI technology, the demand for various languages and various types of data is also increasing. In order to meet the requirements of the AI technology on various data, most AI companies adopt modes such as manual marking, manual recording, manual verification and the like. The method has the problems of high requirements of multiple languages on the quality and the specialty of a marker, high training labor cost, low manufacturing efficiency, high data manufacturing cost and the like in a pure manual mode. In consideration of the fact that a large number of multi-language audios and videos with subtitles, network radio stations and the like exist on a network, in order to solve the problems encountered by manual labeling of various data, the invention provides a method for extracting multi-language network audio and video data based on an AI engine, which can realize automatic extraction of large data of multi-language audios and videos with subtitles, network radio stations and the like, reduce requirements on quality and speciality of a labeling worker, reduce cost and improve data production efficiency.
Disclosure of Invention
In order to solve the problems existing in the scheme, the invention provides a method for extracting multilingual network audio/video data based on an AI engine.
The purpose of the invention can be realized by the following technical scheme:
a method for extracting multi-language network audio and video data based on an AI engine specifically comprises the following steps:
the method comprises the following steps: acquiring target audio and video data based on the Internet, and classifying the target audio and video data to acquire first data and second data; establishing a database for storage;
step two: extracting audio data of the first data, and classifying the audio of each time period into normal human voice audio, human voice environment sound mixed audio and non-human voice audio based on subtitles with a time axis by using a VADNN algorithm;
step three: extracting audio data of the second data, dividing the long audio into small audio by using a VADNN algorithm, and numbering from front to back, wherein the number is 001, 002, \8230, 00T; sending the T divided audios to a multi-language voice recognition engine for recognition to obtain recognition texts corresponding to the audios of all numbers, and determining the non-human voice part in the small divided audios by combining the recognition texts and the VADNN algorithm result;
step four: and searching corresponding text segments in the subtitles, and matching the subtitle segments of all the segmented audios.
Further, in step four, preconditions and parameter settings need to be performed, and the specific method includes:
setting a minimum text segment, marking the minimum text segment as L, identifying the length of each identification text, and filtering the identification texts with the length less than L;
and setting a first M length character, wherein M is a dynamic value, and setting an initial parameter to be N.
Further, searching for a corresponding text segment in the subtitle in four ways, which are respectively:
a. recognizing that a text is completely matched with a text segment in a subtitle;
b. identifying that the first character, the middle character, the tail character and the text length feature of the text are matched with a text segment in the caption;
c. identifying that the first character, the last character and the text length feature of the text are matched with a text segment in the subtitle;
d. identifying that the first character or the tail character of the text and the text length characteristic are matched with a text segment in the caption;
based on the four conditions, recording the number of the current audio, calculating the head and tail position information of the caption, wherein the confidence coefficient a is highest, the confidence coefficient d is lowest, and combining the audio number information of the non-human voice, filling the audio number information determined by the conditions of a, b, c and d into an array of T elements according to the priority order, wherein the high priority covers the low priority;
there are two matching cases:
A. matching numbers 00 (i-1) and 00 (i + 1), and directly positioning to the position information of the 00i number audio;
B. matching numbers 00 (i-2) and 00 (i + 1), wherein 00 (i-1) is non-human voice, and then positioning the audio position information of the 00i number;
and recursively finding all positioned number information, changing the value of M, and circularly confirming all information until finding all subtitle fragments of the segmented audio.
Further, the method for acquiring the target audio and video data based on the internet comprises the following steps:
setting a limiting condition of a target audio/video, acquiring a network platform of the target audio/video meeting the limiting condition from the Internet, marking the network platform as a platform to be selected, screening the platform to be selected, acquiring a platform to be selected to be butted, and marking the platform to be selected as a target platform; the data acquisition module is arranged and comprises a plurality of data acquisition units, the data acquisition units are associated with corresponding target platforms, and the data acquisition units are used for acquiring target audio and video data in the corresponding associated target platforms, identifying the acquired target audio and video data and printing corresponding identification tags.
Further, the method for screening the platform to be selected comprises the following steps:
marking the platform to be selected as j, wherein j =1, 2, \8230, 8230, n and n are positive integers; acquiring target audio and video data volume in a platform to be selected, matching a corresponding data magnitude index according to the acquired target audio and video data volume, marking the target audio and video data volume as LZj, evaluating the implementation cost of the platform to be selected, marking the target audio and video data volume as CBj, setting a quality model, acquiring target audio and video data in a plurality of platforms to be selected, inputting the target audio and video data into the quality model for analysis, acquiring corresponding quality scores, marking the target audio and video data as ZPj, calculating corresponding priority values according to a priority formula, and sorting according to the calculated priority values; and acquiring the demand magnitude of the client, and selecting a corresponding target platform according to the demand magnitude of the client and the sequencing of the platforms to be selected.
Further, the priority formula is Qj = b1 × LZj × (b 2 × CBj + b3 × ZPj), where b1, b2, and b3 are all proportional coefficients and have a value range of 0 and b1 ≦ 1,0 and b2 ≦ 1, and 0 and b3 ≦ 1.
Compared with the prior art, the invention has the beneficial effects that:
the method has the advantages that the corresponding data acquisition channel is established according to the actual condition of a client, the multi-language audio and video data generated in a large amount on the network are fully utilized, the multi-language AI engine is iteratively trained, various economic and time costs generated by the traditional production data are greatly reduced, the automatic extraction of big data such as multi-language audio and video with subtitles, network radio stations and the like is realized, the requirements on the quality and the specialty of a annotator are reduced, the cost is reduced, and the data making efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of the iterative feedback engine mechanism of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 to fig. 2, a method for extracting multilingual network audio/video data based on an AI engine specifically includes:
the method comprises the following steps: acquiring a large amount of target audio and video data based on the Internet, and classifying the target audio and video data to acquire first data and second data; establishing a database for storage;
the method for acquiring a large amount of target audio and video data based on the Internet comprises the following steps:
setting a limiting condition of a target audio and video, wherein the limiting condition is set according to the needs of an AI company, acquiring a network platform of the target audio and video meeting the limiting condition from the Internet, marking the network platform as a platform to be selected, screening the platform to be selected, acquiring a platform to be selected to be butted, and marking the platform to be selected as a target platform; the data acquisition module is arranged and comprises a plurality of data acquisition units, the number of the data acquisition units is the same as that of the target platforms, the data acquisition units are associated with the corresponding target platforms, the data acquisition units are used for acquiring corresponding target audio and video data in the associated target platforms, identifying the acquired target audio and video data and marking corresponding identification tags.
The data acquisition unit is arranged corresponding to the target platform, corresponding data acquisition and identification are carried out in the platform, whether the identified caption with a time axis and the language type of the caption are provided or not is judged, because audio and video data in a network platform are introduced correspondingly generally and have the information of the caption, the language and the like, the specific data acquisition unit can be realized by the existing technology, and the same data acquisition unit can be used for partial network platforms because the operation working modes are the same, so detailed description is not carried out.
The method for screening the platform to be selected comprises the following steps:
marking the platform to be selected as j, wherein j =1, 2, \8230;, n are positive integers; acquiring target audio and video data volume in a platform to be selected, matching corresponding data magnitude indexes according to the acquired target audio and video data volume, marking the target audio and video data volume as LZj, evaluating the implementation cost of the platform to be selected, marking the target audio and video data volume as CBj, setting a quality model, acquiring target audio and video data in a plurality of platforms to be selected, inputting the target audio and video data into the quality model for analysis, acquiring corresponding quality scores, marking the target audio and video data volume as ZPj, calculating corresponding priority values according to a priority formula Qj = b1 xLZj x (b 2 xCBj + b3 xZPj), wherein b1, b2 and b3 are proportional coefficients, the value range is 0 & ltb 1 & lt/1 & gt, 0 & ltb 2 & lt/1 & gt, 0 & ltb 3 & lt/1 & gt, and sorting is carried out according to the calculated priority values; the method comprises the steps of obtaining a requirement magnitude of a client, namely setting according to a target audio and video data amount required by the client, and selecting a corresponding target platform according to the requirement magnitude of the client and the sequencing of platforms to be selected.
Namely, accumulation selection is carried out according to the data magnitude indexes corresponding to the sorted platforms to be selected.
And matching the corresponding data magnitude indexes according to the obtained target audio and video data quantity, dividing different intervals by the expert group according to the possible target audio and video data quantity, setting the corresponding data magnitude indexes in each interval, and obtaining the corresponding data magnitude indexes after matching.
Evaluating the implementation cost of the docking corresponding to the platform to be selected, and determining the corresponding cost according to the cost of the whole process from docking to acquisition, such as the cost of acquiring the corresponding target audio and video frequency, the construction cost, the operation cost and the like is common knowledge in the field.
The quality model is established based on the CNN network or the DNN network, and the specific establishing and training process is common knowledge in the field. Target audio and video data in the plurality of platforms to be selected do not need to be butted with the corresponding platforms to be selected, the target audio and video data can be directly obtained from a network due to the fact that the obtaining amount is small, implementation cost is not consumed, however, the target audio and video data can only be evaluated due to the fact that the target audio and video data are obtained with low efficiency and small quantity through the method, and follow-up development requirements are not met.
In another embodiment, the target audio-video data may be directly obtained by the existing method.
The method for carrying out the target audio/video data category is to classify by whether a subtitle scheme with a time axis is provided or not; the caption scheme with the time axis is the first data, and vice versa is the second data.
Step two: extracting audio data of the first data, and classifying the audio of each time period into normal human voice audio, human voice environment sound mixed audio and non-human voice audio based on subtitles with a time axis by using a VADNN algorithm;
the audio data in the network audio and video and the working principle of the corresponding VADNN algorithm are common knowledge in the art, and therefore, detailed description is not given.
Step three: extracting audio data of the second data, dividing the long audio into small audios by using a VADNN algorithm, and numbering from front to back, wherein the numbers are 001, 002, \8230;, 00T; sending the T divided audios to a multi-language voice recognition engine for recognition to obtain recognition texts corresponding to the audios of all numbers, and determining the non-human voice part in the small divided audios by combining the recognition texts and the VADNN algorithm result;
step four: searching a corresponding text segment in the caption;
performing precondition and parameter setting:
filtering out a text with the identification text length L, wherein the value of L is the minimum text segment, so that the problem of wrong positioning information caused by matching of the text segment which is too small with a plurality of segments in the caption is avoided;
setting a first M length character, wherein M is a dynamic value, such as 1,2,3, \8230;, N; setting an initial parameter to be N;
searching a corresponding text segment in the caption by four ways:
a. recognizing that a text is completely matched with a certain text segment in the caption;
b. identifying that the first character, the middle character, the tail character, the text length and other characteristics of the text are matched with a certain text segment in the caption;
c. identifying that the characteristics of the first character, the tail character, the text length and the like of the text are matched with a certain text segment in the caption;
d. identifying that the characteristics of the first character or the tail character of the text, the length of the text and the like are matched with a certain text segment in the subtitle;
in the above four cases, the number of the current audio is recorded, the head and tail position information of the caption is calculated, wherein the confidence coefficient a is highest, the confidence coefficient d is lowest, the audio number information determined by the conditions of a, b, c and d is filled into an array of T elements according to the priority order by combining the audio number information of the non-human voice, the high priority covers the low priority, and the specific part which is not disclosed is common knowledge in the field; two matching situations occur:
a. matching numbers 00 (i-1) and 00 (i + 1), and directly positioning to the position information of the 00i number audio;
b. matching numbers 00 (i-2) and 00 (i + 1), wherein 00 (i-1) is non-human voice, and then positioning the audio position information of the 00i number;
and recursively finding all the number information capable of being positioned, changing the value of M, and circularly confirming all the information until finding the optimal subtitle fragments of all the segmented audios.
The above formulas are all calculated by removing dimensions and taking numerical values thereof, the formula is a formula which is obtained by acquiring a large amount of data and performing software simulation to obtain the closest real situation, and the preset parameters and the preset threshold value in the formula are set by the technical personnel in the field according to the actual situation or obtained by simulating a large amount of data.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.

Claims (6)

1. A method for extracting multilingual network audio and video data based on an AI engine is characterized by comprising the following specific steps:
the method comprises the following steps: acquiring target audio and video data based on the Internet, and classifying the target audio and video data to acquire first data and second data; establishing a database for storage;
step two: extracting audio data of the first data, and classifying the audio of each time period into normal human voice audio, human voice environment sound mixed audio and non-human voice audio based on subtitles with a time axis by using a VADNN algorithm;
step three: extracting audio data of the second data, dividing the long audio into small audio by using a VADNN algorithm, and numbering from front to back, wherein the number is 001, 002, \8230, 00T; the segmented T audios are sent to a multi-language voice recognition engine for recognition, recognition texts corresponding to the audios of all numbers are obtained, and meanwhile, the non-human voice part in the segmented small audios is determined by combining the recognition texts and a VADNN algorithm result;
step four: and searching corresponding text segments in the subtitles, and matching all the subtitle segments of the segmented audio.
2. The method for extracting multilingual network audio-video data based on the AI engine according to claim 1, wherein preconditions and parameter settings are required in step four, and the specific method comprises:
setting a minimum text segment, marking the minimum text segment as L, identifying the length of each identification text, and filtering the identification texts with the length less than L;
and setting a first M length character, wherein M is a dynamic value, and setting an initial parameter to be N.
3. The method for extracting multilingual network audio-video data based on the AI engine according to claim 2, wherein the corresponding text segment is searched in the subtitle in four ways, which are:
a. recognizing that a text is completely matched with a text segment in a subtitle;
b. identifying that the first character, the middle character, the tail character and the text length feature of the text are matched with a text fragment in the caption;
c. identifying that the first character, the tail character and the text length feature of the text are matched with a text fragment in the caption;
d. identifying that the first character or the tail character of the text and the text length characteristic are matched with a text segment in the subtitle;
based on the four conditions, recording the number of the current audio, calculating the head and tail position information of the caption, wherein the confidence coefficient a is highest, the confidence coefficient d is lowest, and the audio number information determined by the conditions a, b, c and d is filled into an array of T elements according to the priority order by combining the audio number information of the non-human voice, and the high priority covers the low priority;
there are two matching cases:
A. matching numbers 00 (i-1) and 00 (i + 1), and directly positioning to the position information of the 00i number audio;
B. matching numbers 00 (i-2) and 00 (i + 1), wherein 00 (i-1) is non-human voice, and then positioning the audio position information of the 00i number;
and recursively finding all positioned number information, changing the value of M, and circularly confirming all information until finding all subtitle fragments of the segmented audio.
4. The method for extracting multilingual network audio-video data based on the AI engine according to claim 1, wherein the method for obtaining the target audio-video data based on the internet comprises:
setting a limiting condition of a target audio/video, acquiring a network platform of the target audio/video meeting the limiting condition from the Internet, marking the network platform as a platform to be selected, screening the platform to be selected, acquiring a platform to be selected to be butted, and marking the platform to be selected as a target platform; the data acquisition module is arranged and comprises a plurality of data acquisition units, the data acquisition units are associated with corresponding target platforms, and the data acquisition units are used for acquiring corresponding target audio and video data in the associated target platforms, identifying the acquired target audio and video data and marking corresponding identification tags.
5. The AI engine-based method for extracting multilingual network audio-video data according to claim 4, wherein the method for screening the platform to be selected comprises:
marking the platform to be selected as j, wherein j =1, 2, \8230;, n are positive integers; acquiring target audio and video data volume in a platform to be selected, matching a corresponding data magnitude index according to the acquired target audio and video data volume, marking the target audio and video data volume as LZj, evaluating the implementation cost of the platform to be selected, marking the target audio and video data volume as CBj, setting a quality model, acquiring target audio and video data in a plurality of platforms to be selected, inputting the target audio and video data into the quality model for analysis, acquiring corresponding quality scores, marking the target audio and video data as ZPj, calculating corresponding priority values according to a priority formula, and sorting according to the calculated priority values; and acquiring the demand magnitude of the client, and selecting a corresponding target platform according to the demand magnitude of the client and the sequencing of the platforms to be selected.
6. The method for extracting multilingual network audio-video data based on the AI engine according to claim 5, wherein the priority formula is Qj = b1 xllzj x (b 2 xcbj + b3 xzpj), wherein b1, b2, and b3 are all proportionality coefficients, and the value range is 0-bl 1-1, 0-bl 2-1, 0-bl 3-1.
CN202210947268.0A 2022-08-09 2022-08-09 Method for extracting multilingual network audio and video data based on AI engine Pending CN115238129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210947268.0A CN115238129A (en) 2022-08-09 2022-08-09 Method for extracting multilingual network audio and video data based on AI engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210947268.0A CN115238129A (en) 2022-08-09 2022-08-09 Method for extracting multilingual network audio and video data based on AI engine

Publications (1)

Publication Number Publication Date
CN115238129A true CN115238129A (en) 2022-10-25

Family

ID=83678639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210947268.0A Pending CN115238129A (en) 2022-08-09 2022-08-09 Method for extracting multilingual network audio and video data based on AI engine

Country Status (1)

Country Link
CN (1) CN115238129A (en)

Similar Documents

Publication Publication Date Title
CN112269901B (en) Fault distinguishing and reasoning method based on knowledge graph
CN112818906B (en) Intelligent cataloging method of all-media news based on multi-mode information fusion understanding
CN108764480B (en) Information processing system
CN111209728B (en) Automatic labeling and inputting method for test questions
CN112506858B (en) File management method for intelligent brain of intelligent meeting room
CN110750974A (en) Structured processing method and system for referee document
CN117094311B (en) Method for establishing error correction filter for Chinese grammar error correction
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
CN110751234A (en) OCR recognition error correction method, device and equipment
CN114357206A (en) Education video color subtitle generation method and system based on semantic analysis
CN114547232A (en) Nested entity identification method and system with low labeling cost
CN112699671B (en) Language labeling method, device, computer equipment and storage medium
CN111783416B (en) Method for constructing document image data set by using priori knowledge
CN110765107B (en) Question type identification method and system based on digital coding
CN115238129A (en) Method for extracting multilingual network audio and video data based on AI engine
CN111291535A (en) Script processing method and device, electronic equipment and computer readable storage medium
CN115690810A (en) OCR recognition method and system with online automatic optimization function
CN114218437A (en) Adaptive picture clipping and fusing method, system, computer device and medium
CN114078470A (en) Model processing method and device, and voice recognition method and device
CN111651960A (en) Optical character joint training and recognition method for moving from contract simplified form to traditional form
CN115687334B (en) Data quality inspection method, device, equipment and storage medium
CN111488327A (en) Data standard management method and system
CN117973334B (en) Automatic identification importing method based on file form
CN110427613B (en) Method and system for finding similar meaning words and computer readable storage medium
CN118210946A (en) Automatic generation method, medium and system for digital teaching video courseware

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination