CN116229943A - Conversational data set generation method and device - Google Patents
Conversational data set generation method and device Download PDFInfo
- Publication number
- CN116229943A CN116229943A CN202310505189.9A CN202310505189A CN116229943A CN 116229943 A CN116229943 A CN 116229943A CN 202310505189 A CN202310505189 A CN 202310505189A CN 116229943 A CN116229943 A CN 116229943A
- Authority
- CN
- China
- Prior art keywords
- dialogue
- segment
- audio
- data
- segments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000013598 vector Substances 0.000 claims description 14
- 238000002372 labelling Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims 1
- 238000004364 calculation method Methods 0.000 description 6
- 238000003860 storage Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Processing Or Creating Images (AREA)
- Studio Circuits (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a method and a device for generating a conversational data set, wherein the method comprises the following steps: acquiring dialogue data from subtitle data corresponding to a multimedia file, wherein the dialogue data comprises dialogue characters and corresponding time stamps; dividing the dialogue data into a plurality of dialogue segments, and dividing the audio file corresponding to the multimedia file based on each dialogue segment and the corresponding starting time stamp thereof to obtain a plurality of audio segments corresponding to the plurality of dialogue segments; and carrying out speaker recognition on the plurality of audio segments, marking the speaker of each sentence in the dialogue segment corresponding to each audio segment according to the recognition result, and taking the marked dialogue segment as a dialogue data set. According to the method and the device for generating the dialogue data, dialogue data are obtained from the subtitle data, and the dialogue data are segmented and marked, so that a dialogue data set is generated, the generation cost of the dialogue data set can be reduced, and the generation speed, the generation efficiency and the degree of diversification of the dialogue data set are improved.
Description
Technical Field
The application belongs to the field of artificial intelligence, and particularly relates to a method and a device for generating a conversational data set.
Background
However, language processing is a palm-top pearl in the field of artificial intelligence, and man-machine dialogue is the final ring in the field of natural language processing. In implementing a human-machine conversation, a large number of conversational datasets are often required.
At present, the general manufacturing method of the conversational data set is still manually generated, the cost is too high, the speed is low, the topic speaking repetition rate of the same batch of personnel is high, and the ever-increasing data demands cannot be met.
Disclosure of Invention
The embodiment of the application aims to provide a method and a device for generating a conversational data set, so as to solve the defect that the prior art cannot meet the data requirement.
In order to solve the technical problems, the application is realized as follows:
in a first aspect, a method for generating a conversational dataset is provided, including the steps of:
acquiring dialogue data from subtitle data corresponding to a multimedia file, wherein the dialogue data comprises dialogue characters and corresponding time stamps;
dividing the dialogue data into a plurality of dialogue segments, and dividing the audio file corresponding to the multimedia file based on each dialogue segment and the corresponding starting time stamp thereof to obtain a plurality of audio segments corresponding to the plurality of dialogue segments;
and carrying out speaker recognition on the plurality of audio segments, marking the speaker of each sentence in the dialogue segment corresponding to each audio segment according to the recognition result, and taking the marked dialogue segment as a dialogue data set.
In a second aspect, there is provided a device for generating a conversational dataset, comprising:
the acquisition module is used for acquiring dialogue data from subtitle data corresponding to the multimedia file, wherein the dialogue data comprises dialogue characters and corresponding time stamps;
the segmentation module is used for segmenting the dialogue data into a plurality of dialogue segments, and segmenting the audio file corresponding to the multimedia file based on each dialogue segment and the corresponding starting time stamp thereof to obtain a plurality of audio segments corresponding to the dialogue segments;
and the labeling module is used for identifying the speakers of the plurality of audio segments, labeling the speaker of each sentence in the conversation segment corresponding to each audio segment according to the identification result, and taking the labeled conversation segment as a conversation type data set.
According to the method and the device for generating the dialogue data, dialogue data are obtained from the subtitle data, and the dialogue data are segmented and marked, so that a dialogue data set is generated, the generation cost of the dialogue data set can be reduced, and the generation speed, the generation efficiency and the degree of diversification of the dialogue data set are improved.
Drawings
FIG. 1 is a flowchart of a method for generating a conversational dataset according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a device for generating a conversational dataset according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Since the mere manual generation of conversational datasets does not meet the ever-increasing data demands, it is necessary to supplement them by other means. However, the existing movie and television series captions are very ideal data sources, but the problems that the captions are difficult to clean, the dialogue sections and the speakers cannot be distinguished, the captions cannot be classified according to the topics and the like still exist.
The embodiment of the application provides a method and a device for generating a conversational data set, which use an automatic mode and can generate conversational data sets in batches; the method uses context comparison, distinguishes dialogue segments, combines multi-modal speaker recognition, performs speaker distinction, and classifies topics by using keyword extraction and the combination of bert word vectors.
The following describes in detail a method for generating a conversational dataset according to the embodiments of the present application through specific embodiments and application scenarios thereof with reference to the accompanying drawings.
As shown in fig. 1, a flowchart of a method for generating a conversational dataset according to an embodiment of the present application includes the following steps:
In particular, the dialog data may be segmented into a plurality of dialog segments based on a minimum dialog turn and a maximum dialog turn of a sliding window limit.
In this embodiment, after the audio file corresponding to the multimedia file is segmented based on each session segment and the start timestamp corresponding to each session segment to obtain a plurality of audio segments corresponding to the plurality of session segments, each audio segment may be further identified, the identified text is matched and aligned with the text content of the session segment corresponding to the audio segment, and if there is an audio segment that cannot be matched, the audio segment is discarded.
And 103, identifying the speakers of the plurality of audio segments, marking the speaker of each sentence in the conversation segment corresponding to each audio segment according to the identification result, and taking the marked conversation segment as a conversation type data set.
In this embodiment, after the labeled dialog segments are used as the dialog data set, keywords may also be extracted from each dialog segment, and the keywords of each dialog segment and the corresponding weight values thereof may be recorded; according to the keywords of each dialogue segment and the corresponding weight values thereof, the matching degree of each dialogue segment and the existing topic classification is calculated respectively; based on the degree of matching, each dialog segment is classified into an existing topic classification.
Specifically, the word vector of each keyword of the existing topic classification and the topic word of the same dialog segment can be calculated respectively, the similarity of the word vector of the topic word and the word vector of each keyword is calculated by using cosine similarity, and the similarity is weighted and averaged based on the weight of each keyword to obtain the matching degree of the dialog segment and the existing topic classification; correspondingly, under the existing topic classification, sorting the dialog segments according to the matching degree of the dialog segments and the existing topic classification.
According to the method and the device for generating the dialogue data, dialogue data are obtained from the subtitle data, and the dialogue data are segmented and marked, so that a dialogue data set is generated, the generation cost of the dialogue data set can be reduced, and the generation speed, the generation efficiency and the degree of diversification of the dialogue data set are improved.
In the embodiment of the application, the movie, the television drama video and the corresponding subtitle data can be collected first, and the subtitle data can be cleaned. Because the formats of different subtitles are slightly different, the cleaning strategy can be adjusted according to the encountered subtitle formats, such as by distinguishing bystandings, lyrics, scene descriptions and the like through symbols, so that only dialogue characters and corresponding time stamps are ensured as much as possible.
Then, the subtitle data is correctly cut through the context comparison. Specifically, a scene segmentation method can be introduced to segment the whole dialogue data into different dialogue segments, and the specific steps are as follows: first, a judgment algorithm is trained through the existing dialogue data set to judge whether the n sentences of a certain section should be truncated at the last sentence. Then, by limiting the size of the sliding window, the minimum and maximum dialog turns of the dialog segment are limited (e.g., the number of sentences required in the dataset is between 8-12 sentences requiring no less than 4 turns and no more than 6 turns).
When the sliding window is used, the sliding window conforming to the size is sequentially identified from beginning to end, and when the identification fails (judgment can not be used as a cut sentence), the next round of sentences are added into the sliding window to continue the identification; when the recognition is successful (the judgment can be used as a cut sentence), all sentences contained in the sliding window are used as a dialogue section, and the previous operation is continuously repeated from the next sentence of the cut sentence. If the window length exceeds 12 sentences, then the above operation is continued from the latter sentence with this truncation. After the recognition is finished, screening out the dialogue sections with less than 8 sentences.
After the dialogue data is divided into a plurality of dialogue segments, corresponding audio can be intercepted according to the time stamp, and the speaker is identified by using a speaker logging tool and is distributed to different sentences (data of too many speakers is filtered). Because conversational datasets require speakers with different sentences of data labeling, caption data typically does not carry such information, and therefore requires distinguishing between the speakers. Firstly, the selected dialogue segments and the corresponding initial time stamps are used to segment the corresponding movie and TV play audio files (about 2 seconds before and after the segments are properly reserved to avoid the problem of inaccurate time stamps). The cut audio is then identified using asr, the identified text is matched and aligned with the subtitle text (if not, i.e., the similarity is below a threshold, the segment is discarded), and the aligned audio file is retained with the corresponding subtitle file. Finally, the speaker recognition tool is used for recognizing the reserved audio and labeling the speaker of each sentence, and only 2 conversational persons are needed in the common conversational data set, so that only the labeled subtitle file with the conversational person number of 2 is selected.
After the labeling is completed, the labeled caption dialogue segments can be classified, and the keyword of each dialogue segment is extracted to summarize the dialogue content, so that the classification and the correlation calculation are convenient. Specifically, tf-idf may be used to extract keywords from all the session segments, and each reserved keyword is 10 (in sequence), and the keywords and their corresponding weight values are recorded.
Afterwards, keywords can be classified in a summary way, matching coefficients (used for sorting) are recorded, namely, the matching degree of a certain section of dialogue content and a certain topic is calculated, the method can be used for classifying new dialogue section data into the existing topic classification, and when topic requirements outside the existing topic classification are acquired, matching degree sorting can be performed rapidly, and a result meeting the requirements is given.
The matching degree calculating method comprises the following steps: if the topic word used for calculation is a, the 5 keywords of the dialogue data segment currently used for calculation and the corresponding weight values thereof are as follows: b1-w1, B2-w2, B3-w3, B4-w4 and B5-w5, word vectors of A and Bn can be calculated by using a trained Chinese universal bert model to obtain AV and BVn, similarity Sn of the AV and each BVn is calculated by using cosine similarity, and weighted average is carried out by wn to obtain final similarity S.
And classifying all the labeled dialogue segments by using the calculation method, and sorting according to the matching degree under the existing classification of the topics, so that the later taking is convenient. When a new topic demand is acquired, for example, topics other than the existing topic system, the matching degree calculation can be performed by using the method, and a dialogue section with a front matching degree can be selected according to the demand.
According to the embodiment of the application, after relevant subtitles and audio data are crawled through the crawlers, the dialogue type dataset is automatically generated, keywords (convenient for later personalized classification and use) and classification types corresponding to each section of dialogue are generated, the dataset generation efficiency and the diversity degree are greatly improved, compared with the case that the subtitle data are reprocessed manually, the speed is improved by more than 80%, and the cost is only one tenth of that of an artificial method.
As shown in fig. 2, a schematic structural diagram of a device for generating a conversational dataset according to an embodiment of the present application includes:
the obtaining module 210 is configured to obtain dialogue data from subtitle data corresponding to the multimedia file, where the dialogue data includes dialogue text and a corresponding timestamp.
The splitting module 220 is configured to split the session data into a plurality of session segments, and split the audio file corresponding to the multimedia file based on each session segment and the start timestamp corresponding to each session segment, so as to obtain a plurality of audio segments corresponding to the plurality of session segments.
Specifically, the segmentation module 220 is specifically configured to segment the session data into a plurality of session segments based on a minimum session round and a maximum session round limited by the sliding window.
The labeling module 230 is configured to identify a speaker from the plurality of audio segments, label a speaker for each sentence in the conversation segment corresponding to each audio segment according to the identification result, and use the labeled conversation segment as a conversation data set.
In this embodiment, the apparatus further includes:
and the extraction module is used for respectively extracting the keywords from each dialogue segment and recording the keywords of each dialogue segment and the corresponding weight values thereof.
And the calculation module is used for calculating the matching degree of each dialogue segment and the existing topic classification according to the keywords of each dialogue segment and the corresponding weight values of the keywords.
The computing module is specifically configured to respectively compute a topic word of an existing topic classification and a word vector of each keyword in the same dialogue segment, compute similarity between the word vector of the topic word and the word vector of each keyword by using cosine similarity, and weight average the similarity based on weight of each keyword to obtain matching degree between the dialogue segment and the existing topic classification.
And the classifying module is used for classifying each dialogue segment into the existing topic classification based on the matching degree.
The classification module is specifically configured to, under an existing topic classification, order each dialog segment according to a matching degree between each dialog segment and the existing topic classification.
In this embodiment, the apparatus further includes:
the recognition module is used for respectively recognizing each audio segment, matching and aligning the recognized characters with the text content of the dialogue segment corresponding to the audio segment, and discarding the audio segment if the audio segment which cannot be matched exists.
According to the method and the device for generating the dialogue data, dialogue data are obtained from the subtitle data, and the dialogue data are segmented and marked, so that a dialogue data set is generated, the generation cost of the dialogue data set can be reduced, and the generation speed, the generation efficiency and the degree of diversification of the dialogue data set are improved.
The embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements each process of the above-mentioned conversational data set generating method embodiment, and the same technical effects can be achieved, so that repetition is avoided, and no further description is given here. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.
Claims (10)
1. A method of generating a conversational dataset, comprising the steps of:
acquiring dialogue data from subtitle data corresponding to a multimedia file, wherein the dialogue data comprises dialogue characters and corresponding time stamps;
dividing the dialogue data into a plurality of dialogue segments, and dividing the audio file corresponding to the multimedia file based on each dialogue segment and the corresponding starting time stamp thereof to obtain a plurality of audio segments corresponding to the plurality of dialogue segments;
and carrying out speaker recognition on the plurality of audio segments, marking the speaker of each sentence in the dialogue segment corresponding to each audio segment according to the recognition result, and taking the marked dialogue segment as a dialogue data set.
2. The method of claim 1, wherein after the labeling of the session as a conversational dataset, further comprising:
extracting keywords from each dialogue segment respectively, and recording the keywords of each dialogue segment and the corresponding weight values thereof;
according to the keywords of each dialogue segment and the corresponding weight values thereof, the matching degree of each dialogue segment and the existing topic classification is calculated respectively;
based on the degree of matching, each dialog segment is classified into an existing topic classification.
3. The method according to claim 2, wherein the calculating the matching degree between each dialogue segment and the existing topic classification according to the keyword of each dialogue segment and the corresponding weight value thereof comprises:
respectively calculating word vectors of topic words classified by the existing topics and each keyword of the same dialogue section, calculating similarity of the word vectors of the topic words and the word vectors of each keyword by using cosine similarity, and carrying out weighted average on the similarity based on the weight of each keyword to obtain the matching degree of the dialogue section and the existing topic classification;
based on the matching degree, classifying each dialogue segment into an existing topic classification specifically comprises the following steps:
and under the existing topic classification, sequencing each dialog segment according to the matching degree of each dialog segment and the existing topic classification.
4. The method according to claim 1, wherein the segmenting the session data into a plurality of session segments, in particular comprises:
the dialog data is sliced into a plurality of dialog segments based on a minimum dialog turn and a maximum dialog turn of a sliding window limit.
5. The method according to claim 1, wherein the step of slicing the audio file corresponding to the multimedia file based on each session and the start time stamp thereof to obtain a plurality of audio segments corresponding to the plurality of session further comprises:
and respectively identifying each audio segment, matching and aligning the identified characters with the text content of the dialogue segment corresponding to the audio segment, and discarding the audio segment if the audio segment which cannot be matched exists.
6. A device for generating a conversational dataset, comprising:
the acquisition module is used for acquiring dialogue data from subtitle data corresponding to the multimedia file, wherein the dialogue data comprises dialogue characters and corresponding time stamps;
the segmentation module is used for segmenting the dialogue data into a plurality of dialogue segments, and segmenting the audio file corresponding to the multimedia file based on each dialogue segment and the corresponding starting time stamp thereof to obtain a plurality of audio segments corresponding to the dialogue segments;
and the labeling module is used for identifying the speakers of the plurality of audio segments, labeling the speaker of each sentence in the conversation segment corresponding to each audio segment according to the identification result, and taking the labeled conversation segment as a conversation type data set.
7. The apparatus as recited in claim 6, further comprising:
the extraction module is used for extracting keywords from each dialogue segment respectively and recording the keywords of each dialogue segment and the corresponding weight values thereof;
the computing module is used for respectively computing the matching degree of each dialogue segment and the existing topic classification according to the keywords of each dialogue segment and the corresponding weight values thereof;
and the classifying module is used for classifying each dialogue segment into the existing topic classification based on the matching degree.
8. The apparatus of claim 7, wherein the device comprises a plurality of sensors,
the computing module is specifically configured to respectively compute word vectors of topic words classified by an existing topic and each keyword in the same dialogue segment, compute similarity between the word vector of the topic word and the word vector of each keyword by using cosine similarity, and perform weighted average on the similarity based on weight of each keyword to obtain matching degree between the dialogue segment and the existing topic classification;
the classifying module is specifically configured to, under an existing topic classification, order each dialog segment according to a matching degree between each dialog segment and the existing topic classification.
9. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
the segmentation module is specifically configured to segment the session data into a plurality of session segments based on a minimum session round and a maximum session round limited by a sliding window.
10. The apparatus as recited in claim 6, further comprising:
the recognition module is used for respectively recognizing each audio segment, matching and aligning the recognized characters with the text content of the dialogue segment corresponding to the audio segment, and discarding the audio segment if the audio segment which cannot be matched exists.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310505189.9A CN116229943B (en) | 2023-05-08 | 2023-05-08 | Conversational data set generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310505189.9A CN116229943B (en) | 2023-05-08 | 2023-05-08 | Conversational data set generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116229943A true CN116229943A (en) | 2023-06-06 |
CN116229943B CN116229943B (en) | 2023-08-15 |
Family
ID=86584638
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310505189.9A Active CN116229943B (en) | 2023-05-08 | 2023-05-08 | Conversational data set generation method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116229943B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108806668A (en) * | 2018-06-08 | 2018-11-13 | 国家计算机网络与信息安全管理中心 | A kind of audio and video various dimensions mark and model optimization method |
US20190318725A1 (en) * | 2018-04-13 | 2019-10-17 | Mitsubishi Electric Research Laboratories, Inc. | Methods and Systems for Recognizing Simultaneous Speech by Multiple Speakers |
KR102041621B1 (en) * | 2019-02-25 | 2019-11-06 | (주)미디어코퍼스 | System for providing artificial intelligence based dialogue type corpus analyze service, and building method therefor |
CN110717031A (en) * | 2019-10-15 | 2020-01-21 | 南京摄星智能科技有限公司 | Intelligent conference summary generation method and system |
CN112818680A (en) * | 2020-07-10 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Corpus processing method and device, electronic equipment and computer-readable storage medium |
CN114996506A (en) * | 2022-05-24 | 2022-09-02 | 腾讯科技(深圳)有限公司 | Corpus generation method and device, electronic equipment and computer-readable storage medium |
CN115269884A (en) * | 2021-04-29 | 2022-11-01 | 华为云计算技术有限公司 | Method, device and related equipment for generating video corpus |
-
2023
- 2023-05-08 CN CN202310505189.9A patent/CN116229943B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190318725A1 (en) * | 2018-04-13 | 2019-10-17 | Mitsubishi Electric Research Laboratories, Inc. | Methods and Systems for Recognizing Simultaneous Speech by Multiple Speakers |
CN108806668A (en) * | 2018-06-08 | 2018-11-13 | 国家计算机网络与信息安全管理中心 | A kind of audio and video various dimensions mark and model optimization method |
KR102041621B1 (en) * | 2019-02-25 | 2019-11-06 | (주)미디어코퍼스 | System for providing artificial intelligence based dialogue type corpus analyze service, and building method therefor |
CN110717031A (en) * | 2019-10-15 | 2020-01-21 | 南京摄星智能科技有限公司 | Intelligent conference summary generation method and system |
CN112818680A (en) * | 2020-07-10 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Corpus processing method and device, electronic equipment and computer-readable storage medium |
CN115269884A (en) * | 2021-04-29 | 2022-11-01 | 华为云计算技术有限公司 | Method, device and related equipment for generating video corpus |
CN114996506A (en) * | 2022-05-24 | 2022-09-02 | 腾讯科技(深圳)有限公司 | Corpus generation method and device, electronic equipment and computer-readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116229943B (en) | 2023-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112818906B (en) | Intelligent cataloging method of all-media news based on multi-mode information fusion understanding | |
US10304458B1 (en) | Systems and methods for transcribing videos using speaker identification | |
CN107305541B (en) | Method and device for segmenting speech recognition text | |
US6925455B2 (en) | Creating audio-centric, image-centric, and integrated audio-visual summaries | |
US8775174B2 (en) | Method for indexing multimedia information | |
CN106878632B (en) | Video data processing method and device | |
CN107491435B (en) | Method and device for automatically identifying user emotion based on computer | |
CN110705254B (en) | Text sentence-breaking method and device, electronic equipment and storage medium | |
CN113766314B (en) | Video segmentation method, device, equipment, system and storage medium | |
CN112668559A (en) | Multi-mode information fusion short video emotion judgment device and method | |
CN111797820B (en) | Video data processing method and device, electronic equipment and storage medium | |
US7349477B2 (en) | Audio-assisted video segmentation and summarization | |
CN111488813B (en) | Video emotion marking method and device, electronic equipment and storage medium | |
CN114598933B (en) | Video content processing method, system, terminal and storage medium | |
CN114996506A (en) | Corpus generation method and device, electronic equipment and computer-readable storage medium | |
CN114051154A (en) | News video strip splitting method and system | |
CN116229943B (en) | Conversational data set generation method and device | |
CN116017088A (en) | Video subtitle processing method, device, electronic equipment and storage medium | |
WO2011039773A2 (en) | Tv news analysis system for multilingual broadcast channels | |
CN114064968A (en) | News subtitle abstract generating method and system | |
Bechet et al. | Detecting person presence in tv shows with linguistic and structural features | |
JP2006251553A (en) | Method, device, and program for topic division processing | |
CN113470617B (en) | Speech recognition method, electronic equipment and storage device | |
CN114510585B (en) | Information characterization model construction method and information characterization method | |
JP2006135387A (en) | Moving image subject dividing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 411, 4th floor, building 4, No.44, Middle North Third Ring Road, Haidian District, Beijing 100088 Patentee after: Beijing Qingshu Intelligent Technology Co.,Ltd. Address before: Building G, 4th Floor, Cultural and Educational Industrial Park, No. 44 North Third Ring Middle Road, Haidian District, Beijing, 100000 Patentee before: BEIJING AISHU WISDOM TECHNOLOGY CO.,LTD. |
|
CP03 | Change of name, title or address |