WO2020155750A1 - Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium - Google Patents

Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium Download PDF

Info

Publication number
WO2020155750A1
WO2020155750A1 PCT/CN2019/117261 CN2019117261W WO2020155750A1 WO 2020155750 A1 WO2020155750 A1 WO 2020155750A1 CN 2019117261 W CN2019117261 W CN 2019117261W WO 2020155750 A1 WO2020155750 A1 WO 2020155750A1
Authority
WO
WIPO (PCT)
Prior art keywords
subtitle
audio
video
state parameter
segmented audio
Prior art date
Application number
PCT/CN2019/117261
Other languages
French (fr)
Chinese (zh)
Inventor
杨雨晨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020155750A1 publication Critical patent/WO2020155750A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Definitions

  • This application belongs to the field of natural language processing technology, and relates to artificial intelligence-based corpus collection methods, devices, equipment, and storage media.
  • Artificial Intelligence is a new technological science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science. It attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a similar way to human intelligence. Research in this field includes robotics, language recognition, image recognition, Natural language processing and expert systems, etc.
  • the existing methods of collecting corpus of a certain scene mainly include: (1) Obtain the corpus of a certain scene through free resource search. The corpus obtained in this way is very limited and difficult to meet the needs; (2) Through the team's own recording and recording To obtain the corpus of a certain scene by labeling, this method is inefficient and extremely labor-intensive; (3) It is more costly to purchase the corpus of a certain scene through channels.
  • the embodiments of the present application disclose an artificial intelligence-based corpus collection method, device, equipment, and storage medium that can quickly collect corpus that meets a certain scenario.
  • Some embodiments of the present application disclose an artificial intelligence-based corpus collection method, including: obtaining configuration item information input by a user, the configuration item information including target video keywords and video websites, the video website being a video website URL or the name of the video website; download from the video website the video data of the target video obtained by retrieving the target video keyword, the video data including the video file and the SRT subtitle file; separate the audio file from the video file, And split the subtitle text content parsed from the SRT subtitle file into subtitle blocks; divide the audio file according to the segmentation time of each subtitle block to obtain the segmented audio; establish the association between the segmented audio and the subtitle block; The subsequent segmented audio and subtitle blocks are classified and filtered according to preset filtering keywords and then stored together as the target corpus.
  • an artificial intelligence-based corpus collection device including: a configuration item information acquisition module for acquiring configuration item information input by a user, the configuration item information including target video keywords and video websites
  • the video website is the URL of the video website or the name of the video website;
  • the video data download module is used to download the video data of the target video obtained by retrieving the target video keyword from the video website, and the video data includes Video files and SRT subtitle files;
  • audio subtitle processing module used to separate audio files from video files, and split the subtitle text content parsed from SRT subtitle files into subtitle blocks; audio segmentation module, used to separate each The segmentation time of the subtitle block divides the audio file to obtain the segmented audio;
  • the audio subtitle block association module is used to establish the association between the segmented audio and the subtitle block;
  • the filtering module is used to associate the associated segmented audio and subtitles
  • the blocks are classified and filtered according to preset filtering keywords and then stored together as the target corpus.
  • Some embodiments of the present application also disclose a computer device, including a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor executes the computer-readable instructions, the above artificial intelligence-based The steps of the corpus collection method.
  • Some embodiments of the present application also disclose a non-volatile readable storage medium.
  • the non-volatile readable storage medium stores computer readable instructions. When the computer readable instructions are executed by a processor, Implement the steps of the above artificial intelligence-based corpus collection method.
  • Fig. 1 is a flowchart of an artificial intelligence-based corpus collection method provided by an embodiment of the application
  • FIG. 2 is a flowchart of a second specific implementation manner of step S106 in FIG. 1;
  • step S106 in FIG. 1 is a flowchart of a third specific implementation manner of step S106 in FIG. 1;
  • FIG. 4 is a schematic flowchart of a specific implementation of step S405 in FIG. 3;
  • Figure 5 is a schematic diagram of an artificial intelligence-based corpus collection device provided by an embodiment of the application.
  • FIG. 6 is a schematic diagram of the audio subtitle processing module in FIG. 5;
  • FIG. 7 is a schematic structural diagram of a second embodiment of the screening module in FIG. 5;
  • FIG. 8 is a schematic structural diagram of a third embodiment of the screening module in FIG. 5;
  • Fig. 9 is a schematic structural diagram of the voice state parameter score calculation module in Fig. 8.
  • Fig. 10 is a block diagram of the basic structure of a computer device 100 in an embodiment of the present application.
  • the embodiment of the application provides a method for collecting corpus based on artificial intelligence.
  • Fig. 1 is a schematic diagram of artificial intelligence-based corpus collection provided by an embodiment of this application; as shown in Fig. 1, the artificial intelligence-based corpus collection method includes:
  • configuration item information input by a user, where the configuration item information includes a target video keyword and a video website, and the video website is the URL of the video website or the name of the video website.
  • the target video keywords include keywords indicating the video name or video type;
  • the video website may be the name of the video website, such as iQiyi, Youku, or the URL of the video website, such as iQiyi, Youku.
  • the video data in the embodiments of this application includes subtitle files and video files with audio and video signals.
  • the video data can be, for example, movies, TV shows, variety shows, news, animations, songs and other video data, or it can involve specific content.
  • video data such as consumer rights protection, complaints, ordering dialogues, and specific cartoon content.
  • Web crawlers also known as web spiders or web robots
  • Web crawlers are computer-readable instructions or scripts that automatically crawl information on the World Wide Web in accordance with certain rules.
  • the implementation manner of downloading video data may include:
  • the web crawler finds the video URL according to the video URL entered by the user, opens the web page containing the target video, and automatically downloads the target video. For example, when it is necessary to download the movie "When Happiness Comes Knock", the user can pre-set the video URL containing the movie, and the web crawler finds the video URL according to the video URL entered by the user, opens the web page containing the target video, and automatically downloads the target video.
  • the second method is to obtain the video website input by the user.
  • the video website can be the name of the video website, such as iQiyi, Youku, or the URL of the video website, such as iQiyi, Youku.
  • the web crawler enters the name of the video website in the preset search engine like Baidu and other web pages to retrieve the URL of the video website, and then opens the video website such as iQiyi , Enter the target video keyword in the search box of video sites such as iQiyi and Youku to search for the target video. According to the search results, the web crawler opens the searched webpages in turn and downloads all the videos.
  • the web crawler opens the corresponding video website, and enters the target video keyword in the search box of the video website to search for the target video, and the web crawler searches As a result, open the searched web pages one by one to download all the videos.
  • the target video keyword may be, for example, the name of a cartoon such as "bear infestation", or may be a keyword that represents video content such as "cooking".
  • Target video keyword setting In reality, there will be an automatic complaint handling platform for customer service.
  • To filter the keywords of the type of video resources that need to be downloaded for example, we know that a certain type of program (such as mediation programs, after-sales rights protection programs) contains a lot of complaints, anger, and dissatisfaction. You can set this type of program
  • the name of "consumption proposition" is the target video keyword; in some scenes, the atmosphere is more cheerful and lively. For example, early childhood education also involves a lot of voice recognition and other technologies.
  • Types of keywords for example, we know that a certain type of programs like animation programs are mostly programs watched by children, which are very suitable for early education. You can set the keyword as "animation + children".
  • step S103 includes two sub-steps: separating the audio file from the video file, specifically, separating the audio in the video file through the video and audio separation technology to obtain a separate audio file.
  • the subtitle text content parsed from the SRT subtitle file is divided into subtitle blocks.
  • step S103 The two sub-steps in step S103 belong to a parallel relationship, regardless of time sequence.
  • the subtitle text content shown below can be obtained by parsing the SRT subtitle file
  • “1”, “2”, and “3” represent the serial numbers of the subtitles.
  • “1” represents the first subtitle that appears in the audio signal
  • “2” represents the second subtitle that appears in the audio signal
  • " 3” represents the third subtitle appearing in the audio signal;
  • the audio signal mainly includes the part with subtitles and the blank part without subtitles.
  • Each subtitle corresponds to two times.
  • the first “time” (“--> The time on the left of “” indicates the start time of the subtitles in the audio signal
  • the second “time” (the time on the right of "-->”) indicates the end time of the subtitles, from the start time to the end time is the playback time of the subtitles.
  • “00:00:00,162" is the start time of the first subtitle in the audio signal
  • "00:00:01,875" is the end time of the first subtitle
  • "00:00:00,162-->00:00: 01,875" is the playback time of the subtitle content "from now” of the first subtitle.
  • “From now on” is the subtitle content of the first subtitle
  • "I only love you, spoil you, and will not lie to you” is the subtitle content of the second subtitle
  • “No one can beat you, scold you, bully you, If someone bullies you, I will come out to help you as soon as possible” is the subtitle content of the third subtitle.
  • the subtitle text content is divided into blocks by combining the playback time and the sentence breaks to obtain subtitle blocks; for example, “from now” is divided into one subtitle block, "I only love you, do Lie to you” is split into a subtitle block, "No one can beat you, scold you, bully you, and if someone bullies you, I will come out to help you immediately” split into a subtitle block.
  • S104 Split the audio file according to the segmentation time of each subtitle block to obtain segmented audio.
  • each subtitle corresponds to two times.
  • the first "time” represents the start time of the subtitles in the audio signal
  • the second “time” represents the end time of the subtitles, from start time to end Time is the playing time of subtitles. Since the subtitle block is split according to the playing time of the subtitle, the start time to the end time of each subtitle block can be obtained from the playing time of the subtitle, and then the audio file is divided according to the start time to the end time of each subtitle block , For example, split into “00:00:00,162-- ⁇ 00:00:01,875", "00:00:02,800-- ⁇ 00:00:03,000", "00:00:06,560-- ⁇ 00:00: 11,520” and so on, a segment of segmented audio, and the final segmented audio has a one-to-one correspondence with the subtitle block.
  • segmented audio with the time period of "00:00:00,162-->00:00:01,875" is associated with the segmented subtitle "from now on”.
  • the associated segmented audio and subtitle blocks can be stored in a designated folder address or stored separately, but the file names of the two must be consistent.
  • the associated subtitle block segmented audio and subtitle block are classified and filtered according to preset filtering keywords, and then stored together as a target corpus.
  • the configuration item information input by the user is obtained, and the video data of the target video is downloaded from the video website; then the video data is processed, the audio file is separated from the video file, and the SRT subtitle file is parsed
  • the subtitle text content is divided into subtitle blocks; the audio is divided according to the segmentation time of each subtitle block; the segmented audio and the subtitle block are associated; the associated segmented audio and subtitle block are filtered according to the preset keywords After classification and screening, they are stored together as the target corpus, which realizes the purpose of quickly and automatically collecting the required corpus that meets a certain type of scene, such as the preset screening keywords, with high efficiency and low cost.
  • the step of classifying and filtering the associated segmented audio and subtitle blocks according to preset screening keywords and storing them as the target corpus together includes: analyzing each subtitle block Whether it contains text that matches the preset filtering keywords; store the subtitle block containing the matched text together with the segmented audio associated with the subtitle block in the designated first location.
  • the pre-selected keyword classification method is used to help filter the required corpus.
  • people may use more insulting words when they are angry; when they are happy, they may use some positive words. Therefore, if you need to collect corpus of anger emotions, the preset screening keywords can be "too excessive”, “I am angry”, or “slut” or “idiot” that means cursing. If you need to collect corpus of positive emotions , The preset filtering keywords can be "Everyday Upward", “Struggle”, “Refueling” and so on.
  • the stored associated segmented audio and subtitle blocks are the target corpus.
  • FIG. 2 is a flowchart of a second specific implementation manner of step S106 in FIG. 1;
  • the step of classifying and filtering the associated segmented audio and subtitle blocks according to preset screening keywords and storing them as the target corpus together includes: S301. Analyzing whether each subtitle block contains The text matching the preset filtering keywords; S302. Store the subtitle block containing the matching text together with the segmented audio associated with the subtitle block in the designated first position.
  • the pre-selected keyword classification method is used to help filter the required corpus.
  • people may use more insulting words when they are angry; when they are happy, they may use some positive words. Therefore, if you need to collect corpus of anger emotions, the preset screening keywords can be "too excessive”, “I am angry”, or “slut” or “idiot” that means cursing. If you need to collect corpus of positive emotions , The preset filtering keywords can be "Everyday Upward", “Struggle”, “Refueling” and so on.
  • the stored associated segmented audio and subtitle blocks are the target corpus.
  • the screening configuration item information in this embodiment includes not only screening keywords used for screening, but also voice state parameters used to assist in analyzing the emotional category of the segmented audio.
  • the voice state parameters may include volume, frequency, Amplitude, speed of speech, and intonation.
  • the step of selecting the segmented audio of all the voice state parameters in the preset standard interval and storing the subtitle block associated with the segmented audio in the designated second position it also includes: presetting each voice The standard interval of the state parameter.
  • the step of presetting the standard interval of each voice state parameter includes: obtaining a corpus sample marked with the target emotion category for statistical analysis, and obtaining that the probability of each voice state parameter under the target emotion category is greater than a preset value
  • the range of the speech state parameter; an interval included in the range is extracted from the range as a preset standard interval.
  • the corpus sample can be manually collected samples that you think meet a certain type of emotion you want, or it can be an existing sample collected by other methods; the interval included in the range can be The interval with the same range may also be an interval within the range, for example, the range is 50 to 70, and the interval included in the range may be 50 to 70, or 50 to 60, 55 to 65, 60 to 70 and so on.
  • the voice state parameter of frequency we find a corpus sample library marked with the target emotion category (such as anger), test the frequency value of each corpus sample, and plot the frequency probability Normal distribution graph, it is found that the probability of all corpus samples whose frequencies are in the range of 50 ⁇ 70Hz in all corpus samples is greater than the preset value (for example, 97%), and the frequency of the target emotion category can be obtained The probability of each is greater than the range of the preset value of the voice state parameter.
  • the method can be used to obtain that the probability of each voice state parameter under the target emotion category is greater than the range of the preset value of the voice state parameter;
  • the range interval is used as the preset standard interval. You can also select a cell in the interval 50 ⁇ 70Hz, such as 50 ⁇ 60Hz, 55 ⁇ 65Hz, 60 ⁇ 70Hz as the preset standard interval, and other speech state parameters such as volume, amplitude, speech rate And intonation.
  • All the voice state parameters are in the segmented audio of the preset standard interval, that is, the five voice state parameters of the segmented audio are in their respective preset standard intervals.
  • FIG. 3 is a flowchart of a third specific implementation manner of step S106 in FIG. 1;
  • Step S302 after storing the subtitle block containing the matching text together with the segmented audio associated with the subtitle block in the designated first location, further includes: S405. Calculate each subtitle block stored in the first location The score of each voice state parameter of the segmented audio. S406. Perform a summation operation on the scores of all voice state parameters in the same segmented audio, and confirm whether the total score reaches a preset threshold. Among them, the preset threshold can be set by experience or demand, for example, 80 points, 90 points, etc. S407. Store the segmented audio whose total score reaches the preset threshold, together with the subtitle block associated with the segmented audio, to the specified third location.
  • the screening configuration item information in this embodiment includes not only screening keywords used for screening, but also voice state parameters used to assist in analyzing the emotions of segmented audio.
  • the voice state parameters include volume, frequency, amplitude, Speaking speed and intonation.
  • FIG. 4 is a schematic flowchart of a specific implementation of step S405 in FIG. 3; more specifically, the score of each voice state parameter of each segment audio stored in the first position is calculated
  • the value steps include:
  • the corpus samples can be manually collected samples that you think meet a certain type of emotion you want, or they can be obtained corpus samples collected by other methods.
  • the speech state parameter of frequency we find a corpus sample library marked with the target emotion category (such as anger), test the frequency value of each corpus sample, and draw the probability normal distribution diagram of the frequency , It is found that the probability of samples with a frequency in the range of 50 ⁇ 70Hz in all corpus samples is greater than the preset value (for example, 97%), and the probability of the frequency in the target emotion category is greater than the preset value
  • the range of the speech state parameters in the same way, the probability of each speech state parameter under the target emotion category can be obtained by using this method to be greater than the preset value of the speech state parameter range.
  • the frequency standard value is represented by W frequency
  • W volume , W amplitude , W speech rate, and W intonation respectively represent the preset standard values of other speech state parameters.
  • M i 100 * S i * (X i / W i); wherein, M i is the score for each speech state parameter, S i is a weight value for each speech state parameter, X i is a speech state parameters of the test Value, W i is the preset standard value of the voice state parameter, and i represents the voice state parameter, which can specifically be volume, frequency, amplitude, speech rate, and intonation.
  • the frequency specific value is represented by X frequency , X volume , X amplitude , X speech rate , and X intonation respectively represent the specific values of other speech state parameters.
  • P volume , P amplitude , P speech rate and P intonation respectively represent the similarity of other speech state parameters.
  • P volume X volume / W volume
  • P frequency X frequency / W frequency
  • P amplitude X amplitude / W amplitude
  • P speech rate X speech rate / W speech rate
  • P intonation X intonation /W intonation .
  • the weight value is represented by S i .
  • the weight value of each speech state parameter is S volume , S frequency , S amplitude , S speech rate, and S intonation .
  • Set a weight value for each speech state parameter in advance for example, when a person is angry At that time, the sound is obviously much louder, so the weight of the volume is relatively large, which can be set to 60%.
  • M volume 100 * S volume * (X volume / W volume )
  • M frequency 100 * S frequency * (X frequency / W frequency )
  • M amplitude 100 * S amplitude * (X amplitude / W amplitude )
  • M language Speed 100*S speech speed *(X speech speed /W speech speed )
  • M intonation 100*S intonation *(X intonation /W intonation )
  • M volume , M frequency , M amplitude , M speech rate, and M intonation respectively represent the scores of each speech state parameter
  • S volume , S frequency , S amplitude , S speech rate, and S intonation respectively represent each speech state parameter The weight value
  • M represents the total score of the segmented audio
  • M the total score of the same segmented audio
  • the advantage brought by the embodiments of the present application is to realize the purpose of automatically and quickly collecting the required corpus that meets a certain type of scene, with high efficiency and low cost, and setting a variety of voice state parameters. Perform statistical analysis on the corpus samples to obtain a range, select the range within the range as the preset standard interval or select a specific value in the range as the preset standard value, test the segmented audio, calculate the score, and select The emotion of the target corpus is more in line with the standard.
  • FIG. 5 is a schematic structural diagram of a first embodiment of the artificial intelligence-based corpus collection device described in this application;
  • the artificial intelligence-based corpus collection device includes: a configuration item information acquisition module 51, a video data download module 52, an audio subtitle processing module 53, an audio segmentation module 54, an audio subtitle block association module 55, and a screening module 56.
  • the configuration item information obtaining module 51 is configured to obtain configuration item information input by the user; wherein the configuration item information includes a target video keyword and a video website, and the video website is the URL of the video website or the name of the video website.
  • the video data download module 52 is used to download the video data of the target video obtained by retrieving the target video keyword from the video website, the video data includes a video file and an SRT subtitle file;
  • the audio subtitle processing module 53 is used to Separate the audio file from the video file, and split the subtitle text content parsed from the SRT subtitle file into subtitle blocks;
  • the audio segmentation module 54 is used to segment the audio file according to the segmentation time of each subtitle block to obtain Segment audio;
  • audio subtitle block association module 55 used to establish the association between segmented audio and subtitle blocks; filtering module 56 used to classify and filter the associated segmented audio and subtitle blocks according to preset filtering keywords Stored together as the target corpus.
  • FIG. 6 is a schematic structural diagram of the audio subtitle processing module in FIG. 5; specifically, in an embodiment of the present application, the audio subtitle processing module 53 includes: a subtitle splitting module 531, configured to parse SRT subtitle files to obtain subtitles Text content; combine the playback time and sentence breaks to divide the subtitle text content into blocks to obtain subtitle blocks; the audio and video separation module 532 is used to separate audio files from the video files.
  • a subtitle splitting module 531 configured to parse SRT subtitle files to obtain subtitles Text content
  • the audio and video separation module 532 is used to separate audio files from the video files.
  • the screening module 56 includes: a keyword matching module 561 for analyzing whether each subtitle block Contains text matching the preset filtering keywords; the first storage module 562 is used to store the subtitle block containing the matching text together with the segmented audio associated with the changed subtitle block in the specified first location .
  • the screening module 56 includes, in addition to the keyword matching module 561 and the first storage module 562, a voice state parameter judgment module 563 for judging the score stored in the first location. Whether each voice state parameter of a segment of audio is within a preset standard interval; wherein, the voice state parameter is included in the preset filtering configuration item information to assist in analyzing the emotion of the segmented audio; the second storage module 564 uses After selecting the segmented audio with all the speech state parameters in the preset standard interval, the subtitle block associated with the segmented audio is stored in the designated second location.
  • FIG. 8 is a schematic structural diagram of the second embodiment of the screening module in FIG. 5; specifically, in other embodiments, the screening module 56 includes a keyword matching module 561 and a first storage module 562. It also includes: a voice state parameter score calculation module 565, configured to calculate the score of each voice state parameter of each segment of audio stored in the first position; a total score calculation and judgment module 566, configured to combine the same The scores of all speech state parameters in the segmented audio are summed, and it is confirmed whether the total score reaches the preset threshold; the third storage module 567 is used to combine the segmented audio with the total score reached the preset threshold with The subtitle block associated with the segmented audio is stored together in the specified third location.
  • a voice state parameter score calculation module 565 configured to calculate the score of each voice state parameter of each segment of audio stored in the first position
  • a total score calculation and judgment module 566 configured to combine the same The scores of all speech state parameters in the segmented audio are summed, and it is confirmed whether the total score
  • Figure 9 is a schematic structural diagram of the voice state parameter score calculation module in Figure 8; specifically, the voice state parameter score calculation module 565 includes: a range analysis module 5651 for obtaining corpus marked with target emotion categories The samples are analyzed statistically, and the probability of each speech state parameter under the target emotion category is greater than the preset value of the speech state parameter range; among them, the corpus sample can be manually collected by the person who thinks it meets a certain category that you want The samples of the target emotion category may also be obtained samples collected by other methods.
  • a range analysis module 5651 for obtaining corpus marked with target emotion categories
  • the samples are analyzed statistically, and the probability of each speech state parameter under the target emotion category is greater than the preset value of the speech state parameter range; among them, the corpus sample can be manually collected by the person who thinks it meets a certain category that you want
  • the samples of the target emotion category may also be obtained samples collected by other methods.
  • the speech state parameter of frequency we find a corpus sample library marked with the target emotion category (such as anger), test the frequency value of each corpus sample, and draw the probability normal distribution diagram of the frequency , It is found that the probability of samples with a frequency in the range of 50 ⁇ 70Hz in all corpus samples is greater than the preset value (for example, 97%), you can get the probability that the frequency of the target emotion category is greater than the preset value.
  • the preset value for example, 97%)
  • the probability of each speech state parameter under the target emotion category can be obtained by using this method to be greater than the preset value of the speech state parameter range.
  • the standard value setting module 5652 is used to select a value within the range as the preset standard value of the speech state parameter; among them, the frequency standard value is represented by W frequency , W volume , W amplitude , W speech speed and W intonation respectively
  • the preset standard value of other speech state parameters can be a median value or any value within the range, for example, in this embodiment, the median frequency of 60 Hz is selected as the preset standard value of frequency .
  • the test value module 5653 is used to test each voice state parameter value of each segment audio stored in the first position.
  • the score calculation module 5654 is used to calculate the score of each voice state parameter based on the preset standard value of the voice state parameter, the tested voice state parameter value and the received weight value according to the following formula:
  • M i 100 * S i * (X i / W i); wherein, M i is the score for each speech state parameter, S i is a weight value for each speech state parameter, X i is a speech state parameters of the test Value, W i is the preset standard value of the voice state parameter, and i represents the voice state parameter, which can specifically be volume, frequency, amplitude, speech rate, and intonation.
  • FIG. 10 is a block diagram of the basic structure of the computer device 100 in an embodiment of the application.
  • the computer device 100 includes a memory 101, a processor 102, and a network interface 103 that are communicatively connected to each other through a system bus.
  • FIG. 10 only shows the computer device 100 with components 101-103, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 101 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the memory 101 may be an internal storage unit of the computer device 100, such as a hard disk or memory of the computer device 100.
  • the memory 101 may also be an external storage device of the computer device 100, for example, a plug-in hard disk, a smart media card (SMC), and a secure digital device equipped on the computer device 100. (Secure Digital, SD) card, flash card (Flash Card), etc.
  • the memory 101 may also include both an internal storage unit of the computer device 100 and an external storage device thereof.
  • the memory 101 is generally used to store an operating system and various application software installed in the computer device 100, such as the aforementioned artificial intelligence-based corpus collection method.
  • the memory 101 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 102 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 102 is generally used to control the overall operation of the computer device 100.
  • the processor 102 is configured to run computer-readable instructions or process data stored in the memory 101, for example, run the computer-readable instructions of the aforementioned artificial intelligence-based corpus collection method.
  • the network interface 103 may include a wireless network interface or a wired network interface, and the network interface 103 is generally used to establish a communication connection between the computer device 100 and other electronic devices.
  • This application also provides another implementation manner, that is, to provide a non-volatile readable storage medium, the non-volatile readable storage medium stores a document information entry process, and the document information entry process can be at least One processor executes, so that the at least one processor executes the steps of any of the foregoing artificial intelligence-based corpus collection methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

An artificial intelligence-based corpus collecting method, apparatus, and device, and a storage medium, related to the technical field of natural language processing. The method comprises: acquiring configuration item information inputted by a user (S101); downloading from a video website video data of a target video produced by searching for a target video keyword, the video data comprising a video file and an SRT subtitle file (S102); separating an audio file from the video file, splitting a subtitle text content parsed from the SRT subtitle file into subtitle blocks (S103); segmenting the audio file on the basis of a segment time of each subtitle block to acquire segmented audios (S104); establishing associations between the segmented audios and the subtitle blocks (S105); sorting and screening the associated segmented audios and subtitle blocks according to a preset screening keyword and then jointly storing as a target corpus (S106). The method implements the goal of automatically and quickly collecting a corpus satisfying requirements of a certain type of scenarios and is highly efficient and inexpensive.

Description

基于人工智能的语料收集方法、装置、设备及存储介质Artificial intelligence-based corpus collection method, device, equipment and storage medium
本申请以2019年1月28日提交的申请号为201910081793.7,名称为“基于人工智能的语料收集方法、装置、设备及存储介质”的中国发明专利申请为基础,并要求其优先权。This application is based on the Chinese invention patent application filed on January 28, 2019 with the application number 201910081793.7 and titled "Artificial Intelligence-Based Corpus Collection Method, Device, Equipment, and Storage Medium", and claims priority.
技术领域Technical field
本申请属于自然语言处理技术领域,涉及基于人工智能的语料收集方法、装置、设备及存储介质。This application belongs to the field of natural language processing technology, and relates to artificial intelligence-based corpus collection methods, devices, equipment, and storage media.
背景技术Background technique
人工智能(Artificial Intelligence,AI),是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门新的技术科学。人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。Artificial Intelligence (AI) is a new technological science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. Artificial intelligence is a branch of computer science. It attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a similar way to human intelligence. Research in this field includes robotics, language recognition, image recognition, Natural language processing and expert systems, etc.
现实中,基于AI的自然语言处理过程中往往需要预先收集符合各类场景的语料,比如说在客服投诉自动处理平台应用中:需要关于表现“投诉”“不满”以及时间紧急程度的语料,以便于根据紧急、严重程度可以灵活调整工单接入的优先级以及处理对接人,以帮助快速处理投诉解决问题;又比如说,在幼儿早教以及幼儿趣味对话软件应用中,需要基于幼儿的声音而且情绪比较欢快活泼的语料。In reality, AI-based natural language processing often requires pre-collection of corpus that meets various scenarios. For example, in the application of the automatic processing platform for customer service complaints: corpus on the performance of "complaints", "dissatisfaction" and time urgency is needed to facilitate According to the urgency and severity, the priority of work order access and the handling of the docking person can be flexibly adjusted to help quickly deal with complaints and solve problems; for example, in early childhood education and children’s fun dialogue software applications, it needs to be based on children’s voice and The mood is more cheerful and lively corpus.
现有收集某一场景的语料方式主要有:(1)通过免费的资源搜索获得某一场景的语料,这种方式获取到的语料非常有限,难以满足需求;(2)通过团队自己录音并进行标注而获得某一场景的语料,这种办法效率低,极其耗费人力;(3)通过渠道购买某一场景的语料,这种方式成本较高。The existing methods of collecting corpus of a certain scene mainly include: (1) Obtain the corpus of a certain scene through free resource search. The corpus obtained in this way is very limited and difficult to meet the needs; (2) Through the team's own recording and recording To obtain the corpus of a certain scene by labeling, this method is inefficient and extremely labor-intensive; (3) It is more costly to purchase the corpus of a certain scene through channels.
因此,现有语料收集的方法效率低而且成本很高,怎样快速收集到符合某一类场景需要的语料也成为迫切需要解决的问题。Therefore, the existing corpus collection methods are inefficient and costly. How to quickly collect corpus that meets the needs of a certain type of scene has become an urgent problem to be solved.
发明内容Summary of the invention
本申请实施例公开了一种能够快速收集到符合某一种场景的语料的基于人工智能的语料收集方法、装置、设备及存储介质。The embodiments of the present application disclose an artificial intelligence-based corpus collection method, device, equipment, and storage medium that can quickly collect corpus that meets a certain scenario.
本申请的一些实施例公开了一种基于人工智能的语料收集方法,包括:获取用户输入的配置项信息,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的 网址或视频网站的名称;从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件;从视频文件中分离出音频文件,并将SRT字幕文件解析出来的字幕文本内容拆分成字幕块;根据每个字幕块的分段时间切分音频文件,获得分段音频;建立分段音频和字幕块之间的关联;对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。Some embodiments of the present application disclose an artificial intelligence-based corpus collection method, including: obtaining configuration item information input by a user, the configuration item information including target video keywords and video websites, the video website being a video website URL or the name of the video website; download from the video website the video data of the target video obtained by retrieving the target video keyword, the video data including the video file and the SRT subtitle file; separate the audio file from the video file, And split the subtitle text content parsed from the SRT subtitle file into subtitle blocks; divide the audio file according to the segmentation time of each subtitle block to obtain the segmented audio; establish the association between the segmented audio and the subtitle block; The subsequent segmented audio and subtitle blocks are classified and filtered according to preset filtering keywords and then stored together as the target corpus.
本申请的一些实施例还公开了一种基于人工智能的语料收集装置,包括:配置项信息获取模块,用于获取用户输入的配置项信息,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的网址或视频网站的名称;视频数据下载模块,用于从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件;音频字幕处理模块,用于从视频文件中分离出音频文件,并将SRT字幕文件解析出来的字幕文本内容拆分成字幕块;音频切分模块,用于根据每个字幕块的分段时间切分音频文件,获得分段音频;音频字幕块关联模块,用于建立分段音频和字幕块之间的关联;筛选模块,用于对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。Some embodiments of the present application also disclose an artificial intelligence-based corpus collection device, including: a configuration item information acquisition module for acquiring configuration item information input by a user, the configuration item information including target video keywords and video websites The video website is the URL of the video website or the name of the video website; the video data download module is used to download the video data of the target video obtained by retrieving the target video keyword from the video website, and the video data includes Video files and SRT subtitle files; audio subtitle processing module, used to separate audio files from video files, and split the subtitle text content parsed from SRT subtitle files into subtitle blocks; audio segmentation module, used to separate each The segmentation time of the subtitle block divides the audio file to obtain the segmented audio; the audio subtitle block association module is used to establish the association between the segmented audio and the subtitle block; the filtering module is used to associate the associated segmented audio and subtitles The blocks are classified and filtered according to preset filtering keywords and then stored together as the target corpus.
本申请的一些实施例还公开了一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时,实现上述基于人工智能的语料收集方法的步骤。Some embodiments of the present application also disclose a computer device, including a memory and a processor. The memory stores computer-readable instructions. When the processor executes the computer-readable instructions, the above artificial intelligence-based The steps of the corpus collection method.
本申请的一些实施例还公开了一种非易失性可读存储介质,所述非易失性可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时,实现上述基于人工智能的语料收集方法的步骤。Some embodiments of the present application also disclose a non-volatile readable storage medium. The non-volatile readable storage medium stores computer readable instructions. When the computer readable instructions are executed by a processor, Implement the steps of the above artificial intelligence-based corpus collection method.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the present application are presented in the following drawings and descriptions, and other features and advantages of the present application will become apparent from the description, drawings and claims.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings needed in the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, without creative labor, other drawings can be obtained from these drawings.
图1为本申请实施例提供的基于人工智能的语料收集方法的流程图;Fig. 1 is a flowchart of an artificial intelligence-based corpus collection method provided by an embodiment of the application;
图2为图1中步骤S106的第二种具体实施方式的流程图;FIG. 2 is a flowchart of a second specific implementation manner of step S106 in FIG. 1;
图3为图1中步骤S106的第三种具体实施方式的流程图;3 is a flowchart of a third specific implementation manner of step S106 in FIG. 1;
图4为图3中步骤S405的一种具体实施方式的流程示意图;FIG. 4 is a schematic flowchart of a specific implementation of step S405 in FIG. 3;
图5为本申请实施例提供的基于人工智能的语料收集装置的示意图;Figure 5 is a schematic diagram of an artificial intelligence-based corpus collection device provided by an embodiment of the application;
图6为图5中的音频字幕处理模块的示意图;FIG. 6 is a schematic diagram of the audio subtitle processing module in FIG. 5;
图7为图5中的筛选模块的第二实施例的结构示意图;FIG. 7 is a schematic structural diagram of a second embodiment of the screening module in FIG. 5;
图8为图5中的筛选模块的第三实施例的结构示意图;FIG. 8 is a schematic structural diagram of a third embodiment of the screening module in FIG. 5;
图9为图8中的语音状态参数分值计算模块的结构示意图;Fig. 9 is a schematic structural diagram of the voice state parameter score calculation module in Fig. 8;
图10本申请实施例中计算机设备100基本结构框图。Fig. 10 is a block diagram of the basic structure of a computer device 100 in an embodiment of the present application.
具体实施方式detailed description
为了便于理解本申请,下面将参照相关附图对本申请进行更全面的描述。附图中给出了本申请的较佳实施例。但是,本申请可以以许多不同的形式来实现,并不限于本文所描述的实施例。相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。In order to facilitate the understanding of the application, the application will be described in a more comprehensive manner with reference to the relevant drawings. The preferred embodiments of the application are shown in the drawings. However, this application can be implemented in many different forms and is not limited to the embodiments described herein. On the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of this application more thorough and comprehensive.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中在本申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terms used in the specification of the application herein are only for the purpose of describing specific embodiments, and are not intended to limit the application.
本申请实施例提供了一种基于人工智能的语料收集方法。参阅图1,为本申请实施例提供的基于人工智能的语料收集的示意图;如图1中所示意的,所述基于人工智能的语料收集的方法包括:The embodiment of the application provides a method for collecting corpus based on artificial intelligence. Refer to Fig. 1, which is a schematic diagram of artificial intelligence-based corpus collection provided by an embodiment of this application; as shown in Fig. 1, the artificial intelligence-based corpus collection method includes:
S101.获取用户输入的配置项信息,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的网址或视频网站的名称。S101. Obtain configuration item information input by a user, where the configuration item information includes a target video keyword and a video website, and the video website is the URL of the video website or the name of the video website.
其中,目标视频关键字包括表示视频名称或者视频类型的关键字;视频网站可以为视频网站的名称例如爱奇艺、优酷,也可以为视频网站的网址例如爱奇艺、优酷的网址。Among them, the target video keywords include keywords indicating the video name or video type; the video website may be the name of the video website, such as iQiyi, Youku, or the URL of the video website, such as iQiyi, Youku.
S102.从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件。S102. Download from the video website the video data of the target video obtained by retrieving the target video keyword, the video data including a video file and an SRT subtitle file.
具体的,本申请实施例中的视频数据包括有字幕文件和具有音视频信号的视频文件,视频数据可以是例如电影、电视剧、综艺、新闻、动画、歌曲等视频数据,也可以是涉及具体内容例如消费维权、投诉、点餐对话、具体的动画片内容等的视频数据。Specifically, the video data in the embodiments of this application includes subtitle files and video files with audio and video signals. The video data can be, for example, movies, TV shows, variety shows, news, animations, songs and other video data, or it can involve specific content. For example, video data such as consumer rights protection, complaints, ordering dialogues, and specific cartoon content.
网络爬虫(又被称为网页蜘蛛、网络机器人),是一种按照一定的规则,自动地抓取万维网信息的计算机可读指令或者脚本。具体的,下载视频数据的实施方式可以包括:Web crawlers (also known as web spiders or web robots) are computer-readable instructions or scripts that automatically crawl information on the World Wide Web in accordance with certain rules. Specifically, the implementation manner of downloading video data may include:
第一种方式,网络爬虫根据用户输入的视频网址找到该视频网址,打开包含该目标视频的网页,自动下载目标视频。例如,当需要下载电影《当幸福来敲门》,用户可以预先设置包含有该电影的视频网址,网络爬虫根据用户输入的视频网址找到该视频网址,打开包含该 目标视频的网页,自动下载目标视频。In the first way, the web crawler finds the video URL according to the video URL entered by the user, opens the web page containing the target video, and automatically downloads the target video. For example, when it is necessary to download the movie "When Happiness Comes Knock", the user can pre-set the video URL containing the movie, and the web crawler finds the video URL according to the video URL entered by the user, opens the web page containing the target video, and automatically downloads the target video.
第二种方式,获取用户输入的视频网站,视频网站可以为视频网站的名称例如爱奇艺、优酷,也可以为视频网站的网址例如爱奇艺、优酷的网址。当获取的视频网站为视频网站的名称例如爱奇艺、优酷等,网络爬虫在预设的搜索引擎像百度等网页中输入视频网站的名称检索到视频网站的网址,打开视频网站例如爱奇艺,在爱奇艺、优酷等视频网站的搜索框中输入目标视频关键字搜索到目标视频,网络爬虫根据搜索的结果,依次打开搜索出来的网页将所有的视频均下载下来。当获取的视频网站为视频网站的网址例如爱奇艺、优酷的网址时,网络爬虫打开对应的视频网站,并在视频网站的搜索框中输入目标视频关键字搜索到目标视频,网络爬虫根据搜索的结果,依次打开搜索出来的网页将所有的视频均下载下来。其中的目标视频关键字可以为例如“熊出没”等动画片的名称,也可以为例如“做饭”等表示视频内容的关键字。The second method is to obtain the video website input by the user. The video website can be the name of the video website, such as iQiyi, Youku, or the URL of the video website, such as iQiyi, Youku. When the obtained video website is the name of the video website such as iQiyi, Youku, etc., the web crawler enters the name of the video website in the preset search engine like Baidu and other web pages to retrieve the URL of the video website, and then opens the video website such as iQiyi , Enter the target video keyword in the search box of video sites such as iQiyi and Youku to search for the target video. According to the search results, the web crawler opens the searched webpages in turn and downloads all the videos. When the obtained video website is the URL of the video website, such as the URL of iQiyi and Youku, the web crawler opens the corresponding video website, and enters the target video keyword in the search box of the video website to search for the target video, and the web crawler searches As a result, open the searched web pages one by one to download all the videos. The target video keyword may be, for example, the name of a cartoon such as "bear infestation", or may be a keyword that represents video content such as "cooking".
目标视频关键字的设置:现实中,会有客服自动投诉处理平台,当我们知道某些电影属于情绪比较愤慨的电影,可以预先设置关键字为该电影的名称例如“XXX”,也可以设置用于筛选需要下载的视频资源的类型的关键字,比如说,我们知道某一类节目(像调解节目,关于售后维权等节目)里面有很多关于抱怨、气愤、不满的内容,可以设置该类节目的名称例如“消费主张”为目标视频关键字;在一些场景中,气氛是比较欢快活泼的,例如说关于幼儿早教方面也涉及很多语音识别等技术,可以设置用于筛选需要下载的视频资源的类型的关键字,比如说,我们知道某一类节目像动画节目里面多是属于幼儿看的节目,是很适合早教的内容,可以设置关键字为“动画+幼儿”。Target video keyword setting: In reality, there will be an automatic complaint handling platform for customer service. When we know that some movies are emotionally angry movies, we can preset the keywords to the movie’s name such as "XXX", or set it to To filter the keywords of the type of video resources that need to be downloaded, for example, we know that a certain type of program (such as mediation programs, after-sales rights protection programs) contains a lot of complaints, anger, and dissatisfaction. You can set this type of program The name of "consumption proposition" is the target video keyword; in some scenes, the atmosphere is more cheerful and lively. For example, early childhood education also involves a lot of voice recognition and other technologies. You can set a filter to filter the video resources that need to be downloaded. Types of keywords, for example, we know that a certain type of programs like animation programs are mostly programs watched by children, which are very suitable for early education. You can set the keyword as "animation + children".
进一步的,为了表明需要的是视频格式的资源可以在目标视频关键字中加上“视频”,例如“消费主张+视频”、“动画+幼儿+视频”等限定搜索的为视频资源。Further, in order to indicate that what is needed is a video format resource, you can add "video" to the target video keyword, for example, "consumption proposition + video", "animation + toddler + video" and so on to limit the search to video resources.
以上只是列举,不用于限制本申请。The above are only enumerated and not used to limit this application.
S103.从视频文件中分离出音频文件,并将SRT字幕文件解析出来的字幕文本内容切分成字幕块。S103. Separate the audio file from the video file, and divide the subtitle text content parsed from the SRT subtitle file into subtitle blocks.
具体的,步骤S103中包括两个分步骤:从视频文件中分离出音频文件,具体的,通过视频音频分离技术将视频文件中的音频分离出来,得到单独的音频文件。将SRT字幕文件解析出来的字幕文本内容切分成字幕块。Specifically, step S103 includes two sub-steps: separating the audio file from the video file, specifically, separating the audio in the video file through the video and audio separation technology to obtain a separate audio file. The subtitle text content parsed from the SRT subtitle file is divided into subtitle blocks.
步骤S103中的两个分步骤属于并列关系,不分时间先后。The two sub-steps in step S103 belong to a parallel relationship, regardless of time sequence.
具体的,在本实施例中,解析SRT字幕文件可以得到如下所示的字幕文本内容;Specifically, in this embodiment, the subtitle text content shown below can be obtained by parsing the SRT subtitle file;
11
00:00:00,162-->00:00:01,87500:00:00,162-->00:00:01,875
从现在开始from now on
22
00:00:02,800-->00:00:03,00000:00:02,800-->00:00:03,000
我只疼你,宠你,不会骗你I only love you, spoil you, and will not lie to you
33
00:00:06,560-->00:00:11,52000:00:06,560-->00:00:11,520
没有人能打你,骂你,欺负你,有人欺负你我会第一时间出来帮你No one can beat you, scold you, bully you, if someone bullies you, I will come out to help you the first time
其中,“1”、“2”、“3”表示字幕的序号,例如“1”代表该音频信号中出现的第1个字幕,“2”代表该音频信号中出现的第2个字幕,“3”代表该音频信号中出现的第3个字幕;音频信号中主要包括有字幕的部分和没有字幕的空白部分,每一个字幕都对应两个时间,第一个“时间”(“-->”左边的时间)表示音频信号中字幕出现的开始时间,第二个“时间”(“-->”右边的时间)表示字幕结束的时间,从开始时间到结束时间为字幕的播放时间。例如“00:00:00,162”为音频信号中第1个字幕出现的开始时间,“00:00:01,875”为第1个字幕结束的时间,“00:00:00,162--〉00:00:01,875”为第1个字幕的字幕内容“从现在开始”的播放时间。“从现在开始”为第1个字幕的字幕内容,“我只疼你,宠你,不会骗你”为第2个字幕的字幕内容,“没有人能打你,骂你,欺负你,有人欺负你我会第一时间出来帮你”为第3个字幕的字幕内容。Among them, "1", "2", and "3" represent the serial numbers of the subtitles. For example, "1" represents the first subtitle that appears in the audio signal, and "2" represents the second subtitle that appears in the audio signal, " 3" represents the third subtitle appearing in the audio signal; the audio signal mainly includes the part with subtitles and the blank part without subtitles. Each subtitle corresponds to two times. The first "time" ("--> The time on the left of "" indicates the start time of the subtitles in the audio signal, the second "time" (the time on the right of "-->") indicates the end time of the subtitles, from the start time to the end time is the playback time of the subtitles. For example, "00:00:00,162" is the start time of the first subtitle in the audio signal, "00:00:01,875" is the end time of the first subtitle, "00:00:00,162-->00:00: 01,875" is the playback time of the subtitle content "from now" of the first subtitle. "From now on" is the subtitle content of the first subtitle, "I only love you, spoil you, and will not lie to you" is the subtitle content of the second subtitle, "No one can beat you, scold you, bully you, If someone bullies you, I will come out to help you as soon as possible" is the subtitle content of the third subtitle.
具体的,本实施例中,结合播放时间和断句符对字幕文本内容进行分块,得到字幕块;例如“从现在开始”拆分为一个字幕块,“我只疼你,宠你,不会骗你”拆分为一个字幕块,“没有人能打你,骂你,欺负你,有人欺负你我会第一时间出来帮你”拆分为一个字幕块。S104.根据每个字幕块的分段时间切分音频文件,获得分段音频。Specifically, in this embodiment, the subtitle text content is divided into blocks by combining the playback time and the sentence breaks to obtain subtitle blocks; for example, "from now" is divided into one subtitle block, "I only love you, do Lie to you" is split into a subtitle block, "No one can beat you, scold you, bully you, and if someone bullies you, I will come out to help you immediately" split into a subtitle block. S104. Split the audio file according to the segmentation time of each subtitle block to obtain segmented audio.
在解析得到的字幕文本中,每一个字幕都对应两个时间,第一个“时间”表示音频信号中字幕出现的开始时间,第二个“时间”表示字幕结束的时间,从开始时间到结束时间为字幕的播放时间。由于字幕块是按照字幕的播放时间拆分的,因此由字幕的播放时间可以得到每个字幕块的开始时间到结束时间,再按照每一个字幕块的开始时间到结束时间将音频文件进行切分,比如说切分成“00:00:00,162--〉00:00:01,875”、“00:00:02,800--〉00:00:03,000”、“00:00:06,560--〉00:00:11,520”等等一段一段的分段音频,最后拆分出来的分段音频与字幕块一一对应。In the parsed subtitle text, each subtitle corresponds to two times. The first "time" represents the start time of the subtitles in the audio signal, and the second "time" represents the end time of the subtitles, from start time to end Time is the playing time of subtitles. Since the subtitle block is split according to the playing time of the subtitle, the start time to the end time of each subtitle block can be obtained from the playing time of the subtitle, and then the audio file is divided according to the start time to the end time of each subtitle block , For example, split into "00:00:00,162--〉00:00:01,875", "00:00:02,800--〉00:00:03,000", "00:00:06,560--〉00:00: 11,520" and so on, a segment of segmented audio, and the final segmented audio has a one-to-one correspondence with the subtitle block.
S105.建立分段音频和字幕块之间的关联。S105. Establish an association between the segmented audio and the subtitle block.
将分段音频和字幕块进行关联,例如时间段为“00:00:00,162--〉00:00:01,875”的分段音频与分段字幕“从现在开始”关联起来。关联起来的分段音频和字幕块可以存储在一个 指定的文件夹地址也可以分开存储,但是两者的文件名称需要一致。Associate the segmented audio with the subtitle block, for example, the segmented audio with the time period of "00:00:00,162-->00:00:01,875" is associated with the segmented subtitle "from now on". The associated segmented audio and subtitle blocks can be stored in a designated folder address or stored separately, but the file names of the two must be consistent.
S106.对关联后的字幕块分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。S106. The associated subtitle block segmented audio and subtitle block are classified and filtered according to preset filtering keywords, and then stored together as a target corpus.
在本申请的实施例中,获取用户输入的配置项信息,从所述视频网站下载目标视频的视频数据;然后处理所述视频数据,从视频文件中分离出音频文件,并将SRT字幕文件解析出来的字幕文本内容切分成字幕块;根据每个字幕块的分段时间切分音频;将分段音频和字幕块关联起来;对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料,实现了快速自动收集到所需要的符合某一类场景例如符合预设筛选关键词的语料的目的,效率高而且成本低。In the embodiment of the present application, the configuration item information input by the user is obtained, and the video data of the target video is downloaded from the video website; then the video data is processed, the audio file is separated from the video file, and the SRT subtitle file is parsed The subtitle text content is divided into subtitle blocks; the audio is divided according to the segmentation time of each subtitle block; the segmented audio and the subtitle block are associated; the associated segmented audio and subtitle block are filtered according to the preset keywords After classification and screening, they are stored together as the target corpus, which realizes the purpose of quickly and automatically collecting the required corpus that meets a certain type of scene, such as the preset screening keywords, with high efficiency and low cost.
具体的,步骤S106的第一种具体实施方式中,所述对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料的步骤包括:分析每个字幕块是否包含有与预设筛选关键词相匹配的文本;将包含有相匹配的文本的字幕块连同与该字幕块相关联的分段音频一起存储到指定的第一位置。Specifically, in the first specific implementation manner of step S106, the step of classifying and filtering the associated segmented audio and subtitle blocks according to preset screening keywords and storing them as the target corpus together includes: analyzing each subtitle block Whether it contains text that matches the preset filtering keywords; store the subtitle block containing the matched text together with the segmented audio associated with the subtitle block in the designated first location.
具体的,本申请实施例中通过预设筛选关键词分类筛选的方式帮助筛选需要的语料。通常人在愤怒的状态下可能会使用更多的侮辱性的词汇;在高兴的时候,可能会使用一些积极向上的词汇。因此,如果需要收集愤怒情绪的语料,预设筛选关键词可以为“太过分”、“我很生气”或者表示骂人的“贱人”、“傻子”等等,如果需要收集积极向上情绪的语料,预设筛选关键词可以为“天天向上”、“奋斗”、“加油”等等。Specifically, in the embodiment of the present application, the pre-selected keyword classification method is used to help filter the required corpus. Usually, people may use more insulting words when they are angry; when they are happy, they may use some positive words. Therefore, if you need to collect corpus of anger emotions, the preset screening keywords can be "too excessive", "I am angry", or "slut" or "idiot" that means cursing. If you need to collect corpus of positive emotions , The preset filtering keywords can be "Everyday Upward", "Struggle", "Refueling" and so on.
抓取每个字幕块的文本与预设的筛选关键词进行比对,确认所述字幕块是否包含有与预设筛选关键词相匹配的文本,其中,筛选关键词匹配的方式可以为模糊匹配。存储下来的相关联的分段音频和字幕块为目标语料。Grab the text of each subtitle block and compare it with the preset screening keywords to confirm whether the subtitle block contains text that matches the preset screening keywords, where the screening keyword matching method can be fuzzy matching . The stored associated segmented audio and subtitle blocks are the target corpus.
请参阅图2,图2为图1中步骤S106的第二种具体实施方式的流程图;Please refer to FIG. 2, which is a flowchart of a second specific implementation manner of step S106 in FIG. 1;
具体的,在一些实施例中,所述对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料的步骤包括:S301.分析每个字幕块是否包含有与预设筛选关键词相匹配的文本;S302.将包含有相匹配的文本的字幕块连同与该字幕块相关联的分段音频一起存储到指定的第一位置。Specifically, in some embodiments, the step of classifying and filtering the associated segmented audio and subtitle blocks according to preset screening keywords and storing them as the target corpus together includes: S301. Analyzing whether each subtitle block contains The text matching the preset filtering keywords; S302. Store the subtitle block containing the matching text together with the segmented audio associated with the subtitle block in the designated first position.
具体的,本申请实施例中通过预设筛选关键词分类筛选的方式帮助筛选需要的语料。通常人在愤怒的状态下可能会使用更多的侮辱性的词汇;在高兴的时候,可能会使用一些积极向上的词汇。因此,如果需要收集愤怒情绪的语料,预设筛选关键词可以为“太过分”、“我很生气”或者表示骂人的“贱人”、“傻子”等等,如果需要收集积极向上情绪的语料,预设筛选关键词可以为“天天向上”、“奋斗”、“加油”等等。Specifically, in the embodiment of the present application, the pre-selected keyword classification method is used to help filter the required corpus. Usually, people may use more insulting words when they are angry; when they are happy, they may use some positive words. Therefore, if you need to collect corpus of anger emotions, the preset screening keywords can be "too excessive", "I am angry", or "slut" or "idiot" that means cursing. If you need to collect corpus of positive emotions , The preset filtering keywords can be "Everyday Upward", "Struggle", "Refueling" and so on.
抓取每个字幕块的文本与预设的筛选关键词进行比对,确认所述字幕块是否包含有与预设筛选关键词相匹配的文本,其中,筛选关键词匹配的方式可以为模糊匹配。存储下来的相关联的分段音频和字幕块为目标语料。Grab the text of each subtitle block and compare it with the preset screening keywords to confirm whether the subtitle block contains text that matches the preset screening keywords, where the screening keyword matching method can be fuzzy matching . The stored associated segmented audio and subtitle blocks are the target corpus.
S303.判断前述存储在第一位置的各分段音频的每一个语音状态参数是否在预设标准区间;S304.挑选出所有的语音状态参数均在预设标准区间的分段音频连同与该分段音频相关联的字幕块一起存储到指定的第二位置。S303. Determine whether each voice state parameter of each segmented audio stored in the first position is in the preset standard interval; S304. Select all the segmented audios whose voice state parameters are in the preset standard interval together with the segment audio The subtitle blocks associated with the audio segment are stored together in the specified second location.
具体的,本实施例中的筛选配置项信息除了包括用于筛选的筛选关键词之外,还包括用于辅助分析分段音频的情绪类别的语音状态参数,语音状态参数可以包括音量、频率、振幅、语速以及语调。Specifically, the screening configuration item information in this embodiment includes not only screening keywords used for screening, but also voice state parameters used to assist in analyzing the emotional category of the segmented audio. The voice state parameters may include volume, frequency, Amplitude, speed of speech, and intonation.
所述挑选出所有的语音状态参数均在预设标准区间的分段音频连同与该分段音频相关联的字幕块一起存储到指定的第二位置的步骤之前,还包括:预先设置每一个语音状态参数的标准区间。Before the step of selecting the segmented audio of all the voice state parameters in the preset standard interval and storing the subtitle block associated with the segmented audio in the designated second position, it also includes: presetting each voice The standard interval of the state parameter.
具体的,预先设置每一个语音状态参数的标准区间的步骤包括:获取标记有目标情绪类别的语料样本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;从所述范围内提取一个包含于所述范围内的区间作为预设标准区间。其中,语料样本可以由人工采集好自己认为符合自己想要的某一类情绪的样本,也可以为获取到的已有的由其他方式收集的样本;包含于所述范围内的区间可以为与所述范围相同的区间也可以为所述范围以内的区间,例如所述范围为50~70,包含于所述范围内的区间可以为50~70,也可以为50~60、55~65、60~70等等。Specifically, the step of presetting the standard interval of each voice state parameter includes: obtaining a corpus sample marked with the target emotion category for statistical analysis, and obtaining that the probability of each voice state parameter under the target emotion category is greater than a preset value The range of the speech state parameter; an interval included in the range is extracted from the range as a preset standard interval. Among them, the corpus sample can be manually collected samples that you think meet a certain type of emotion you want, or it can be an existing sample collected by other methods; the interval included in the range can be The interval with the same range may also be an interval within the range, for example, the range is 50 to 70, and the interval included in the range may be 50 to 70, or 50 to 60, 55 to 65, 60 to 70 and so on.
更具体的,在本实施例中,例如说关于频率这个语音状态参数我们找到一个标记有目标情绪类别(例如愤怒)的语料样本库,测试每一个语料样本的频率值,并画出频率的概率正态分布图,发现所有的语料样本中频率在范围50~70Hz内的样本在所有的语料样本所占的概率均大于预设值(例如97%),就可以得到目标情绪类别下频率所占的概率均大于预设值的语音状态参数的范围,同样的,采用该方法可以得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;可以将该范围区间作为预设标准区间,也可以在区间50~70Hz内挑选一个小区间例如50~60Hz、55~65Hz、60~70Hz作为预设标准区间,其他的语音状态参数例如音量、振幅、语速以及语调同样处理。More specifically, in this embodiment, for example, regarding the voice state parameter of frequency, we find a corpus sample library marked with the target emotion category (such as anger), test the frequency value of each corpus sample, and plot the frequency probability Normal distribution graph, it is found that the probability of all corpus samples whose frequencies are in the range of 50~70Hz in all corpus samples is greater than the preset value (for example, 97%), and the frequency of the target emotion category can be obtained The probability of each is greater than the range of the preset value of the voice state parameter. Similarly, the method can be used to obtain that the probability of each voice state parameter under the target emotion category is greater than the range of the preset value of the voice state parameter; The range interval is used as the preset standard interval. You can also select a cell in the interval 50~70Hz, such as 50~60Hz, 55~65Hz, 60~70Hz as the preset standard interval, and other speech state parameters such as volume, amplitude, speech rate And intonation.
所有的语音状态参数均在预设标准区间的分段音频,即分段音频的五个语音状态参数均在各自对应的预设标准区间。All the voice state parameters are in the segmented audio of the preset standard interval, that is, the five voice state parameters of the segmented audio are in their respective preset standard intervals.
请参阅图3,图3为图1中步骤S106的第三种具体实施方式的流程图;Please refer to FIG. 3, which is a flowchart of a third specific implementation manner of step S106 in FIG. 1;
步骤S302将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存 储到指定的第一位置之后,还包括:S405.计算出存储在所述第一位置的各分段音频的每一个语音状态参数的分值。S406.将同一分段音频中所有的语音状态参数的分值进行求和运算,确认总分值是否达到预设的阈值。其中,预设阈值可以由经验或者需求自行设置,例如说80分、90分等。S407.将总分值达到预设阈值的分段音频连同与该分段音频相关联的字幕块一起存储到指定的第三位置。Step S302, after storing the subtitle block containing the matching text together with the segmented audio associated with the subtitle block in the designated first location, further includes: S405. Calculate each subtitle block stored in the first location The score of each voice state parameter of the segmented audio. S406. Perform a summation operation on the scores of all voice state parameters in the same segmented audio, and confirm whether the total score reaches a preset threshold. Among them, the preset threshold can be set by experience or demand, for example, 80 points, 90 points, etc. S407. Store the segmented audio whose total score reaches the preset threshold, together with the subtitle block associated with the segmented audio, to the specified third location.
具体的,本实施例中的筛选配置项信息除了包括用于筛选的筛选关键词之外,还包括用于辅助分析分段音频的情绪的语音状态参数,语音状态参数包括音量、频率、振幅、语速以及语调。Specifically, the screening configuration item information in this embodiment includes not only screening keywords used for screening, but also voice state parameters used to assist in analyzing the emotions of segmented audio. The voice state parameters include volume, frequency, amplitude, Speaking speed and intonation.
接着请参阅图4,图4为图3中步骤S405的一种具体实施方式的流程示意图;更具体的,计算出存储在所述第一位置的各分段音频的每一个语音状态参数的分值的步骤包括:Next, please refer to FIG. 4, which is a schematic flowchart of a specific implementation of step S405 in FIG. 3; more specifically, the score of each voice state parameter of each segment audio stored in the first position is calculated The value steps include:
S501.获取标记有目标情绪类别的语料样本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围。S501. Obtain a corpus sample marked with a target emotion category for statistical analysis, and obtain that the probability of each voice state parameter under the target emotion category is greater than the range of the voice state parameter of the preset value.
其中,语料样本可以由人工采集好自己认为符合自己想要的某一类情绪的样本,也可以为获取到的已有的由其他方式收集的语料样本。在本实施例中,例如说关于频率这个语音状态参数我们找到一个标记有目标情绪类别(例如愤怒)的语料样本库,测试每一个语料样本的频率值,并画出频率的概率正态分布图,发现频率在范围50~70Hz内的样本在所有的语料样本中所占的概率均大于预设值(例如97%),就可以得到目标情绪类别下频率所占的概率均大于预设值的语音状态参数的范围,同样的,采用该方法就可以得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围。Among them, the corpus samples can be manually collected samples that you think meet a certain type of emotion you want, or they can be obtained corpus samples collected by other methods. In this embodiment, for example, regarding the speech state parameter of frequency, we find a corpus sample library marked with the target emotion category (such as anger), test the frequency value of each corpus sample, and draw the probability normal distribution diagram of the frequency , It is found that the probability of samples with a frequency in the range of 50~70Hz in all corpus samples is greater than the preset value (for example, 97%), and the probability of the frequency in the target emotion category is greater than the preset value The range of the speech state parameters, in the same way, the probability of each speech state parameter under the target emotion category can be obtained by using this method to be greater than the preset value of the speech state parameter range.
S502.在该范围内挑选一个数值例如中值作为预设的语音状态参数标准值。S502. Select a value such as the median value in the range as the preset standard value of the voice state parameter.
其中,频率标准值用W 频率表示,W 音量、W 振幅、W 语速和W 语调分别表示其他语音状态参数的预设标准值。 Among them, the frequency standard value is represented by W frequency , and W volume , W amplitude , W speech rate, and W intonation respectively represent the preset standard values of other speech state parameters.
S503.测试所述存储在第一位置的各分段音频的每一个语音状态参数值。S503. Test each voice state parameter value of each segment of audio stored in the first position.
S504.基于预设的语音状态参数标准值、测试的语音状态参数值以及接收到的权重值,按以下公式计算每一个语音状态参数的分值:S504. Based on the preset standard value of the voice state parameter, the tested voice state parameter value and the received weight value, calculate the score of each voice state parameter according to the following formula:
M i=100*S i*(X i/W i);其中,M i为每一个语音状态参数的分值,S i为每一个语音状态参数的权重值,X i为测试的语音状态参数值,W i为预设的语音状态参数标准值, i代表语音状态参数,具体可以为音量、频率、振幅、语速和语调。 M i = 100 * S i * (X i / W i); wherein, M i is the score for each speech state parameter, S i is a weight value for each speech state parameter, X i is a speech state parameters of the test Value, W i is the preset standard value of the voice state parameter, and i represents the voice state parameter, which can specifically be volume, frequency, amplitude, speech rate, and intonation.
具体的,将所述存储在第一位置的各分段音频实际测试出来的语音状态参数具体值X i与预设的语音状态参数标准值W i进行比较,得出的数值称为相似度P i,即P i=X i/W iSpecifically, the audio segment stored in the first position out of the actual test voice state specific parameters X i value with a preset reference value voice state parameter W i are compared, a similarity value, called P obtained i , namely P i =X i /W i ;
例如,将所述存储在第一位置的各分段音频实际测试出来的频率具体值与预设的频率标 准值W 频率进行比较,得出频率相似度P 频率,频率具体值用X 频率表示,X 音量、X 振幅、X 语速、X 语调分别表示其他语音状态参数的具体值,P 音量、P 振幅、P 语速和P 语调分别表示其他语音状态参数的相似度,具体的公式如下: For example, comparing the actual measured frequency value of each segment audio stored in the first position with the preset frequency standard value W frequency to obtain the frequency similarity P frequency , and the frequency specific value is represented by X frequency , X volume , X amplitude , X speech rate , and X intonation respectively represent the specific values of other speech state parameters. P volume , P amplitude , P speech rate and P intonation respectively represent the similarity of other speech state parameters. The specific formula is as follows:
P 音量=X 音量/W 音量,P 频率=X 频率/W 频率,P 振幅=X 振幅/W 振幅,P 语速=X 语速/W 语速P volume = X volume / W volume , P frequency = X frequency / W frequency , P amplitude = X amplitude / W amplitude , P speech rate = X speech rate / W speech rate ,
P 语调=X 语调/W 语调P intonation =X intonation /W intonation .
接收预设的每一个语音状态参数的权重值。权重值用S i表示,每个语音状态参数的权重值分别为S 音量、S 频率、S 振幅、S 语速和S 语调;预先给每一个语音状态参数设置一个权重值,例如人在愤怒的时候,声音明显会响亮很多,因此音量的权重就比较大,可以设为60%。 Receive the preset weight value of each voice state parameter. The weight value is represented by S i . The weight value of each speech state parameter is S volume , S frequency , S amplitude , S speech rate, and S intonation . Set a weight value for each speech state parameter in advance, for example, when a person is angry At that time, the sound is obviously much louder, so the weight of the volume is relatively large, which can be set to 60%.
由M i=100*S i*P i进一步得出公式M i=100*S i*(X i/W i),参考公式M i=100*S i*(X i/W i)计算每一个语音状态参数的分值。具体的,参考如下公式: From M i =100*S i *P i , the formula M i =100*S i *(X i /W i ) is further obtained. Refer to the formula M i =100*S i *(X i /W i ) to calculate each The score of a voice state parameter. Specifically, refer to the following formula:
M 音量=100*S 音量*(X 音量/W 音量),M 频率=100*S 频率*(X 频率/W 频率),M 振幅=100*S 振幅*(X 振幅/W 振幅),M 语速=100*S 语速*(X 语速/W 语速),M 语调=100*S 语调*(X 语调/W 语调) M volume = 100 * S volume * (X volume / W volume ), M frequency = 100 * S frequency * (X frequency / W frequency ), M amplitude = 100 * S amplitude * (X amplitude / W amplitude ), M language Speed =100*S speech speed *(X speech speed /W speech speed ), M intonation =100*S intonation *(X intonation /W intonation )
其中,M 音量、M 频率、M 振幅、M 语速以及M 语调分别表示每一个语音状态参数的分值;S 音量、S 频率、S 振幅、S 语速和S 语调分别表示每一个语音状态参数的权重值; Among them, M volume , M frequency , M amplitude , M speech rate, and M intonation respectively represent the scores of each speech state parameter; S volume , S frequency , S amplitude , S speech rate, and S intonation respectively represent each speech state parameter The weight value;
因此,可以计算出每一个语音状态参数的分值。Therefore, the score of each speech state parameter can be calculated.
反过来继续参阅图3,S406.将同一分段音频中所有的语音状态参数的分值进行求和运算,确认总分值是否达到预设的阈值。具体的,用M表示分段音频的总分值,根据公式M=M 音量+M 频率+M 振幅+M 语速+M 语调得出同一分段音频的总分值。 In turn, continue to refer to FIG. 3, S406. Perform a summation operation on the scores of all the voice state parameters in the same segmented audio to confirm whether the total score reaches the preset threshold. Specifically, M represents the total score of the segmented audio, and the total score of the same segmented audio is obtained according to the formula M=M volume +M frequency +M amplitude +M speech rate +M intonation .
将同一分段音频的总分值与预设的阈值进行比较,确认总分值是否达到预设的阈值。Compare the total score of the same audio segment with the preset threshold to confirm whether the total score reaches the preset threshold.
S407.将总分值达到预设阈值的分段音频连同与该分段音频相关联的字幕块一起存储到指定的第三位置。具体的,如果总分值大于或者等于预设阈值,则将分段音频连同与该分段音频相关联的字幕块一起存储到指定的第三位置。S407. Store the segmented audio whose total score reaches the preset threshold, together with the subtitle block associated with the segmented audio, to the specified third location. Specifically, if the total score is greater than or equal to the preset threshold, the segmented audio is stored in the specified third location together with the subtitle block associated with the segmented audio.
本申请实施例带来的好处是实现了自动快速收集到所需要的符合某一类场景的语料的目的,效率高而且成本低,设置了多种语音状态参数,通过对获取标记有情绪类别的语料样本进行统计分析,得出一个范围,选出范围内的区间作为预设标准区间或者选出范围内的某一个具体值作为预设标准值,对分段音频进行测试、计算打分,挑选出的目标语料的情绪更加符合标准。The advantage brought by the embodiments of the present application is to realize the purpose of automatically and quickly collecting the required corpus that meets a certain type of scene, with high efficiency and low cost, and setting a variety of voice state parameters. Perform statistical analysis on the corpus samples to obtain a range, select the range within the range as the preset standard interval or select a specific value in the range as the preset standard value, test the segmented audio, calculate the score, and select The emotion of the target corpus is more in line with the standard.
本申请实施例提供了一种基于人工智能的语料收集装置,参阅图5,图5为本申请所述基于人工智能的语料收集装置的第一种实施例的结构示意图;An embodiment of the present application provides an artificial intelligence-based corpus collection device. Refer to FIG. 5, which is a schematic structural diagram of a first embodiment of the artificial intelligence-based corpus collection device described in this application;
所述基于人工智能的语料收集装置包括:配置项信息获取模块51、视频数据下载模块52、 音频字幕处理模块53、音频切分模块54、音频字幕块关联模块55和筛选模块56。配置项信息获取模块51,用于获取用户输入的配置项信息;其中,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的网址或视频网站的名称。视频数据下载模块52,用于从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件;音频字幕处理模块53,用于从视频文件中分离出音频文件,并将SRT字幕文件解析出来的字幕文本内容拆分成字幕块;音频切分模块54,用于根据每个字幕块的分段时间切分音频文件,获得分段音频;音频字幕块关联模块55,用于建立分段音频和字幕块之间的关联;筛选模块56,用于对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。The artificial intelligence-based corpus collection device includes: a configuration item information acquisition module 51, a video data download module 52, an audio subtitle processing module 53, an audio segmentation module 54, an audio subtitle block association module 55, and a screening module 56. The configuration item information obtaining module 51 is configured to obtain configuration item information input by the user; wherein the configuration item information includes a target video keyword and a video website, and the video website is the URL of the video website or the name of the video website. The video data download module 52 is used to download the video data of the target video obtained by retrieving the target video keyword from the video website, the video data includes a video file and an SRT subtitle file; the audio subtitle processing module 53 is used to Separate the audio file from the video file, and split the subtitle text content parsed from the SRT subtitle file into subtitle blocks; the audio segmentation module 54 is used to segment the audio file according to the segmentation time of each subtitle block to obtain Segment audio; audio subtitle block association module 55, used to establish the association between segmented audio and subtitle blocks; filtering module 56 used to classify and filter the associated segmented audio and subtitle blocks according to preset filtering keywords Stored together as the target corpus.
参阅图6,图6为图5中的音频字幕处理模块的结构示意图;具体的,在本申请实施例中,音频字幕处理模块53包括:字幕拆分模块531,用于解析SRT字幕文件得到字幕文本内容;结合播放时间和断句符对字幕文本内容进行分块,得到字幕块;音频视频分离模块532,用于从视频文件中分离出音频文件。Referring to FIG. 6, FIG. 6 is a schematic structural diagram of the audio subtitle processing module in FIG. 5; specifically, in an embodiment of the present application, the audio subtitle processing module 53 includes: a subtitle splitting module 531, configured to parse SRT subtitle files to obtain subtitles Text content; combine the playback time and sentence breaks to divide the subtitle text content into blocks to obtain subtitle blocks; the audio and video separation module 532 is used to separate audio files from the video files.
参阅图7,图7为图5中的筛选模块的第一实施例的结构示意图;具体的,在一些实施例中,筛选模块56包括:关键词匹配模块561,用于分析每个字幕块是否包含有与预设筛选关键词相匹配的文本;第一存储模块562,用于将包含有相匹配的文本的字幕块连同与改字幕块相关联的分段音频一起存储到指定的第一位置。Refer to FIG. 7, which is a schematic structural diagram of the first embodiment of the screening module in FIG. 5; specifically, in some embodiments, the screening module 56 includes: a keyword matching module 561 for analyzing whether each subtitle block Contains text matching the preset filtering keywords; the first storage module 562 is used to store the subtitle block containing the matching text together with the segmented audio associated with the changed subtitle block in the specified first location .
进一步的,在另一些实施例中,筛选模块56除了包括关键词匹配模块561和第一存储模块562之外还包括:语音状态参数判断模块563,用于判断所述存储在第一位置的分段音频的每一个语音状态参数是否在预设的标准区间;其中,所述语音状态参数包含于预设的筛选配置项信息,用于辅助分析分段音频的情绪;第二存储模块564,用于挑选出所有的语音状态参数均在预设的标准区间的分段音频连同与该分段音频相关联的字幕块一起存储到指定的第二位置。Further, in other embodiments, the screening module 56 includes, in addition to the keyword matching module 561 and the first storage module 562, a voice state parameter judgment module 563 for judging the score stored in the first location. Whether each voice state parameter of a segment of audio is within a preset standard interval; wherein, the voice state parameter is included in the preset filtering configuration item information to assist in analyzing the emotion of the segmented audio; the second storage module 564 uses After selecting the segmented audio with all the speech state parameters in the preset standard interval, the subtitle block associated with the segmented audio is stored in the designated second location.
参阅图8,图8为图5中的筛选模块的第二实施例的结构示意图;具体的,在另一些实施例中,筛选模块56除了包括关键词匹配模块561和第一存储模块562之外还包括:语音状态参数分值计算模块565,用于计算出存储在所述第一位置的各分段音频的每个语音状态参数的分值;总分值计算判断模块566,用于将同一分段音频中所有的语音状态参数的分值进行求和运算,并确认总分值是否达到预设的阈值;第三存储模块567,用于将总分值达到预设阈值的分段音频连同与该分段音频相关联的字幕块一起存储到指定的第三位置。Referring to FIG. 8, FIG. 8 is a schematic structural diagram of the second embodiment of the screening module in FIG. 5; specifically, in other embodiments, the screening module 56 includes a keyword matching module 561 and a first storage module 562. It also includes: a voice state parameter score calculation module 565, configured to calculate the score of each voice state parameter of each segment of audio stored in the first position; a total score calculation and judgment module 566, configured to combine the same The scores of all speech state parameters in the segmented audio are summed, and it is confirmed whether the total score reaches the preset threshold; the third storage module 567 is used to combine the segmented audio with the total score reached the preset threshold with The subtitle block associated with the segmented audio is stored together in the specified third location.
参阅图9,图9为图8中的语音状态参数分值计算模块的结构示意图;具体的,语音状态参数分值计算模块565包括:范围分析模块5651,用于获取标记有目标情绪类别的语料样 本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;其中,语料样本可以由人工采集好自己认为符合自己想要的某一类目标情绪类别的样本,也可以为获取到的已有的由其他方式收集的样本。在本实施例中,例如说关于频率这个语音状态参数我们找到一个标记有目标情绪类别(例如愤怒)的语料样本库,测试每一个语料样本的频率值,并画出频率的概率正态分布图,发现频率在范围50~70Hz内的样本在所有的语料样本中所占的概率均大于预设值(例如97%),就可以得到目标情绪类别下频率所占的概率均大于预设值的语音状态参数的范围,同样的,采用该方法就可以得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围。标准值设置模块5652,用于在所述范围内挑选一个数值作为预设的语音状态参数标准值;其中,频率标准值用W 频率表示,W 音量、W 振幅、W 语速和W 语调分别表示其他语音状态参数的预设标准值,作为预设标准值所挑选的数值可以为中值或者范围内任何一个数值,例如,在本实施例中,挑选频率中值60Hz作为频率的预设标准值。测试值模块5653,用于测试所述存储在第一位置的各分段音频的每一个语音状态参数值。分值计算模块5654,用于基于预设的语音状态参数标准值、和测试的语音状态参数值以及接收到的权重值,按以下公式计算每一个语音状态参数的分值: Referring to Figure 9, Figure 9 is a schematic structural diagram of the voice state parameter score calculation module in Figure 8; specifically, the voice state parameter score calculation module 565 includes: a range analysis module 5651 for obtaining corpus marked with target emotion categories The samples are analyzed statistically, and the probability of each speech state parameter under the target emotion category is greater than the preset value of the speech state parameter range; among them, the corpus sample can be manually collected by the person who thinks it meets a certain category that you want The samples of the target emotion category may also be obtained samples collected by other methods. In this embodiment, for example, regarding the speech state parameter of frequency, we find a corpus sample library marked with the target emotion category (such as anger), test the frequency value of each corpus sample, and draw the probability normal distribution diagram of the frequency , It is found that the probability of samples with a frequency in the range of 50~70Hz in all corpus samples is greater than the preset value (for example, 97%), you can get the probability that the frequency of the target emotion category is greater than the preset value The range of the speech state parameters, in the same way, the probability of each speech state parameter under the target emotion category can be obtained by using this method to be greater than the preset value of the speech state parameter range. The standard value setting module 5652 is used to select a value within the range as the preset standard value of the speech state parameter; among them, the frequency standard value is represented by W frequency , W volume , W amplitude , W speech speed and W intonation respectively The preset standard value of other speech state parameters, the value selected as the preset standard value can be a median value or any value within the range, for example, in this embodiment, the median frequency of 60 Hz is selected as the preset standard value of frequency . The test value module 5653 is used to test each voice state parameter value of each segment audio stored in the first position. The score calculation module 5654 is used to calculate the score of each voice state parameter based on the preset standard value of the voice state parameter, the tested voice state parameter value and the received weight value according to the following formula:
M i=100*S i*(X i/W i);其中,M i为每一个语音状态参数的分值,S i为每一个语音状态参数的权重值,X i为测试的语音状态参数值,W i为预设的语音状态参数标准值, i代表语音状态参数,具体可以为音量、频率、振幅、语速和语调。 M i = 100 * S i * (X i / W i); wherein, M i is the score for each speech state parameter, S i is a weight value for each speech state parameter, X i is a speech state parameters of the test Value, W i is the preset standard value of the voice state parameter, and i represents the voice state parameter, which can specifically be volume, frequency, amplitude, speech rate, and intonation.
本申请的一实施例公开了一种计算机设备。具体请参阅图10,为本申请的一实施例中计算机设备100基本结构框图。如图10中所示意的,所述计算机设备100包括通过系统总线相互通信连接存储器101、处理器102、网络接口103。需要指出的是,图10中仅示出了具有组件101-103的计算机设备100,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。An embodiment of the application discloses a computer device. For details, please refer to FIG. 10, which is a block diagram of the basic structure of the computer device 100 in an embodiment of the application. As shown in FIG. 10, the computer device 100 includes a memory 101, a processor 102, and a network interface 103 that are communicatively connected to each other through a system bus. It should be pointed out that FIG. 10 only shows the computer device 100 with components 101-103, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions. Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
所述存储器101至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、 多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器101可以是所述计算机设备100的内部存储单元,例如该计算机设备100的硬盘或内存。在另一些实施例中,所述存储器101也可以是所述计算机设备100的外部存储设备,例如该计算机设备100上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器101还可以既包括所述计算机设备100的内部存储单元也包括其外部存储设备。本实施例中,所述存储器101通常用于存储安装于所述计算机设备100的操作系统和各类应用软件,例如上述基于人工智能的语料收集方法等。此外,所述存储器101还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 101 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 101 may be an internal storage unit of the computer device 100, such as a hard disk or memory of the computer device 100. In other embodiments, the memory 101 may also be an external storage device of the computer device 100, for example, a plug-in hard disk, a smart media card (SMC), and a secure digital device equipped on the computer device 100. (Secure Digital, SD) card, flash card (Flash Card), etc. Of course, the memory 101 may also include both an internal storage unit of the computer device 100 and an external storage device thereof. In this embodiment, the memory 101 is generally used to store an operating system and various application software installed in the computer device 100, such as the aforementioned artificial intelligence-based corpus collection method. In addition, the memory 101 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器102在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器102通常用于控制所述计算机设备100的总体操作。本实施例中,所述处理器102用于运行所述存储器101中存储的计算机可读指令或者处理数据,例如运行上述基于人工智能的语料收集方法的计算机可读指令。In some embodiments, the processor 102 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 102 is generally used to control the overall operation of the computer device 100. In this embodiment, the processor 102 is configured to run computer-readable instructions or process data stored in the memory 101, for example, run the computer-readable instructions of the aforementioned artificial intelligence-based corpus collection method.
所述网络接口103可包括无线网络接口或有线网络接口,该网络接口103通常用于在所述计算机设备100与其他电子设备之间建立通信连接。The network interface 103 may include a wireless network interface or a wired network interface, and the network interface 103 is generally used to establish a communication connection between the computer device 100 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种非易失性可读存储介质,所述非易失性可读存储介质存储有单据信息录入流程,所述单据信息录入流程可被至少一个处理器执行,以使所述至少一个处理器执行上述任意一种基于人工智能的语料收集方法的步骤。This application also provides another implementation manner, that is, to provide a non-volatile readable storage medium, the non-volatile readable storage medium stores a document information entry process, and the document information entry process can be at least One processor executes, so that the at least one processor executes the steps of any of the foregoing artificial intelligence-based corpus collection methods.
最后应说明的是,显然以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Finally, it should be noted that, obviously, the embodiments described above are only a part of the embodiments of this application, not all of them. The drawings show the preferred embodiments of this application, but do not limit the patents of this application. range. This application can be implemented in many different forms. On the contrary, the purpose of providing these examples is to make the understanding of the disclosure of this application more thorough and comprehensive. Although the application has been described in detail with reference to the foregoing embodiments, for those skilled in the art, it is still possible for those skilled in the art to modify the technical solutions described in the foregoing specific embodiments, or equivalently replace some of the technical features. . All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims (21)

  1. 一种基于人工智能的语料收集方法,其特征在于,包括:A corpus collection method based on artificial intelligence, characterized in that it includes:
    获取用户输入的配置项信息,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的网址或视频网站的名称;Acquiring configuration item information input by the user, where the configuration item information includes a target video keyword and a video website, and the video website is the URL of the video website or the name of the video website;
    从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件;Downloading from the video website the video data of the target video obtained by retrieving the target video keyword, the video data including a video file and an SRT subtitle file;
    从所述视频文件中分离出音频文件,并将所述SRT字幕文件解析出来的字幕文本内容拆分成字幕块;Separating an audio file from the video file, and splitting the subtitle text content parsed from the SRT subtitle file into subtitle blocks;
    根据每个所述字幕块的分段时间切分所述音频文件,获得分段音频;Segment the audio file according to the segment time of each subtitle block to obtain segmented audio;
    建立所述分段音频和所述字幕块之间的关联;Establishing an association between the segmented audio and the subtitle block;
    对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。The associated segmented audio and subtitle blocks are classified and filtered according to preset filtering keywords, and then stored together as the target corpus.
  2. 根据权利要求1所述的基于人工智能的语料收集方法,其特征在于,所述对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料的步骤具体包括:The artificial intelligence-based corpus collection method according to claim 1, wherein the step of classifying and filtering the associated segmented audio and subtitle blocks according to preset filtering keywords and storing them as the target corpus specifically includes :
    分析每个字幕块是否包含有与预设筛选关键词相匹配的文本;Analyze whether each subtitle block contains text that matches the preset filtering keywords;
    将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置。The subtitle block containing the matching text is stored in the designated first location along with the segmented audio associated with the subtitle block.
  3. 根据权利要求2所述的基于人工智能的语料收集方法,其特征在于,所述将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置的步骤之后,还包括:The artificial intelligence-based corpus collection method according to claim 2, wherein the subtitle block containing the matching text is stored in the designated first subtitle block together with the segmented audio associated with the subtitle block. After the location steps, it also includes:
    判断存储在第一位置的各分段音频的每一个语音状态参数是否在预设的标准区间;Determine whether each voice state parameter of each segmented audio stored in the first position is within a preset standard interval;
    挑选出所有的语音状态参数均在预设标准区间的分段音频连同与所述分段音频相关联的字幕块一起存储到指定的第二位置。The selected segmented audio with all the voice state parameters in the preset standard interval is stored in the designated second location together with the subtitle block associated with the segmented audio.
  4. 根据权利要求3所述的基于人工智能的语料收集方法,其特征在于,所述预设标准区间的设置方法,具体包括:The artificial intelligence-based corpus collection method of claim 3, wherein the setting method of the preset standard interval specifically includes:
    获取标记有目标情绪类别的语料样本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;Obtain corpus samples marked with the target emotion category for statistical analysis, and obtain that the probability of each speech state parameter under the target emotion category is greater than the range of the speech state parameter of the preset value;
    从所述范围内提取一个包含于所述参考范围内的区间作为预设标准区间。An interval included in the reference range is extracted from the range as a preset standard interval.
  5. 根据权利要求2所述的基于人工智能的语料收集方法,其特征在于,所述将包含有相 匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置的步骤之后还包括:The artificial intelligence-based corpus collection method according to claim 2, wherein the subtitle block containing the matching text is stored in the designated first subtitle block together with the segmented audio associated with the subtitle block. After the location steps, include:
    计算出存储在所述第一位置的各分段音频的每一个语音状态参数的分值;Calculating the score of each voice state parameter of each segmented audio stored in the first position;
    将同一分段音频中所有的语音状态参数的分值进行求和运算,确认总分值是否达到预设的阈值;Sum the scores of all voice state parameters in the same segmented audio to confirm whether the total score reaches the preset threshold;
    将总分值达到预设阈值的分段音频连同与所述分段音频相关联的字幕块一起存储到指定的第三位置。The segmented audio whose total score reaches the preset threshold is stored together with the subtitle block associated with the segmented audio to the specified third location.
  6. 根据权利要求5所述的基于人工智能的语料收集方法,其特征在于,所述计算出存储在所述第一位置的各分段音频的每一个语音状态参数的分值的步骤具体包括:The artificial intelligence-based corpus collection method according to claim 5, wherein the step of calculating the score of each voice state parameter of each segmented audio stored in the first position specifically comprises:
    获取标记有目标情绪类别的语料样本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;Obtain corpus samples marked with the target emotion category for statistical analysis, and obtain that the probability of each speech state parameter under the target emotion category is greater than the range of the speech state parameter of the preset value;
    在所述范围内挑选一个数值作为预设的语音状态参数标准值;Select a value within the range as the preset standard value of the voice state parameter;
    测试所述存储在第一位置的各分段音频的每一个语音状态参数值;Testing each voice state parameter value of each audio segment stored in the first position;
    基于预设的语音状态参数标准值、测试的语音状态参数值以及接收到的权重值,按以下公式计算每一个语音状态参数的分值:Based on the preset standard value of the voice state parameter, the tested voice state parameter value and the received weight value, the score of each voice state parameter is calculated according to the following formula:
    M i=100*S i*(X i/W i);其中,M i为每一个语音状态参数的分值,S i为每一个语音状态参数的权重值,X i为测试的语音状态参数值,W i为预设的语音状态参数标准值, i代表语音状态参数。 M i = 100 * S i * (X i / W i); wherein, M i is the score for each speech state parameter, S i is a weight value for each speech state parameter, X i is a speech state parameters of the test Value, W i is the preset standard value of the voice state parameter, and i represents the voice state parameter.
  7. 根据权利要求4或6所述的基于人工智能的语料收集方法,其特征在于,所述获取标记有目标情绪类别的语料样本进行统计分析的步骤包括:The artificial intelligence-based corpus collection method according to claim 4 or 6, wherein the step of obtaining a corpus sample marked with a target emotion category for statistical analysis comprises:
    获取标记有目标情绪类别的语料样本库;Obtain a corpus sample library marked with the target emotion category;
    测试所述语料样本库中每一个语料样本的频率值,以获取关于所述频率值的概率正太分布图;Test the frequency value of each corpus sample in the corpus sample library to obtain a probability normal distribution map of the frequency value;
    基于所述概率正太分布图进行统计分析。Perform statistical analysis based on the probability normal distribution map.
  8. 根据权利要求1至7任一项所述的基于人工智能的语料收集方法,其特征在于,所述将SRT字幕文件解析出来的字幕文本内容拆分成字幕块的步骤具体包括:The artificial intelligence-based corpus collection method according to any one of claims 1 to 7, wherein the step of splitting the subtitle text content parsed from the SRT subtitle file into subtitle blocks specifically comprises:
    解析SRT字幕文件得到字幕文本内容;Parse the SRT subtitle file to get the subtitle text content;
    结合播放时间和断句符对字幕文本内容进行分块,得到字幕块。The subtitle text content is divided into blocks by combining the playing time and sentence breaks to obtain subtitle blocks.
  9. 一种基于人工智能的语料收集装置,其特征在于,包括:A corpus collection device based on artificial intelligence, characterized in that it comprises:
    配置项信息获取模块,用于获取用户输入的配置项信息,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的网址或视频网站的名称;The configuration item information obtaining module is used to obtain the configuration item information input by the user, the configuration item information includes the target video keyword and the video website, and the video website is the URL of the video website or the name of the video website;
    视频数据下载模块,用于从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件;A video data download module, configured to download video data of the target video obtained by retrieving the target video keyword from the video website, the video data including a video file and an SRT subtitle file;
    音频字幕处理模块,用于从视频文件中分离出音频文件,并将SRT字幕文件解析出来的字幕文本内容拆分成字幕块;Audio subtitle processing module, used to separate audio files from video files, and split subtitle text content parsed from SRT subtitle files into subtitle blocks;
    音频切分模块,用于根据每个字幕块的分段时间切分音频文件,获得分段音频;The audio segmentation module is used to segment the audio file according to the segmentation time of each subtitle block to obtain segmented audio;
    音频字幕块关联模块,用于建立分段音频和字幕块之间的关联;Audio subtitle block association module, used to establish the association between segmented audio and subtitle blocks;
    筛选模块,用于对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。The screening module is used to classify and screen the associated segmented audio and subtitle blocks according to preset screening keywords and store them together as a target corpus.
  10. 根据权利要求9所述的基于人工智能的语料收集装置,其特征在于,所述音频字幕处理模块包括:The artificial intelligence-based corpus collection device according to claim 9, wherein the audio subtitle processing module comprises:
    字幕拆分模块,用于解析SRT字幕文件得到字幕文本内容;结合播放时间和断句符对字幕文本内容进行分块,得到字幕块;音频视频分离模块532,用于从视频文件中分离出音频文件。The subtitle splitting module is used to parse the SRT subtitle file to obtain the subtitle text content; combine the playback time and the hyphen to block the subtitle text content to obtain the subtitle block; the audio and video separation module 532 is used to separate the audio file from the video file .
  11. 一种计算机设备,包括存储器和处理器,其特征在于,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时,实现如下基于人工智能的语料收集方法的步骤:A computer device comprising a memory and a processor, wherein the memory stores computer readable instructions, and when the processor executes the computer readable instructions, the following steps of the artificial intelligence-based corpus collection method are implemented :
    获取用户输入的配置项信息,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的网址或视频网站的名称;Acquiring configuration item information input by the user, where the configuration item information includes a target video keyword and a video website, and the video website is the URL of the video website or the name of the video website;
    从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件;Downloading from the video website the video data of the target video obtained by retrieving the target video keyword, the video data including a video file and an SRT subtitle file;
    从所述视频文件中分离出音频文件,并将所述SRT字幕文件解析出来的字幕文本内容拆分成字幕块;Separating an audio file from the video file, and splitting the subtitle text content parsed from the SRT subtitle file into subtitle blocks;
    根据每个所述字幕块的分段时间切分所述音频文件,获得分段音频;Segment the audio file according to the segment time of each subtitle block to obtain segmented audio;
    建立所述分段音频和所述字幕块之间的关联;Establishing an association between the segmented audio and the subtitle block;
    对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。The associated segmented audio and subtitle blocks are classified and filtered according to preset filtering keywords, and then stored together as the target corpus.
  12. 根据权利要求11所述的计算机设备,其特征在于,所述对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料的步骤具体包括:The computer device according to claim 11, wherein the step of classifying and filtering the associated segmented audio and subtitle blocks according to preset filtering keywords and storing them as the target corpus together specifically comprises:
    分析每个字幕块是否包含有与预设筛选关键词相匹配的文本;Analyze whether each subtitle block contains text that matches the preset filtering keywords;
    将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置。The subtitle block containing the matching text is stored in the designated first location along with the segmented audio associated with the subtitle block.
  13. 根据权利要求12所述的计算机设备,其特征在于,所述将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置的步骤之后,还包括:The computer device according to claim 12, wherein after the step of storing the subtitle block containing the matching text together with the segmented audio associated with the subtitle block in the designated first location, Also includes:
    判断存储在第一位置的各分段音频的每一个语音状态参数是否在预设的标准区间;Determine whether each voice state parameter of each segmented audio stored in the first position is within a preset standard interval;
    挑选出所有的语音状态参数均在预设标准区间的分段音频连同与所述分段音频相关联的字幕块一起存储到指定的第二位置。The selected segmented audio with all the voice state parameters in the preset standard interval is stored in the designated second location together with the subtitle block associated with the segmented audio.
  14. 根据权利要求13所述的计算机设备,其特征在于,所述预设标准区间的设置方法,具体包括:获取标记有目标情绪类别的语料样本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;The computer device according to claim 13, wherein the method for setting the preset standard interval specifically comprises: obtaining a corpus sample marked with a target emotion category for statistical analysis to obtain each speech state parameter under the target emotion category The occupied probability is greater than the range of the preset value of the voice state parameter;
    从所述范围内提取一个包含于所述参考范围内的区间作为预设标准区间。An interval included in the reference range is extracted from the range as a preset standard interval.
  15. 根据权利要求12所述的计算机设备,其特征在于,所述将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置的步骤之后还包括:The computer device according to claim 12, wherein the step of storing the subtitle block containing the matching text together with the segmented audio associated with the subtitle block in the specified first location is further followed by include:
    计算出存储在所述第一位置的各分段音频的每一个语音状态参数的分值;Calculating the score of each voice state parameter of each segmented audio stored in the first position;
    将同一分段音频中所有的语音状态参数的分值进行求和运算,确认总分值是否达到预设的阈值;Sum the scores of all voice state parameters in the same segmented audio to confirm whether the total score reaches the preset threshold;
    将总分值达到预设阈值的分段音频连同与所述分段音频相关联的字幕块一起存储到指定的第三位置。The segmented audio whose total score reaches the preset threshold is stored together with the subtitle block associated with the segmented audio to the specified third location.
  16. 一个或多个非易失性可读存储介质,其特征在于,所述非易失性可读存储介质上存储有计算机可读指令,所述计算机可读指令被一种处理器执行时,实现如下基于人工智能的语料收集方法的步骤:获取用户输入的配置项信息,所述配置项信息包括目标视频关键字和视频网站,所述视频网站为视频网站的网址或视频网站的名称;One or more non-volatile readable storage media, wherein the non-volatile readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by a processor, The steps of the artificial intelligence-based corpus collection method are as follows: obtaining configuration item information input by the user, the configuration item information including target video keywords and video website, and the video website is the URL of the video website or the name of the video website;
    从所述视频网站下载通过检索所述目标视频关键字得到的目标视频的视频数据,所述视频数据包括视频文件和SRT字幕文件;Downloading from the video website the video data of the target video obtained by retrieving the target video keyword, the video data including a video file and an SRT subtitle file;
    从所述视频文件中分离出音频文件,并将所述SRT字幕文件解析出来的字幕文本内容拆分成字幕块;Separating an audio file from the video file, and splitting the subtitle text content parsed from the SRT subtitle file into subtitle blocks;
    根据每个所述字幕块的分段时间切分所述音频文件,获得分段音频;Segment the audio file according to the segment time of each subtitle block to obtain segmented audio;
    建立所述分段音频和所述字幕块之间的关联;Establishing an association between the segmented audio and the subtitle block;
    对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料。The associated segmented audio and subtitle blocks are classified and filtered according to preset filtering keywords, and then stored together as the target corpus.
  17. 根据权利要求16所述的非易失性可读存储介质,其特征在于,所述对关联后的分段音频和字幕块按照预设筛选关键词进行分类筛选后一起存储为目标语料的步骤具体包括:The non-volatile readable storage medium according to claim 16, wherein the step of classifying and filtering the associated segmented audio and subtitle blocks according to preset filtering keywords and storing them together as the target corpus is specifically include:
    分析每个字幕块是否包含有与预设筛选关键词相匹配的文本;Analyze whether each subtitle block contains text that matches the preset filtering keywords;
    将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置。The subtitle block containing the matching text is stored in the designated first location along with the segmented audio associated with the subtitle block.
  18. 根据权利要求17所述的非易失性可读存储介质,其特征在于,所述将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置的步骤之后,还包括:The non-volatile readable storage medium according to claim 17, wherein the subtitle block containing the matching text is stored in the designated first subtitle block together with the segmented audio associated with the subtitle block After the steps of a position, it also includes:
    判断存储在第一位置的各分段音频的每一个语音状态参数是否在预设的标准区间;Determine whether each voice state parameter of each segmented audio stored in the first position is within a preset standard interval;
    挑选出所有的语音状态参数均在预设标准区间的分段音频连同与所述分段音频相关联的字幕块一起存储到指定的第二位置。The selected segmented audio with all the voice state parameters in the preset standard interval is stored in the designated second location together with the subtitle block associated with the segmented audio.
  19. 根据权利要求18所述的非易失性可读存储介质,其特征在于,所述预设标准区间的设置方法,具体包括:获取标记有目标情绪类别的语料样本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;The non-volatile readable storage medium according to claim 18, wherein the method for setting the preset standard interval specifically comprises: obtaining a corpus sample marked with a target emotion category for statistical analysis to obtain the target emotion category The probability occupied by each voice state parameter is greater than the preset value of the voice state parameter range;
    从所述范围内提取一个包含于所述参考范围内的区间作为预设标准区间。An interval included in the reference range is extracted from the range as a preset standard interval.
  20. 根据权利要求17所述的非易失性可读存储介质,其特征在于,所述将包含有相匹配的文本的字幕块连同与所述字幕块相关联的分段音频一起存储到指定的第一位置的步骤之后还包括:计算出存储在所述第一位置的各分段音频的每一个语音状态参数的分值;The non-volatile readable storage medium according to claim 17, wherein the subtitle block containing the matching text is stored in the designated first subtitle block together with the segmented audio associated with the subtitle block After the step of a position, it further includes: calculating the score of each voice state parameter of each segment of audio stored in the first position;
    将同一分段音频中所有的语音状态参数的分值进行求和运算,确认总分值是否达到预设的阈值;Sum the scores of all voice state parameters in the same segmented audio to confirm whether the total score reaches the preset threshold;
    将总分值达到预设阈值的分段音频连同与所述分段音频相关联的字幕块一起存储到指定的第三位置。The segmented audio whose total score reaches the preset threshold is stored together with the subtitle block associated with the segmented audio to the specified third location.
  21. [根据细则26改正25.11.2019]
    根据权利要求19所述的非易失性可读存储介质,其特征在于,所述计算出存储在所述第一位置的各分段音频的每一个语音状态参数的分值的步骤具体包括:
    获取标记有目标情绪类别的语料样本进行统计分析,得到目标情绪类别下每一个语音状态参数所占的概率均大于预设值的语音状态参数的范围;
    在所述范围内挑选一个数值作为预设的语音状态参数标准值;
    测试所述存储在第一位置的各分段音频的每一个语音状态参数值;
    基于预设的语音状态参数标准值、测试的语音状态参数值以及接收到的权重值,按以下公式计算每一个语音状态参数的分值:
    M i=100*S i*(X i/W i);其中,M i为每一个语音状态参数的分值,S i为每一个语音状态参数的权重值,X i为测试的语音状态参数值,W i为预设的语音状态参数标准值, i代表语音状态参数。
    [Corrected according to Rule 26 on 25.11.2019]
    The non-volatile readable storage medium according to claim 19, wherein the step of calculating the score of each voice state parameter of each segmented audio stored in the first position specifically comprises:
    Obtain corpus samples marked with the target emotion category for statistical analysis, and obtain that the probability of each speech state parameter under the target emotion category is greater than the range of the speech state parameter of the preset value;
    Select a value within the range as the preset standard value of the voice state parameter;
    Testing each voice state parameter value of each audio segment stored in the first position;
    Based on the preset standard value of the voice state parameter, the tested voice state parameter value and the received weight value, the score of each voice state parameter is calculated according to the following formula:
    M i = 100 * S i * (X i / W i); wherein, M i is the score for each speech state parameter, S i is a weight value for each speech state parameter, X i is a speech state parameters of the test Value, W i is the preset standard value of the voice state parameter, and i represents the voice state parameter.
PCT/CN2019/117261 2019-01-28 2019-11-11 Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium WO2020155750A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910081793.7 2019-01-28
CN201910081793.7A CN110008378B (en) 2019-01-28 2019-01-28 Corpus collection method, device, equipment and storage medium based on artificial intelligence

Publications (1)

Publication Number Publication Date
WO2020155750A1 true WO2020155750A1 (en) 2020-08-06

Family

ID=67165610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117261 WO2020155750A1 (en) 2019-01-28 2019-11-11 Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN110008378B (en)
WO (1) WO2020155750A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022228235A1 (en) * 2021-04-29 2022-11-03 华为云计算技术有限公司 Method and apparatus for generating video corpus, and related device

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008378B (en) * 2019-01-28 2024-03-19 平安科技(深圳)有限公司 Corpus collection method, device, equipment and storage medium based on artificial intelligence
CN110427930A (en) * 2019-07-29 2019-11-08 中国工商银行股份有限公司 Multimedia data processing method and device, electronic equipment and readable storage medium storing program for executing
CN112749299A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Method and device for determining video type, electronic equipment and readable storage medium
CN111091811B (en) * 2019-11-22 2022-04-22 珠海格力电器股份有限公司 Method and device for processing voice training data and storage medium
CN111209461A (en) * 2019-12-30 2020-05-29 成都理工大学 Bilingual corpus collection system based on public identification words
CN111445902B (en) * 2020-03-27 2023-05-30 北京字节跳动网络技术有限公司 Data collection method, device, storage medium and electronic equipment
CN111629267B (en) * 2020-04-30 2023-06-09 腾讯科技(深圳)有限公司 Audio labeling method, device, equipment and computer readable storage medium
CN112818680B (en) * 2020-07-10 2023-08-01 腾讯科技(深圳)有限公司 Corpus processing method and device, electronic equipment and computer readable storage medium
CN114996506A (en) * 2022-05-24 2022-09-02 腾讯科技(深圳)有限公司 Corpus generation method and device, electronic equipment and computer-readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070253678A1 (en) * 2006-05-01 2007-11-01 Sarukkai Ramesh R Systems and methods for indexing and searching digital video content
CN103324685A (en) * 2013-06-03 2013-09-25 大连理工大学 Search method for video fragments of Japanese online video corpora
CN104978961A (en) * 2015-05-25 2015-10-14 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN105047203A (en) * 2015-05-25 2015-11-11 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN108268539A (en) * 2016-12-31 2018-07-10 上海交通大学 Video matching system based on text analyzing
CN110008378A (en) * 2019-01-28 2019-07-12 平安科技(深圳)有限公司 Corpus collection method, device, equipment and storage medium based on artificial intelligence

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070061303A1 (en) * 2005-09-14 2007-03-15 Jorey Ramer Mobile search result clustering
CN101021855B (en) * 2006-10-11 2010-04-07 北京新岸线网络技术有限公司 Video searching system based on content
KR102018295B1 (en) * 2017-06-14 2019-09-05 주식회사 핀인사이트 Apparatus, method and computer-readable medium for searching and providing sectional video

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070253678A1 (en) * 2006-05-01 2007-11-01 Sarukkai Ramesh R Systems and methods for indexing and searching digital video content
CN103324685A (en) * 2013-06-03 2013-09-25 大连理工大学 Search method for video fragments of Japanese online video corpora
CN104978961A (en) * 2015-05-25 2015-10-14 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN105047203A (en) * 2015-05-25 2015-11-11 腾讯科技(深圳)有限公司 Audio processing method, device and terminal
CN108268539A (en) * 2016-12-31 2018-07-10 上海交通大学 Video matching system based on text analyzing
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN110008378A (en) * 2019-01-28 2019-07-12 平安科技(深圳)有限公司 Corpus collection method, device, equipment and storage medium based on artificial intelligence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022228235A1 (en) * 2021-04-29 2022-11-03 华为云计算技术有限公司 Method and apparatus for generating video corpus, and related device

Also Published As

Publication number Publication date
CN110008378B (en) 2024-03-19
CN110008378A (en) 2019-07-12

Similar Documents

Publication Publication Date Title
WO2020155750A1 (en) Artificial intelligence-based corpus collecting method, apparatus, device, and storage medium
WO2018072071A1 (en) Knowledge map building system and method
US20190104342A1 (en) Cognitive digital video filtering based on user preferences
CN111814770B (en) Content keyword extraction method of news video, terminal device and medium
WO2021098648A1 (en) Text recommendation method, apparatus and device, and medium
CN111274442B (en) Method for determining video tag, server and storage medium
US11151191B2 (en) Video content segmentation and search
CN105224581B (en) The method and apparatus of picture are presented when playing music
CN109408672B (en) Article generation method, article generation device, server and storage medium
CN107229731B (en) Method and apparatus for classifying data
AU2017216520A1 (en) Common data repository for improving transactional efficiencies of user interactions with a computing device
JP2004078512A (en) Document management method and document management device
CN115982376B (en) Method and device for training model based on text, multimode data and knowledge
CN112929746B (en) Video generation method and device, storage medium and electronic equipment
JP2008282407A (en) Information processing apparatus
CN114297439A (en) Method, system, device and storage medium for determining short video label
US11756301B2 (en) System and method for automatically detecting and marking logical scenes in media content
JP2007012013A (en) Video data management device and method, and program
CN111090813A (en) Content processing method and device and computer readable storage medium
US20200349177A1 (en) Method, system, and computer program product for retrieving relevant documents
JP7395377B2 (en) Content search methods, devices, equipment, and storage media
CN114625918A (en) Video recommendation method, device, equipment, storage medium and program product
CN114254158A (en) Video generation method and device, and neural network training method and device
CN115580758A (en) Video content generation method and device, electronic equipment and storage medium
CN107133644B (en) Digital library's content analysis system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19913960

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19913960

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19913960

Country of ref document: EP

Kind code of ref document: A1