WO2019153406A1 - Audio paragraph recognition method and apparatus - Google Patents

Audio paragraph recognition method and apparatus Download PDF

Info

Publication number
WO2019153406A1
WO2019153406A1 PCT/CN2018/078525 CN2018078525W WO2019153406A1 WO 2019153406 A1 WO2019153406 A1 WO 2019153406A1 CN 2018078525 W CN2018078525 W CN 2018078525W WO 2019153406 A1 WO2019153406 A1 WO 2019153406A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
keyword information
paragraph
mark
keyword
Prior art date
Application number
PCT/CN2018/078525
Other languages
French (fr)
Chinese (zh)
Inventor
陈滢朱
刘善果
刘胜强
Original Assignee
深圳市鹰硕技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市鹰硕技术有限公司 filed Critical 深圳市鹰硕技术有限公司
Publication of WO2019153406A1 publication Critical patent/WO2019153406A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to an audio passage recognition method, apparatus, electronic device, and computer readable storage medium.
  • recording events through electronic device recording has brought great convenience to daily life.
  • audio recording of the teacher's lecture content in the classroom is convenient for the teacher to teach again or the student to review the homework; or, in the meeting, watching live television, etc., using electronic devices to record audio for re-playing or archiving, viewing, etc. of electronic materials.
  • the audio file cannot visually see the passage of the audio content, when the audio file is long or needs to be acquired and processed for a certain paragraph of the audio, it cannot be quickly located to the specified position in the audio, but needs to be manually debugged.
  • the corresponding audio content can be played or recognized.
  • an audio passage recognition method including:
  • the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
  • matching the recorded audio in the pre-stored keyword information base includes:
  • the speech features are matched in the keyword information base based on a maximum likelihood function.
  • the method further includes:
  • the determining whether the keyword information is a valid keyword includes:
  • the optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.
  • the method further includes:
  • Data training is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result.
  • searching for a paragraph mark within a preset audio range of audio corresponding to the keyword information includes:
  • the method further includes:
  • the correction flag is added to the plurality of audio segments identified by the same keyword information.
  • the method further includes:
  • the weight value Q+1 of the keyword information corresponding to the correction identifier After receiving the correction instruction triggered according to the correction identifier, the weight value Q+1 of the keyword information corresponding to the correction identifier;
  • Data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.
  • the method further includes:
  • the identified audio passage corresponding to the correction identifier is cancelled.
  • the paragraph is marked as preset paragraph field information.
  • the method further includes:
  • a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage.
  • the method further includes:
  • the audio clip is completed according to the audio passage.
  • the paragraph mark includes a pre-segment mark and a end-of-segment mark
  • the completing the audio clip according to the audio passage includes:
  • the clip is clipped according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph.
  • an audio passage recognition apparatus comprising:
  • a keyword information matching module configured to match the recorded audio in a pre-stored keyword information base
  • a paragraph mark searching module configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information after matching the corresponding keyword information in the keyword information base;
  • the audio passage recognition module is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio paragraph according to the analysis result.
  • an electronic device comprising:
  • a memory having computer readable instructions stored thereon, the computer readable instructions being executed by the processor to implement the method of any one of claims 1 to 7.
  • a computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor, implements the method of any of the above.
  • the audio passage recognition method in the exemplary embodiment of the present disclosure matches the recorded audio in a pre-stored keyword information library, and after matching the corresponding keyword information, the audio pre-corresponding to the keyword information
  • the audio range is searched for whether there is a paragraph mark, and after the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio paragraph is identified according to the analysis result.
  • the use of keyword information and paragraph mark combined recognition method thus improving the accuracy of audio paragraph recognition; on the other hand, by identifying the audio paragraph information, the audio user can quickly rely on the keyword information Positioning and playing the audio greatly improves the audio usage and enhances the user experience.
  • FIG. 1 illustrates a flow chart of an audio passage recognition method according to an exemplary embodiment of the present disclosure
  • FIG. 2 shows a schematic block diagram of an audio passage recognition device according to an exemplary embodiment of the present disclosure
  • FIG. 3 schematically illustrates a block diagram of an electronic device in accordance with an exemplary embodiment of the present disclosure
  • FIG. 4 schematically illustrates a schematic diagram of a computer readable storage medium in accordance with an exemplary embodiment of the present disclosure.
  • an audio paragraph recognition method is first provided, which can be applied to an electronic device such as a computer; as shown in FIG. 1, the audio passage recognition method may include the following steps:
  • Step S110 Matching the recorded audio in the pre-stored keyword information base
  • Step S120 After matching the corresponding keyword information in the keyword information base, search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information.
  • Step S130 If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
  • the audio passage recognition method in the present exemplary embodiment since the method of combining the recognition using the keyword information and the paragraph mark is used, the accuracy of the audio passage recognition is improved; on the other hand, by recognizing the paragraph information of the audio, The user of the audio can quickly locate and play the audio according to the keyword information, greatly improving the audio use effect and enhancing the user experience.
  • step S110 the recorded audio is matched in the pre-stored keyword information base
  • the recorded audio may be an audio file recorded by the user through the electronic device, including: mp3, wma, and the like.
  • the teaching audio of a lesson recorded by the mobile phone when the user attends the conference, the audio of all the speeches of the conference presenter recorded by the recording pen is used; when the user watches the live television broadcast, the home intelligent audio is recorded.
  • the pre-stored keyword information base may be a keyword information base composed of keyword information selected according to the previously learned speech content, meeting content, etc., or may be based on commonly used time series, order words or other customizable keywords.
  • a keyword information base composed of information and the like. For example: keyword information such as: “morning”, “90s of the last century”, “first chapter”, “first”, “again”, “again”, etc., can also be user-defined keywords, such as in the seventh grade history textbook. Chapter information: "The origin of Chinese civilization", “The emergence of the state and the transformation of society”.
  • the recorded audio can be processed into a sound wave signal by short-time Fourier transform using the characteristics of short time Fourier transform with high temporal resolution.
  • the acoustic signal can be filtered by the auditory filter group, different auditory filter banks are selected according to different audio attributes, the simulated acoustic wave signal is maximized, the ambient noise of the acoustic signal is filtered, and the speech features are extracted.
  • the auditory filter bank includes, but is not limited to, a resonance filter, a Roex function filter, a Gammatone filter, and a Gammachirp filter.
  • the speech features may be matched in the keyword information base based on a maximum likelihood function.
  • the likelihood function for the speech feature parameter ⁇ For the specified keyword information x, the likelihood function for the speech feature parameter ⁇ :
  • x) is equal to the probability that the speech feature parameter ⁇ is relative to the keyword information X.
  • the method further includes: determining whether the keyword information is a valid keyword. If only one keyword information is matched in the recorded audio, determining that the keyword information is valid keyword information; if a plurality of identical keyword information is matched in the recorded audio, each keyword information is A fuzzy matrix equation is established with the time code value of the keyword information; the optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.
  • the optimal solution ⁇ is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information. After the valid keyword information is found, the step of searching for the method of paragraph marking in the preset audio range of the audio corresponding to the valid keyword information is performed.
  • step S120 after matching the corresponding keyword information in the keyword information base, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information;
  • whether to find a paragraph mark in the preset audio range of the audio corresponding to the keyword information specifically includes: searching, in the preset audio range, whether there is a duration greater than a preset duration and a signal strength The sound wave signal less than the preset intensity value, if present, determines that the found paragraph mark is the sound wave signal whose duration is longer than the preset time length and whose signal intensity is less than the preset intensity value. For example:
  • the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database the corresponding keyword "first section” is matched in the keyword information database, and then the audio near the keyword "first section” is pre-prepared.
  • Set the time range (for example, 5s before and after the first section of the keyword) to find whether there is a sound wave signal whose duration is longer than the preset duration and the signal strength is less than the preset intensity value.
  • the preset duration is 2s and the preset intensity is between 2dB
  • the preset intensity is between 2dB
  • the average sound intensity (preset intensity value) is 2dB of the sound wave signal
  • the duration of the sound wave signal is greater than the preset time length of 2s, that is, greater than the word interval time in the normal sentence, that is, there is a significant pause time, then the pause is determined as The paragraph mark corresponding to the keyword "first section", that is, the audio information of the "first section” is recorded from this paragraph mark.
  • searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information further includes: searching for whether there is paragraph field information within the preset audio range. For example:
  • the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database the corresponding keyword "first section” is matched in the keyword information database, and then the audio near the keyword “first section” is pre-prepared. If the paragraph field information "first” is found within the time range (still assumed to be 5s before and after the keyword “first section”), then the paragraph field information "first” can be judged as the paragraph mark corresponding to the keyword "first section". That is, the audio information of the "first section” is recorded from this paragraph mark.
  • unsupervised data training learning is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result.
  • different data training methods can be selected according to different audio contents, and the classroom recording of ancient poetry learning and the reciting analysis database of 300 poems of Tang poetry can be used for data training, and more poems can be used as keyword information to update to keyword information.
  • the library according to the Korean language classroom recording and the standard Korean program database for data training, you can add more Korean-specific grammar keywords such as the modal verbs commonly used at the end of the sentence.
  • the keyword information corresponding to the paragraph identifier is updated to the keyword information base.
  • Step S130 If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
  • the paragraph mark and the keyword information are in a corresponding relationship.
  • the paragraph mark and the keyword information are in a corresponding relationship.
  • the correction identifier is added to the plurality of audio passages identified by the same keyword information, for example:
  • the weight of the keyword information corresponding to the correction identifier is a value Q+1; data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.
  • the weighting value is increased by correcting the triggering condition, thereby realizing the error correction function of the keyword information, which is an active learning update of the keyword information in the keyword information database, compared with the unsupervised keyword information learning, through the keyword information Active learning can achieve a more accurate growth of the keyword information base.
  • the identification of the keyword information in the audio is relocated, and the correction identifier is cancelled.
  • the recognized audio passage uses the corrected keyword information as valid keyword information.
  • a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage.
  • the audio passages can be classified according to different keyword information, layered to generate paragraph catalogs or paragraph index storage, and the corresponding audio passages can be quickly and efficiently found; or the paragraph directory or paragraph index of the corresponding position can be marked on the playback progress bar of the audio file. Information that allows the user to accurately locate the specified audio passage during audio playback.
  • the audio clip is completed according to the audio segment.
  • the clip audio and the keyword information can be stored correspondingly, so that the entire audio file can be quickly indexed, and the user can separately specify the audio of the “first chapter” and the “origin of the Chinese civilization”, etc.
  • the efficient use of audio segments facilitates archival lookup.
  • the paragraph mark includes a pre-segment mark and a end-of-segment mark
  • the completing the audio clip according to the audio passage includes: determining a start point of the paragraph according to the keyword information and the pre-segment mark, according to the keyword information and The end of the paragraph end; the editing is performed according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph.
  • the paragraph field information "first” is the pre-segment mark of the keyword "first section”
  • the paragraph field information "then” is both the pre-segment mark of the keyword "second section” and the keyword “first section”.
  • the end-of-segment mark can be combined with the pre-segment mark and the end-of-segment mark to determine the audio passage of the keyword "first section” to complete the audio clip.
  • paragraph field information "above is” may also be the end-of-segment mark of the keyword "first section”, and may also be used as the end-of-segment mark information of the keyword "first section” to determine the keyword.
  • the audio section of the "first section” completes the audio clip.
  • the audio passage recognition apparatus 200 may include a keyword information matching module 210, a paragraph mark search module 220, and an audio passage recognition module 230. among them:
  • the keyword information matching module 210 is configured to match the recorded audio in the pre-stored keyword information base;
  • the paragraph mark searching module 220 is configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information after matching the corresponding keyword information in the keyword information base;
  • the audio passage recognition module 230 is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio passage according to the analysis result.
  • modules or units of the audio passage recognition apparatus 200 are mentioned in the above detailed description, such division is not mandatory. Indeed, in accordance with embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one of the modules or units described above may be further divided into multiple modules or units.
  • an electronic device capable of implementing the above method is also provided.
  • aspects of the present invention can be implemented as a system, method, or program product. Accordingly, aspects of the present invention may be embodied in the form of a complete hardware embodiment, a complete software embodiment (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein. "Circuit,” “module,” or “system.”
  • FIG. 3 An electronic device 300 in accordance with such an embodiment of the present invention is described below with reference to FIG. 3 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
  • electronic device 300 is embodied in the form of a general purpose computing device.
  • the components of the electronic device 300 may include, but are not limited to, the at least one processing unit 310, the at least one storage unit 320, the bus 330 connecting different system components (including the storage unit 320 and the processing unit 310), and the display unit 340.
  • the storage unit stores program code, which can be executed by the processing unit 310, such that the processing unit 310 performs various exemplary embodiments according to the present invention described in the "Exemplary Method" section of the present specification.
  • the processing unit 310 can perform steps S110 to S130 as shown in FIG. 1.
  • the storage unit 320 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 3201 and/or a cache storage unit 3202, and may further include a read only storage unit (ROM) 3203.
  • RAM random access storage unit
  • ROM read only storage unit
  • the storage unit 320 may also include a program/utility 3204 having a set (at least one) of the program modules 3205, such program modules 3205 including but not limited to: an operating system, one or more applications, other program modules, and program data, Implementations of the network environment may be included in each or some of these examples.
  • Bus 330 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.
  • the electronic device 300 can also communicate with one or more external devices 370 (eg, a keyboard, pointing device, Bluetooth device, etc.), and can also communicate with one or more devices that enable the user to interact with the electronic device 300, and/or with Any device (e.g., router, modem, etc.) that enables the electronic device 300 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 350. Also, electronic device 300 can communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through network adapter 360. As shown, network adapter 360 communicates with other modules of electronic device 300 via bus 330.
  • network adapter 360 communicates with other modules of electronic device 300 via bus 330.
  • the exemplary embodiments described herein may be implemented by software, or may be implemented by software in combination with necessary hardware. Therefore, the technical solution according to an embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network.
  • a non-volatile storage medium which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a number of instructions are included to cause a computing device (which may be a personal computer, server, terminal device, or network device, etc.) to perform a method in accordance with an embodiment of the present disclosure.
  • a computer readable storage medium having stored thereon a program product capable of implementing the above method of the present specification.
  • aspects of the present invention may also be embodied in the form of a program product comprising program code for causing said program product to run on a terminal device The terminal device performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section of the present specification.
  • a program product 400 for implementing the above method which may employ a portable compact disk read only memory (CD-ROM) and includes program code, and may be in a terminal device, is illustrated in accordance with an embodiment of the present invention.
  • CD-ROM portable compact disk read only memory
  • the program product of the present invention is not limited thereto, and in the present document, the readable storage medium may be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus or device.
  • the program product can employ any combination of one or more readable media.
  • the readable medium can be a readable signal medium or a readable storage medium.
  • the readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium can be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++, etc., including conventional procedural Programming language—such as the "C" language or a similar programming language.
  • the program code can execute entirely on the user computing device, partially on the user device, as a stand-alone software package, partially on the remote computing device on the user computing device, or entirely on the remote computing device or server. Execute on.
  • the remote computing device can be connected to the user computing device via any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computing device (eg, provided using an Internet service) Businesses are connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Businesses are connected via the Internet.
  • the use of keyword information and paragraph mark combined recognition method thus improving the accuracy of audio paragraph recognition; on the other hand, by identifying the audio paragraph information, the audio user can quickly rely on the keyword information Positioning and playing the audio greatly improves the audio usage and enhances the user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

An audio paragraph recognition method and apparatus, an electronic device, and a storage medium, relating to the technical field of computers. The method comprises: matching recorded audio in a pre-stored keyword information base (S110); if corresponding keyword information is found by matching in the keyword information base, finding whether a paragraph mark exists in a preset audio range of the audio corresponding to the keyword information (S120); and if a paragraph mark is found, analyzing the keyword information and the paragraph mark, and recognizing an audio paragraph according to the analysis result (S130). The audio paragraph of the recorded audio can be effectively recognized according to the keyword information.

Description

音频段落识别方法以及装置Audio paragraph recognition method and device 技术领域Technical field
本公开涉及计算机技术领域,具体而言,涉及一种音频段落识别方法、装置、电子设备以及计算机可读存储介质。The present disclosure relates to the field of computer technology, and in particular to an audio passage recognition method, apparatus, electronic device, and computer readable storage medium.
背景技术Background technique
目前,通过电子设备录音来记录事件为日常生活带来了极大的便利。例如:对课堂上老师讲课内容进行音频录制,方便老师再次教学或学生复习功课;或者,在会议、观看电视直播等场合,使用电子设备录制音频方便再次播放或电子资料的存档、查阅等等。At present, recording events through electronic device recording has brought great convenience to daily life. For example, audio recording of the teacher's lecture content in the classroom is convenient for the teacher to teach again or the student to review the homework; or, in the meeting, watching live television, etc., using electronic devices to record audio for re-playing or archiving, viewing, etc. of electronic materials.
然而,由于音频文件无法直观的看到音频内容的段落,在音频文件较长或者需要对音频某一段落进行获取、加工时,并不能快速的定位到音频中的指定位置,而是需要手动调试多次才能播放或识别对应音频内容。However, since the audio file cannot visually see the passage of the audio content, when the audio file is long or needs to be acquired and processed for a certain paragraph of the audio, it cannot be quickly located to the specified position in the audio, but needs to be manually debugged. The corresponding audio content can be played or recognized.
因此,需要提供一种至少能够解决上述问题的技术方案。Therefore, it is desirable to provide a technical solution that at least solves the above problems.
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the Background section above is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
发明内容Summary of the invention
本公开的目的在于提供一种音频段落识别方法、装置、电子设备以及计算机可读存储介质,进而至少在一定程度上克服由于相关技术的限制和缺陷而导致的一个或者多个问题。It is an object of the present disclosure to provide an audio passage recognition method, apparatus, electronic device, and computer readable storage medium that overcomes at least some of the problems due to limitations and disadvantages of the related art.
根据本公开的一个方面,提供一种音频段落识别方法,包括:According to an aspect of the present disclosure, an audio passage recognition method is provided, including:
将录制音频在预存的关键字信息库中进行匹配;Recording audio in a pre-stored keyword repository;
在所述关键字信息库中匹配到对应的关键字信息后,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记;After matching the corresponding keyword information in the keyword information base, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information;
若查找到段落标记,对所述关键字信息以及所述段落标记进行分析, 根据分析结果识别出音频段落。If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
在本公开的一种示例性实施例中,将录制音频在预存的关键字信息库中进行匹配包括:In an exemplary embodiment of the present disclosure, matching the recorded audio in the pre-stored keyword information base includes:
通过对所述录制音频进行短时傅立叶变换处理转换为音波信号;Converting to the sound wave signal by performing short-time Fourier transform processing on the recorded audio;
对所述音波信号进行听觉滤波器组滤波,过滤音波信号的环境噪音,提取出语音特征;Performing an auditory filter bank filter on the sound wave signal, filtering ambient noise of the sound wave signal, and extracting a voice feature;
将所述语音特征在所述关键字信息库中基于最大似然函数进行匹配。The speech features are matched in the keyword information base based on a maximum likelihood function.
在本公开的一种示例性实施例中,在所述关键字信息库中匹配到对应的关键字信息后,所述方法还包括:In an exemplary embodiment of the present disclosure, after matching the corresponding keyword information in the keyword information base, the method further includes:
确定所述关键字信息是否为有效关键字,若是,则执行在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记所述方法的步骤;Determining whether the keyword information is a valid keyword, and if yes, performing a step of searching for a method of indicating whether there is a paragraph mark in a preset audio range of the audio corresponding to the keyword information;
其中,确定所述关键字信息是否为有效关键字,包括:The determining whether the keyword information is a valid keyword includes:
若在所述录制音频中匹配到多个相同关键字信息,则将各个关键字信息与关键字信息的时间码值建立模糊矩阵方程;If a plurality of identical keyword information are matched in the recorded audio, a fuzzy matrix equation is established for each keyword information and a time code value of the keyword information;
通过计算模糊矩阵方程得到最佳解,确定所述最佳解对应的关键字信息为有效关键字信息。The optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.
在本公开的一种示例性实施例中,所述方法还包括:In an exemplary embodiment of the present disclosure, the method further includes:
根据所述有效关键字信息以及所述段落标识进行数据训练,根据训练结果更新关键字信息库。Data training is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result.
在本公开的一种示例性实施例中,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记,包括:In an exemplary embodiment of the present disclosure, searching for a paragraph mark within a preset audio range of audio corresponding to the keyword information includes:
在所述预设音频范围内查找是否存在持续时间大于预设时长且信号强度小于预设强度值的音波信号,若存在,确定查找到的段落标记为所述持续时间大于预设时长且信号强度小于预设强度值的音波信号。Querying, within the preset audio range, whether there is a sound wave signal whose duration is greater than a preset duration and the signal strength is less than the preset intensity value, and if so, determining that the found paragraph mark is the duration is greater than a preset duration and the signal strength An acoustic signal that is less than a preset intensity value.
在本公开的一种示例性实施例中,根据分析结果识别出音频段落后,所述方法还包括:In an exemplary embodiment of the present disclosure, after the audio passage is identified according to the analysis result, the method further includes:
若识别多个音频段落的关键词信息相同,则对相同关键字信息识别的 所述多个音频段落增加校正标识。If the keyword information identifying the plurality of audio segments is the same, the correction flag is added to the plurality of audio segments identified by the same keyword information.
在本公开的一种示例性实施例中,所述方法还包括:In an exemplary embodiment of the present disclosure, the method further includes:
当接收到根据所述校正标识触发的校正指令后,对所述校正标识对应的关键字信息的权重值Q+1;After receiving the correction instruction triggered according to the correction identifier, the weight value Q+1 of the keyword information corresponding to the correction identifier;
根据各关键字信息以及对应的权重值结合所述段落标记进行数据训练,并根据训练结果更新所述关键字信息库。Data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.
在本公开的一种示例性实施例中,根据分析结果识别出音频段落后,所述方法还包括:In an exemplary embodiment of the present disclosure, after the audio passage is identified according to the analysis result, the method further includes:
当接收到根据所述校正标识触发的校正指令后,取消所述校正标识对应的已识别的音频段落。After receiving the correction instruction triggered according to the correction identifier, the identified audio passage corresponding to the correction identifier is cancelled.
在本公开的一种示例性实施例中,所述段落标记为预设的段落字段信息。In an exemplary embodiment of the present disclosure, the paragraph is marked as preset paragraph field information.
在本公开的一种示例性实施例中,所述方法还包括:In an exemplary embodiment of the present disclosure, the method further includes:
当识别出的音频段落为多个,根据与音频段落对应的关键字信息生成与每个音频段落对应的段落目录或段落索引。When the identified audio passages are plural, a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage.
在本公开的一种示例性实施例中,根据分析结果识别出音频段落后,所述方法还包括:In an exemplary embodiment of the present disclosure, after the audio passage is identified according to the analysis result, the method further includes:
根据所述音频段落完成音频剪辑。The audio clip is completed according to the audio passage.
在本公开的一种示例性实施例中,所述段落标记包括段前标记以及段尾标记,所述根据所述音频段落完成音频剪辑,包括:In an exemplary embodiment of the present disclosure, the paragraph mark includes a pre-segment mark and a end-of-segment mark, and the completing the audio clip according to the audio passage includes:
根据关键字信息以及所述段前标记确定出段落起点,根据关键字信息以及所述段尾结束点;Determining a starting point of the paragraph according to the keyword information and the pre-segment mark, according to the keyword information and the ending point of the end of the paragraph;
根据所述段尾结束点以及段尾结束点的前一个段落起点进行剪辑。The clip is clipped according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph.
在本公开的一个方面,提供一种音频段落识别装置,包括:In an aspect of the disclosure, an audio passage recognition apparatus is provided, comprising:
关键字信息匹配模块,用于将录制音频在预存的关键字信息库中进行匹配;a keyword information matching module, configured to match the recorded audio in a pre-stored keyword information base;
段落标记查找模块,用于在所述关键字信息库中匹配到对应的关键字 信息后,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记;a paragraph mark searching module, configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information after matching the corresponding keyword information in the keyword information base;
音频段落识别模块,用于在查找到段落标记后,对所述关键字信息以及所述段落标记进行分析,根据分析结果识别出音频段落。The audio passage recognition module is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio paragraph according to the analysis result.
在本公开的一个方面,提供一种电子设备,包括:In an aspect of the disclosure, an electronic device is provided, comprising:
处理器;以及Processor;
存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现根据权利要求1至7中任一项所述的方法。A memory having computer readable instructions stored thereon, the computer readable instructions being executed by the processor to implement the method of any one of claims 1 to 7.
在本公开的一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现根据上述任意一项所述的方法。In an aspect of the present disclosure, a computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor, implements the method of any of the above.
本公开的示例性实施例中的音频段落识别方法,将录制音频在预存的关键字信息库中进行匹配,在匹配到对应的关键字信息后,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记,并在查找到段落标记后,对所述关键字信息以及所述段落标记进行分析,根据分析结果识别出音频段落。一方面,由于使用关键字信息和段落标记结合识别的方法,因此提高了音频段落识别的准确性;另一方面,通过识别出音频的段落信息,可以使音频的使用者快速的根据关键字信息对音频进行定位播放,极大的提高的音频使用效果和增强了用户体验。The audio passage recognition method in the exemplary embodiment of the present disclosure matches the recorded audio in a pre-stored keyword information library, and after matching the corresponding keyword information, the audio pre-corresponding to the keyword information The audio range is searched for whether there is a paragraph mark, and after the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio paragraph is identified according to the analysis result. On the one hand, the use of keyword information and paragraph mark combined recognition method, thus improving the accuracy of audio paragraph recognition; on the other hand, by identifying the audio paragraph information, the audio user can quickly rely on the keyword information Positioning and playing the audio greatly improves the audio usage and enhances the user experience.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。The above general description and the following detailed description are intended to be illustrative and not restrictive.
附图说明DRAWINGS
通过参照附图来详细描述其示例实施例,本公开的上述和其它特征及优点将变得更加明显。The above and other features and advantages of the present disclosure will become more apparent from the detailed description.
图1示出了根据本公开一示例性实施例的音频段落识别方法的流程图;FIG. 1 illustrates a flow chart of an audio passage recognition method according to an exemplary embodiment of the present disclosure;
图2示出了根据本公开一示例性实施例的音频段落识别装置的示意框图;FIG. 2 shows a schematic block diagram of an audio passage recognition device according to an exemplary embodiment of the present disclosure;
图3示意性示出了根据本公开一示例性实施例的电子设备的框图;以及FIG. 3 schematically illustrates a block diagram of an electronic device in accordance with an exemplary embodiment of the present disclosure;
图4示意性示出了根据本公开一示例性实施例的计算机可读存储介质的示意图。FIG. 4 schematically illustrates a schematic diagram of a computer readable storage medium in accordance with an exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施例。然而,示例实施例能够以多种形式实施,且不应被理解为限于在此阐述的实施例;相反,提供这些实施例使得本公开将全面和完整,并将示例实施例的构思全面地传达给本领域的技术人员。在图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in a variety of forms and should not be construed as being limited to the embodiments set forth herein. To those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and the repeated description thereof will be omitted.
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本公开的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而没有所述特定细节中的一个或更多,或者可以采用其它的方法、组元、材料、装置、步骤等。在其它情况下,不详细示出或描述公知结构、方法、装置、实现、材料或者操作以避免模糊本公开的各方面。Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are set forth However, one skilled in the art will appreciate that the technical solution of the present disclosure may be practiced without one or more of the specific details, or other methods, components, materials, devices, steps, etc. may be employed. In other instances, well-known structures, methods, devices, implementations, materials, or operations are not shown or described in detail to avoid obscuring aspects of the present disclosure.
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个软件硬化的模块中实现这些功能实体或功能实体的一部分,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。The block diagrams shown in the figures are merely functional entities and do not necessarily have to correspond to physically separate entities. That is, these functional entities may be implemented in software, or implemented in one or more software-hardened modules, or in different network and/or processor devices and/or microcontroller devices. Implement these functional entities.
在本示例实施例中,首先提供了一种音频段落识别方法,可以应用于计算机等电子设备;参考图1中所示,该音频段落识别方法可以包括以下步骤:In the present exemplary embodiment, an audio paragraph recognition method is first provided, which can be applied to an electronic device such as a computer; as shown in FIG. 1, the audio passage recognition method may include the following steps:
步骤S110.将录制音频在预存的关键字信息库中进行匹配;Step S110. Matching the recorded audio in the pre-stored keyword information base;
步骤S120.在所述关键字信息库中匹配到对应的关键字信息后,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记;Step S120. After matching the corresponding keyword information in the keyword information base, search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information.
步骤S130.若查找到段落标记,对所述关键字信息以及所述段落标记 进行分析,根据分析结果识别出音频段落。Step S130. If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
根据本示例实施例中的音频段落识别方法,一方面,由于使用关键字信息和段落标记结合识别的方法,因此提高了音频段落识别的准确性;另一方面,通过识别出音频的段落信息,可以使音频的使用者快速的根据关键字信息对音频进行定位播放,极大的提高的音频使用效果和增强了用户体验。According to the audio passage recognition method in the present exemplary embodiment, on the one hand, since the method of combining the recognition using the keyword information and the paragraph mark is used, the accuracy of the audio passage recognition is improved; on the other hand, by recognizing the paragraph information of the audio, The user of the audio can quickly locate and play the audio according to the keyword information, greatly improving the audio use effect and enhancing the user experience.
下面,将对本示例实施例中的音频段落识别方法进行进一步的说明。Hereinafter, the audio passage recognition method in the present exemplary embodiment will be further described.
在步骤S110中,将录制音频在预存的关键字信息库中进行匹配;In step S110, the recorded audio is matched in the pre-stored keyword information base;
本示例实施方式中,录制音频可以是用户通过电子设备录制的音频文件,包括:mp3、wma等各种音频格式。例如:用户在课堂教学时,使用手机录制的一堂课的教学音频;用户参加会议时,使用录音笔录制的会议主讲人的全部发言内容的音频;用户在家观看电视直播时,使用家庭智能音响录制的美食直播节目的音频。In this example embodiment, the recorded audio may be an audio file recorded by the user through the electronic device, including: mp3, wma, and the like. For example, when the user is teaching in the classroom, the teaching audio of a lesson recorded by the mobile phone; when the user attends the conference, the audio of all the speeches of the conference presenter recorded by the recording pen is used; when the user watches the live television broadcast, the home intelligent audio is recorded. The audio of the live show of the gourmet.
预存的关键字信息库可以是根据预先得知的演讲内容、会议内容等筛选出的关键字信息组成的关键字信息库,也可以是根据常用的时序、次序词语或其他可以自定义的关键字信息等组成的关键字信息库。例如:关键字信息如:“上午”“上世纪90年代”“第一章”“首先”“再次”“又如”等,也可以是用户自定义的关键字,如在七年级历史课本中章节信息:“中华文明的起源”、“国家的产生和社会的变革”等。The pre-stored keyword information base may be a keyword information base composed of keyword information selected according to the previously learned speech content, meeting content, etc., or may be based on commonly used time series, order words or other customizable keywords. A keyword information base composed of information and the like. For example: keyword information such as: "morning", "90s of the last century", "first chapter", "first", "again", "again", etc., can also be user-defined keywords, such as in the seventh grade history textbook. Chapter information: "The origin of Chinese civilization", "The emergence of the state and the transformation of society".
由于上述音频存在录音环境不同、录音设备不同、发音人不同等问题,所以音强和音色等也不相同,在将录制音频在预存的关键字信息库中进行匹配时,需要先将音频转换为统一的音波信号。Since the above audio has different recording environments, different recording devices, and different speakers, the sound intensity and the timbre are different. When the recorded audio is matched in the pre-stored keyword information database, the audio needs to be converted into Unified sound signal.
进一步的,由于声波频率相对稳定,可以利用短时傅立叶变换高时间分辨率的特性,将所述录制音频进行短时傅立叶变换处理为音波信号。Further, since the sound wave frequency is relatively stable, the recorded audio can be processed into a sound wave signal by short-time Fourier transform using the characteristics of short time Fourier transform with high temporal resolution.
进一步的,可以对所述音波信号进行听觉滤波器组滤波,根据不同音频属性选取不同的听觉滤波器组,最大化的模拟声波信号,过滤音波信号的环境噪音,提取出语音特征。听觉滤波器组包含但不限于:共振滤波器、Roex函数滤波器、Gammatone滤波器、Gammachirp滤波器。Further, the acoustic signal can be filtered by the auditory filter group, different auditory filter banks are selected according to different audio attributes, the simulated acoustic wave signal is maximized, the ambient noise of the acoustic signal is filtered, and the speech features are extracted. The auditory filter bank includes, but is not limited to, a resonance filter, a Roex function filter, a Gammatone filter, and a Gammachirp filter.
将所述语音特征在所述关键字信息库中进行匹配,得到在关键字信息库中与所述语音特征匹配的关键字信息。Matching the voice features in the keyword information base to obtain keyword information matching the voice features in the keyword information base.
进一步的,可以将所述语音特征在所述关键字信息库中基于最大似然函数进行匹配。对于指定的关键字信息x,关于语音特征参数θ的似然函数:Further, the speech features may be matched in the keyword information base based on a maximum likelihood function. For the specified keyword information x, the likelihood function for the speech feature parameter θ:
L(θ|x)=P(X=x|θ)L(θ|x)=P(X=x|θ)
L(θ|x)等于语音特征参数θ相对关键字信息X的概率。L(θ|x) is equal to the probability that the speech feature parameter θ is relative to the keyword information X.
本示例实施方式中,在所述关键字信息库中匹配到对应的关键字信息后,所述方法还包括:确定所述关键字信息是否为有效关键字。若在所述录制音频中只匹配到一个关键字信息,则确定所述关键字信息为有效关键字信息;若在所述录制音频中匹配到多个相同关键字信息,则将各个关键字信息与关键字信息的时间码值建立模糊矩阵方程;通过计算模糊矩阵方程得到最佳解,确定所述最佳解对应的关键字信息为有效关键字信息。In this example embodiment, after matching the corresponding keyword information in the keyword information base, the method further includes: determining whether the keyword information is a valid keyword. If only one keyword information is matched in the recorded audio, determining that the keyword information is valid keyword information; if a plurality of identical keyword information is matched in the recorded audio, each keyword information is A fuzzy matrix equation is established with the time code value of the keyword information; the optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.
即,则将各个关键字信息x ij(i=1,2,…,m,j=1,2,…,n)与关键字信息y ij(i=1,2,…,m,j=1,2,…,n)的时间码值建立模糊矩阵方程R: That is, each keyword information x ij (i = 1, 2, ..., m, j = 1, 2, ..., n) and the keyword information y ij (i = 1, 2, ..., m, j = The time code value of 1, 2, ..., n) establishes the fuzzy matrix equation R:
模糊矩阵方程
Figure PCTCN2018078525-appb-000001
λ=MAX[R(x,y)];
Fuzzy matrix equation
Figure PCTCN2018078525-appb-000001
λ=MAX[R(x,y)];
通过计算模糊矩阵方程得到最佳解λ,确定所述最佳解对应的关键字信息为有效关键字信息。找到有效关键字信息后,则执行所述有效关键字信息对应的音频的预设音频范围内查找是否有段落标记所述方法的步骤。The optimal solution λ is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information. After the valid keyword information is found, the step of searching for the method of paragraph marking in the preset audio range of the audio corresponding to the valid keyword information is performed.
在步骤S120中,在所述关键字信息库中匹配到对应的关键字信息后,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记;In step S120, after matching the corresponding keyword information in the keyword information base, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information;
本示例实施方式中,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记具体包括:在所述预设音频范围内查找是否存在持 续时间大于预设时长且信号强度小于预设强度值的音波信号,若存在,确定查找到的段落标记为所述持续时间大于预设时长且信号强度小于预设强度值的音波信号。举例而言:In this example, whether to find a paragraph mark in the preset audio range of the audio corresponding to the keyword information specifically includes: searching, in the preset audio range, whether there is a duration greater than a preset duration and a signal strength The sound wave signal less than the preset intensity value, if present, determines that the found paragraph mark is the sound wave signal whose duration is longer than the preset time length and whose signal intensity is less than the preset intensity value. For example:
用户对某课堂教学录制了音频,在音频中有这样的内容:“今天我们要学习人类的历史这个章节,(停顿)第一节的内容是…”。当用户将上述课堂教学音频在预存的关键字信息库中进行匹配时,在关键字信息库中匹配了对应的关键字“第一节”,然后在关键字“第一节”的音频附近预设时间范围(例如为关键字“第一节”的前后各5s)内查找是否存在持续时间大于预设时长且信号强度小于预设强度值的音波信号。例如,预设时长为2s,预设强度值为2dB之间,那么通过查找,发现上述音频内容中,在关键字“第一节”的音频附近预设时间范围5s内存在音波信号强度小于音波平均音强(预设强度值)2dB的音波信号,且该音波信号的持续时长大于预设时长2s,即大于正常语句中词语间隔时间,也就是有一个明显的停顿时间,则判定此停顿为关键字“第一节”对应的段落标记,也就是从此段落标记开始记录“第一节”的音频信息。The user recorded audio for a class teaching, and there is such content in the audio: "Today we have to learn the chapter of human history, (pause) the first section of the content is...". When the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database, the corresponding keyword "first section" is matched in the keyword information database, and then the audio near the keyword "first section" is pre-prepared. Set the time range (for example, 5s before and after the first section of the keyword) to find whether there is a sound wave signal whose duration is longer than the preset duration and the signal strength is less than the preset intensity value. For example, if the preset duration is 2s and the preset intensity is between 2dB, then by searching, it is found that the audio content is less than the sound wave in the preset time range of 5s in the vicinity of the audio of the first section of the keyword. The average sound intensity (preset intensity value) is 2dB of the sound wave signal, and the duration of the sound wave signal is greater than the preset time length of 2s, that is, greater than the word interval time in the normal sentence, that is, there is a significant pause time, then the pause is determined as The paragraph mark corresponding to the keyword "first section", that is, the audio information of the "first section" is recorded from this paragraph mark.
本示例实施方式中,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记还包括:在所述预设音频范围内查找是否存在段落字段信息。举例而言:In this example embodiment, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information further includes: searching for whether there is paragraph field information within the preset audio range. For example:
用户对某课堂教学录制了音频,在音频中有这样的内容:“今天我们要学习人类的历史这个章节,首先学习第一节,本节的内容是…”。当用户将上述课堂教学音频在预存的关键字信息库中进行匹配时,在关键字信息库中匹配了对应的关键字“第一节”,然后在关键字“第一节”的音频附近预设时间范围(仍假设为关键字“第一节”的前后各5s)内查找到段落字段信息“首先”,则可以判断段落字段信息“首先”为关键字“第一节”对应的段落标记,也就是从此段落标记开始记录“第一节”的音频信息。The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first section, the content of this section is...". When the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database, the corresponding keyword "first section" is matched in the keyword information database, and then the audio near the keyword "first section" is pre-prepared. If the paragraph field information "first" is found within the time range (still assumed to be 5s before and after the keyword "first section"), then the paragraph field information "first" can be judged as the paragraph mark corresponding to the keyword "first section". That is, the audio information of the "first section" is recorded from this paragraph mark.
需要说明的是,上述示例仅为进一步理解本公开所列举的示例性描述,由于人类语言繁复,各类表达方式、语言习惯、语法都不尽相同,对于各类段落标记在此不一一列举,通过其它段落标记识别段落信息也同样属于本公开的保护范围。It should be noted that the above examples are only for further understanding of the exemplary descriptions listed in the present disclosure. Since the human language is complicated, various expressions, language habits, and grammars are different, and various paragraph marks are not listed here. It is also within the scope of the present disclosure to identify paragraph information by other paragraph marks.
本示例实施方式中,根据所述有效关键字信息以及所述段落标识进行无监督的数据训练学习,根据训练结果更新关键字信息库。如可以根据不同的音频内容选择不同的数据训练方式,将古诗学习的课堂录音与唐诗三百首的朗诵解析数据库进行数据训练,可以将更多的诗词用于作为关键字信息更新到关键字信息库中;根据韩语课堂的录音与标准韩语节目的数据库进行数据训练,可以将更多韩语特有语法的关键字如常用在句末的语气助词“
Figure PCTCN2018078525-appb-000002
”作为与段落标识对应的关键字信息更新到关键字信息库中。
In this example embodiment, unsupervised data training learning is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result. For example, different data training methods can be selected according to different audio contents, and the classroom recording of ancient poetry learning and the reciting analysis database of 300 poems of Tang poetry can be used for data training, and more poems can be used as keyword information to update to keyword information. In the library; according to the Korean language classroom recording and the standard Korean program database for data training, you can add more Korean-specific grammar keywords such as the modal verbs commonly used at the end of the sentence.
Figure PCTCN2018078525-appb-000002
The keyword information corresponding to the paragraph identifier is updated to the keyword information base.
步骤S130.若查找到段落标记,对所述关键字信息以及所述段落标记进行分析,根据分析结果识别出音频段落。Step S130. If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
本示例实施方式中,段落标记与关键字信息为对应关系。举例而言:In the example embodiment, the paragraph mark and the keyword information are in a corresponding relationship. For example:
用户对某课堂教学录制了音频,在音频中有这样的内容:“今天我们要学习人类的历史这个章节,首先学习第一节,首先看本节的总论…”。当用户将上述课堂教学音频在预存的关键字信息库中进行匹配时,在关键字“第一节”的音频附近预设时间范围内查找到多个段落字段信息“首先”,然而结合关键字“第一节”,分析得出只有在关键字“第一节”前面的段落字段信息“首先”可以作为段落标记,所以就以第一个“首先”为确立关键字“第一节”的音频段落真实位置。The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first quarter, first look at the general theory of this section...". When the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database, multiple paragraph field information "first" is found in the preset time range near the audio of the keyword "first section", but the combined keyword "The first quarter", the analysis shows that only the paragraph field information "first" in front of the keyword "first section" can be used as a paragraph mark, so the first "first" is used to establish the keyword "first section". The actual position of the audio passage.
本示例实施方式中,根据分析结果识别出音频段落后,若识别多个音频段落的关键词信息相同,则对通过相同关键词信息识别的所述多个音频段落增加校正标识,举例而言:In this example embodiment, after the audio passage is identified according to the analysis result, if the keyword information of the plurality of audio passages is the same, the correction identifier is added to the plurality of audio passages identified by the same keyword information, for example:
用户对某课堂教学录制了音频,在音频中有这样的内容:“今天我们要学习人类的历史这个章节,首先学习第一节,第一节的内容是…,以上是第一节的内容。然后学习第二节,本节是第一节内容的延续…”。以上音频中出现多次关键词“第一节”,但是并不能把每个关键词“第一节”都作为段落标记的起始信息,这是就需要对相同的多个关键词“第一次”增加校正标识,以提醒用户进行校正。或者,仅将第一次出现的关键词“第一节”或能与段落字段信息匹配的关键词“第一节”作为有效的关键词使用,而其它多次出现的关键次“第一次”则认为匹配不成功。The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first section, the content of the first section is..., the above is the content of the first section. Then learn the second quarter, this section is a continuation of the first section...". There are multiple keywords "first section" in the above audio, but you can't use each keyword "first section" as the starting information of the paragraph mark. This is the first keyword that needs to be the same. "Add a correction flag to remind the user to make corrections. Or, only the keyword "first section" that appears for the first time or the keyword "first section" that can match the paragraph field information is used as a valid keyword, while other key occurrences of multiple occurrences are "first time" "The match is considered unsuccessful."
本示例实施方式中,通过相同关键词信息识别的所述多个音频段落增加 校正标识后,当接收到根据所述校正标识触发的校正指令后,对所述校正标识对应的关键字信息的权重值Q+1;根据各关键字信息以及对应的权重值结合所述段落标记进行数据训练,并根据训练结果更新所述关键字信息库。In this example embodiment, after the correction identifier is added by the plurality of audio segments identified by the same keyword information, after receiving the correction instruction triggered according to the correction identifier, the weight of the keyword information corresponding to the correction identifier is a value Q+1; data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.
通过校正标识触发条件增加权重值,从而实现了关键字信息的认为纠错功能,是关键字信息库中关键字信息的主动学习更新,相较无监督的关键字信息学习,通过关键字信息的主动学习,能够实现关键字信息库更准确的成长。The weighting value is increased by correcting the triggering condition, thereby realizing the error correction function of the keyword information, which is an active learning update of the keyword information in the keyword information database, compared with the unsupervised keyword information learning, through the keyword information Active learning can achieve a more accurate growth of the keyword information base.
本示例实施方式中,根据分析结果识别出音频段落后,当接收到根据所述校正标识触发的校正指令后,对所述关键字信息在音频中的识别进行重新定位,取消所述校正标识对应的已识别的音频段落,使用校正后的关键字信息作为有有效关键字信息。In this example, after the audio segment is identified according to the analysis result, after receiving the correction instruction triggered according to the correction identifier, the identification of the keyword information in the audio is relocated, and the correction identifier is cancelled. The recognized audio passage uses the corrected keyword information as valid keyword information.
本示例实施方式中,当识别出的音频段落为多个,根据与音频段落对应的关键字信息生成与每个音频段落对应的段落目录或段落索引。可以根据不同关键字信息对音频段落进行分类、分层生成段落目录或段落索引储存,快捷有效的找到对应的音频段落;也可以在音频文件的播放进度条上标注对应位置的段落目录或段落索引信息,使用户在音频播放过程中准确的定位到指定音频段落的位置。In the present exemplary embodiment, when the identified audio passages are plural, a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage. The audio passages can be classified according to different keyword information, layered to generate paragraph catalogs or paragraph index storage, and the corresponding audio passages can be quickly and efficiently found; or the paragraph directory or paragraph index of the corresponding position can be marked on the playback progress bar of the audio file. Information that allows the user to accurately locate the specified audio passage during audio playback.
本示例实施方式中,还包括根据分析结果识别出音频段落后,根据所述音频段落完成音频剪辑。可以将所述剪辑音频和关键字信息对应储存,这样就是可以实现对整个音频文件的快速索引,用户可以单独指定播放“第一章”的音频、“中华文明的起源”的音频等,既实现了音频分段高效利用,又方便了归档查找。In this example embodiment, after the audio segment is recognized according to the analysis result, the audio clip is completed according to the audio segment. The clip audio and the keyword information can be stored correspondingly, so that the entire audio file can be quickly indexed, and the user can separately specify the audio of the “first chapter” and the “origin of the Chinese civilization”, etc. The efficient use of audio segments facilitates archival lookup.
本示例实施方式中,段落标记包括段前标记以及段尾标记,所述根据所述音频段落完成音频剪辑,包括:根据关键字信息以及所述段前标记确定出段落起点,根据关键字信息以及所述段尾结束点;根据所述段尾结束点以及段尾结束点的前一个段落起点进行剪辑。举例而言:In this example embodiment, the paragraph mark includes a pre-segment mark and a end-of-segment mark, and the completing the audio clip according to the audio passage includes: determining a start point of the paragraph according to the keyword information and the pre-segment mark, according to the keyword information and The end of the paragraph end; the editing is performed according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph. For example:
用户对某课堂教学录制了音频,在音频中有这样的内容:“今天我们要学习人类的历史这个章节,首先学习第一节,本节的内容是…,以上是第一节的内容。然后学习第二节…”。以上音频中,段落字段信息“首先”是 关键字“第一节”的段前标记,段落字段信息“然后”既是关键字“第二节”的段前标记,也是关键字“第一节”的段尾标记,可以根据上述段前标记和段尾标记结合,确定关键字“第一节”的音频段落,完成音频剪辑。The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first section, the content of this section is..., the above is the content of the first section. Then Learn the second quarter...". In the above audio, the paragraph field information "first" is the pre-segment mark of the keyword "first section", and the paragraph field information "then" is both the pre-segment mark of the keyword "second section" and the keyword "first section". The end-of-segment mark can be combined with the pre-segment mark and the end-of-segment mark to determine the audio passage of the keyword "first section" to complete the audio clip.
同时,上述示例中,段落字段信息“以上是”也可以是关键字“第一节”的段尾标记,同样也可以以此作为关键字“第一节”的段尾标记信息,确定关键字“第一节”的音频段落,完成音频剪辑。Meanwhile, in the above example, the paragraph field information "above is" may also be the end-of-segment mark of the keyword "first section", and may also be used as the end-of-segment mark information of the keyword "first section" to determine the keyword. The audio section of the "first section" completes the audio clip.
需要说明的是,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。It should be noted that, although the various steps of the method of the present disclosure are described in a particular order in the drawings, this does not require or imply that the steps must be performed in the specific order, or that all the steps shown must be performed. Achieve the desired results. Additionally or alternatively, certain steps may be omitted, multiple steps being combined into one step execution, and/or one step being decomposed into multiple step executions and the like.
此外,在本示例实施例中,还提供了一种音频段落识别装置。参照图2所示,该音频段落识别装置200可以包括:关键字信息匹配模块210、段落标记查找模块220以及音频段落识别模块230。其中:Further, in the present exemplary embodiment, an audio passage identifying means is also provided. Referring to FIG. 2, the audio passage recognition apparatus 200 may include a keyword information matching module 210, a paragraph mark search module 220, and an audio passage recognition module 230. among them:
关键字信息匹配模块210,用于将录制音频在预存的关键字信息库中进行匹配;The keyword information matching module 210 is configured to match the recorded audio in the pre-stored keyword information base;
段落标记查找模块220,用于在所述关键字信息库中匹配到对应的关键字信息后,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记;The paragraph mark searching module 220 is configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information after matching the corresponding keyword information in the keyword information base;
音频段落识别模块230,用于在查找到段落标记后,对所述关键字信息以及所述段落标记进行分析,根据分析结果识别出音频段落。The audio passage recognition module 230 is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio passage according to the analysis result.
上述中各音频段落识别装置模块的具体细节已经在对应的音频段落识别方法中进行了详细的描述,因此此处不再赘述。The specific details of each of the audio passage recognition device modules have been described in detail in the corresponding audio passage recognition method, and thus will not be described herein.
应当注意,尽管在上文详细描述中提及了音频段落识别装置200的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the audio passage recognition apparatus 200 are mentioned in the above detailed description, such division is not mandatory. Indeed, in accordance with embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one of the modules or units described above may be further divided into multiple modules or units.
此外,在本公开的示例性实施例中,还提供了一种能够实现上述方法的电子设备。Further, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
所属技术领域的技术人员能够理解,本发明的各个方面可以实现为系统、方法或程序产品。因此,本发明的各个方面可以具体实现为以下形式,即:完全的硬件实施例、完全的软件实施例(包括固件、微代码等),或硬件和软件方面结合的实施例,这里可以统称为“电路”、“模块”或“系统”。Those skilled in the art will appreciate that various aspects of the present invention can be implemented as a system, method, or program product. Accordingly, aspects of the present invention may be embodied in the form of a complete hardware embodiment, a complete software embodiment (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein. "Circuit," "module," or "system."
下面参照图3来描述根据本发明的这种实施例的电子设备300。图3显示的电子设备300仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。An electronic device 300 in accordance with such an embodiment of the present invention is described below with reference to FIG. The electronic device 300 shown in FIG. 3 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.
如图3所示,电子设备300以通用计算设备的形式表现。电子设备300的组件可以包括但不限于:上述至少一个处理单元310、上述至少一个存储单元320、连接不同系统组件(包括存储单元320和处理单元310)的总线330、显示单元340。As shown in FIG. 3, electronic device 300 is embodied in the form of a general purpose computing device. The components of the electronic device 300 may include, but are not limited to, the at least one processing unit 310, the at least one storage unit 320, the bus 330 connecting different system components (including the storage unit 320 and the processing unit 310), and the display unit 340.
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元310执行,使得所述处理单元310执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施例的步骤。例如,所述处理单元310可以执行如图1中所示的步骤S110至步骤S130。Wherein, the storage unit stores program code, which can be executed by the processing unit 310, such that the processing unit 310 performs various exemplary embodiments according to the present invention described in the "Exemplary Method" section of the present specification. The steps of the examples. For example, the processing unit 310 can perform steps S110 to S130 as shown in FIG. 1.
存储单元320可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)3201和/或高速缓存存储单元3202,还可以进一步包括只读存储单元(ROM)3203。The storage unit 320 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 3201 and/or a cache storage unit 3202, and may further include a read only storage unit (ROM) 3203.
存储单元320还可以包括具有一组(至少一个)程序模块3205的程序/实用工具3204,这样的程序模块3205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 320 may also include a program/utility 3204 having a set (at least one) of the program modules 3205, such program modules 3205 including but not limited to: an operating system, one or more applications, other program modules, and program data, Implementations of the network environment may be included in each or some of these examples.
总线330可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。 Bus 330 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.
电子设备300也可以与一个或多个外部设备370(例如键盘、指向设 备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备300交互的设备通信,和/或与使得该电子设备300能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口350进行。并且,电子设备300还可以通过网络适配器360与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器360通过总线330与电子设备300的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备300使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 300 can also communicate with one or more external devices 370 (eg, a keyboard, pointing device, Bluetooth device, etc.), and can also communicate with one or more devices that enable the user to interact with the electronic device 300, and/or with Any device (e.g., router, modem, etc.) that enables the electronic device 300 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 350. Also, electronic device 300 can communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through network adapter 360. As shown, network adapter 360 communicates with other modules of electronic device 300 via bus 330. It should be understood that although not shown in the figures, other hardware and/or software modules may be utilized in conjunction with electronic device 300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives. And data backup storage systems, etc.
通过以上的实施例的描述,本领域的技术人员易于理解,这里描述的示例实施例可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施例的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施例的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software in combination with necessary hardware. Therefore, the technical solution according to an embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network. A number of instructions are included to cause a computing device (which may be a personal computer, server, terminal device, or network device, etc.) to perform a method in accordance with an embodiment of the present disclosure.
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施例中,本发明的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本发明各种示例性实施例的步骤。In an exemplary embodiment of the present disclosure, there is also provided a computer readable storage medium having stored thereon a program product capable of implementing the above method of the present specification. In some possible embodiments, aspects of the present invention may also be embodied in the form of a program product comprising program code for causing said program product to run on a terminal device The terminal device performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section of the present specification.
参考图4所示,描述了根据本发明的实施例的用于实现上述方法的程序产品400,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本发明的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Referring to FIG. 4, a program product 400 for implementing the above method, which may employ a portable compact disk read only memory (CD-ROM) and includes program code, and may be in a terminal device, is illustrated in accordance with an embodiment of the present invention. For example running on a personal computer. However, the program product of the present invention is not limited thereto, and in the present document, the readable storage medium may be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus or device.
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可 以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product can employ any combination of one or more readable media. The readable medium can be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。The computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a readable medium can be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言的任意组合来编写用于执行本发明操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++, etc., including conventional procedural Programming language—such as the "C" language or a similar programming language. The program code can execute entirely on the user computing device, partially on the user device, as a stand-alone software package, partially on the remote computing device on the user computing device, or entirely on the remote computing device or server. Execute on. In the case of a remote computing device, the remote computing device can be connected to the user computing device via any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computing device (eg, provided using an Internet service) Businesses are connected via the Internet).
此外,上述附图仅是根据本发明示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。Further, the above-described drawings are merely illustrative of the processes included in the method according to the exemplary embodiments of the present invention, and are not intended to be limiting. It is easy to understand that the processing shown in the above figures does not indicate or limit the chronological order of these processes. In addition, it is also easy to understand that these processes may be performed synchronously or asynchronously, for example, in a plurality of modules.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到 本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Other embodiments of the present disclosure will be apparent to those skilled in the <RTIgt; The present application is intended to cover any variations, uses, or adaptations of the present disclosure, which are in accordance with the general principles of the disclosure and include common general knowledge or common technical means in the art that are not disclosed in the present disclosure. . The specification and examples are to be regarded as illustrative only,
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。It is to be understood that the invention is not limited to the details of the details and The scope of the disclosure is to be limited only by the appended claims.
工业实用性Industrial applicability
一方面,由于使用关键字信息和段落标记结合识别的方法,因此提高了音频段落识别的准确性;另一方面,通过识别出音频的段落信息,可以使音频的使用者快速的根据关键字信息对音频进行定位播放,极大的提高的音频使用效果和增强了用户体验。On the one hand, the use of keyword information and paragraph mark combined recognition method, thus improving the accuracy of audio paragraph recognition; on the other hand, by identifying the audio paragraph information, the audio user can quickly rely on the keyword information Positioning and playing the audio greatly improves the audio usage and enhances the user experience.

Claims (15)

  1. 一种音频段落识别方法,其特征在于,所述方法包括:An audio paragraph recognition method, characterized in that the method comprises:
    将录制音频在预存的关键字信息库中进行匹配;Recording audio in a pre-stored keyword repository;
    在所述关键字信息库中匹配到对应的关键字信息后,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记;After matching the corresponding keyword information in the keyword information base, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information;
    若查找到段落标记,对所述关键字信息以及所述段落标记进行分析,根据分析结果识别出音频段落。If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
  2. 如权利要求1所述的方法,其特征在于,将录制音频在预存的关键字信息库中进行匹配包括:The method of claim 1 wherein matching the recorded audio in the pre-stored keyword information base comprises:
    通过对所述录制音频进行短时傅立叶变换处理转换为音波信号;Converting to the sound wave signal by performing short-time Fourier transform processing on the recorded audio;
    对所述音波信号进行听觉滤波器组滤波,过滤音波信号的环境噪音,提取出语音特征;Performing an auditory filter bank filter on the sound wave signal, filtering ambient noise of the sound wave signal, and extracting a voice feature;
    将所述语音特征在所述关键字信息库中基于最大似然函数进行匹配。The speech features are matched in the keyword information base based on a maximum likelihood function.
  3. 如权利要求1所述的方法,其特征在于,在所述关键字信息库中匹配到对应的关键字信息后,所述方法还包括:The method of claim 1, wherein after the matching the keyword information in the keyword information base, the method further comprises:
    确定所述关键字信息是否为有效关键字,若是,则执行在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记所述方法的步骤;Determining whether the keyword information is a valid keyword, and if yes, performing a step of searching for a method of indicating whether there is a paragraph mark in a preset audio range of the audio corresponding to the keyword information;
    其中,确定所述关键字信息是否为有效关键字,包括:The determining whether the keyword information is a valid keyword includes:
    若在所述录制音频中匹配到多个相同关键字信息,则将各个关键字信息与关键字信息的时间码值建立模糊矩阵方程;If a plurality of identical keyword information are matched in the recorded audio, a fuzzy matrix equation is established for each keyword information and a time code value of the keyword information;
    通过计算模糊矩阵方程得到最佳解,确定所述最佳解对应的关键字信息为有效关键字信息。The optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.
  4. 如权利要求3所述的方法,其特征在于,所述方法还包括:The method of claim 3, wherein the method further comprises:
    根据所述有效关键字信息以及所述段落标识进行数据训练,根据训练结果更新关键字信息库。Data training is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result.
  5. 如权利要求2所述的方法,其特征在于,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记,包括:The method according to claim 2, wherein searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information comprises:
    在所述预设音频范围内查找是否存在持续时间大于预设时长且信号强度小于预设强度值的音波信号,若存在,确定查找到的段落标记为所述持续时间大于预设时长且信号强度小于预设强度值的音波信号。Querying, within the preset audio range, whether there is a sound wave signal whose duration is greater than a preset duration and the signal strength is less than the preset intensity value, and if so, determining that the found paragraph mark is the duration is greater than a preset duration and the signal strength An acoustic signal that is less than a preset intensity value.
  6. 如权利要求1所述的方法,其特征在于,根据分析结果识别出音频段落后,所述方法还包括:The method of claim 1, wherein after the audio passage is identified based on the analysis result, the method further comprises:
    若识别多个音频段落的关键词信息相同,则对相同关键字信息识别的所述多个音频段落增加校正标识。If the keyword information identifying the plurality of audio segments is the same, the correction flag is added to the plurality of audio segments identified by the same keyword information.
  7. 如权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6 wherein the method further comprises:
    当接收到根据所述校正标识触发的校正指令后,对所述校正标识对应的关键字信息的权重值Q+1;After receiving the correction instruction triggered according to the correction identifier, the weight value Q+1 of the keyword information corresponding to the correction identifier;
    根据各关键字信息以及对应的权重值结合所述段落标记进行数据训练,并根据训练结果更新所述关键字信息库。Data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.
  8. 如权利要求6所述的方法,其特征在于,根据分析结果识别出音频段落后,所述方法还包括:The method of claim 6, wherein after the audio passage is identified based on the analysis result, the method further comprises:
    当接收到根据所述校正标识触发的校正指令后,取消所述校正标识对应的已识别的音频段落。After receiving the correction instruction triggered according to the correction identifier, the identified audio passage corresponding to the correction identifier is cancelled.
  9. 如权利要求1所述的方法,其特征在于,所述段落标记为预设的段落字段信息。The method of claim 1 wherein said paragraph is marked as preset paragraph field information.
  10. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1 wherein the method further comprises:
    当识别出的音频段落为多个,根据与音频段落对应的关键字信息生成与每个音频段落对应的段落目录或段落索引。When the identified audio passages are plural, a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage.
  11. 如权利要求1所述的方法,其特征在于,根据分析结果识别出音频段落后,所述方法还包括:The method of claim 1, wherein after the audio passage is identified based on the analysis result, the method further comprises:
    根据所述音频段落完成音频剪辑。The audio clip is completed according to the audio passage.
  12. 如权利要求11所述的方法,其特征在于,所述段落标记包括段前标记以及段尾标记,所述根据所述音频段落完成音频剪辑,包括:The method of claim 11 wherein said paragraph mark comprises a pre-segment mark and a end-of-segment mark, said completing an audio clip based on said audio passage, comprising:
    根据关键字信息以及所述段前标记确定出段落起点,根据关键字信息 以及所述段尾结束点;Determining a starting point of the paragraph according to the keyword information and the pre-segment mark, according to the keyword information and the ending point of the end of the paragraph;
    根据所述段尾结束点以及段尾结束点的前一个段落起点进行剪辑。The clip is clipped according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph.
  13. 一种音频段落识别装置,其特征在于,所述装置包括:An audio passage recognition device, characterized in that the device comprises:
    关键字信息匹配模块,用于将录制音频在预存的关键字信息库中进行匹配;a keyword information matching module, configured to match the recorded audio in a pre-stored keyword information base;
    段落标记查找模块,用于在所述关键字信息库中匹配到对应的关键字信息时,在与所述关键字信息对应的音频的预设音频范围内查找是否有段落标记;a paragraph mark searching module, configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information when the keyword information is matched to the corresponding keyword information;
    音频段落识别模块,用于在查找到段落标记后,对所述关键字信息以及所述段落标记进行分析,根据分析结果识别出音频段落。The audio passage recognition module is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio paragraph according to the analysis result.
  14. 一种电子设备,其特征在于,包括An electronic device characterized by comprising
    处理器;以及Processor;
    存储器,所述存储器上存储有计算机可读指令,所述计算机可读指令被所述处理器执行时实现根据权利要求1至12中任一项所述的方法。A memory having computer readable instructions stored thereon, the computer readable instructions being executed by the processor to implement the method of any one of claims 1 to 12.
  15. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现根据权利要求1至12中任一项所述方法。A computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor to implement the method of any one of claims 1 to 12.
PCT/CN2018/078525 2018-02-06 2018-03-09 Audio paragraph recognition method and apparatus WO2019153406A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810115684.8A CN108363765B (en) 2018-02-06 2018-02-06 Audio paragraph identification method and device
CN201810115684.8 2018-02-06

Publications (1)

Publication Number Publication Date
WO2019153406A1 true WO2019153406A1 (en) 2019-08-15

Family

ID=63004397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/078525 WO2019153406A1 (en) 2018-02-06 2018-03-09 Audio paragraph recognition method and apparatus

Country Status (2)

Country Link
CN (1) CN108363765B (en)
WO (1) WO2019153406A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335612A (en) * 2019-07-11 2019-10-15 招商局金融科技有限公司 Minutes generation method, device and storage medium based on speech recognition
CN113204668A (en) * 2021-05-21 2021-08-03 广州博冠信息科技有限公司 Audio clipping method and device, storage medium and electronic equipment
CN113507632B (en) * 2021-08-12 2023-02-28 北京字跳网络技术有限公司 Video processing method, device, terminal and storage medium
CN113691966B (en) * 2021-08-23 2023-09-05 上海联净电子科技有限公司 Audio playing method, system, equipment and storage medium based on simultaneous transmission of information and energy

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030084037A1 (en) * 2001-10-31 2003-05-01 Kabushiki Kaisha Toshiba Search server and contents providing system
CN102724598A (en) * 2011-12-05 2012-10-10 新奥特(北京)视频技术有限公司 Method for splitting news items
CN104778218A (en) * 2015-03-20 2015-07-15 广东欧珀移动通信有限公司 Method and device for processing incomplete song
CN106802885A (en) * 2016-12-06 2017-06-06 乐视控股(北京)有限公司 A kind of meeting summary automatic record method, device and electronic equipment
CN107480152A (en) * 2016-06-08 2017-12-15 北京新岸线网络技术有限公司 A kind of audio analysis and search method and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8577684B2 (en) * 2005-07-13 2013-11-05 Intellisist, Inc. Selective security masking within recorded speech utilizing speech recognition techniques
CN107305541B (en) * 2016-04-20 2021-05-04 科大讯飞股份有限公司 Method and device for segmenting speech recognition text
CN107369085A (en) * 2017-06-28 2017-11-21 深圳市佰仟金融服务有限公司 A kind of information output method, device and terminal device
CN107481743A (en) * 2017-08-07 2017-12-15 捷开通讯(深圳)有限公司 The edit methods of mobile terminal, memory and recording file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030084037A1 (en) * 2001-10-31 2003-05-01 Kabushiki Kaisha Toshiba Search server and contents providing system
CN102724598A (en) * 2011-12-05 2012-10-10 新奥特(北京)视频技术有限公司 Method for splitting news items
CN104778218A (en) * 2015-03-20 2015-07-15 广东欧珀移动通信有限公司 Method and device for processing incomplete song
CN107480152A (en) * 2016-06-08 2017-12-15 北京新岸线网络技术有限公司 A kind of audio analysis and search method and system
CN106802885A (en) * 2016-12-06 2017-06-06 乐视控股(北京)有限公司 A kind of meeting summary automatic record method, device and electronic equipment

Also Published As

Publication number Publication date
CN108363765B (en) 2020-12-08
CN108363765A (en) 2018-08-03

Similar Documents

Publication Publication Date Title
WO2021232725A1 (en) Voice interaction-based information verification method and apparatus, and device and computer storage medium
US11314370B2 (en) Method for extracting salient dialog usage from live data
WO2019153406A1 (en) Audio paragraph recognition method and apparatus
WO2019148586A1 (en) Method and device for speaker recognition during multi-person speech
CN107481720B (en) Explicit voiceprint recognition method and device
CN110069608B (en) Voice interaction method, device, equipment and computer storage medium
US20180052824A1 (en) Task identification and completion based on natural language query
CN110415679B (en) Voice error correction method, device, equipment and storage medium
JP2021009701A (en) Interface intelligent interaction control method, apparatus, system, and program
CN108012173B (en) Content identification method, device, equipment and computer storage medium
JP2019501466A (en) Method and system for search engine selection and optimization
WO2019148585A1 (en) Conference abstract generating method and apparatus
KR102029276B1 (en) Answering questions using environmental context
US20190180747A1 (en) Voice recognition apparatus and operation method thereof
US10854189B2 (en) Techniques for model training for voice features
CN108305618B (en) Voice acquisition and search method, intelligent pen, search terminal and storage medium
US11062700B1 (en) Query answering with controlled access knowledge graph
WO2019169794A1 (en) Method and device for displaying annotation content of teaching system
CN110070859A (en) A kind of audio recognition method and device
WO2020233381A1 (en) Speech recognition-based service request method and apparatus, and computer device
CN111405374A (en) Video progress node generation method, device, equipment and storage medium
CN111142993A (en) Information acquisition method, terminal and computer storage medium
CN111581228A (en) Search method and device for correcting search condition, storage medium and electronic equipment
CN110647613A (en) Courseware construction method, courseware construction device, courseware construction server and storage medium
US20210141865A1 (en) Machine learning based tenant-specific chatbots for performing actions in a multi-tenant system

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18905326

Country of ref document: EP

Kind code of ref document: A1