WO2019153406A1

WO2019153406A1 - Audio paragraph recognition method and apparatus

Info

Publication number: WO2019153406A1
Application number: PCT/CN2018/078525
Authority: WO
Inventors: 陈滢朱; 刘善果; 刘胜强
Original assignee: 深圳市鹰硕技术有限公司
Priority date: 2018-02-06
Filing date: 2018-03-09
Publication date: 2019-08-15
Also published as: CN108363765B; CN108363765A

Abstract

An audio paragraph recognition method and apparatus, an electronic device, and a storage medium, relating to the technical field of computers. The method comprises: matching recorded audio in a pre-stored keyword information base (S110); if corresponding keyword information is found by matching in the keyword information base, finding whether a paragraph mark exists in a preset audio range of the audio corresponding to the keyword information (S120); and if a paragraph mark is found, analyzing the keyword information and the paragraph mark, and recognizing an audio paragraph according to the analysis result (S130). The audio paragraph of the recorded audio can be effectively recognized according to the keyword information.

Description

Audio paragraph recognition method and device

Technical field

The present disclosure relates to the field of computer technology, and in particular to an audio passage recognition method, apparatus, electronic device, and computer readable storage medium.

Background technique

At present, recording events through electronic device recording has brought great convenience to daily life. For example, audio recording of the teacher's lecture content in the classroom is convenient for the teacher to teach again or the student to review the homework; or, in the meeting, watching live television, etc., using electronic devices to record audio for re-playing or archiving, viewing, etc. of electronic materials.

However, since the audio file cannot visually see the passage of the audio content, when the audio file is long or needs to be acquired and processed for a certain paragraph of the audio, it cannot be quickly located to the specified position in the audio, but needs to be manually debugged. The corresponding audio content can be played or recognized.

Therefore, it is desirable to provide a technical solution that at least solves the above problems.

It should be noted that the information disclosed in the Background section above is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Summary of the invention

It is an object of the present disclosure to provide an audio passage recognition method, apparatus, electronic device, and computer readable storage medium that overcomes at least some of the problems due to limitations and disadvantages of the related art.

According to an aspect of the present disclosure, an audio passage recognition method is provided, including:

Recording audio in a pre-stored keyword repository;

After matching the corresponding keyword information in the keyword information base, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information;

If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.

In an exemplary embodiment of the present disclosure, matching the recorded audio in the pre-stored keyword information base includes:

Converting to the sound wave signal by performing short-time Fourier transform processing on the recorded audio;

Performing an auditory filter bank filter on the sound wave signal, filtering ambient noise of the sound wave signal, and extracting a voice feature;

The speech features are matched in the keyword information base based on a maximum likelihood function.

In an exemplary embodiment of the present disclosure, after matching the corresponding keyword information in the keyword information base, the method further includes:

Determining whether the keyword information is a valid keyword, and if yes, performing a step of searching for a method of indicating whether there is a paragraph mark in a preset audio range of the audio corresponding to the keyword information;

The determining whether the keyword information is a valid keyword includes:

If a plurality of identical keyword information are matched in the recorded audio, a fuzzy matrix equation is established for each keyword information and a time code value of the keyword information;

The optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.

In an exemplary embodiment of the present disclosure, the method further includes:

Data training is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result.

In an exemplary embodiment of the present disclosure, searching for a paragraph mark within a preset audio range of audio corresponding to the keyword information includes:

Querying, within the preset audio range, whether there is a sound wave signal whose duration is greater than a preset duration and the signal strength is less than the preset intensity value, and if so, determining that the found paragraph mark is the duration is greater than a preset duration and the signal strength An acoustic signal that is less than a preset intensity value.

In an exemplary embodiment of the present disclosure, after the audio passage is identified according to the analysis result, the method further includes:

If the keyword information identifying the plurality of audio segments is the same, the correction flag is added to the plurality of audio segments identified by the same keyword information.

After receiving the correction instruction triggered according to the correction identifier, the weight value Q+1 of the keyword information corresponding to the correction identifier;

Data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.

After receiving the correction instruction triggered according to the correction identifier, the identified audio passage corresponding to the correction identifier is cancelled.

In an exemplary embodiment of the present disclosure, the paragraph is marked as preset paragraph field information.

When the identified audio passages are plural, a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage.

The audio clip is completed according to the audio passage.

In an exemplary embodiment of the present disclosure, the paragraph mark includes a pre-segment mark and a end-of-segment mark, and the completing the audio clip according to the audio passage includes:

Determining a starting point of the paragraph according to the keyword information and the pre-segment mark, according to the keyword information and the ending point of the end of the paragraph;

The clip is clipped according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph.

In an aspect of the disclosure, an audio passage recognition apparatus is provided, comprising:

a keyword information matching module, configured to match the recorded audio in a pre-stored keyword information base;

a paragraph mark searching module, configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information after matching the corresponding keyword information in the keyword information base;

The audio passage recognition module is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio paragraph according to the analysis result.

In an aspect of the disclosure, an electronic device is provided, comprising:

Processor;

A memory having computer readable instructions stored thereon, the computer readable instructions being executed by the processor to implement the method of any one of claims 1 to 7.

In an aspect of the present disclosure, a computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor, implements the method of any of the above.

The audio passage recognition method in the exemplary embodiment of the present disclosure matches the recorded audio in a pre-stored keyword information library, and after matching the corresponding keyword information, the audio pre-corresponding to the keyword information The audio range is searched for whether there is a paragraph mark, and after the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio paragraph is identified according to the analysis result. On the one hand, the use of keyword information and paragraph mark combined recognition method, thus improving the accuracy of audio paragraph recognition; on the other hand, by identifying the audio paragraph information, the audio user can quickly rely on the keyword information Positioning and playing the audio greatly improves the audio usage and enhances the user experience.

The above general description and the following detailed description are intended to be illustrative and not restrictive.

DRAWINGS

The above and other features and advantages of the present disclosure will become more apparent from the detailed description.

FIG. 1 illustrates a flow chart of an audio passage recognition method according to an exemplary embodiment of the present disclosure;

FIG. 2 shows a schematic block diagram of an audio passage recognition device according to an exemplary embodiment of the present disclosure;

FIG. 3 schematically illustrates a block diagram of an electronic device in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a computer readable storage medium in accordance with an exemplary embodiment of the present disclosure.

Detailed ways

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in a variety of forms and should not be construed as being limited to the embodiments set forth herein. To those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and the repeated description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are set forth However, one skilled in the art will appreciate that the technical solution of the present disclosure may be practiced without one or more of the specific details, or other methods, components, materials, devices, steps, etc. may be employed. In other instances, well-known structures, methods, devices, implementations, materials, or operations are not shown or described in detail to avoid obscuring aspects of the present disclosure.

The block diagrams shown in the figures are merely functional entities and do not necessarily have to correspond to physically separate entities. That is, these functional entities may be implemented in software, or implemented in one or more software-hardened modules, or in different network and/or processor devices and/or microcontroller devices. Implement these functional entities.

In the present exemplary embodiment, an audio paragraph recognition method is first provided, which can be applied to an electronic device such as a computer; as shown in FIG. 1, the audio passage recognition method may include the following steps:

Step S110. Matching the recorded audio in the pre-stored keyword information base;

Step S120. After matching the corresponding keyword information in the keyword information base, search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information.

Step S130. If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.

According to the audio passage recognition method in the present exemplary embodiment, on the one hand, since the method of combining the recognition using the keyword information and the paragraph mark is used, the accuracy of the audio passage recognition is improved; on the other hand, by recognizing the paragraph information of the audio, The user of the audio can quickly locate and play the audio according to the keyword information, greatly improving the audio use effect and enhancing the user experience.

Hereinafter, the audio passage recognition method in the present exemplary embodiment will be further described.

In step S110, the recorded audio is matched in the pre-stored keyword information base;

In this example embodiment, the recorded audio may be an audio file recorded by the user through the electronic device, including: mp3, wma, and the like. For example, when the user is teaching in the classroom, the teaching audio of a lesson recorded by the mobile phone; when the user attends the conference, the audio of all the speeches of the conference presenter recorded by the recording pen is used; when the user watches the live television broadcast, the home intelligent audio is recorded. The audio of the live show of the gourmet.

The pre-stored keyword information base may be a keyword information base composed of keyword information selected according to the previously learned speech content, meeting content, etc., or may be based on commonly used time series, order words or other customizable keywords. A keyword information base composed of information and the like. For example: keyword information such as: "morning", "90s of the last century", "first chapter", "first", "again", "again", etc., can also be user-defined keywords, such as in the seventh grade history textbook. Chapter information: "The origin of Chinese civilization", "The emergence of the state and the transformation of society".

Since the above audio has different recording environments, different recording devices, and different speakers, the sound intensity and the timbre are different. When the recorded audio is matched in the pre-stored keyword information database, the audio needs to be converted into Unified sound signal.

Further, since the sound wave frequency is relatively stable, the recorded audio can be processed into a sound wave signal by short-time Fourier transform using the characteristics of short time Fourier transform with high temporal resolution.

Further, the acoustic signal can be filtered by the auditory filter group, different auditory filter banks are selected according to different audio attributes, the simulated acoustic wave signal is maximized, the ambient noise of the acoustic signal is filtered, and the speech features are extracted. The auditory filter bank includes, but is not limited to, a resonance filter, a Roex function filter, a Gammatone filter, and a Gammachirp filter.

Matching the voice features in the keyword information base to obtain keyword information matching the voice features in the keyword information base.

Further, the speech features may be matched in the keyword information base based on a maximum likelihood function. For the specified keyword information x, the likelihood function for the speech feature parameter θ:

L(θ|x)=P(X=x|θ)

L(θ|x) is equal to the probability that the speech feature parameter θ is relative to the keyword information X.

In this example embodiment, after matching the corresponding keyword information in the keyword information base, the method further includes: determining whether the keyword information is a valid keyword. If only one keyword information is matched in the recorded audio, determining that the keyword information is valid keyword information; if a plurality of identical keyword information is matched in the recorded audio, each keyword information is A fuzzy matrix equation is established with the time code value of the keyword information; the optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.

That is, each keyword information x _ij (i = 1, 2, ..., m, j = 1, 2, ..., n) and the keyword information y _ij (i = 1, 2, ..., m, j = The time code value of 1, 2, ..., n) establishes the fuzzy matrix equation R:

Fuzzy matrix equation

λ=MAX[R(x,y)];

The optimal solution λ is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information. After the valid keyword information is found, the step of searching for the method of paragraph marking in the preset audio range of the audio corresponding to the valid keyword information is performed.

In step S120, after matching the corresponding keyword information in the keyword information base, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information;

In this example, whether to find a paragraph mark in the preset audio range of the audio corresponding to the keyword information specifically includes: searching, in the preset audio range, whether there is a duration greater than a preset duration and a signal strength The sound wave signal less than the preset intensity value, if present, determines that the found paragraph mark is the sound wave signal whose duration is longer than the preset time length and whose signal intensity is less than the preset intensity value. For example:

The user recorded audio for a class teaching, and there is such content in the audio: "Today we have to learn the chapter of human history, (pause) the first section of the content is...". When the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database, the corresponding keyword "first section" is matched in the keyword information database, and then the audio near the keyword "first section" is pre-prepared. Set the time range (for example, 5s before and after the first section of the keyword) to find whether there is a sound wave signal whose duration is longer than the preset duration and the signal strength is less than the preset intensity value. For example, if the preset duration is 2s and the preset intensity is between 2dB, then by searching, it is found that the audio content is less than the sound wave in the preset time range of 5s in the vicinity of the audio of the first section of the keyword. The average sound intensity (preset intensity value) is 2dB of the sound wave signal, and the duration of the sound wave signal is greater than the preset time length of 2s, that is, greater than the word interval time in the normal sentence, that is, there is a significant pause time, then the pause is determined as The paragraph mark corresponding to the keyword "first section", that is, the audio information of the "first section" is recorded from this paragraph mark.

In this example embodiment, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information further includes: searching for whether there is paragraph field information within the preset audio range. For example:

The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first section, the content of this section is...". When the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database, the corresponding keyword "first section" is matched in the keyword information database, and then the audio near the keyword "first section" is pre-prepared. If the paragraph field information "first" is found within the time range (still assumed to be 5s before and after the keyword "first section"), then the paragraph field information "first" can be judged as the paragraph mark corresponding to the keyword "first section". That is, the audio information of the "first section" is recorded from this paragraph mark.

It should be noted that the above examples are only for further understanding of the exemplary descriptions listed in the present disclosure. Since the human language is complicated, various expressions, language habits, and grammars are different, and various paragraph marks are not listed here. It is also within the scope of the present disclosure to identify paragraph information by other paragraph marks.

In this example embodiment, unsupervised data training learning is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result. For example, different data training methods can be selected according to different audio contents, and the classroom recording of ancient poetry learning and the reciting analysis database of 300 poems of Tang poetry can be used for data training, and more poems can be used as keyword information to update to keyword information. In the library; according to the Korean language classroom recording and the standard Korean program database for data training, you can add more Korean-specific grammar keywords such as the modal verbs commonly used at the end of the sentence.

The keyword information corresponding to the paragraph identifier is updated to the keyword information base.

In the example embodiment, the paragraph mark and the keyword information are in a corresponding relationship. For example:

The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first quarter, first look at the general theory of this section...". When the user matches the above-mentioned classroom teaching audio in the pre-stored keyword information database, multiple paragraph field information "first" is found in the preset time range near the audio of the keyword "first section", but the combined keyword "The first quarter", the analysis shows that only the paragraph field information "first" in front of the keyword "first section" can be used as a paragraph mark, so the first "first" is used to establish the keyword "first section". The actual position of the audio passage.

In this example embodiment, after the audio passage is identified according to the analysis result, if the keyword information of the plurality of audio passages is the same, the correction identifier is added to the plurality of audio passages identified by the same keyword information, for example:

The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first section, the content of the first section is..., the above is the content of the first section. Then learn the second quarter, this section is a continuation of the first section...". There are multiple keywords "first section" in the above audio, but you can't use each keyword "first section" as the starting information of the paragraph mark. This is the first keyword that needs to be the same. "Add a correction flag to remind the user to make corrections. Or, only the keyword "first section" that appears for the first time or the keyword "first section" that can match the paragraph field information is used as a valid keyword, while other key occurrences of multiple occurrences are "first time" "The match is considered unsuccessful."

In this example embodiment, after the correction identifier is added by the plurality of audio segments identified by the same keyword information, after receiving the correction instruction triggered according to the correction identifier, the weight of the keyword information corresponding to the correction identifier is a value Q+1; data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.

The weighting value is increased by correcting the triggering condition, thereby realizing the error correction function of the keyword information, which is an active learning update of the keyword information in the keyword information database, compared with the unsupervised keyword information learning, through the keyword information Active learning can achieve a more accurate growth of the keyword information base.

In this example, after the audio segment is identified according to the analysis result, after receiving the correction instruction triggered according to the correction identifier, the identification of the keyword information in the audio is relocated, and the correction identifier is cancelled. The recognized audio passage uses the corrected keyword information as valid keyword information.

In the present exemplary embodiment, when the identified audio passages are plural, a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage. The audio passages can be classified according to different keyword information, layered to generate paragraph catalogs or paragraph index storage, and the corresponding audio passages can be quickly and efficiently found; or the paragraph directory or paragraph index of the corresponding position can be marked on the playback progress bar of the audio file. Information that allows the user to accurately locate the specified audio passage during audio playback.

In this example embodiment, after the audio segment is recognized according to the analysis result, the audio clip is completed according to the audio segment. The clip audio and the keyword information can be stored correspondingly, so that the entire audio file can be quickly indexed, and the user can separately specify the audio of the “first chapter” and the “origin of the Chinese civilization”, etc. The efficient use of audio segments facilitates archival lookup.

In this example embodiment, the paragraph mark includes a pre-segment mark and a end-of-segment mark, and the completing the audio clip according to the audio passage includes: determining a start point of the paragraph according to the keyword information and the pre-segment mark, according to the keyword information and The end of the paragraph end; the editing is performed according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph. For example:

The user recorded audio for a class teaching, and there is such content in the audio: "Today we want to learn the chapter of human history, first learn the first section, the content of this section is..., the above is the content of the first section. Then Learn the second quarter...". In the above audio, the paragraph field information "first" is the pre-segment mark of the keyword "first section", and the paragraph field information "then" is both the pre-segment mark of the keyword "second section" and the keyword "first section". The end-of-segment mark can be combined with the pre-segment mark and the end-of-segment mark to determine the audio passage of the keyword "first section" to complete the audio clip.

Meanwhile, in the above example, the paragraph field information "above is" may also be the end-of-segment mark of the keyword "first section", and may also be used as the end-of-segment mark information of the keyword "first section" to determine the keyword. The audio section of the "first section" completes the audio clip.

It should be noted that, although the various steps of the method of the present disclosure are described in a particular order in the drawings, this does not require or imply that the steps must be performed in the specific order, or that all the steps shown must be performed. Achieve the desired results. Additionally or alternatively, certain steps may be omitted, multiple steps being combined into one step execution, and/or one step being decomposed into multiple step executions and the like.

Further, in the present exemplary embodiment, an audio passage identifying means is also provided. Referring to FIG. 2, the audio passage recognition apparatus 200 may include a keyword information matching module 210, a paragraph mark search module 220, and an audio passage recognition module 230. among them:

The keyword information matching module 210 is configured to match the recorded audio in the pre-stored keyword information base;

The paragraph mark searching module 220 is configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information after matching the corresponding keyword information in the keyword information base;

The audio passage recognition module 230 is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio passage according to the analysis result.

The specific details of each of the audio passage recognition device modules have been described in detail in the corresponding audio passage recognition method, and thus will not be described herein.

It should be noted that although several modules or units of the audio passage recognition apparatus 200 are mentioned in the above detailed description, such division is not mandatory. Indeed, in accordance with embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one of the modules or units described above may be further divided into multiple modules or units.

Further, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that various aspects of the present invention can be implemented as a system, method, or program product. Accordingly, aspects of the present invention may be embodied in the form of a complete hardware embodiment, a complete software embodiment (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein. "Circuit," "module," or "system."

An electronic device 300 in accordance with such an embodiment of the present invention is described below with reference to FIG. The electronic device 300 shown in FIG. 3 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present invention.

As shown in FIG. 3, electronic device 300 is embodied in the form of a general purpose computing device. The components of the electronic device 300 may include, but are not limited to, the at least one processing unit 310, the at least one storage unit 320, the bus 330 connecting different system components (including the storage unit 320 and the processing unit 310), and the display unit 340.

Wherein, the storage unit stores program code, which can be executed by the processing unit 310, such that the processing unit 310 performs various exemplary embodiments according to the present invention described in the "Exemplary Method" section of the present specification. The steps of the examples. For example, the processing unit 310 can perform steps S110 to S130 as shown in FIG. 1.

The storage unit 320 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 3201 and/or a cache storage unit 3202, and may further include a read only storage unit (ROM) 3203.

The storage unit 320 may also include a program/utility 3204 having a set (at least one) of the program modules 3205, such program modules 3205 including but not limited to: an operating system, one or more applications, other program modules, and program data, Implementations of the network environment may be included in each or some of these examples.

Bus 330 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.

The electronic device 300 can also communicate with one or more external devices 370 (eg, a keyboard, pointing device, Bluetooth device, etc.), and can also communicate with one or more devices that enable the user to interact with the electronic device 300, and/or with Any device (e.g., router, modem, etc.) that enables the electronic device 300 to communicate with one or more other computing devices. This communication can take place via an input/output (I/O) interface 350. Also, electronic device 300 can communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through network adapter 360. As shown, network adapter 360 communicates with other modules of electronic device 300 via bus 330. It should be understood that although not shown in the figures, other hardware and/or software modules may be utilized in conjunction with electronic device 300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives. And data backup storage systems, etc.

Through the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software in combination with necessary hardware. Therefore, the technical solution according to an embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network. A number of instructions are included to cause a computing device (which may be a personal computer, server, terminal device, or network device, etc.) to perform a method in accordance with an embodiment of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer readable storage medium having stored thereon a program product capable of implementing the above method of the present specification. In some possible embodiments, aspects of the present invention may also be embodied in the form of a program product comprising program code for causing said program product to run on a terminal device The terminal device performs the steps according to various exemplary embodiments of the present invention described in the "Exemplary Method" section of the present specification.

Referring to FIG. 4, a program product 400 for implementing the above method, which may employ a portable compact disk read only memory (CD-ROM) and includes program code, and may be in a terminal device, is illustrated in accordance with an embodiment of the present invention. For example running on a personal computer. However, the program product of the present invention is not limited thereto, and in the present document, the readable storage medium may be any tangible medium containing or storing a program that can be used by or in connection with an instruction execution system, apparatus or device.

The program product can employ any combination of one or more readable media. The readable medium can be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium can be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.

Program code for performing the operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++, etc., including conventional procedural Programming language—such as the "C" language or a similar programming language. The program code can execute entirely on the user computing device, partially on the user device, as a stand-alone software package, partially on the remote computing device on the user computing device, or entirely on the remote computing device or server. Execute on. In the case of a remote computing device, the remote computing device can be connected to the user computing device via any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computing device (eg, provided using an Internet service) Businesses are connected via the Internet).

Further, the above-described drawings are merely illustrative of the processes included in the method according to the exemplary embodiments of the present invention, and are not intended to be limiting. It is easy to understand that the processing shown in the above figures does not indicate or limit the chronological order of these processes. In addition, it is also easy to understand that these processes may be performed synchronously or asynchronously, for example, in a plurality of modules.

Other embodiments of the present disclosure will be apparent to those skilled in the <RTIgt; The present application is intended to cover any variations, uses, or adaptations of the present disclosure, which are in accordance with the general principles of the disclosure and include common general knowledge or common technical means in the art that are not disclosed in the present disclosure. . The specification and examples are to be regarded as illustrative only,

It is to be understood that the invention is not limited to the details of the details and The scope of the disclosure is to be limited only by the appended claims.

Industrial applicability

On the one hand, the use of keyword information and paragraph mark combined recognition method, thus improving the accuracy of audio paragraph recognition; on the other hand, by identifying the audio paragraph information, the audio user can quickly rely on the keyword information Positioning and playing the audio greatly improves the audio usage and enhances the user experience.

Claims

An audio paragraph recognition method, characterized in that the method comprises:

Recording audio in a pre-stored keyword repository;

After matching the corresponding keyword information in the keyword information base, searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information;

If the paragraph mark is found, the keyword information and the paragraph mark are analyzed, and the audio passage is recognized according to the analysis result.
The method of claim 1 wherein matching the recorded audio in the pre-stored keyword information base comprises:

Converting to the sound wave signal by performing short-time Fourier transform processing on the recorded audio;

Performing an auditory filter bank filter on the sound wave signal, filtering ambient noise of the sound wave signal, and extracting a voice feature;

The speech features are matched in the keyword information base based on a maximum likelihood function.
The method of claim 1, wherein after the matching the keyword information in the keyword information base, the method further comprises:

Determining whether the keyword information is a valid keyword, and if yes, performing a step of searching for a method of indicating whether there is a paragraph mark in a preset audio range of the audio corresponding to the keyword information;

The determining whether the keyword information is a valid keyword includes:

If a plurality of identical keyword information are matched in the recorded audio, a fuzzy matrix equation is established for each keyword information and a time code value of the keyword information;

The optimal solution is obtained by calculating the fuzzy matrix equation, and the keyword information corresponding to the optimal solution is determined to be valid keyword information.
The method of claim 3, wherein the method further comprises:

Data training is performed according to the valid keyword information and the paragraph identifier, and the keyword information base is updated according to the training result.
The method according to claim 2, wherein searching for a paragraph mark in a preset audio range of the audio corresponding to the keyword information comprises:

Querying, within the preset audio range, whether there is a sound wave signal whose duration is greater than a preset duration and the signal strength is less than the preset intensity value, and if so, determining that the found paragraph mark is the duration is greater than a preset duration and the signal strength An acoustic signal that is less than a preset intensity value.
The method of claim 1, wherein after the audio passage is identified based on the analysis result, the method further comprises:

If the keyword information identifying the plurality of audio segments is the same, the correction flag is added to the plurality of audio segments identified by the same keyword information.
The method of claim 6 wherein the method further comprises:

After receiving the correction instruction triggered according to the correction identifier, the weight value Q+1 of the keyword information corresponding to the correction identifier;

Data training is performed according to each keyword information and a corresponding weight value in combination with the paragraph mark, and the keyword information base is updated according to the training result.
The method of claim 6, wherein after the audio passage is identified based on the analysis result, the method further comprises:

After receiving the correction instruction triggered according to the correction identifier, the identified audio passage corresponding to the correction identifier is cancelled.
The method of claim 1 wherein said paragraph is marked as preset paragraph field information.
The method of claim 1 wherein the method further comprises:

When the identified audio passages are plural, a paragraph directory or a paragraph index corresponding to each audio passage is generated based on the keyword information corresponding to the audio passage.
The method of claim 1, wherein after the audio passage is identified based on the analysis result, the method further comprises:

The audio clip is completed according to the audio passage.
The method of claim 11 wherein said paragraph mark comprises a pre-segment mark and a end-of-segment mark, said completing an audio clip based on said audio passage, comprising:

Determining a starting point of the paragraph according to the keyword information and the pre-segment mark, according to the keyword information and the ending point of the end of the paragraph;

The clip is clipped according to the end point of the end of the paragraph and the beginning of the previous paragraph of the end point of the end of the paragraph.
An audio passage recognition device, characterized in that the device comprises:

a keyword information matching module, configured to match the recorded audio in a pre-stored keyword information base;

a paragraph mark searching module, configured to search for a paragraph mark in a preset audio range of the audio corresponding to the keyword information when the keyword information is matched to the corresponding keyword information;

The audio passage recognition module is configured to analyze the keyword information and the paragraph mark after finding the paragraph mark, and identify the audio paragraph according to the analysis result.
An electronic device characterized by comprising

Processor;

A memory having computer readable instructions stored thereon, the computer readable instructions being executed by the processor to implement the method of any one of claims 1 to 12.
A computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor to implement the method of any one of claims 1 to 12.