CN110895575A - Audio processing method and device - Google Patents

Audio processing method and device Download PDF

Info

Publication number
CN110895575A
CN110895575A CN201810974926.9A CN201810974926A CN110895575A CN 110895575 A CN110895575 A CN 110895575A CN 201810974926 A CN201810974926 A CN 201810974926A CN 110895575 A CN110895575 A CN 110895575A
Authority
CN
China
Prior art keywords
audio
information
text
clip
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810974926.9A
Other languages
Chinese (zh)
Other versions
CN110895575B (en
Inventor
高欣羽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810974926.9A priority Critical patent/CN110895575B/en
Publication of CN110895575A publication Critical patent/CN110895575A/en
Application granted granted Critical
Publication of CN110895575B publication Critical patent/CN110895575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an audio processing method and device, comprising the following steps: converting first audio information to be processed into text information; searching the converted text information by using the search information to obtain a text segment containing the search information; and processing the audio clip corresponding to the text clip containing the search information to obtain second audio information. According to the method and the device, the audio content is visually presented, and tasks such as searching and positioning, editing and audio splicing are carried out on the audio content according to the searching information, so that the audio information can be edited conveniently, automatically and efficiently as text editing is carried out, and the workload of the whole audio information processing is greatly reduced.

Description

Audio processing method and device
Technical Field
The present application relates to, but not limited to, speech recognition technology, and more particularly, to an audio processing method and apparatus.
Background
When a user edits and splices multiple sections of audio, the user usually needs to listen to each section of audio first, then manually mark a content segment to be spliced, and finally splice the audio of the marked content segment. The audio processing mode in the related art has obviously large workload; furthermore, when the start time point of the content segment is manually recorded, the accuracy is low, such as: if the exact stopping time point required is 00:01:53 and the user records as 00:02:01, this will result in extraneous noise being left in the audio piece to be spliced; moreover, the user finally obtains a plurality of audio files, the user records the time points of the content segments, the audio segments and the like, and the content of the audio files before and after splicing is difficult to be intuitively seen, so that the difficulty degree of auditing, rechecking, modifying and re-editing is greatly increased, and a large amount of word descriptions are required to be attached to the audio files when the audio files are retained and handed over.
In summary, the audio processing method in the related art is time-consuming, low in efficiency, large in workload, high in error rate of time point marking, and low in intelligent automation degree in the whole process.
Disclosure of Invention
The application provides an audio processing method and an audio processing device, which can accurately and efficiently process audio information.
The application provides an audio processing method, which comprises the following steps:
converting the audio information to be processed into text information;
searching the converted text information by utilizing the search information to obtain a text segment containing the search information;
and processing the audio clip corresponding to the text clip containing the search information to obtain second audio information.
Optionally, the searching the converted text information by using the search information, and acquiring the text fragment containing the search information includes:
searching in the text information according to the search information to obtain at least one text segment containing the search information;
and respectively determining the start-stop time point information of the audio segment corresponding to each text segment according to the searched start-stop position of at least one text segment.
Start-stop time point information of a text segment containing search information is identified.
Optionally, the processing according to the audio segment corresponding to the text segment containing the search information includes:
splicing the obtained text segments containing the search information into a text message;
cutting each audio clip from the first audio clip according to the start-stop time point of the audio clip corresponding to each text clip in the spliced text information;
and splicing the cut audio segments to obtain the second audio information.
Optionally, the method further comprises:
identifying an audio source of an audio segment corresponding to the text segment;
and adding audio source information to the audio clips.
Optionally, identifying an audio source of an audio segment corresponding to the text segment, and adding audio source information to the audio segment, including:
judging a speaker corresponding to the audio clip through voiceprint recognition;
and adding information of a speaker corresponding to the voiceprint in the text fragment.
Optionally, the processing the audio clip corresponding to the text clip containing the search information includes:
converting the text information added with the speaker information into a corresponding audio clip by a speech synthesis technology; and splicing the converted audio segments into the second audio information.
Optionally, the method further comprises:
generating text information containing the additional information, and converting the text information containing the additional information into a system audio clip by using a speech synthesis technology;
the processing of the audio clip corresponding to the obtained text clip containing the search information includes: and splicing the obtained system audio clip and the audio clip corresponding to the text clip containing the search information to form the second audio information.
Optionally, after the splicing into one text message, the method further includes:
and editing the spliced text information according to the operation information from the user.
Optionally, the editing comprises: adding or deleting text and adding annotation and comment information.
The present application further provides an audio processing apparatus, comprising a memory and a processor, wherein the memory stores the following instructions executable by the processor: for performing the steps of the audio processing method of any of the above.
The present application further provides a computer-readable storage medium storing computer-executable instructions for performing any of the audio processing methods described above.
The present application further provides an audio processing apparatus comprising: a conversion unit, a search unit, and a processing unit; wherein the content of the first and second substances,
the conversion unit is used for converting the audio information of the driver instrument to be processed into text information;
the search unit is used for searching the converted text information by utilizing the search information to obtain a text segment containing the search information;
and the processing unit is used for processing the audio clip corresponding to the text clip containing the search information to obtain second audio information.
Optionally, the search unit is specifically configured to:
searching in the text information according to the search information to obtain at least one text segment containing the search information;
and respectively determining the start-stop time point information of the audio segment corresponding to each text segment according to the searched start-stop position of at least one text segment.
Optionally, the processing unit is specifically configured to: splicing the text segments containing the search information into a text message;
cutting each audio clip from the first audio clip according to the start-stop time point of the audio clip corresponding to each text clip in the spliced text information;
and splicing the cut audio segments to obtain the second audio information.
Optionally, the processing unit is further configured to: and editing the spliced text information according to the operation information from the user.
Optionally, the apparatus further comprises: the adding unit is used for generating text information containing the additional information and converting the text information containing the additional information into a system audio clip;
the processing unit is specifically configured to: and splicing the obtained system audio clip and the obtained audio clip corresponding to the text clip containing the search information to form the second audio information.
The technical scheme at least comprises the following steps: converting the audio information to be processed into text information; searching the converted text information by using preset search information to obtain a text segment containing the search information; and processing the audio clip corresponding to the obtained text clip containing the search information to form processed audio information. According to the method and the device, the audio content is visually presented, and tasks such as searching, positioning, editing and audio splicing are carried out on the audio content according to the search information such as the keywords, so that the audio information can be edited conveniently, automatically and efficiently as text editing is realized, and the workload of the whole audio information processing is greatly reduced.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
The accompanying drawings are included to provide a further understanding of the claimed subject matter and are incorporated in and constitute a part of this specification, illustrate embodiments of the subject matter and together with the description serve to explain the principles of the subject matter and not to limit the subject matter.
FIG. 1 is a flow chart of an audio processing method in an embodiment of the present application;
fig. 2 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more apparent, embodiments of the present application will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
In one exemplary configuration of the present application, a computing device includes one or more processors (CPUs), input/output interfaces, a network interface, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transient media), such as modulated data signals and carrier waves.
The steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions. Also, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Fig. 1 is a flowchart of an audio processing method in an embodiment of the present application, as shown in fig. 1, including:
step 100: and converting the first audio information to be processed into text information.
The first audio information to be processed may include one or more audio files.
Optionally, in this step, a Speech conversion technology, such as a Speech-to-Text (STT) technology, may be used to convert each audio file into a Text file.
Step 101: and searching the converted text information by utilizing the search information to obtain a text segment containing the search information.
Alternatively, the search information may be a keyword as set in advance.
Optionally, this step may include: searching in the text information according to the search information to obtain at least one text segment containing the search information; and respectively determining the start-stop time point information of the audio segment corresponding to each text segment according to the searched start-stop position of at least one text segment.
In an exemplary embodiment, taking search information as a keyword as an example, searching in the converted text information by using the keyword to obtain a text segment containing the search information; and identifying the start-stop time point information of the audio segment corresponding to the text segment containing the search information. Specifically, the method comprises the following steps: after the text segment containing the search information is obtained by searching the text information by using the keyword, the text segment containing the search information is identified, that is, the text segment with the identification is the text segment containing the search information. At this time, for example, the marked text segment a includes the starting and ending time points of the text segment a, such as 00:04:32-00:25:01, and the text segment a is marked as: audio A00:04:32-00:25: 01.
It should be noted that, according to the voice intelligent conversion technology provided in the related art, preliminary sentence-breaking processing can be performed. Then, the text segment containing the search information in the present application may be defined as: a full sentence containing the search information. In the step, the text segments containing the keywords are searched, so that the context containing the keywords can be quickly acquired, and the positioning effect on the related text information is realized.
Optionally, if the user considers that the sentence break of the text segment containing the search information is not accurate or complete, which characters are specifically selected as the text segment to be spliced subsequently according to the marked position. The specific implementation may be implemented in a form of providing a human-computer interaction interface for a user, and the specific implementation form is not used to limit the scope of the present application.
It should be noted that there may be several correspondences in information for a segment of speech, such as a sound chart, a time axis, and a text. Where the sonogram and time axis are similar to the audio track, carrying time information. The principle of the speech conversion technology is to convert the sound segment into text, and when the unit of the sound segment is small enough, the corresponding time point can be obtained on the time axis, such as by backward-pushing: the transcription obtains the word 'cloud', -the sound clip transcribed into the word 'cloud', -the voice file has a corresponding time axis, so that the time point corresponding to the sound clip where the 'cloud' is located can be obtained as 00:05:30, then the 'cloud' in the text can obtain the corresponding time point: 00:05:30, that is, the converted text information has corresponding time information.
Step 102: and processing the audio clip corresponding to the text clip containing the search information to obtain second audio information.
Optionally, this step includes: splicing the obtained text segments containing the search information into complete text information; cutting each audio clip from the first audio clip to be processed according to the start-stop time point of the audio clip corresponding to each text clip in the spliced complete text information; and splicing the cut audio segments to obtain second audio information.
It should be noted that, after the text segment containing the search information is spliced by the present step, the present application further includes: and editing the spliced text information according to the operation information from the user. Wherein the editing includes, but is not limited to: add or delete text, add annotation commentary, etc. For example, the spliced text is edited by the user on his/her own, such as: if the text is explained by Mr. Wang, the text can be added, such as 'Mr. Wang' and so on. The editing is difficult to realize for a simple audio file, and by means of the text splicing after the text splicing, the purpose of editing the audio through the convenience of text editing is realized, and the blind editing of the audio information is avoided.
According to the audio processing method, through the tasks of visually presenting the audio content, searching and positioning the audio content according to the keywords, editing, splicing the audio and the like, the audio information can be conveniently, automatically and efficiently edited like text editing such as copying, pasting, cutting and pasting, and the like, and the workload of the whole audio information processing is greatly reduced.
Optionally, before step 102, after step 101, the audio processing method of the present application further includes:
and identifying the audio source of the audio clip corresponding to the text clip containing the search information, and adding audio source information to the audio clip. Therefore, the audio source information is added to the audio texts with different audio sources.
Wherein, the audio source refers to information of a speaker, and identifying the audio source of the audio segment is to determine whether the speakers of different audio segments are the same person.
Optionally, identifying an audio source of an audio segment corresponding to the text segment, and adding audio source information to the audio segment, including:
judging a speaker corresponding to the audio clip through voiceprint recognition; and adding information of a speaker corresponding to the voiceprint in the text fragment.
In an exemplary embodiment, a voiceprint recognition technique may be used to identify whether speakers of different audio segments corresponding to text segments containing search information are the same person, i.e., whether the audio sources are the same; if the audio sources are not the same speaker, namely different audio sources exist, the user can be prompted whether the information of the speaker needs to be added or not;
if an instruction to select to add speaker information is received from a user, the speaker information is directly added to text information containing search information.
Accordingly, the number of the first and second electrodes,
step 102 comprises: converting the text information added with the speaker information into an audio clip added with the speaker information by utilizing a speech synthesis technology; and splicing the converted audio segments added with the information of the speakers into a second audio file.
Optionally, the method further includes:
generating Text information containing the additional information, and converting the Text information containing the additional information into a system audio fragment, for example, using a Speech synthesis technology such as a Text-to-Speech (TTS) technology; correspondingly, step 102 specifically includes: and splicing the obtained system audio clip and the audio clip corresponding to the text clip containing the search information obtained by the application to form second audio information.
The audio processing method of the present application is described in detail below with reference to specific embodiments.
Suppose a user is a product manager and visits a customer and collects four pieces of audio files such as audio a, audio B, audio C, and audio D from four customers, respectively. In this embodiment, it is assumed that the raw audio material collected by the customer manager includes: an audio file a for speaker a, an audio file B for speaker B, an audio file C for speaker C, and an audio file D for speaker D. Now, the views of the clients on the "arrhizus" need to be sorted out according to the collected audio files, that is, all the segments which refer to the "arrhizus" need to be searched out from the four collected audio files and spliced into one audio file for the voice of the "arrhizus". The audio processing method provided by the application specifically comprises the following steps:
first, the audio file a, the audio file B, the audio file C, and the audio file D are all converted into text information using a voice conversion technique. Take the example of converting audio file a into text file a:
"we are a company that focuses on human genome data analysis and genetic information application development. With the maturity of gene sequencing technology and the rapid decrease of cost, gene detection gradually goes into common families from scientific research. The united states introduced an 'accurate medical' program, the united kingdom also initiated a genome program for one hundred thousand people, and it is believed that china would also initiate a corresponding program immediately. When everyone performs a whole genome sequencing, a data volume of 90Gbp is generated. If tens of thousands of people, millions of people and even tens of millions of people carry out whole genome sequencing, the generated mass data can be solved by setting up several servers by self, and large-scale calculation and mass storage of cloud computing are required to be relied on. The Aliyun is a relatively mature cloud product provider in China at present, covers various aspects such as calculation, storage, safety and the like, saves the cost of manpower and material resources for building a machine room by self, and has good elasticity. Our massive genomic data analysis relies on multiple products such as ECS, OSS, OTS, BatchCompute, etc. Fastq data generated by the Hiseq Xten sequencer is directly transmitted to the OSS through a high-speed network, so that the problems of data storage and backup are solved. In data analysis, the ECS and the batchcomputer directly read genome data of the OSS from the intranet, analyze a plurality of genome data concurrently, and quickly return a gene interpretation result.
The Ali cloud provides various products, so that people do not need to spend too much manpower and material resources on deployment and maintenance of a server, can concentrate strength on the products to the greatest extent, provides the best genome data interpretation and analysis service for users, enables the people and the Ali cloud to cooperate with each other in a hand, and expects the early arrival of personalized medical treatment. "
And then, the text information obtained by conversion is subjected to keyword search by using the keyword 'Aliyun', and related content is quickly positioned. Taking the text file a as an example, first, preliminary sentence-breaking processing can be performed according to the voice intelligent conversion technology provided in the related art. Then, the quickly located text segment containing search information in this embodiment may be defined as: containing the search informationComplete sentence, i.e.) "Aliyun is a relatively mature cloud product provider in China at present", and"Aliyun is Provide a wide variety of products". And then, the user decides which context is to be selected as the text segment to be spliced according to the two positions which are quickly positioned. For example, the user selects the text information underlined in the text file a as the searched related content: "… … must rely on the large-scale computing and mass storage of cloud computing.The Aliyun is compared with the Aliyun at present in China Mature cloud product providers cover various aspects of calculation, storage, safety and the like, save the cost of manpower and material resources for self-building machine rooms, and has excellent elasticity. Our massive genomic data analysis relies on multiple ECS, OSS, OTS, BatchCompute, etc And (5) producing the product. Fastq data generated by a Hiseq X ten sequencer are directly transmitted to an OSS through a high-speed network, so that the problem of data is solved Storage and backup. In the data analysis, ECS and BatchCompute directly read the genome data of OSS from the intranet and concurrently perform And analyzing a plurality of genome data and rapidly returning a gene reading result.
The Aliyun provides various products, so that people do not need to spend much manpower and material resources on a server In the aspects of deployment and maintenance, the system can concentrate the strength on products to the greatest extent, and provides the best genome data interpretation and classification for users Analysis service, which allows us and Aliyun to cooperate with each other, is expected to bring about the early arrival of personalized medical treatment.
Thus, the user can quickly locate the desired segment, and the starting and ending time points of the located text segment are assumed as follows: 00:04:32-00:25:01, then this piece of text information a will be automatically marked as: audio files A00:04:32-00:25: 01.
In this embodiment, it is assumed that the text information B, the text information C, and the text information D obtained by conversion are respectively located and marked after being searched according to the above method, and that: audio file B00:05:45-00:35:06, audio file C00: 01:22-00:15:03, and audio file D00: 34:01-00:46: 45.
And then, cutting the marked audio fragments from four audio files collected by a product manager, and splicing into processed audio files: "Audio A00:04:32-00:25:01 + Audio B00:05:45-00:35:06+ Audio C00: 01:22-00:15:03+ Audio D00: 34:01-00:46: 45".
Furthermore, a voiceprint recognition technology can be used for recognizing that the four sections of audio files to be spliced come from four different speakers, and at the moment, a user can be further prompted whether the information of the speakers needs to be added or not; if the user chooses to add the speaker's information, the addition can be made directly in the text message, such as: "The user A says:audio A00:04:32-00:25: 01.User B speaks: text information of the audio files B00:05:45-00:35: 06.User C To say that:text information of audio files C00: 01:22-00:15: 03.The user D says:text information of audio files D00: 34:01-00:46:45 "; then, converting the text information added with the additional information into an audio clip with the additional information by utilizing a speech synthesis technology; and finally, splicing the converted audio segments with the additional information into a processed audio file.
Optionally, the additional information may also be a description of the spliced audio file, for example, for the four audio files identified and cut out in the above embodiment, a title or other description may be added, such as:"the content of this audio is Evaluation of the Alicloun by four customers. I visited these four users in beijing city 12 months 3 days ago 2016 ", etc.At this time, this additional information may be used alone as a text information for representing the additional information. In the subsequent splicing treatment, the text information representing the additional information is converted into an audio clip of the additional information only by utilizing a voice synthesis technology; finally, splicing the converted audio clip of the additional information with the marked audio clip cut from the four sections of audio files collected by a product manager to obtain the processed audio file: "audio clip of additional information + audio a00:04:32-00:25:01 + audio B00:05:45-00:35:06+ audio C00: 01:22-00:15:03+ audio D00: 34:01-00:46: 45" are spliced into a processed audio file.
The audio processing method provided by the embodiment of the application visually presents the audio content, so that people can read the sound, the speed and the information transparency of cognitive processing are improved, and the friendliness to listeners or subsequent audio processing personnel is greatly improved.
An embodiment of the present application further provides an audio processing apparatus, including a memory and a processor, where the memory stores the following instructions executable by the processor: the executable instructions are for performing the steps of the audio processing method described in one or more of the embodiments above.
The embodiment of the present application further provides a computer-readable storage medium, which stores computer-executable instructions for executing the audio processing method described in one or more embodiments above.
Fig. 2 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present application, which at least includes: a conversion unit, a search unit, and a processing unit; wherein the content of the first and second substances,
the conversion unit is used for converting the first audio information to be processed into text information;
the search unit is used for searching the converted text information by utilizing the search information to obtain a text segment containing the search information;
and the processing unit is used for processing the audio clip corresponding to the obtained text clip containing the search information to obtain second audio information.
Optionally, the search unit is specifically configured to:
searching in the text information according to the search information to obtain at least one text segment containing the search information;
and respectively determining the start-stop time point information of the audio segment corresponding to each text segment according to the searched start-stop position of at least one text segment.
Optionally, the processing unit is specifically configured to:
splicing the obtained text segments containing the search information into a text message;
cutting each audio clip from the first audio clip according to the start-stop time point of the audio clip corresponding to each text clip in the spliced text information;
and splicing the cut audio segments to obtain second audio information.
Optionally, the processing unit is further configured to: and editing the spliced text information according to the operation information from the user. Wherein, editing includes but is not limited to: add or delete text, add annotation commentary, etc.
Optionally, the processing unit is further configured to:
and identifying the audio source of the audio clip corresponding to the text clip containing the search information, and adding audio source information for the audio clips with different audio sources.
Optionally, the audio processing apparatus of the present application further includes: the adding unit is used for generating text information containing the additional information and converting the text information containing the additional information into a system audio clip;
correspondingly, the processing unit is specifically configured to: and splicing the obtained system audio clip and the obtained audio clip corresponding to the text clip containing the search information to form the second audio information.
Alternatively, the search information may include keywords.
Although the embodiments disclosed in the present application are described above, the descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims (16)

1. An audio processing method, comprising:
converting first audio information to be processed into text information;
searching the converted text information by utilizing the search information to obtain a text segment containing the search information;
and processing the audio clip corresponding to the text clip containing the search information to obtain second audio information.
2. The audio processing method of claim 1, wherein the searching the converted text information by using the search information to obtain the text segment containing the search information comprises:
searching in the text information according to the search information to obtain at least one text segment containing the search information;
respectively determining the start-stop time point information of the audio clip corresponding to each text clip according to the searched start-stop position of at least one text clip;
start-stop time point information of a text segment containing search information is identified.
3. The audio processing method according to claim 1, wherein the processing according to the audio segment corresponding to the text segment containing the search information comprises:
splicing the obtained text segments containing the search information into a text message;
cutting each audio clip from the first audio clip according to the start-stop time point of the audio clip corresponding to each text clip in the spliced text information;
and splicing the cut audio segments to obtain the second audio information.
4. The audio processing method according to claim 1 or 3, characterized in that the method further comprises:
identifying an audio source of an audio segment corresponding to the text segment;
and adding audio source information to the audio clips.
5. The audio processing method according to claim 4, wherein identifying an audio source of an audio segment corresponding to the text segment and adding audio source information to the audio segment includes:
judging a speaker corresponding to the audio clip through voiceprint recognition;
and adding information of a speaker corresponding to the voiceprint in the text fragment.
6. The audio processing method according to claim 5, wherein the processing the audio segment corresponding to the text segment containing the search information comprises:
converting the text information added with the information of the speaker into a corresponding audio clip through voice synthesis;
and splicing the converted audio segments to obtain the second audio information.
7. The audio processing method according to claim 1 or 3, characterized in that the method further comprises:
generating text information containing the additional information, and converting the text information containing the additional information into a system audio clip through voice synthesis;
the processing of the audio clip corresponding to the obtained text clip containing the search information includes: and splicing the system audio clip and the audio clip corresponding to the text clip containing the search information to form the second audio information.
8. The audio processing method of claim 3, wherein after said concatenating into a text message, the method further comprises:
and editing the spliced text information according to the operation information from the user.
9. The audio processing method of claim 8, wherein the editing comprises: adding or deleting text and adding annotation and comment information.
10. An audio processing apparatus comprising a memory and a processor, wherein the memory has stored therein the following instructions executable by the processor: steps for performing the audio processing method of any one of claims 1 to 9.
11. A computer-readable storage medium storing computer-executable instructions for performing the audio processing method of any one of claims 1 to 9.
12. An audio processing apparatus, comprising: a conversion unit, a search unit, and a processing unit; wherein the content of the first and second substances,
the conversion unit is used for converting the audio information of the driver instrument to be processed into text information;
the search unit is used for searching the converted text information by utilizing the search information to obtain a text segment containing the search information;
and the processing unit is used for processing the audio clip corresponding to the text clip containing the search information to obtain second audio information.
13. The audio processing apparatus according to claim 12, wherein the search unit is specifically configured to:
searching in the text information according to the search information to obtain at least one text segment containing the search information;
and respectively determining the start-stop time point information of the audio segment corresponding to each text segment according to the searched start-stop position of at least one text segment.
14. The audio processing device according to claim 12, wherein the processing unit is specifically configured to: splicing the text segments containing the search information into a text message;
cutting each audio clip from the first audio clip according to the start-stop time point of the audio clip corresponding to each text clip in the spliced text information;
and splicing the cut audio segments to obtain the second audio information.
15. The audio processing device of claim 14, wherein the processing unit is further configured to: and editing the spliced text information according to the operation information from the user.
16. The audio processing apparatus according to claim 12 or 14, characterized in that the apparatus further comprises: the adding unit is used for generating text information containing the additional information and converting the text information containing the additional information into a system audio clip;
the processing unit is specifically configured to: and splicing the obtained system audio clip and the obtained audio clip corresponding to the text clip containing the search information to form the second audio information.
CN201810974926.9A 2018-08-24 2018-08-24 Audio processing method and device Active CN110895575B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810974926.9A CN110895575B (en) 2018-08-24 2018-08-24 Audio processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810974926.9A CN110895575B (en) 2018-08-24 2018-08-24 Audio processing method and device

Publications (2)

Publication Number Publication Date
CN110895575A true CN110895575A (en) 2020-03-20
CN110895575B CN110895575B (en) 2023-06-23

Family

ID=69784964

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810974926.9A Active CN110895575B (en) 2018-08-24 2018-08-24 Audio processing method and device

Country Status (1)

Country Link
CN (1) CN110895575B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204668A (en) * 2021-05-21 2021-08-03 广州博冠信息科技有限公司 Audio clipping method and device, storage medium and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130041892A1 (en) * 2006-10-13 2013-02-14 Syscom Inc. Method and system for converting audio text files originating from audio files to searchable text and for processing the searchable text
US20130158992A1 (en) * 2011-12-17 2013-06-20 Hon Hai Precision Industry Co., Ltd. Speech processing system and method
DE102014203818A1 (en) * 2014-03-03 2015-09-03 Sennheiser Electronic Gmbh & Co. Kg Method and device for converting speech signals into text
CN106095799A (en) * 2016-05-30 2016-11-09 广州多益网络股份有限公司 The storage of a kind of voice, search method and device
CN106782545A (en) * 2016-12-16 2017-05-31 广州视源电子科技股份有限公司 A kind of system and method that audio, video data is changed into writing record
CN107644646A (en) * 2017-09-27 2018-01-30 北京搜狗科技发展有限公司 Method of speech processing, device and the device for speech processes
CN107798143A (en) * 2017-11-24 2018-03-13 珠海市魅族科技有限公司 A kind of information search method, device, terminal and readable storage medium storing program for executing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130041892A1 (en) * 2006-10-13 2013-02-14 Syscom Inc. Method and system for converting audio text files originating from audio files to searchable text and for processing the searchable text
US20130158992A1 (en) * 2011-12-17 2013-06-20 Hon Hai Precision Industry Co., Ltd. Speech processing system and method
DE102014203818A1 (en) * 2014-03-03 2015-09-03 Sennheiser Electronic Gmbh & Co. Kg Method and device for converting speech signals into text
CN106095799A (en) * 2016-05-30 2016-11-09 广州多益网络股份有限公司 The storage of a kind of voice, search method and device
CN106782545A (en) * 2016-12-16 2017-05-31 广州视源电子科技股份有限公司 A kind of system and method that audio, video data is changed into writing record
CN107644646A (en) * 2017-09-27 2018-01-30 北京搜狗科技发展有限公司 Method of speech processing, device and the device for speech processes
CN107798143A (en) * 2017-11-24 2018-03-13 珠海市魅族科技有限公司 A kind of information search method, device, terminal and readable storage medium storing program for executing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
R.ANI 等: "Smart Specs: Voice assisted text reading system for visually impaired persons using TTS method" *
牛嵩峰 等: "基于人工智能的中文语音文本智能编辑系统设计" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204668A (en) * 2021-05-21 2021-08-03 广州博冠信息科技有限公司 Audio clipping method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110895575B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
US10977299B2 (en) Systems and methods for consolidating recorded content
TW202008349A (en) Speech labeling method and apparatus, and device
US10410615B2 (en) Audio information processing method and apparatus
CN108305632A (en) A kind of the voice abstract forming method and system of meeting
US20200126583A1 (en) Discovering highlights in transcribed source material for rapid multimedia production
US20200126559A1 (en) Creating multi-media from transcript-aligned media recordings
JP2003289387A (en) Voice message processing system and method
TW201209804A (en) Digital media voice tags in social networks
WO2019169794A1 (en) Method and device for displaying annotation content of teaching system
US20200302112A1 (en) Speech to text enhanced media editing
CN111798833A (en) Voice test method, device, equipment and storage medium
WO2020182042A1 (en) Keyword sample determining method, voice recognition method and apparatus, device, and medium
US20160189107A1 (en) Apparatus and method for automatically creating and recording minutes of meeting
WO2017080235A1 (en) Audio recording editing method and recording device
US20160189103A1 (en) Apparatus and method for automatically creating and recording minutes of meeting
EP4322029A1 (en) Method and apparatus for generating video corpus, and related device
CN107680584B (en) Method and device for segmenting audio
CN112259083A (en) Audio processing method and device
CN110889266A (en) Conference record integration method and device
KR102036721B1 (en) Terminal device for supporting quick search for recorded voice and operating method thereof
CN110895575B (en) Audio processing method and device
KR20060100646A (en) Method and system for searching the position of an image thing
WO2023226726A1 (en) Voice data processing method and apparatus
CN114420125A (en) Audio processing method, device, electronic equipment and medium
JP5713782B2 (en) Information processing apparatus, information processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40025596

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant