CN109657094A

CN109657094A - Audio-frequency processing method and terminal device

Info

Publication number: CN109657094A
Application number: CN201811423356.0A
Authority: CN
Inventors: 彭捷; 黄欣新
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2019-04-19

Abstract

The present invention is suitable for computer application technology, provides a kind of audio-frequency processing method, terminal device and computer readable storage medium, comprising: passes through and obtains audio file to be processed；Parsing audio file obtains urtext information；The playing time of every entry in entry text and broadcasting entry text in urtext information including audio file；The text to be searched for obtaining user's input, the matched entry of text institute determining and to be searched in entry text, and play the target play moment of entry；According to entry and target play moment, audio corresponding with entry is played.Its position and playing time in entry text is determined by the entry inputted according to user and is played out, audio file is flexibly presented to the user according to the mode that user selects, audio software is improved and plays the intelligence of audio and the usage experience of user.

Description

Audio-frequency processing method and terminal device

Technical field

The invention belongs to computer application technology more particularly to a kind of audio-frequency processing methods, terminal device and calculating Machine readable storage medium storing program for executing.

Background technique

With the development of Computer Multimedia Technology, now with various types of audio and video playout softwares, Yong Huke To play music by these softwares, video is appreciated, life & amusement mode is increased.It can only lead in existing audio playing software It crosses and different play mode is set to play audio file, audio broadcasting, flexibility cannot be carried out according to the broadcasting demand of user It is lower.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of audio-frequency processing method, terminal device and computer-readable storages Medium, to solve that audio broadcasting, the lower problem of flexibility cannot be carried out according to the broadcasting demand of user in the prior art.

The first aspect of the embodiment of the present invention provides a kind of audio-frequency processing method, comprising:

Obtain audio file to be processed；

It parses the audio file and obtains urtext information；Including the audio file in the urtext information Entry text and the playing time for playing every entry in the entry text；

The text to be searched for obtaining user's input determines matched with the text institute to be searched in the entry text Entry, and play the target play moment of the entry；

According to the entry and the target play moment, audio corresponding with the entry is played.

The second aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer program that can run on the processor, when the processor executes the computer program It performs the steps of

Obtain audio file to be processed；

The third aspect of the embodiment of the present invention provides a kind of terminal device, comprising:

Acquiring unit, for obtaining audio file to be processed；

Resolution unit obtains urtext information for parsing the audio file；Include in the urtext information The entry text of the audio file and the playing time for playing every entry in the entry text；

Matching unit is determined with described in the entry text wait search for obtaining the text to be searched of user's input The matched entry of Suo Wenben institute, and play the target play moment of the entry；

Broadcast unit, for playing and the entry pair according to the entry and the target play moment The audio answered.

The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer storage medium It is stored with computer program, the computer program includes program instruction, and described program instruction makes institute when being executed by a processor State the method that processor executes above-mentioned first aspect.

Existing beneficial effect is the embodiment of the present invention compared with prior art:

The embodiment of the present invention is by obtaining audio file to be processed；Parsing audio file obtains urtext information；It is former The playing time of every entry in entry text and broadcasting entry text in beginning text information including audio file；It obtains and uses The text to be searched of family input, the matched entry of text institute determining and to be searched in entry text, and play target The target play moment of entry；According to entry and target play moment, audio corresponding with entry is played.Pass through root It determines its position and playing time in entry text according to the entry that user inputs and plays out, so that audio file can be with It is flexibly presented to the user according to the mode that user selects, improves audio software and play the intelligence of audio and making for user With experience.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is the flow chart for the audio-frequency processing method that the embodiment of the present invention one provides；

Fig. 2 is the flow chart of audio-frequency processing method provided by Embodiment 2 of the present invention；

Fig. 3 is the schematic diagram for the terminal device that the embodiment of the present invention three provides；

Fig. 4 is the schematic diagram for the terminal device that the embodiment of the present invention four provides.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.

In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.

It is the flow chart for the audio-frequency processing method that the embodiment of the present invention one provides referring to Fig. 1, Fig. 1.The present embodiment sound intermediate frequency The executing subject of processing method is terminal.Terminal includes but is not limited to that smart phone, tablet computer, wearable device etc. are mobile eventually End, can also be desktop computer etc..Audio-frequency processing method as shown in the figure may comprise steps of:

S101: audio file to be processed is obtained.

Before handling audio file, the audio file is first obtained.Its mode obtained can be by wirelessly passing The modes such as defeated, cable network obtain, herein without limitation.Audio file is generally divided into two classes: audio files and Instrument Digital connect Mouth (Musical Instrument Digital Interface, MIDI) file, audio files is by sound recording device The original sound of recording directly has recorded the binary sampled data of actual sound.MIDI file is a kind of musical performance instruction Sequence is played using audio output device or the electronic musical instrument being connected with computer, audio text in the present embodiment Part is MIDI file.

Audio file is a kind of file important in internet multimedia.The format of audio file in the present embodiment can be with Including but not limited to: waveform audio file format (WAVE Form Audio File Forma, WAVE), Audio Interchange File lattice Formula (Audio Interchange File Format, AIFF), sense of hearing audio format (Audio, AU), dynamic image expert group (Moving Picture Experts Group, MPEG), instant Public Address System (RealAudio, RAM), musical instrument digital interface (Musical Instrument Digital Interface, MIDI).Wherein, WAVE format is one kind of Microsoft's exploitation Sound file format, it meets resource interchange file format (Resource Interchange File Format, PIFF) text Part specification is supported for saving the audio-frequency information resource of WINDOWS platform by WINDOWS platform and its application program.WAVE Format supports a variety of audio digits, sample frequency and sound channel, is sound file format popular on personal computer, file ruler It is very little bigger, it is chiefly used in storing brief sound clip.Audio type file is a kind of digital audio format through overcompression, It is common sound file format in web application.MPEG type file represents Moving picture compression standard, audio text here Part format refers to the audio-frequency unit in MPGE standard, i.e. dynamic image expert group audio layer, because of its sound quality and memory space Sexual valence is relatively high.RealAudio type file is mainly used for the real-time Transmission audio-frequency information on the wide area network of low rate.Network connects Rate difference is connect, client sound quality obtained is also different.MIDI type file is digital music or synthesized instrument Unified international standard, it defines the mode of Computer Music program, synthesizer and other electronic equipments exchange music signal, Also specify the agreement between the electronic musical instrument of different manufacturers and the cable and hardware and equipment of computer connection.It can be used for for not With musical instruments such as musical instrument creation digital audio analog violoncello, violin, pianos.Each audio file has included audio text Part information, wherein audio file information may include the urtext information of audio file, file format, file frame number, each The playing time of text and finish time and text duration etc..For example, the audio file information in a song may include The lyrics duration, wrirte music, write words, singer etc..

S102: it parses the audio file and obtains urtext information；It include the audio in the urtext information The entry text of file and the playing time for playing every entry in the entry text.

After getting audio file to be processed, which is parsed, obtains urtext information.Tool Body, since the type of audio file is different, so the coding mode of each audio file and its corresponding analysis mode are not yet It is identical.Type in this programme with specific reference to each audio file is parsed, by the format and its volume that determine audio file Code mode, can parse the urtext information of the audio file according to coding mode.Wherein, it is wrapped in urtext information It includes the entry text of audio file and plays the playing time of every entry in entry text.

Illustratively, song information and lyrics text have been included at least in the audio file of a first song.By reading and solving Audio lyrics file information is analysed, obtains urtext information, the urtext information obtained in a song is exactly the song The lyrics and playing time.Wherein, before and after every lyrics at the time of respectively at the beginning of the lyrics and at the end of It carves.It is as follows to parse text formatting:

32.34: having passed through the lane street Hou Gu: 35.89

35.89: it is oblique to look at the setting sun afar for you by green wall: 39.03

39.03: only because being casual quick glance: 42.79

42.79: upsetting my state of mind regardless of day and night: 46.07

46.07: wanting to be turned into village Zhou Biancheng butterfly: 49.57

49.57: driving high official position across numerous luxuriant leaf: 52.74

52.74: although be that hills and mountains are layer upon layer of: 56.56

56.56: not also being in the mood for flowing herein and even rest: 59.82

In the present embodiment, playing time can be used to indicate that each sentence, each entry or each in audio text At the time of word starts to play, i.e., to start to play starting for the 0th second for the audio file, play every entry in the audio file First character at the time of be playing time.For example, in the examples described above, entry " although being that hills and mountains are layer upon layer of " is corresponding to be broadcast Put is 52.74 seconds constantly, i.e., plays the entry within the 52.74th second from starting to play the audio file.Further, in order to more Refinement, the accurate playing time for indicating each word in audio file, word, we can also determine in audio text information Each word playing time, i.e., the word at the time of starting to play for playing time.

In numerous audio compression methods, these methods compression digital audio as far as possible while keeping sound quality is allowed to Occupy smaller memory space.MPEG compression is lossy compression, which means that being certain to lose when compressing with this method A part of audio-frequency information.But since the control of compression method is difficult to find this loss.Using several extremely complex and harsh Mathematical algorithm so that the partial loss that only will be barely audible in original audio is fallen.This leaves more to important information 12 times of significant effects of audio compression can exactly be should be his quality, mpeg audio catches on by this method by space Get up.(the Moving Picture Experts Group Audio Layer of dynamic image expert's compression standard audio level 3 III, MP3) file is broadly divided into three parts: label _ V2 (ID3V2), audio data frame, label _ V1 (ID3V1).Wherein, ID3V2 contains author in the position that file starts, and composition, the information such as album, length is not fixed, and extends the information of ID3V1 Amount；It include a series of frame in frame, in the middle position of file, number is determined by file size and frame length；The length of each frame It may be not fixed, it is also possible to it is fixed, it is determined by bit rate, each frame is divided into frame head and data entity two parts again；Frame head has recorded The information such as bit rate, sample rate, the version of MP3, it is mutually indepedent between each frame.

Illustratively, mp3 file is made of frame, and frame is the smallest composition unit of mp3 file.Mpeg audio file root Three layers are divided into according to compression quality and coding complexity, and respectively corresponds these three audio files of MP1, MP2, MP3, and according to Different purposes uses the coding of different levels.The level of mpeg audio coding is higher, and encoder is more complicated, and compression ratio is also got over The compression ratio of height, MP1 and MP2 are respectively 4:1 and 6:1-8:1, and the compression ratio of MP3 is then up to 10:1~12:1, one minute sound The uncompressed memory space for needing 10MB of the music of matter, and there was only 1MB or so after MP3 compressed encoding.But MP3 is to sound Frequency signal first carries out frequency spectrum point to audio file when MP3 is encoded to reduce audio distortions degree using lossy compression mode Analysis, then filter noise level with filter, then by way of quantization by it is remaining each break up arrangement, eventually form Mp3 file with higher compression ratios, and compressed file is enable to reach the sound of relatively former source of sound in playback Effect.

WMA is a kind of media file format that Microsoft defines, it is a kind of Streaming Media.Each wma file, its head 16 A byte be it is fixed, be hexadecimal " 30 26 B2,75 8E, 66 CF, 11 A6 D9,00 AA, 00 62 CE 6C ", For identifying whether this is wma file.Next 8 bytes are that an integer high position exists below, indicate entire WMA text The size on part head, this head the inside contain all non-audio informations such as label information, and subsequent head is audio-frequency information. The inside houses many frames offset is 31 since file, the standard tag information for having us to need, extension tag letter Breath, wma file control information etc..Each frame is not isometric, but frame head is 24 fixed bytes, wherein preceding 16 byte It is the name for identifying this frame, rear 8 bytes are used to indicate that the size of this frame.Since we only need read write tag Information, and label information is stored in respectively in two frames, respectively standard label frame and extension tag frame, so only needing to locate The two frames are managed, other frames can be skipped completely according to the frame length of acquisition.Standard label frame only includes title of song, art Four family, copyright and remarks contents.Its frame name is hexadecimal " 33 26 B2,75 8E, 66 CF, 11 A6 D9 00 00 62 CE 6C " of AA, the integer of followed by 5 respectively 2 bytes, first four respectively indicate after the frame head of 24 bytes Title of song, artist, copyright, the size of remarks.After this 10 bytes, the content of this five information is just stored in order ?.All texts are all to store by the coding mode of wide character, and have one behind each character string in wma file A 0 termination character.The number for the information for including inside extension tag frame be it is uncertain, each information is also the same according to frame Mode organize.The frame name of extension tag frame is hexadecimal " 40 07 E3 D2 of A4 D0 D2,11 97 F0 00 A0 C9 5E A8 50 " first has in this frame of the integer representation of two byte one shared after the frame head of 24 bytes Information number is extended, followed by extension information.Each extension information includes extension information name and corresponding value.First have one The integer of a 2 bytes come indicate extension information name size, followed by extension name of the information, then have 2 bytes Integer mark, and be that the integer of 2 byte is used to indicate the size of value, be this value with that.When extension is believed When breath name is WMFSDKVersion, what this value indicated is the version of this wma file；When extension information name is WM/ When AlbumTitle, what this value represented is exactly album name；When extending information name is WM/Genre, this value is represented just It is school；Similarly, it is easy to the purposes of this value is found out from the name of extension information.The name and value of these extension information are almost It is all to be stored with the character string of wide character.Integer mark only to WM/TrackNumber and WM/Track, believe by the two extensions It is useful to cease name, subsequent value is indicated in the form of the integer of 4 bytes when integer is identified as 3, that is, song Information, when integer is identified as 0, song information is indicated with common character string forms.

AMR adaptive multi-rate audio compression audio coding formats, being one makes voice coding optimize, be exclusively used in Effect ground compression speech frequency.AMR audio is mainly used for the audio compression of mobile device, and compression ratio is very high but sound quality is poor, It is mainly used for the audio compression of voice class, is not suitable for the compression of the music class audio frequency more demanding to sound quality.It is a not using 1-8 Same position speed coding.AMR mono- shares 16 kinds of coding modes.0-7 corresponds to 8 kinds of different coding modes, and every kind of coding mode is adopted Sample frequency is different；8-15 is for noise or retains use.All AMR file header marks are 6 bytes.This file is every frame 21 Byte.In the file header of AMR audio, the head of file is inconsistent in the case of monophonic and multichannel, under mono case File header only include a magic number, and file header had both included magic number in the case of multichannel, after which also included one 32 Channel description field.32 bit ports in the case of multichannel describe character, and first 28 are all reserved character, it is necessary to it is arranged to 0, Last 4 illustrate the sound channel number used.It is exactly speech frame block continuous in time after the file header of AMR audio, each Frame block includes that the speech frame of several 8 hytes alignment is arranged successively since first sound channel relative to several sound channels.Often One speech frame is all since one 8 frame heads, and wherein P is that filler must be set as 0, and each frame is the alignment of 8 hytes 's.

It should be noted that the size of its audio frame is different for different coding modes, bit rate is also not With, the calculation of audio data frame sign are as follows: mono- frame of AMR corresponds to 20ms, then there is the audio data of 50 frames for one second.Due to than Special rate is different, and the size of data of every frame is also different.If bit rate is 12.2kbs, the audio data digit of sampling per second Are as follows: 12200/50=244bit=30.5byte is rounded as 31 bytes.Rounding will round up, along with the frame of a byte Head, the size of such data frame are 32 bytes.

S103: obtaining the text to be searched of user's input, the determining and text institute to be searched in the entry text Matched entry, and play the target play moment of the entry.

After the entry text being resolved in urtext information, by obtaining the text to be searched of user's input, The determining target play moment with the text to be searched matched entry of institute and broadcasting entry in entry text.It is exemplary Ground, entry text can be and the lyrics of song audio files, user input text to be searched can be a word or One sentence of person, this word or sentence inputted by user search corresponding playing time in former lyrics file.? The object searched in the present embodiment is the entry of user's input, and is stored with each entry in urtext information and its broadcasts The moment is put, by searching for the playing time of entry corresponding entry and the entry in urtext information, to determine The target play moment of the entry.

In practical applications, the mode for obtaining the text to be searched of user's input can be user in the window of audio player Mouth input entry, or cursor placement can be determined into text to be searched in some position of entry text.In basis When text to be searched is matched in entry text, can by way of calculating similarity factor, calculate text to be searched with The highest part of similarity factor in entry text, so determine entry and play the entry target play when It carves.

Specifically, when calculating the similarity factor between text to be searched and entry text, can first by two objects into Row participle, determines at least one entry in two objects.Specific similarity factor calculation can be by calculating two sequences The distance between column or similarity factor determine operation deviation value.Illustratively, Euclidean distance, Euclidean distance can be passed through Standardization, Mahalanobis generalised distance, manhatton distance, Chebyshev distance, Minkowski distance or Hamming distances Mode calculate the distance between text to be searched and entry text, can also pass through and calculate cosine similarity factor and adjustment Cosine similarity factor, Pearson correlation coefficients, log-likelihood similarity factor, log-likelihood likelihood, mutual information gain or word Similarity factor between text and entry text to be searched is calculated the mode of similarity factor.

Illustratively, the operation that can be calculated by Jaccard similarity factor between text and entry text to be searched is inclined From value:Wherein, X, Y are respectively used to indicate that the entry in text to be searched and entry text quantifies Value can determine text and word to be searched by calculating the Jaccard similarity factor between text and entry text to be searched Operation deviation value between bar text.

S104: according to the entry and the target play moment, audio corresponding with the entry is played.

The matched entry of text institute determining and to be searched in entry text, and determine to play the mesh of entry After marking playing time, according to entry and target play moment, audio corresponding with entry is played.Illustratively, After user in practical applications has selected certain lyrics or has input some word, by the words and phrases determiner in entry Position and target play moment in text, play out.

Further, in the present embodiment, after determining entry and target play moment, the target word can be played Item, broadcast mode can be the continuous loop play entry, be also possible to only play an entry；It can also be It plays entry and continues immediately to play the audio-frequency unit after entry.In addition to this it is possible to be other broadcasting sides Formula, in the present embodiment without limitation.

Above scheme, by obtaining audio file to be processed；It parses the audio file and obtains urtext information；Institute It states the entry text in urtext information including the audio file and plays in the entry text broadcasting for every entry Put the moment；The text to be searched for obtaining user's input determines matched with the text institute to be searched in the entry text Entry, and play the target play moment of the entry；When being played according to the entry and the target It carves, plays audio corresponding with the entry.Its position in entry text is determined by the entry inputted according to user It sets and playing time and plays out, audio file is flexibly presented to the user according to the mode that user selects, is mentioned High audio software plays the usage experience of the intelligence and user of audio.

Referring to fig. 2, Fig. 2 is the flow chart of audio-frequency processing method provided by Embodiment 2 of the present invention.The present embodiment sound intermediate frequency The executing subject of processing method is terminal.Terminal includes but is not limited to that smart phone, tablet computer, wearable device etc. are mobile eventually End, can also be desktop computer etc..Audio-frequency processing method as shown in the figure may comprise steps of:

S201: audio file to be processed is obtained.

The implementation of S101 is identical in S201 embodiment corresponding with Fig. 1 in the present embodiment, specifically refers to The associated description of S101 in the corresponding embodiment of Fig. 1, details are not described herein.

S202: it parses the audio file and obtains urtext information；It include the audio in the urtext information The entry text of file and the playing time for playing every entry in the entry text.

The implementation of S102 is identical in S202 embodiment corresponding with Fig. 1 in the present embodiment, specifically refers to The associated description of S102 in the corresponding embodiment of Fig. 1, details are not described herein.

S203: obtaining the text to be searched of user's input, and at least one key is extracted from the text to be searched Word.

It is getting audio file to be processed and after obtaining urtext information in the audio file, is obtaining user The text to be searched of input, wherein text to be searched can be a word, be also possible to a sentence.By obtaining user The text to be searched of input, extracts at least one keyword from text to be searched.

Further, step S203 can specifically include step S2031~S2032:

S2031: the text to be searched of user's input is obtained, and the text to be searched is pre-processed, obtains pre- place Text after reason.

In practical applications, the mode for obtaining the text to be searched of user's input can be also possible to by obtaining in real time Timing acquisition, the mode of timing can be user oneself set time period, by periodically obtain user's input wait search Suo Wenben, it is ensured that the running quality of audio file has preferable controllability.

After getting text to be searched, text to be searched is pre-processed, in the present embodiment, pretreated side Formula may include delete redundancy entry, the operation such as data correction or data filter out, herein without limitation.Specifically, due to The entry of family input or selection many situations all there is text that punctuation mark etc. is anticipated without word in this case can To delete the redundancies entry such as punctuate, audio processing efficiency is improved.In many cases, all there is mistake in the entry text of user's input Word can identify the wrong word in text to be searched when pretreated, predict the entry meaning in text to be searched and use The correct entry of input is wanted at family, and corrects it, and improves the accuracy of vocabulary entry search.In many cases, user also can Input much has duplicate entry when input, and such case will will increase the duration and error rate of vocabulary entry search, therefore, It by identifying duplicate entry, and carries out data and filters out, reduce the entry data of text to be searched, improve vocabulary entry search Efficiency and accuracy rate.

S2032: according to preparatory trained participle model, segmenting the text after the pretreatment, obtain to A few keyword.

For any Men Yuyan, word is most basic unit, and computer will be understood and be analyzed to natural language, First must just word segmentation processing be carried out to original long text.Participle technique is exactly to pass through computer to carry out certainly word in text A kind of technology of dynamic identification, for using English as the Romance of representative, there is advantageous advantage in word segmentation processing, Default is separated with space between word and word.However Chinese word segmentation is just appeared to it is complicated and much more difficult, Minimum unit is word in Chinese text, and there is no apparent separators between word.

It treats in matched text and is segmented first in the present embodiment, at least one keyword at extraction.It optionally, can be with It is segmented by the segmentation methods based on string matching, dictionary data is loaded by certain data structure, to input Text-string carry out cutting according to certain scanning sequency and matching strategy, and carry out character string with the word in dictionary Match, thinks to identify a word if successful match, the participle clear thinking based on dictionary pattern matching, principle is simple and is easy to It realizes.Understanding process of the computer by the imitation mankind to sentence can also be wished by the participle based on understanding, the algorithm, from Semantic, grammer angle analyzes text sentence.Therefore need to prepare in advance the related letter in terms of a large amount of language, grammer Breath and knowledge.

Participle model is trained previously according to the historical data of entry text, when being trained to participle model, is first obtained Get the good training set data of labor standard.It is labelled with the position of participle in these training set datas, determines kinds of characters institute Corresponding participle position, wherein participle position includes starting position, end position and the middle position of participle.Secondly, to acquisition To training set data carry out pretreatment and feature extraction.By filtering out non-targeted character, a Chinese character is given, is sentenced Breaking, whether it belongs to punctuation mark, number, Chinese figure or letter；If being not belonging to any kind therein, statistics should The position of positioned word, is indicated with B, M, E, S when character occurs in training corpus.Wherein, B is for indicating the character It is the beginning of each word；M is for indicating the character in the middle position of some word；E is for indicating that the character is the knot of some word Beam position；S is for indicating that the character can one word of independent composition.The position of character is matched by rule-statistical, counts character Corresponding location conten determines the position classification of the character；Illustratively, the threshold value that this programme is taken is 90%, as long as word Accord with position frequency of occurrence is more than total degree 90%, then it is assumed that most of the character is in the corresponding character of word；

Secondly, predicting the position of key character by participle model.The spy that participle model is taken in the present embodiment Sign may include N-gram feature, may include but be not limited to such as ci, cici+1 and cici+2 feature in this feature.Wherein, Ci is for indicating character types corresponding to former and later two keywords, the wherein feature of i=-2, -1,0,1,2 or 5；cici+1 For indicating the character combination feature of adjacent spaces, the wherein feature of i=-2, -1,0,1 or 4；Cici+2 is separated by for indicating The character combination feature of one character, wherein i=-1,0 or 2 feature.The spy that participle model in the present embodiment is taken Sign can also include character repetition information characteristics, calculate whether some character is repeat character (RPT), function sets with first three character For duplication (c0, ci), wherein i=-2, -1 or 2 feature.The spy that participle model is taken in the present embodiment Sign can also include character class feature, for calculating three character types before the character.

Finally, being learnt using trellis traversal method to the parameter in model in the present embodiment, the index mainly traversed Have: learning rate, frequency of training, lot number amount, termination error etc..The condition that model training terminates includes but is not limited to that frequency of training reaches Some index has been had arrived to certain number, error.When carrying out parameter learning, the numerical value determination to each index includes But be not limited to following: learning rate chooses three dimensions such as 0.01,0.02,0.03；Frequency of training chooses 500,1000,2,000 three Dimension；Lot number amount chooses 100,200,500 three dimensions；Termination error choose 0.05,0.01,0.5 three dimension.By to not Same online learning methods, available design parameter combination.The model combination of different parameters composition is obtained by model training: Params1, params2, params3 ... .params n }, wherein params n is for indicating that it is different that training obtains Parameter.After obtaining training parameter, the model combination that these parameters form is tested, determines the accuracy of test, and The highest model of accuracy is chosen as participle model to obtain to carry out word segmentation processing to the text to be searched after pretreatment At least one keyword, for indicating the entry in text to be searched, to carry out vocabulary entry search.

S204: carrying out fuzzy matching in the urtext information according to the keyword, obtain with it is described to be searched The matched entry of text institute.

After the keyword in text to be searched has been determined, according to the keyword urtext information entry text Middle carry out fuzzy matching obtains and the matched entry of text institute to be searched.

Further, step S204 can specifically include step S2041~S2044.

S2041: the first term vector corresponding with the keyword is generated according to the keyword.

In the case where the text to be matched got is one or more keyword, can directly be existed by keyword It is searched in target text, determines position and the playing time of entry of the keyword in target text, be somebody's turn to do with playing The corresponding audio-frequency unit of entry.Further, for example, often there is duplicate text in a first song, therefore, it is necessary to User inputs at least two keywords, to be matched in target text by greater number of keyword, accurately determines mesh Mark entry.In the present embodiment, by quantifying keyword, corresponding first term vector of keyword is obtained.

Assuming that between each keyword in text to be matched and urtext information be it is incoherent, indicated with vector Keyword in text, to simplify the complex relationship in text between keyword.Text is regarded as and is independent from each other word Item group (T₁,T₂,T₃,…,T_i,…T_n) constitute, for each entry T_iIt is assigned according to its significance level in the text certain Weight w_i, and by (T₁,T₂,T₃,…,T_i,…T_n) regard reference axis in a n-dimensional coordinate system, w as₁,w₂,…,w_i,…, w_mRespectively corresponding coordinate value, in this way by (T₁,T₂,T₃,…,T_i,…T_n) decompose obtained orthogonal entry set of vectors and just constitute One text vector space.

S2042: the urtext information is divided into simple sentence, and determines the second term vector of each simple sentence.

Based on the text vector space in step S2042, text to be matched is reduced to the vector being made of keyword: a =(w_a1,w_a2,…,w_ai,…,w_am)^T；It is the vector being made of keyword: b=(w by urtext Information Simplification_b1,w_b2,…, w_bi,…,w_bm)^T。

S2043: it according to first term vector and each second term vector, calculates in the urtext information Simple sentence matching degree between every entry and the keyword.

It in practical applications, can be by the distance between vector or both similarity calculations urtext information Simple sentence matching degree between every entry and keyword.It is alternatively possible to pass through Euclidean distance, the standard of Euclidean distance Change, Mahalanobis generalised distance, manhatton distance, Chebyshev distance, Minkowski distance or Hamming distances mode To calculate the distance between text to be searched and entry text, calculating cosine similarity factor can also be passed through and adjust cosine phase Like coefficient, Pearson correlation coefficients, log-likelihood similarity factor, log-likelihood likelihood, mutual information gain or word to similar The mode of coefficient calculates the similarity factor between text and entry text to be searched.

The simple sentence matching degree of the two is calculated according to crucial term vector are as follows:

Wherein, a=(w_a1,w_a2,…,w_ai,…,w_am)^TFor indicating that keyword that text to be matched is simplified to is constituted Vector；B=(w_b1,w_b2,…,w_bi,…,w_bm)^TFor indicate urtext Information Simplification at the vector that is constituted of keyword.

S2044: identify that the highest entry of simple sentence matching degree is the entry.

Identify that urtext information corresponding with the maximum keyword of crucial Word similarity of text to be matched is to wait for this Matched text corresponds to target text.If the audio file is a song, the lyrics are searched for generally by audio lyrics file Keyword function, lyrics position where quickly positioning.Click the simple search result list of the broadcasting lyrics of positioning.Quickly jump to The playing time position of the current lyrics.The corresponding audio-frequency unit of the entry is played, audio-source and audio lyrics text are enable This rolling in unison shows and its plays, that is, realizes that user directly passes through control mouse and indicates the corresponding audio original lyrics, To control audio while play.

It further, can also be by presetting a matching degree threshold value, for will be greater than or be equal to the matching degree The simple sentence matching degree of threshold value screens, and is presented to user, selectes corresponding entry by user oneself, increases user Master control.

S205: according to the playing time of every entry in the entry and the urtext information, determine described in The target play moment of entry.

The implementation of S103 is identical in S205 embodiment corresponding with Fig. 1 in the present embodiment, specifically refers to The associated description of S103 in the corresponding embodiment of Fig. 1, details are not described herein.

S206: according to the entry and the target play moment, audio corresponding with the entry is played.

The implementation of S104 is identical in S206 embodiment corresponding with Fig. 1 in the present embodiment, specifically refers to The associated description of S104 in the corresponding embodiment of Fig. 1, details are not described herein.

S207: the target audio that the currently playing moment played is obtained, and identifies the content of text of the target audio.

While playing audio, obtain currently playing target audio, i.e., the currently playing moment, preset time it Interior a segment of audio.Voice in the target audio is identified, the content of text of target audio is obtained.Optionally, may be used To construct audio identification model, identify target by audio identification model by carrying out data analysis according to history audio file The content of text of audio.Specifically, in building audio identification model process, including two stages, training stage and identification rank Section.In the training stage, by the content of text in manually identifying audio file, and using the characteristic vector of audio file as template It is stored in template library；In cognitive phase, the characteristic vector of input audio file is successively subjected to phase with each template in template library Compare like degree, is exported similarity soprano as recognition result.

S208: according to the content of text and the currently playing moment, correct recording in the urtext information With target play moment corresponding to the matched entry text of the content of text.

After recognizing the corresponding content of text of currently played target audio, broadcast according to text content and currently It puts the moment, corrects the entry text and playing time of urtext information.Illustratively, when music player is playing music When, it can also roll simultaneously and change the lyrics, but the lyrics of song heard of user may be different with the lyrics that are rolled or be broadcast It puts that the moment is inconsistent, needs to identify the content of text of currently played audio in this case, just to go to correct original song Word.

It modifies while playing audio file to audio file information, including modification content of text and broadcasting text Playing time, wherein the playing time of text can refine to play each sentence, each word playing time.It is repairing When changing content of text, the content of text that acquisition current time is played carries out language according to the content of text that current time is played Sound identification, the text in text results and audio file information obtained according to identification compare, if inconsistent, modify sound Playing time and content in frequency file information.It is broadcast correcting target corresponding to the entry text in urtext information When putting the moment, audio playback progress component can be monitored by angular timed task, the synchronous rolling for controlling text plays.

Specifically, when correcting the playing time of text, current time when can be by the currently playing text, with this article This playing time recorded in urtext information is compared, if inconsistent, it is determined that inconsistent sentence, entry occurs Or single word, determine and record that the current time for playing these texts is correct playing time, while modifying former before The playing time of these texts recorded in beginning text information.Entry or playing time modification in entire audio file After finishing, according to lyrics name+timestamp format backup history file, while the audio file after modification is saved, realize language The error correction of text file after sound identification plays.

Above scheme, by obtaining audio file to be processed；It parses the audio file and obtains urtext information；Institute It states the entry text in urtext information including the audio file and plays in the entry text broadcasting for every entry Put the moment；The text to be searched of user's input is obtained, and extracts at least one keyword from the text to be searched；According to The keyword carries out fuzzy matching in the urtext information, obtains and the matched mesh of the text institute to be searched Mark entry；According to the playing time of every entry in the entry and the urtext information, the target word is determined The target play moment of item.According to the entry and the target play moment, play corresponding with the entry Audio.The target audio that the currently playing moment played is obtained, and identifies the content of text of the target audio；According to the text This content and the currently playing moment, correct recorded in the urtext information with the matched target of the content of text Target play moment corresponding to entry text.By directly playing the entry of user's selection, realizes and broadcast by user's determination Event is put, and quickly controls the playback progress of player, and passes through while carrying out the calibration and modification of entry text in broadcasting, It improves audio software and plays the intelligence of audio and the usage experience of user.

It is a kind of schematic diagram for terminal device that the embodiment of the present invention three provides referring to Fig. 3, Fig. 3.What terminal device included Each unit is used to execute each step in the corresponding embodiment of FIG. 1 to FIG. 2.Referring specifically to the corresponding implementation of FIG. 1 to FIG. 2 Associated description in example.For ease of description, only the parts related to this embodiment are shown.The terminal device of the present embodiment 300 include:

Acquiring unit 301, for obtaining audio file to be processed；

Resolution unit 302 obtains urtext information for parsing the audio file；It is wrapped in the urtext information It includes the entry text of the audio file and plays the playing time of every entry in the entry text；

Matching unit 303, for obtaining the text to be searched of user's input, in the entry text it is determining with it is described to The matched entry of text institute is searched for, and plays the target play moment of the entry；

Broadcast unit 304, for playing and the entry according to the entry and the target play moment Corresponding audio.

Further, the matching unit 303 may include:

Extraction unit for obtaining the text to be searched of user's input, and extracts at least from the text to be searched One keyword；

Search unit obtains and institute for carrying out fuzzy matching in the urtext information according to the keyword State the matched entry of text institute to be searched；

Determination unit, for the playing time according to every entry in the entry and the urtext information, Determine the target play moment of the entry.

Further, the extraction unit may include:

Pretreatment unit for obtaining the text to be searched of user's input, and pre-processes the text to be searched, Text after being pre-processed；

Participle unit, for being segmented to the text after the pretreatment according to preparatory trained participle model, Obtain at least one described keyword.

Further, described search unit may include:

Primary vector unit, for generating the first term vector corresponding with the keyword according to the keyword；

Secondary vector unit for the urtext information to be divided into simple sentence, and determines the of each simple sentence Two term vectors；

Computing unit, for calculating the urtext according to first term vector and each second term vector The simple sentence matching degree between every entry and the keyword in information；

Recognition unit, the highest entry of simple sentence matching degree is the entry for identification.

Further, the terminal device can also include:

Content recognition unit for obtaining the target audio that the currently playing moment played, and identifies the target audio Content of text；

Unit is corrected, for correcting the urtext information according to the content of text and the currently playing moment Middle record with target play moment corresponding to the matched entry text of the content of text.

Fig. 4 is the schematic diagram for the terminal device that the embodiment of the present invention four provides.As shown in figure 4, the terminal of the embodiment is set Standby 4 include: processor 40, memory 41 and are stored in the meter that can be run in the memory 41 and on the processor 40 Calculation machine program 42.The processor 40 is realized when executing the computer program 42 in above-mentioned each audio-frequency processing method embodiment The step of, such as step 101 shown in FIG. 1 is to 103.Alternatively, realization when the processor 40 executes the computer program 42 The function of each module/unit in above-mentioned each Installation practice, such as the function of unit 301 to 303 shown in Fig. 3.

Illustratively, the computer program 42 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 41, and are executed by the processor 40, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 42 in the terminal device 4 is described.

The terminal device 4 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 40, memory 41.It will be understood by those skilled in the art that Fig. 4 The only example of terminal device 4 does not constitute the restriction to terminal device 4, may include than illustrating more or fewer portions Part perhaps combines certain components or different components, such as the terminal device can also include input-output equipment, net Network access device, bus etc..

Alleged processor 40 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

The memory 41 can be the internal storage unit of the terminal device 4, such as the hard disk or interior of terminal device 4 It deposits.The memory 41 is also possible to the External memory equipment of the terminal device 4, such as be equipped on the terminal device 4 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card, FC) etc..Further, the memory 41 can also have been deposited both the inside including the terminal device 4 Storage unit also includes External memory equipment.The memory 41 is for storing the computer program and terminal device institute Other programs and data needed.The memory 41 can be also used for temporarily storing the number that has exported or will export According to.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of audio-frequency processing method characterized by comprising

Obtain audio file to be processed；

It parses the audio file and obtains urtext information；It include the entry of the audio file in the urtext information Text and the playing time for playing every entry in the entry text；

The text to be searched for obtaining user's input, determining in the entry text and matched target of the text institute to be searched Entry, and play the target play moment of the entry；

2. audio-frequency processing method as described in claim 1, which is characterized in that the text to be searched for obtaining user's input, The determining and matched entry of the text institute to be searched in the entry text, and play the mesh of the entry Mark playing time, comprising:

The text to be searched of user's input is obtained, and extracts at least one keyword from the text to be searched；

Fuzzy matching is carried out in the urtext information according to the keyword, obtains being matched with the text to be searched The entry；

According to the playing time of entry in the entry and the urtext information, the target of the entry is determined Playing time.

3. audio-frequency processing method as claimed in claim 2, which is characterized in that the text to be searched for obtaining user's input, And at least one keyword is extracted from the text to be searched, comprising:

The text to be searched of user's input is obtained, and the text to be searched is pre-processed, the text after being pre-processed This；

According to preparatory trained participle model, the text after the pretreatment is segmented, is obtained described at least one Keyword.

4. audio-frequency processing method as claimed in claim 2, which is characterized in that it is described according to the keyword in the original text Fuzzy matching is carried out in this information, is obtained and the matched entry of the text institute to be searched, comprising:

The first term vector corresponding with the keyword is generated according to the keyword；

The urtext information is divided into simple sentence, and determines the second term vector of each simple sentence；

According to first term vector and each second term vector, calculate every entry in the urtext information with Simple sentence matching degree between the keyword；

Identify that the highest entry of simple sentence matching degree is the entry.

5. audio-frequency processing method according to any one of claims 1-4, which is characterized in that it is described according to the entry and The target play moment, after playing corresponding with entry audio, further includes:

The target audio that the currently playing moment played is obtained, and identifies the content of text of the target audio；

According to the content of text and the currently playing moment, correct being recorded in the urtext information with the text Target play moment corresponding to the entry text of content matching.

6. a kind of terminal device, which is characterized in that including memory and processor, being stored in the memory can be described The computer program run on processor, which is characterized in that when the processor executes the computer program, realize following step It is rapid:

Obtain audio file to be processed；

7. terminal device as claimed in claim 6, which is characterized in that the text to be searched for obtaining user's input, in institute It is determining in predicate bar text to be broadcast with the matched entry of the text institute to be searched, and the target of the broadcasting entry Put the moment, comprising:

According to the playing time of every entry in the entry and the urtext information, the entry is determined Target play moment.

8. terminal device as claimed in claim 6, which is characterized in that the text to be searched for obtaining user's input, and from At least one keyword is extracted in the text to be searched, comprising:

9. a kind of terminal device characterized by comprising

Acquiring unit, for obtaining audio file to be processed；

Resolution unit obtains urtext information for parsing the audio file；It include described in the urtext information The entry text of audio file and the playing time for playing every entry in the entry text；

Matching unit, for obtaining the text to be searched of user's input, the determining and text to be searched in the entry text The matched entry of this institute, and play the target play moment of the entry；

Broadcast unit, for playing corresponding with the entry according to the entry and the target play moment Audio.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.