CN105391708A

CN105391708A - Audio data detection method and device

Info

Publication number: CN105391708A
Application number: CN201510731621.1A
Authority: CN
Inventors: 彭波
Original assignee: Beijing Ruian Technology Co Ltd
Current assignee: Beijing Ruian Technology Co Ltd
Priority date: 2015-11-02
Filing date: 2015-11-02
Publication date: 2016-03-09

Abstract

The invention discloses an audio data detection method and a device. The method comprises steps: target audio data are acquired according to at least one instant messaging software network packet; at least one piece of preset key word information included in the target audio data is searched, and the appearing times of each piece of preset key word information is counted; according to the appearing times of each piece of preset key word information and a preset weight corresponding to each piece of preset key word information, a target score corresponding to the target audio data is determined; and if the target score is larger than a preset score, the audio data are determined as sensitive data. Audio data during the instant messaging process can be acquired, the audio data are scored according to the preset key word information included in the audio data, whether the audio data are sensitive data is further determined, a machine is used for detecting the sensitive data in the audio data during the instant messaging process, and the network safety is improved.

Description

The detection method of voice data and device

Technical field

The embodiment of the present invention relates to data analysis technique, particularly relates to a kind of detection method and device of voice data.

Background technology

Along with the arrival of forth generation mobile communication technology 4G cybertimes and the universal of intelligent terminal, the phonetic function that increasing user uses the instant messaging of intelligent terminal to apply exchanges.

User A is by the instant chat application input one section of voice in terminal a, and these voice are sent to terminal b by wireless network by terminal a, and user B gets this voice by terminal b.

But some lawless persons utilize the phonetic function of intelligent terminal to be engaged in relate to probably, relate to the illegal activities such as sudden and violent.Provide in prior art and text data is analyzed, determine the technical scheme whether comprising responsive vocabulary in text message.But the responsive vocabulary that prior art cannot contain the voice data in internet data bag detects, and causes Network Security Vulnerabilities.

Summary of the invention

The invention provides a kind of detection method and device of voice data, to realize detecting the responsive vocabulary in voice data, improve network security.

First aspect, embodiments provides a kind of detection method of voice data, comprising:

Network packet according at least one instant chat software obtains target audio data;

Search at least one the preset keyword information comprised in described target audio data, and add up the occurrence number of each preset keyword information;

The target scoring that described target audio data are corresponding is determined according to the occurrence number of described each preset keyword information and default weight corresponding to described each preset keyword information;

If described target scoring is greater than preset scoring, then determine that described voice data is sensitive data.

Second aspect, the embodiment of the present invention additionally provides a kind of checkout gear of voice data, comprising:

Target audio data capture unit, obtains target audio data for the network packet according at least one instant chat software;

Preset keyword information searching unit, for searching at least one the preset keyword information comprised in the described target audio data of described target audio data capture unit acquisition;

Occurrence number statistic unit, for adding up the occurrence number of each preset keyword information that described preset keyword information searching unit finds;

Target score calculation unit, determines for the occurrence number of described each preset keyword information that obtains according to described occurrence number statistic unit and default weight corresponding to described each preset keyword information the target scoring that described target audio data are corresponding;

Sensitive data determining unit, if the described target scoring obtained for described target score calculation unit is greater than preset scoring, then determines that described voice data is sensitive data.

The present invention obtains target audio data according to the network packet of at least one instant chat software, then the occurrence number of the preset keyword information in target audio data is added up, whether the occurrence number according to preset keyword information is marked to target audio data, and be sensitive data according to appraisal result determination voice data.Cannot carry out compared with machine detects to the Internet sound intermediate frequency packet with prior art, the present invention can obtain the packets of audio data in instant chat process, and according to the preset keyword information comprised in the voice data after parsing, voice data is marked, and then determine whether voice data is sensitive data, realize being detected the responsive vocabulary that instant chat process sound intermediate frequency packet contains by machine, improve network security.

Accompanying drawing explanation

Fig. 1 is the flow chart of the detection method of a voice data in the embodiment of the present invention one;

Fig. 2 is the flow chart of the detection method of first voice data in the embodiment of the present invention two;

Fig. 3 is the flow chart of the detection method of second voice data in the embodiment of the present invention two;

Fig. 4 is the flow chart of the detection method of the 3rd voice data in the embodiment of the present invention two;

Fig. 5 is the flow chart of the detection method of the 4th voice data in the embodiment of the present invention two;

Fig. 6 is the flow chart of the detection method of the 5th voice data in the embodiment of the present invention two;

Fig. 7 is the structural representation of the checkout gear of first voice data in the embodiment of the present invention three;

Fig. 8 is the structural representation of the checkout gear of second voice data in the embodiment of the present invention three;

Fig. 9 is the structural representation of the checkout gear of the 3rd voice data in the embodiment of the present invention three;

Figure 10 is the structural representation of the checkout gear of the 4th voice data in the embodiment of the present invention three;

Figure 11 is the structural representation of the checkout gear of the 5th voice data in the embodiment of the present invention three.

Embodiment

Below in conjunction with drawings and Examples, the present invention is described in further detail.Be understandable that, specific embodiment described herein is only for explaining the present invention, but not limitation of the invention.It also should be noted that, for convenience of description, illustrate only part related to the present invention in accompanying drawing but not entire infrastructure.

Embodiment one

The flow chart of the detection method of a kind of voice data that Fig. 1 provides for the embodiment of the present invention one, the present embodiment is applicable to the situation of the voice data in network being carried out to safety detection, the method can be performed by the network equipment (as server) with message repeating function, and the method specifically comprises the steps:

Step 110, obtain target audio data according to the network packet of at least one instant chat software.

User, can input text information, voice messaging, video information and pictorial information when using instant chat software.When the above-mentioned information of transmission user input, the information that user inputs is divided into multiple less sub-information by transmitting terminal usually, then by network, sub-information is sent to receiving terminal, obtains the information that user inputs at transmitting terminal by receiving terminal after being combined.The network packet of instant chat software is the packet for carrying above-mentioned sub-information.Network packet sends with message form, and receiving terminal can determine the type of the information entrained by network packet according to the information in network packet heading.Such as, if carry the condition code that " session (session) " creates audio session in heading, can determine that network packet is for carrying voice data.

Receiving terminal by resolving network packet, obtains the audio frequency subdata bag carried in network packet after receiving network packet.By carrying out sequence combination and decoding to resolving the audio frequency subdata bag obtained, obtain the target audio data of transmitting terminal user input.Target audio data can be pulse code modulation (PulseCodeModulation, PCM) file.

Step 120, search at least one the preset keyword information comprised in target audio data, and add up the occurrence number of each preset keyword information.

Preset keyword can probably relate to sudden and violent grade for term for relating to, such as, and blast, bomb, gun model etc.Preset keyword information is one section of acoustic signals that preset keyword is corresponding.Include the acoustic signals that different user or tester read a certain preset keyword in advance, the acoustic signals be indexed to is preset keyword information.Multiple preset keyword information corresponding to preset keyword are obtained all by the way for each preset keyword.

Because the target audio data obtained in step 110 can be PCM, therefore when whether including the acoustic signals identical with preset keyword in target audio data, can determine in target audio data containing preset keyword information.From first data bit of target audio data, the subdata in target audio data and preset keyword information are compared successively, this subdata has identical data bit with preset keyword information.

Step 130, according to the occurrence number of each preset keyword information and target scoring corresponding to default weight determination target audio data corresponding to each preset keyword information.

By adding up the occurrence number of each preset keyword information, the relation group (K of each preset keyword information and its occurrence number in target audio data can be obtained _x, N _x), wherein, K _xrepresent an xth preset keyword information, N _xfor the occurrence number of an xth preset keyword in target audio data.

Suppose there is M preset keyword information, then can obtain M relation group: { (K ₁, N ₁), (K ₂, N ₂), (K ₃, N ₃) ... (K _m, N _m).Each preset keyword information is a corresponding default weight { w respectively ₁, w ₂, w ₃, w ₄w _m.Target scoring S=(K ₁* w ₁+ K ₂* w ₂+ K ₃* w ₃+ ... + K _m* w _m) * C, wherein, C can be 1 or or be less than 1 and be greater than the mark of 0 or be greater than the natural number of 1, preset weight { w ₁, w ₂, w ₃, w ₄w _mall be greater than zero and be less than one.Preferably, C=10.Default weight can be arranged according to the content of preset keyword information.

If the scoring of step 140 target is greater than preset scoring, then determine that voice data is sensitive data.

Default scoring is greater than zero.Preferably, default scoring is more than or equal to 50.When determining that voice data is sensitive data, these data are sent to the server of relevant department (as network security department), so that relevant department analyzes the user initiating these target audio data.

The present embodiment obtains target audio data according to the network packet of at least one instant chat software, then the occurrence number of the preset keyword information in target audio data is added up, whether the occurrence number according to preset keyword information is marked to target audio data, and be sensitive data according to appraisal result determination voice data.Cannot carry out compared with machine detects to internet audio packet with prior art, the present embodiment can obtain the voice data in instant chat process, and according to the preset keyword information comprised in voice data, voice data is marked, and then determine whether voice data is sensitive data, realize being detected the responsive vocabulary that instant chat process sound intermediate frequency packet contains by machine, improve network security.

Embodiment two

The embodiment of the present invention additionally provides a kind of detection method of voice data, as further illustrating embodiment one, as shown in Figure 2, step 110, obtain target audio data according to the network packet of at least one instant chat software, implement by following manner:

Step 111, basis are preset the network packet of procotol at least one instant chat software and are resolved, and obtain at least one audio frequency subdata bag, audio frequency subdata packet number and type of coding.

Network packet is the packet after encapsulating according to default procotol, to forward in a network.When after arrival receiving terminal, receiving terminal is resolved network packet according to this default procotol.During parsing, after being removed by the heading of network packet, obtain audio frequency subdata bag.Enter instant chat software application layer protocol analysis through bottom (physical layer) protocol analysis and the identification of protocol characteristic code, what obtain transmitting terminal instant chat software (as GoogleTalk) logs in the information such as IP, login account information, conversation start mark, the condition code of audio session and the coded format (payload-type) of call employing.Wherein, protocol characteristic code, for representing different procotols, can determine default procotol according to protocol characteristic code, to carry out decapsulation to network packet.Because instant chat software needs to ensure real-time, RTP (Real-timeTransportProtocol, RTP) therefore can be adopted to transmit the voice data that user inputs.

In one implementation, invited party (a) and between invited party (also known as terminal b) is carried out the process of voice data transmission as shown in Figure 3 also known as terminal, is specifically comprised the steps:

The server that step 101, terminal a have passed through transfer effect sends interactive voice request data package to terminal b.

Exemplary voice are consulted interactive demand signal message and are comprised following content:

<iqto＝"UserB888gmail.com"type＝"set"id＝"60">

<sessionxmlns＝“http://www.google.com/session”type＝"initiate"id＝"2484656951"initiator＝"UserA888gmail.com">

<descriptionxmlns＝"http://www.google.com/session/phone">

<payload-typexmlns＝“http://www.google.com/session/phone”id＝"4"name＝"G723"/>

<payload-typexmlns＝“http://www.google.com/session/phone”id＝"106"name＝"audio/telephone-event"/>

</description>

</session>

</iq>

Wherein, <session> is the condition code creating audio session, to=" UserB888gmail.com/Talk.UserB " is the user name of invited party (terminal b), type=" initiate " refers to that this type of data packet is initialization type, initiator=" UserA888gmail.com/Talk.UserA " be session setup side i.e. invited party (terminal a), <description> is some descriptors of this session, effect is similar to reply and describes agreement (SessionDescriptionProtocol, SDP), payload-type refers to the coded format that call adopts, as G723.Id is the identification information of respective labels.

After step 102, terminal b receive this request, send reply data bag by server to terminal a.

If type=" accept " in step 103 reply data bag, then terminal a starts the voice data transmission between terminal b.

In reply data bag, if type=" accept (acceptance) ", then represent the request of terminal b (invited party) receiving terminal a (invited party).If type=" reject (refusal) ", represents that terminal b (invited party) refuses the request of terminal a (invited party).

Step 104, pass through at the end of, end side is (as terminal a) sends end of conversation packet by server to opposite end (as terminal b).

In end of conversation packet, type=" terminate ", represents that both sides terminate call.

Known by above-mentioned flow process, by resolving the heading of network packet, the decoding process of audio frequency subdata can be determined according to the payload-type in heading.After decapsulation is carried out to network packet, obtain the audio frequency subdata bag that network packet is carried.

Step 112, according to audio frequency subdata packet number at least one audio frequency subdata bag carry out sequence combination, obtain packets of audio data.

Terminal a is when sending the acoustic signals of user's input to terminal b, first the coded system that acoustic signals specifies according to payload-type is encoded, then the Data Placement after coding is become little subdata, and each subdata interpolation is encoded accordingly to combine in opposite end.Accordingly, after intercepting and capturing audio frequency subdata bag, according to audio frequency subdata packet number, multiple audio frequency subdata bags of catching are sorted.When the sequence number of multiple audio frequency subdata bag is continuous, and when total length is consistent with the condition code of audio session, the plurality of audio frequency subdata bag is combined, obtains the target audio packet corresponding with the information of acoustic wave that user inputs in opposite end.

Step 113, according to type of coding, target audio packet to be decoded, obtain target audio data.

According to the coded system that payload-type describes, target audio packet is decoded, obtain target audio data.

Instant chat software is as Google voice GoogleTalk, Yahoo Expert, types of facial makeup in Beijing operas facebook etc.The coded system of each instant chat software to the sound wave that user inputs is not quite similar.Therefore need to determine the decoding algorithm corresponding with this coded system according to coded system (payload-type), then according to this decoding algorithm, target audio packet is decoded, obtain the wave file of peer user input.

The technical scheme that the present embodiment provides, by resolving network packet, can obtain the target audio data that network packet is carried, and improves the treatment effeciency of network packet.

Further, inventor finds that sample rate because different instant chat software uses when transferring voice and sampling resolution and coded system are not quite similar, and for the ease of implementing the keyword spotting of voice data, needs the data after to reduction to carry out transcoding.The embodiment of the present invention additionally provides a kind of detection method of voice data, as further illustrating above-described embodiment, as shown in Figure 4, in step 113, decode to target audio packet according to type of coding, after obtaining target audio data, shown method also comprises:

Step 114, transcoding is carried out to target audio data, obtain the target audio data with preset format.

Except target audio data are carried out except transcoding, also need to ensure that the form of preset keyword information meets preset format, namely identical with the form after target audio transcoded data.

Exemplary, the wave file form of preset format to be sample rate be 8000Hz, figure place 16bit.Wave file can be pulse code modulation (PulseCodeModulation, PCM) file.Optionally, for the ease of playing, by the WAV formatted file that PCM file transform provides for Microsoft.

Accordingly, step 120, search at least one the preset keyword information comprised in target audio data, also implement by following manner:

Step 120 ', search there is preset format target audio data at least one preset keyword information of comprising.

The technical scheme that the present embodiment provides, can the form of unified goal packets of audio data and the form of preset keyword information, improves the accuracy of keyword search.

The embodiment of the present invention additionally provides a kind of detection method of voice data, as further illustrating above-described embodiment, as shown in Figure 5, step 120, search at least one the preset keyword information comprised in target audio data, implement by following manner:

The default character vector that step 121, the target feature vector parameter obtaining target audio data and each preset keyword information are corresponding.

According to frame length, target audio data are divided, obtain multiple frame data.Every frame length is preset duration, and preset duration is as 20ms or 30ms.By the multiple frame data obtained substitute in hidden Markov model (HiddenMarkovModel, HMM), obtain character vector.Hidden Markov model can extract short-time energy, zero-crossing rate, frequency band variance as the characteristic parameter of end-point detection, the Mel cepstrum coefficient of the critical band characteristic vector of 16 dimensions and 12 dimensions is selected to be used as the characteristic parameter of pattern recognition modeling, a stack features vector parameters is obtained, i.e. target feature vector parameter after calculating.In like manner, the character vector of preset keyword information can be obtained.

Such as, target audio data are divided, obtain n frame data, n frame data are substituted in HMM, obtains target feature vector parameter X={x ₁, x ₂... x _n.

Further, in order to improve the relevance between consecutive frame data, there are in adjacent two frames the data that overlapping duration is corresponding.Such as: first frame data is the data of 0-20ms, second frame data is the data of 10-30ms, and the 3rd frame is the data of 20-40ms.

Step 122, search at least one the default character vector comprised in target feature vector parameter.

In one implementation, respectively following step is carried out to each default character vector:

Obtain the characteristic vector quantity m presetting character vector and comprise.From first characteristic vector of target feature vector parameter, judge that whether first characteristic vector be identical with default character vector to m+1 characteristic vector.If identical, then 1 is added to the count results of current preset character vector.If not identical, then whether second characteristic vector be identical with default character vector to m+2 characteristic vector.If identical, then 1 is added to the count results of current preset character vector; If not identical, then whether the 3rd characteristic vector be identical with default character vector to m+3 characteristic vector, by that analogy, obtains the occurrence number of current preset character vector in target feature vector parameter.Successively each default character vector is calculated as parameter current, obtain the occurrence number of each default character vector.

In another kind of implementation, extract the default character vector M that a predetermined keyword information is corresponding, make its posterior probability of certain section in voice to be identified reach maximum, namely

Wherein, M is default character vector corresponding to key word information, HMM model, X={x ₁, x ₂... x _nthe target feature vector parameter of target audio data, be included in one section of characteristic vector { x in X _b..., x _e(1≤b≤e≤N), b is its starting point, and e is terminating point, the computing function that function P () is posterior probability.

Further, because information of acoustic wave is non-stationary signal, be therefore that each frame increases hamming code window (Hamming window), then the character vector of each frame after obtaining windowing.

Accordingly, add up the occurrence number of each preset keyword information, comprising:

The occurrence number of each default character vector comprised in step 123, statistics target feature vector parameter.

The technical scheme that the present embodiment provides, target audio data and preset keyword information can be converted to character vector to compare, take up room little due to character vector and target audio data and preset keyword information can be expressed accurately, therefore, it is possible to improve the accuracy of keyword search further, improve the detection efficiency of keyword.

Further, the embodiment of the present invention additionally provides a kind of detection confirmation method of voice data, as further illustrating above-described embodiment, as shown in Figure 6, step 122, search at least one default character vector of comprising in described target feature vector parameter after, described method also comprises:

The likelihood ratio that described each default character vector that step 124, calculating find is corresponding.

Likelihood ratio computing formula as follows:

f (w | \overset{&OverBar;}{λ}) = l n \frac{P (w | λ)}{P (w | \overset{&OverBar;}{λ})}

(formula one)

Wherein λ is the HMM model that the sub-word of target is corresponding, for the alternative hypothesis model of its correspondence, as P (w| λ) for hypothesis one section of speech characteristic vector w is correctly validated into the probability of certain word λ; Alternative hypothesis be w be correctly validated into non-certain word λ ( ) probability, default character vector likelihood ratio being greater than preset value adds resident probability and confirms.

Step 125, confirm that result determines the resident probability of each default character vector according to likelihood ratio.

Likelihood ratio needs after confirming to extract park feature to increase the confirmation ability of the sub-word of target.If word is resident too shortly just give its lower mark:

The computing formula of park feature d (t) is as follows:

D (t)=lnK (α, ρ)+(ρ-1) lnp (t)-α t (formula two)

Wherein, t is residence time, for constant, α and ρ is obtained by the sample average and variance of adding up the sub-word residence time of corpus.

The resident probability DP presetting character vector is that it comprises the minimum value of the park feature of sub-word.The sub-word data volume that is frame data are corresponding.

DP=min _id _i(t) (formula three)

Wherein, i represents the needs of predetermined keyword information, and 1<i<m, m are the sum of predetermined keyword information.

Step 126, resident probability is greater than the default character vector of predetermined probabilities, is defined as the default character vector comprised in described target feature vector parameter.

The technical scheme that the present embodiment provides, can be screened the preset keyword information obtained in step 122 by the resident probability of preset keyword information, improves the accuracy of preset keyword information, to improve detection efficiency, improves network security.

Further, according to target audio data genaration wav file, and the related information such as transmitting terminal networking protocol address (InternetProtocolAddress, IP address), login account of this wav file and instant chat is synthesized in BCP formatted file.Carry out data analysis so that follow-up based on BCP formatted file, the frequency etc. that the account sends this wav file is such as added up in data analysis.

Embodiment three

The embodiment of the present invention additionally provides a kind of checkout gear 11 of voice data, and shown device 11 is arranged in server, and as shown in Figure 7, shown device 11 comprises:

Target audio data capture unit 11, obtains target audio data for the network packet according at least one instant chat software;

Preset keyword information searching unit 12, for searching at least one the preset keyword information comprised in the described target audio data of described target audio data capture unit 11 acquisition;

Occurrence number statistic unit 13, for adding up the occurrence number of each preset keyword information that described preset keyword information searching unit 12 finds;

Target score calculation unit 14, determines for the occurrence number of described each preset keyword information that obtains according to described occurrence number statistic unit 13 and default weight corresponding to described each preset keyword information the target scoring that described target audio data are corresponding;

Sensitive data determining unit 15, if the described target scoring obtained for described target score calculation unit 14 is greater than preset scoring, then determines that described voice data is sensitive data.

Further, as shown in Figure 8, described target audio data capture unit 11, comprising:

Resolving subelement 111, for resolving according to presetting the network packet of procotol at least one instant chat software, obtaining at least one audio frequency subdata bag, audio frequency subdata packet number and type of coding;

Combination subelement 112, carries out sequence combination for the described audio frequency subdata packet number obtained according to described parsing subelement 111 at least one audio frequency subdata bag described, obtains packets of audio data;

Decoding subelement 113, for decoding to the described target audio packet that described combination subelement 112 obtains according to described type of coding, obtains target audio data.

Further, as shown in Figure 9, described target audio data capture unit 11, also comprises:

Transcoding subelement 114, carries out transcoding for the described target audio data obtained described combination subelement 112, obtains the target audio data with preset format;

Accordingly, described preset keyword information searching unit 12 also for, search described transcoding subelement 114 obtain described in there is at least one the preset keyword information comprised in the target audio data of preset format.

Further, as shown in Figure 10, described preset keyword information searching unit 12, comprising:

Character vector obtains subelement 121, the default character vector that target feature vector parameter and each preset keyword information for obtaining described target audio data are corresponding;

Default character vector searches subelement 122, for searching at least one the default character vector comprised in described target feature vector parameter;

Accordingly, described occurrence number statistic unit 13 also for, add up the occurrence number that described character vector obtains each default character vector comprised in the described target feature vector parameter that obtains of subelement 121.

Further, as shown in figure 11, described preset keyword information searching unit 12 also comprises:

Likelihood ratio computation subunit 123, for calculating likelihood ratio corresponding to described each default character vector of finding;

Resident probability determination subelement 124, the described likelihood ratio for obtaining according to described likelihood ratio computation subunit 123 determines the resident probability of described each default character vector;

Preset character vector determination subelement 125, described resident probability for being obtained by described resident probability determination subelement 124 is greater than the default character vector of predetermined probabilities, is defined as the default character vector comprised in described target feature vector parameter.

Said apparatus 1 can perform the method that the embodiment of the present invention one and embodiment two provide, and possesses and performs the corresponding functional module of said method and beneficial effect.The not ins and outs of detailed description in the present embodiment, the method that can provide see the embodiment of the present invention one and embodiment two.

Note, above are only preferred embodiment of the present invention and institute's application technology principle.Skilled person in the art will appreciate that and the invention is not restricted to specific embodiment described here, various obvious change can be carried out for a person skilled in the art, readjust and substitute and can not protection scope of the present invention be departed from.Therefore, although be described in further detail invention has been by above embodiment, the present invention is not limited only to above embodiment, when not departing from the present invention's design, can also comprise other Equivalent embodiments more, and scope of the present invention is determined by appended right.

Claims

1. a detection method for voice data, is characterized in that, comprising:

2. the detection method of voice data according to claim 1, is characterized in that, the described network packet according at least one instant chat software obtains target audio data, comprising:

Resolve according to the network packet of default procotol at least one instant chat software, obtain at least one audio frequency subdata bag, audio frequency subdata packet number and type of coding;

According to described audio frequency subdata packet number, sequence combination is carried out at least one audio frequency subdata bag described, obtain packets of audio data;

According to described type of coding, described target audio packet is decoded, obtain target audio data.

3. the detection method of voice data according to claim 2, is characterized in that, decoding to described target audio packet according to described type of coding, after obtaining target audio data, described method also comprises:

Transcoding is carried out to described target audio data, obtains the target audio data with preset format;

Accordingly, described in search at least one the preset keyword information comprised in described target audio data, comprising:

There is described in searching at least one the preset keyword information comprised in the target audio data of preset format.

4. the detection method of voice data according to any one of claim 1 to 3, is characterized in that, described in search at least one the preset keyword information comprised in described target audio data, comprising:

Obtain the target feature vector parameter of described target audio data and default character vector corresponding to each preset keyword information;

Search at least one the default character vector comprised in described target feature vector parameter;

Accordingly, the occurrence number of each preset keyword information of described statistics, comprising:

Add up the occurrence number of each default character vector comprised in described target feature vector parameter.

5. the detection method of voice data according to claim 4, is characterized in that, after searching at least one the default character vector comprised in described target feature vector parameter, described method also comprises:

The likelihood ratio that described each default character vector that calculating finds is corresponding;

The resident probability of described each default character vector is determined according to described likelihood ratio;

Resident probability is greater than the default character vector of predetermined probabilities, is defined as the default character vector comprised in described target feature vector parameter.

6. a checkout gear for voice data, is characterized in that, comprising:

7. the checkout gear of voice data according to claim 6, is characterized in that, described target audio data capture unit, comprising:

Resolving subelement, for resolving according to presetting the network packet of procotol at least one instant chat software, obtaining at least one audio frequency subdata bag, audio frequency subdata packet number and type of coding;

Combination subelement, carries out sequence combination for the described audio frequency subdata packet number obtained according to described parsing subelement to described at least one group of audio frequency subdata bag, obtains packets of audio data;

Decoding subelement, for according to the audio-frequency load decoding data of described type of coding to the described target audio packet that described combination subelement obtains, obtains target audio data.

8. the checkout gear of voice data according to claim 7, is characterized in that, described target audio data capture unit, also comprises:

Transcoding subelement, carries out transcoding for the described target audio data obtained described decoding subelement, obtains the target audio data with preset format;

Accordingly, described preset keyword information searching unit also for, search described transcoding subelement obtain described in there is at least one the preset keyword information comprised in the target audio data of preset format.

9. the checkout gear of the voice data according to any one of claim 6 to 8, is characterized in that, described preset keyword information searching unit, comprising:

Character vector obtains subelement, the default character vector that target feature vector parameter and each preset keyword information for obtaining described target audio data are corresponding;

Default character vector searches subelement, for searching at least one the default character vector comprised in described target feature vector parameter;

Accordingly, described occurrence number statistic unit also for, add up the occurrence number that described character vector obtains each default character vector comprised in the described target feature vector parameter that obtains of subelement.

10. the checkout gear of voice data according to claim 9, is characterized in that, described preset keyword information searching unit also comprises:

Likelihood ratio computation subunit, for calculating likelihood ratio corresponding to described each default character vector of finding;

Resident probability determination subelement, the described likelihood ratio for obtaining according to described likelihood ratio computation subunit determines the resident probability of described each default character vector;

Preset character vector determination subelement, the described resident probability for being obtained by described resident probability determination subelement is greater than the default character vector of predetermined probabilities, is defined as the default character vector comprised in described target feature vector parameter.