CN114038470A

CN114038470A - Method, device, equipment and medium for extracting voiceprint features of transformer signals

Info

Publication number: CN114038470A
Application number: CN202111280080.7A
Authority: CN
Inventors: 周春雷; 李洋; 宋金伟; 季良; 李俊妮; 朱广新; 宣东海; 张璧君; 陈相舟
Original assignee: Big Data Center Of State Grid Corp Of China
Current assignee: Big Data Center Of State Grid Corp Of China
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2022-02-11

Abstract

The embodiment of the invention discloses a method, a device, equipment and a medium for extracting voiceprint characteristics of a transformer signal. The method comprises the following steps: acquiring an operation state acoustic signal of the transformer, and reconstructing the acoustic signal to obtain a reconstructed acoustic signal; selecting one frame in the reconstructed acoustic signal as a reference frame; acquiring at least one adjacent frame adjacent to the reference frame to obtain a frequency value of at least one adjacent frame acoustic signal; processing the frequency value of at least one sound signal of the adjacent frame, and determining the average value of the processed at least one frequency value as the frequency value of the reference frame; and calculating the voice voiceprint characteristic parameters of the reference frame according to the frequency value of the reference frame. The embodiment of the invention avoids the difference of frequency domain signals caused by slight time shift deviation of the voice signals in the frame, so that compared with the traditional method which directly extracts the voice print characteristics according to the frequency value of the voice signals, the method improves the stationarity of the voice print characteristic extraction and the voice print recognition rate.

Description

Method, device, equipment and medium for extracting voiceprint features of transformer signals

Technical Field

The embodiment of the invention relates to the technical field of audio data processing, in particular to a transformer signal voiceprint feature extraction method, device, equipment and medium.

Background

As one of the most important electrical devices in an electrical power system, a power transformer needs to be ensured in a reliable operation state. At present, the troubleshooting of the transformer mainly depends on regular polling and visual inspection by special polling personnel. The inspection mode consumes a large amount of manpower, has strong subjectivity, and the transformer fault often cannot be fed back and solved in real time due to periodic inspection, thereby finally causing huge property loss to the national power system.

When the transformer is in operation, the iron core and the winding generate vibration, and vibration sound signals which are radiated and mixed around are transmitted through the oil and the box body. In addition, when partial discharge occurs in the transformer, an acoustic signal is often emitted. Therefore, the vibration sound signal of the transformer contains abundant information of the running state of the transformer, and can be used as a basis for judging the running state of the transformer.

However, the acoustic sensor collects the acoustic signal of the operating state of the transformer, and also has corona acoustic signals, environmental noise and the like, so that effective operating state information is submerged in various interferences, and the information contained in the acoustic signal is difficult to extract, and difficult to perform subsequent processing. Therefore, how to effectively process the acoustic signal of the transformer and extract the voiceprint characteristics in the acoustic signal becomes the key for accurately judging the fault of the transformer in the follow-up process.

Disclosure of Invention

The embodiment of the invention provides a transformer signal voiceprint feature extraction method, a device, equipment and a medium, which are used for avoiding the difference of frequency domain signals caused by slight time shift deviation of intra-frame voice signals and improving the stationarity of voiceprint feature extraction and the voiceprint recognition rate.

In a first aspect, an embodiment of the present invention provides a transformer signal voiceprint feature extraction method, where the method includes:

acquiring an operation state acoustic signal of the transformer, and reconstructing the acoustic signal to obtain a reconstructed acoustic signal;

selecting one frame in the reconstructed acoustic signal as a reference frame;

acquiring at least one adjacent frame adjacent to the reference frame to obtain a frequency value of at least one adjacent frame acoustic signal;

processing the frequency value of at least one sound signal of the adjacent frame, and determining the average value of the processed at least one frequency value as the frequency value of the reference frame;

and calculating the voice voiceprint characteristic parameters of the reference frame according to the frequency value of the reference frame.

In a second aspect, an embodiment of the present invention further provides a transformer signal voiceprint feature extraction apparatus, where the apparatus includes:

the signal reconstruction module is used for acquiring an acoustic signal in an operating state and reconstructing the acoustic signal to obtain a reconstructed acoustic signal;

the signal selection module is used for selecting one frame in the reconstructed acoustic signal as a reference frame;

the frequency acquisition module is used for secondly acquiring at least one adjacent frame adjacent to the reference frame to obtain a frequency value of at least one adjacent frame acoustic signal;

the frequency determining module is used for processing the frequency value of at least one sound signal of the adjacent frame and determining the average value of the processed at least one frequency value as the frequency value of the reference frame;

and the characteristic extraction module is used for calculating the voice voiceprint characteristic parameters of the reference frame according to the frequency value of the reference frame.

In a third aspect, an embodiment of the present invention further provides a transformer signal voiceprint feature extraction device, where the device includes:

one or more processors;

storage means for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors implement the transformer signal voiceprint feature extraction method of any of the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are stored thereon, and when executed by a computer processor, implement the method for extracting the voiceprint feature of the transformer signal according to any one of the first aspect.

According to the method and the device, the frequency value of the reference frame is adjusted through the frequency values of the adjacent frames, so that the difference of frequency domain signals caused by slight time shift deviation of the voice signals in the frame is avoided, and compared with the traditional method which directly extracts the voiceprint features according to the frequency values of the voice signals, the stationarity of voiceprint feature extraction and the voiceprint recognition rate are improved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:

fig. 1 is a flowchart of a transformer signal voiceprint feature extraction method according to an embodiment of the present invention;

fig. 2 is a flowchart of a transformer signal voiceprint feature extraction method provided in the second embodiment of the present invention

Fig. 3 is a block diagram of a structure of a transformer signal voiceprint feature extraction apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a transformer signal voiceprint feature extraction device provided in the fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a transformer signal voiceprint feature extraction method according to an embodiment of the present invention, which is applicable to a situation where a power worker performs feature extraction using an acoustic signal of a transformer operating state, where the extracted feature may have multiple uses, such as fault determination, and the method may be executed by the transformer signal voiceprint feature extraction device according to the embodiment of the present invention, and the device may be implemented in a software and/or hardware manner. The device can be configured in an electronic device with signal processing capability, and the method specifically comprises the following steps:

and S110, acquiring the running state acoustic signal of the transformer, and reconstructing the acoustic signal to obtain a reconstructed acoustic signal.

The transformer can be an old type transformer with newly installed sound acquisition equipment, can also be a new type transformer with current part leaving factory and provided with acquisition equipment, and can also be a transformer for remotely monitoring the jitter degree of the transformer to obtain the corresponding sound signals.

Specifically, the electronic device acquires the acoustic signal from the acoustic signal acquisition device. The grid may often require the transformer to be shut down due to maintenance, power scheduling, etc. Therefore, the embodiment of the invention only acquires the acoustic signal of the voltage device in the running state, avoids meaningless acoustic signal acquisition and processing of the transformer in the closed state, saves the computing resource and further improves the processing rate. Meanwhile, the self noise of the transformer during working is large, a plurality of transformers often work simultaneously, the noise influences each other, and the sound signal acquired on one transformer at the moment can not only have environmental noises such as wind noise, human voice and the like, but also be influenced by the noise generated by the operation of other power transformation equipment. The original acoustic signals which are directly obtained are reconstructed to carry out certain noise reduction and filtering, so that the subsequent signal processing is guaranteed to be effective.

And S120, selecting one frame in the reconstructed acoustic signal as a reference frame.

Wherein each frame is an acoustic signal segment having a duration less than the duration of the complete reconstructed acoustic signal.

Specifically, due to the fact that the duration of the reconstructed acoustic signal is long, the amount of data to be directly processed is large, the signal cannot be accurately processed while the processing efficiency is low, and due to the fact that the abnormality of the transformer is often short in duration, the abnormality of the transformer cannot be accurately detected in the long-time reconstructed acoustic signal. The embodiment of the invention divides the reconstructed sound signal into frames and selects one frame as a reference frame to process the current voice signal, thereby reducing the quantity of data and being more beneficial to timely and accurately finding the abnormality in a shorter time frame.

S130, acquiring at least one adjacent frame adjacent to the reference frame to obtain a frequency value of at least one adjacent frame acoustic signal.

And the adjacent frames are a previous frame and a next frame of the reference frame when the reconstructed signal is framed.

Specifically, if the reference frame is the first frame, only one adjacent frame is present at this time, that is, the second frame; if the reference frame is the last frame, only one adjacent frame is used at the moment, namely the last frame; except for the case where the first frame and the last frame are reference frames, the reference frame has two adjacent frames in other cases. And acquiring the frequency value of one or two adjacent frames as reference data for subsequently calculating the frequency of the reference frame.

S140, processing the frequency value of at least one sound signal of the adjacent frame, and determining the average value of the processed at least one frequency value as the frequency value of the reference frame.

Wherein, the processing refers to performing a preset processing operation on at least one adjacent frame, so as to ensure that the frequency value after further processing can be used for calculating the frequency value of the reference frame.

Specifically, if there is only one adjacent frame, determining an average value of the processed frequency value and the frequency value of the reference frame; and if the number of the adjacent frames is only two, determining the average value of the frequency values of the two processed adjacent frames. Regardless of whether one or more adjacent frames exist, it is ensured that the obtained reference frame frequency is not directly obtained original data, but data related to the adjacent frames, thereby reducing the influence of errors or other factors on single frame data.

S150, calculating the voice voiceprint characteristic parameters of the reference frame according to the frequency value of the reference frame.

The voice voiceprint characteristic parameters are used for distinguishing the detected sound signals of the transformer to determine whether the sound signals in the current running state are abnormal or not, so that the transformer fault can be found in time.

Specifically, the speech voiceprint characteristic parameter of the reference frame is calculated and obtained according to the reference frame frequency value obtained on the basis of the adjacent frame frequency value data. The difference of frequency domain signals caused by slight time shift deviation of the voice signals in the frame is avoided, so that compared with the traditional method in which the voiceprint characteristic parameters are directly calculated according to the frequency value of the current voice frame, the method improves the stationarity of voiceprint characteristic extraction and the voiceprint recognition rate.

According to the technical scheme of the embodiment, the sound signal is reconstructed to obtain a reconstructed sound signal by acquiring the running state sound signal of the transformer; selecting one frame in the reconstructed acoustic signal as a reference frame; acquiring at least one adjacent frame adjacent to the reference frame to obtain a frequency value of at least one adjacent frame acoustic signal; processing the frequency value of at least one sound signal of the adjacent frame, and determining the average value of the processed at least one frequency value as the frequency value of the reference frame; and calculating the voice voiceprint characteristic parameters of the reference frame according to the frequency value of the reference frame, so that the difference of frequency domain signals caused by slight time shift deviation of voice signals in the frame is avoided, and compared with the traditional method which directly extracts the voiceprint characteristics according to the frequency value of the voice signals, the method improves the stability of voiceprint characteristic extraction and the voiceprint recognition rate.

On the basis of the technical scheme, the operation state acoustic signal of the transformer is acquired, and the acoustic signal is reconstructed to obtain a reconstructed acoustic signal, preferably:

performing short-time Fourier transform on the acoustic signal after windowing and framing through a Hamming window; aiming at the transformed acoustic signals, carrying out underdetermined blind source separation to obtain de-noised signals; carrying out four-layer eigenmode function decomposition on the de-noised signal by a variational mode decomposition method; aiming at the decomposed acoustic signals, obtaining reconstructed acoustic signals according to the filtered high-frequency components, low-frequency components and intermediate-frequency components of the eigenmode functions; and performing filtering processing on the filtered high-frequency component and the filtered low-frequency component of the eigenmode function by using a 3 sigma principle according to 6-layer wavelet decomposition.

According to the embodiment of the invention, on the basis of the above embodiment, the intrinsic mode function component is further subjected to filtering processing to obtain a new intrinsic mode function component, and the new intrinsic mode function component is reconstructed to obtain a final de-noising signal, so that the transformer substation operation sound signal and the environmental noise can be effectively separated, the de-noising effect is good, and the stationarity of voiceprint feature extraction and the voiceprint recognition rate are further improved.

Example two

Fig. 2 is a flowchart of the operation of a transformer signal voiceprint feature extraction method according to the second embodiment of the present invention, and the transformer signal voiceprint feature extraction method is based on the above-mentioned embodiment of the present invention, and further explains that one frame in the reconstructed acoustic signal is selected as a reference frame. Specifically, referring to fig. 2, the method may include:

s210, obtaining the running state acoustic signal of the transformer, and reconstructing the acoustic signal to obtain a reconstructed acoustic signal.

S220, judging whether the reconstructed acoustic signal is effective or not; and if the reconstructed signal is invalid, acquiring and reconstructing the acoustic signal again to obtain the reconstructed acoustic signal.

Due to the reasons that the reconstruction cannot be objectively reconstructed due to the possible calculation or processing error during reconstruction, or due to the fact that a certain section of acoustic signal lacks details, and the like, the finally reconstructed signal may come in and go out of the original signal, and the subsequent calculation of the voice voiceprint characteristics is influenced. The embodiment of the invention judges the validity of the reconstructed signal, and if the reconstructed signal is invalid, the acoustic signal is acquired and reconstructed again to ensure that the reconstructed acoustic signal subjected to framing finally is an effective acoustic signal.

Preferably, a minimum hash algorithm is used for calculating the similarity between the reconstructed acoustic signal and a preset standard acoustic signal; judging whether the similarity exceeds a preset threshold value or not; and if the similarity exceeds a preset threshold value, determining that the reconstructed acoustic signal is valid.

Illustratively, the specific way of calculating the reconstructed signal by using the minimum hash algorithm is as follows:

T＝[T₁,T₂,…,T_N]；

Si＝[Si₁,Si₂,…,Si_N]

carrying out similarity calculation;

j (T, Si) belongs to (0, 1), i represents the number of stored voiceprint feature information of the reconstructed acoustic signal, T represents a voiceprint feature information set of the reconstructed acoustic signal, Si represents a voiceprint feature information set of a preset acoustic signal, and the larger the value of the similarity coefficient of J (T, Si) is, the higher the similarity between the two sets is.

According to the embodiment of the invention, the similarity between the reconstructed sound signal and the sound signal before reconstruction is calculated by adopting the minimum Hash algorithm, so that the efficiency of calculating the similarity is improved.

And S230, dividing the reconstructed acoustic signal into a plurality of frames according to the preset frame time length and the preset frame shifting.

The frame time length is the duration of each frame, and the frame shift is the shift time length of the next frame relative to the previous frame, and the frame time length and the frame shift can be preset with fixed values or determined based on the adaptability of the reconstructed acoustic signal.

Illustratively, a reconstructed acoustic signal with the time length of 2s is taken as a sample of a frame, each sample is framed, and if the time length of each frame is 0.04s, and the frame is shifted by o.o1s, each sample can be divided into 97 frames according to the time domain.

S240, selecting one frame in the reconstructed acoustic signal as a reference frame.

S250, processing the frequency value of at least one sound signal of the adjacent frame, and determining the average value of the processed at least one frequency value as the frequency value of the reference frame.

S260, acquiring at least one adjacent frame adjacent to the reference frame to obtain a frequency value of at least one adjacent frame acoustic signal.

And S270, calculating the voice voiceprint characteristic parameters of the reference frame according to the frequency value of the reference frame.

According to the technical scheme of the embodiment, the difference of frequency domain signals caused by slight time shift deviation of the intra-frame voice signals is avoided, compared with the traditional method that the stationarity of voiceprint feature extraction and the voiceprint recognition rate are improved directly according to the frequency value of the voice signals, effectiveness judgment is further carried out on the reconstructed voice signals, if the reconstructed voice signals are invalid, the voice signals are obtained again and reconstructed, the effectiveness of the reconstructed voice signals is guaranteed, and the similarity is calculated through a minimum hash algorithm, so that the calculation efficiency is improved.

On the basis of the above embodiments, it is preferable that the frequency value of at least one of the adjacent frame acoustic signals is processed, and the average value of the processed at least one frequency value is determined as the frequency value of the reference frame:

respectively setting corresponding sampling conditions for each adjacent frame of acoustic signals; and acquiring the frequency value of each adjacent frame acoustic signal according to the sampling condition corresponding to each adjacent frame acoustic signal.

And the sampling time length is set and determined according to the time length offset and the adjacent frame number.

Specifically, the time offset may be Δ FS, the number of adjacent frames may be N, and N is a positive integer. Δ FS and N need to satisfy: Δ FS < < FS and Δ FS × N <2 × FL. Wherein FL is the time length of the preset frame, FS is the frame shift.

Since the delta FS meets the requirement of delta FS < < FS, the frequency values of at least one frame of acquired voice data are mutually independent in the process of acquiring the reference frame and the adjacent frame after framing, the influence on voiceprint feature extraction caused by data overlapping between sampling data is avoided, the reliability of voiceprint feature extraction is improved, and the accuracy of voiceprint feature extraction is further improved. Meanwhile, N needs to satisfy delta FS N <2 FL, so that in the process of sampling the voice data after the frame division, the frequency values of at least one frame of voice data are continuous, cross-frame sampling is avoided, the reliability of voiceprint feature extraction is further improved, and the accuracy of voiceprint feature extraction is improved.

Illustratively, the specific process of obtaining the sampling condition according to the time length offset and the adjacent frame number is as follows:

assuming that the current voice data frame is the i-th frame voice data after framing processing is performed on the voice data stream, setting sampling conditions respectively corresponding to each frame voice data in the N frames of voice data according to the delta FS and the N, wherein the sampling conditions comprise sampling starting conditions and sampling stopping conditions.

Setting a sampling starting point as i x FS and setting a sampling cut-off point as i x FS + FL; setting a sampling condition of j frame voice data in N frames of voice data adjacent to a current voice frame, wherein the sampling condition comprises the following steps: setting a sampling starting point as follows: and i × FS + j × Δ FS, and setting the sampling cutoff point to i × FS + j × Δ FS + FL.

EXAMPLE III

Fig. 3 is a block diagram of a transformer signal voiceprint feature extraction device according to a third embodiment of the present invention, which is capable of executing a transformer signal voiceprint feature extraction method according to any embodiment of the present invention, and has corresponding functional modules and beneficial effects of the execution method. As shown in fig. 3, the apparatus may include:

the signal reconstruction module 310 is configured to acquire an acoustic signal in an operating state, and reconstruct the acoustic signal to obtain a reconstructed acoustic signal;

a signal selecting module 320, configured to select a frame in the reconstructed acoustic signal as a reference frame;

a frequency obtaining module 330, configured to obtain at least one adjacent frame adjacent to the reference frame to obtain a frequency value of the acoustic signal of the at least one adjacent frame;

a frequency determining module 340, configured to process frequency values of at least one of the adjacent frames of acoustic signals, and determine an average value of the processed at least one frequency value as a frequency value of the reference frame;

and a feature extraction module 350, configured to calculate a voice print feature parameter of the reference frame according to the frequency value of the reference frame.

The transformer signal voiceprint feature extraction device provided by this embodiment reconstructs an acoustic signal to obtain a reconstructed acoustic signal by acquiring an acoustic signal of an operating state of the transformer; selecting one frame in the reconstructed acoustic signal as a reference frame; acquiring at least one adjacent frame adjacent to the reference frame to obtain a frequency value of at least one adjacent frame acoustic signal; processing the frequency value of at least one sound signal of the adjacent frame, and determining the average value of the processed at least one frequency value as the frequency value of the reference frame; according to the frequency value of the reference frame, the voice voiceprint characteristic parameters of the reference frame are calculated, so that the difference of frequency domain signals caused by slight time shift deviation of voice signals in the frame is avoided, and compared with the traditional method which directly uses the frequency value of the voice signals, the method improves the stability of voiceprint characteristic extraction and the voiceprint recognition rate

Optionally, the signal reconstructing module 310 is further configured to:

performing short-time Fourier transform on the acoustic signal after windowing and framing through a Hamming window;

aiming at the transformed acoustic signals, carrying out underdetermined blind source separation to obtain de-noised signals;

carrying out four-layer eigenmode function decomposition on the de-noised signal by a variational mode decomposition method;

aiming at the decomposed acoustic signals, obtaining reconstructed acoustic signals according to the filtered high-frequency components, low-frequency components and intermediate-frequency components of the eigenmode functions;

and performing filtering processing on the filtered high-frequency component and the filtered low-frequency component of the eigenmode function by using a 3 sigma principle according to 6-layer wavelet decomposition.

Optionally, the frequency determining module 340 is further configured to set a corresponding sampling condition for each of the adjacent frame acoustic signals; and acquiring the frequency value of each adjacent frame acoustic signal according to the sampling condition corresponding to each adjacent frame acoustic signal.

The sampling condition is set according to the time length offset and the adjacent frame number.

Optionally, the transformer signal voiceprint feature extraction device further includes: a signal framing module 360.

And the signal framing module is used for dividing the reconstructed acoustic signal into a plurality of frames according to the preset frame time length and the preset frame shifting.

Optionally, the transformer signal voiceprint feature extraction device further includes: a signal detection module 370.

The signal detection module 370 is configured to determine whether the reconstructed acoustic signal is valid; and if the reconstructed signal is invalid, acquiring and reconstructing the acoustic signal again to obtain the reconstructed acoustic signal.

Preferably, the signal detection module 370 further includes a hash calculation unit 371;

the hash calculation unit 371 is configured to calculate a similarity between the reconstructed acoustic signal and a preset standard acoustic signal by using a minimum hash algorithm; judging whether the similarity exceeds a preset threshold value or not; and if the similarity exceeds a preset threshold value, determining that the reconstructed acoustic signal is valid.

Example four

Fig. 4 is a schematic structural diagram of a transformer signal voiceprint feature extraction apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the apparatus includes a processor 40, a memory 41, an input device 42, and an output device 43; the number of processors 40 in the device may be one or more, and one processor 40 is taken as an example in fig. 4; the processor 40, the memory 41, the input device 42 and the output device 43 in the transformer signal voiceprint feature extraction apparatus can be connected by a bus or other means, and the bus connection is taken as an example in fig. 4.

The memory 41 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the transformer signal voiceprint feature extraction method in the embodiment of the present invention (for example, the signal reconstruction module 310, the signal selection module 320, the frequency acquisition module 330, the frequency determination module 340, and the feature extraction module 350). The processor 40 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 41, that is, implements the transformer signal voiceprint feature extraction method described above.

The memory 41 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 41 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 41 may further include memory located remotely from processor 40, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 42 is operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 43 may include a display device such as a display screen.

EXAMPLE five

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are executed by a computer processor to perform a transformer signal voiceprint feature extraction method, where the method includes:

selecting one frame in the reconstructed acoustic signal as a reference frame;

Of course, the storage medium containing the computer-executable instructions provided by the embodiments of the present invention is not limited to the method operations described above, and may also perform related operations in the transformer signal voiceprint feature extraction provided by any embodiments of the present invention.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiment of the above search apparatus, each included unit and module are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A transformer signal voiceprint feature extraction method is characterized by comprising the following steps:

selecting one frame in the reconstructed acoustic signal as a reference frame;

2. The method for extracting voiceprint features of a voice signal according to claim 1, wherein reconstructing the voice signal to obtain a reconstructed voice signal comprises:

3. The method according to claim 1, wherein before selecting a frame in the reconstructed acoustic signal as a reference frame, the method further comprises:

judging whether the reconstructed acoustic signal is effective or not;

and if the reconstructed signal is invalid, acquiring and reconstructing the acoustic signal again to obtain the reconstructed acoustic signal.

4. The method for extracting voiceprint features of a speech signal according to claim 3, wherein the step of judging whether the reconstructed acoustic signal is valid comprises the steps of:

calculating the similarity between the reconstructed sound signal and a preset standard sound signal by using a minimum hash algorithm;

judging whether the similarity exceeds a preset threshold value or not;

and if the similarity exceeds a preset threshold value, determining that the reconstructed acoustic signal is valid.

5. The method according to claim 1, wherein before selecting a frame in the reconstructed acoustic signal as a reference frame, the method further comprises:

and dividing the reconstructed acoustic signal into a plurality of frames according to the preset frame time length and the preset frame shifting.

6. The method for extracting vocal print features of a speech signal according to claim 1, wherein the step of processing the frequency value of at least one of the acoustic signals of the adjacent frames, and determining the average of the processed at least one frequency value as the frequency value of the reference frame comprises:

respectively setting corresponding sampling conditions for each adjacent frame of acoustic signals;

and acquiring the frequency value of each adjacent frame acoustic signal according to the sampling condition corresponding to each adjacent frame acoustic signal.

7. The voice signal voiceprint feature extraction method according to claim 6, characterized in that: the sampling condition is set according to the time length offset and the adjacent frame number.

8. The utility model provides a transformer signal vocal print feature extraction device which characterized in that includes:

9. The transformer signal voiceprint feature extraction equipment is characterized in that the voice signal voiceprint feature extraction method comprises the following steps:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the transformer signal voiceprint feature extraction method of any one of claims 1 to 7.

10. A storage medium containing computer executable instructions for performing the transformer signal voiceprint feature extraction method of any one of claims 1 to 7 when executed by a computer processor.