WO2017161829A1

WO2017161829A1 - Voice signal information processing method and device

Info

Publication number: WO2017161829A1
Application number: PCT/CN2016/096988
Authority: WO
Inventors: 王永庆
Original assignee: 乐视控股（北京）有限公司; 乐视致新电子科技（天津）有限公司
Priority date: 2016-03-25
Filing date: 2016-08-26
Publication date: 2017-09-28
Also published as: CN105845138A

Abstract

A voice signal processing method and device. The method comprises: receiving a voice signal, wherein the voice signal comprises at least one voice segment (S101); acquiring signal loss information about the at least one voice segment (S102); determining a signal loss level of the voice signal according to the signal loss information about the at least one voice segment (S103), and performing voice recognition processing on the voice signal according to the signal loss level of the voice signal (S104). In the present method, a corresponding processing means is adopted according to a signal loss level of a voice signal, thus facilitating improvement of the accuracy of voice signal recognition.

Description

Speech signal processing method and device

The present application claims priority to Chinese Patent Application No. 201610179999.X filed on March 25, 2016, the entire disclosure of which is hereby incorporated by reference. in.

Technical field

The present invention relates to the field of voice recognition technology, and in particular, to a voice signal processing method and apparatus.

Background technique

With the development of smart TV technology, voice TV services have emerged, allowing users to interact with humans through voice and television. In order to support the voice TV service, a voice remote controller has appeared on the basis of the conventional remote controller. The user interacts with the television via a voice remote control.

Specifically, the voice remote controller records the user voice, generates an analog voice signal, performs analog-to-digital conversion on the analog voice signal to obtain a digital voice signal, and then transmits the digital voice signal to the television terminal, and the television terminal identifies the digital voice signal. According to the recognition result, the corresponding operation is performed to realize human-computer interaction.

In the prior art, a wireless transmission technology in the 2.4 GHz band, such as Wi-Fi, Bluetooth, or the like, is mainly used between the voice remote controller and the television terminal. Since wireless transmission technologies such as Wi-Fi and Bluetooth are easily interfered by external factors, signal loss may occur during the transmission of voice signals, which may reduce the accuracy of voice recognition and affect the user experience.

Summary of the invention

The invention provides a speech signal processing method and device for performing speech recognition and improving the accuracy of speech signal recognition.

The embodiment of the invention provides a voice signal processing method, including:

Receiving a voice signal, the voice signal including at least one voice segment;

Obtaining signal loss information of the at least one voice segment;

Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;

A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.

An embodiment of the present invention provides a voice signal processing apparatus, including:

a receiving module, configured to receive a voice signal, where the voice signal includes at least one voice segment;

An acquiring module, configured to acquire signal loss information of the at least one voice segment;

a determining module, configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment;

And a processing module, configured to perform voice recognition processing on the voice signal according to a signal loss degree of the voice signal.

Embodiments of the present invention also provide a non-transitory computer readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions for executing the voice signal Approach.

An embodiment of the present invention further provides an electronic device, including: one or more processors; and a memory; wherein the memory stores instructions executable by the one or more processors, the instructions being It is set to perform the above-described voice signal processing method.

Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are When executed, the computer is caused to execute the above-described speech signal processing method.

The method and device for processing a voice signal provided by an embodiment of the present invention obtains signal loss information of each voice segment included in a voice signal, and determines according to signal loss information of each voice segment. The signal loss of the speech signal is based on the signal loss of the speech signal, and the speech signal is subjected to speech recognition processing. The embodiment of the invention fully considers the influence of signal loss on the subsequent processing of the speech signal, and can adopt a corresponding processing manner according to the signal loss degree of the speech signal, which is beneficial to improve the accuracy of the speech signal recognition.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

1 is a schematic flowchart of a voice signal processing method according to an embodiment of the present invention;

2 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention; FIG.

FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

detailed description

The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

FIG. 1 is a schematic flowchart diagram of a voice signal processing method according to an embodiment of the present invention. As shown in Figure 1, the method includes:

101. Receive a voice signal, the voice signal including at least one voice segment.

102. Acquire signal loss information of at least one voice segment.

103. Determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment.

104. Perform speech recognition processing on the speech signal according to the signal loss degree of the speech signal.

The embodiment provides a voice signal processing method, which can be executed by a voice signal processing device to improve the accuracy of voice signal recognition.

The method provided in this embodiment is applicable to various application scenarios that require voice signal identification, and in particular, an application scenario for performing voice signal transmission by using a wireless transmission technology in a 2.4 GHz band, such as Wi-Fi or Bluetooth. The wireless transmission technology such as -Fi and Bluetooth is easily interfered by external factors. Therefore, the signal loss phenomenon is more likely to occur during the transmission of the voice signal. Therefore, the method provided in this embodiment is more suitable for this application scenario. For example, in a voice television service scenario, the voice signal processing device may be implemented on a service terminal corresponding to the television terminal or the television terminal, so that the voice signal sent by the voice remote controller is voice recognition processing and the voice recognition is improved by using the method provided in this embodiment. The accuracy rate.

The principle and flow of the method in this embodiment are described in detail below.

Specifically, the voice signal processing device receives the voice signal. For example, the voice signal processing device can receive a voice signal transmitted by a voice signal collecting device (eg, a voice remote controller, a smart phone, etc.) in each application scenario. Wherein, for the voice signal collecting device, the analog voice signal is collected, and the analog voice signal can be analog-digital converted, and then the analog-to-digital converted voice signal is sent to the voice signal processing device.

Optionally, the voice signal collecting device may further process, compress, and the like the voice signal before transmitting the voice signal to the voice signal processing device. If the voice signal received by the voice signal processing device is a coded and compressed signal, the voice signal processing device further decompresses, decodes, and the like the voice signal after receiving the voice signal.

Since the speech signal belongs to a short-term stationary signal, the speech signal processing device can segment the speech signal to obtain at least one speech segment. Wherein, the voice signal processing device can Weighted implementation with a finite window length that is movable. Each speech segment includes multiple signal points. This embodiment does not limit the length of the voice segment, and the length of the voice segment is determined by the number of signal points included in the voice segment. The length of the voice segment can be adaptively set according to the application scenario, for example, it can be 256, 1024, and the like.

In addition, the speech segment and the speech segment may be continuous or overlapping. Preferably, the segmentation method may adopt overlapping segmentation, that is, there is an overlapping portion between the previous speech segment and the latter speech segment, so as to ensure a smooth transition between the speech segment and the speech segment, and maintain the continuous Sex.

After dividing the speech signal into at least one speech segment, the speech signal processing device may acquire signal loss information of at least one speech segment. The signal loss information of the voice segment mainly includes information that can reflect the loss of signal points in the voice segment, such as missing signal points and the number of consecutive lost signal points. For convenience of description, the present embodiment regards a continuously lost signal point as a segment, which is called a signal loss segment, and takes the number of consecutive lost signal points included in the signal loss segment as the length of the signal loss segment.

Based on the foregoing, in an optional implementation, for each of the at least one voice segment, the voice signal processing device may multiply the amplitude of each two adjacent signal points in the voice segment to obtain a multiplication result greater than An adjacent signal point equal to 0 is used as a signal point of the lost speech segment, wherein adjacent signal points whose multiplication result is greater than or equal to 0 may also be referred to as a signal point that is not zero, and statistics are continuously lost in the speech segment. The signal formed by the signal point loses the length of the segment. It is worth noting that the speech segment can include one or more signal loss segments. For example, suppose a voice segment includes 200 signal points, wherein all of the 20th to 40th signal points are lost, forming a signal loss segment having a length of 21, and in addition, the 80th to 120th signal points are all lost. Another signal loss segment is formed with a length of 41.

Since the speech signal is divided into at least one speech segment, the signal loss information of the at least one speech segment can reflect the signal loss condition of the speech signal. Therefore, after acquiring the signal loss information of the voice segment, the voice signal processing device can determine the signal loss degree of the voice signal according to the signal loss information of the at least one voice segment. Among them, the signal loss of the speech signal is reversed. The extent to which the speech signal is lost is, for example, zero loss (ie, not lost), slight loss or heavy loss, and the like.

In an optional implementation, the voice signal processing apparatus may count the number of lost voice segments in the at least one voice segment according to the signal loss information of the at least one voice segment, and then perform the number of lost voice segments and the preset segment number threshold. Comparing, the signal loss degree of the voice signal is determined according to the comparison result of the number of lost voice segments and the preset segment number threshold.

Wherein, the lost speech segment refers to a speech segment in which the number of signal points lost and lost is satisfying the specified condition. For example, when it is determined that the speech segment does not occur due to the signal loss information of one speech segment, and the missing signal point satisfies the specified condition, the speech segment is determined to be a lost speech segment. The specified condition may be that the total number of lost signal points is greater than the first specified number, for example, 100, the voice signal processing device may identify the voice segment whose total number of lost signal points is greater than 50 as the lost voice segment; or, the specified condition may also be continuous If the number of lost signal points is greater than the second specified number, for example, 60, the speech signal processing apparatus can recognize the speech segment whose number of consecutively lost signal points is greater than 60 as the lost speech segment.

It is to be noted that the value of the segment threshold is not limited in this embodiment, and may be adaptively set according to an application scenario.

Further, if the number of segments is one, the voice signal processing device may compare the number of lost voice segments with a preset segment threshold. If the number of lost voice segments is greater than the preset segment threshold, the voice signal is determined to be Severe signal loss, heavy signal loss means that the signal point loss is more serious; conversely, if the number of lost speech segments is less than or equal to the preset number of segments threshold but not 0, then the speech signal is determined to be a slight signal loss, a mild signal Loss means that the signal point loss is relatively slight; if the number of lost speech segments is equal to 0, it is determined that the voice signal is zero-degree signal loss, indicating that no signal loss has occurred. For example, if the threshold number of segments is 10, the total number of voice segments divided by the voice signal is 60. If more than 10 voice segments in 60 voice segments are lost, the voice signal is a heavy signal loss; if 60 voices In the segment, there is a signal loss in the speech segment but no more than 10, indicating that the speech signal is a slight signal loss; if no speech segment is lost in the 60 speech segments, the speech signal is lost to zero. Lost.

Further optionally, the foregoing threshold number includes a first segment threshold and a second segment threshold, and the second segment threshold is greater than the first segment threshold. Based on this, the voice signal processing apparatus may compare the number of lost voice segments with the first segment threshold and the second segment threshold respectively; if the number of lost segments is less than or equal to the first segment threshold, determine that the voice signal is The zero-degree signal is lost; if the number of lost speech segments is greater than the first segment threshold but less than or equal to the second segment threshold, the speech signal is determined to be a slight signal loss; if the number of lost speech segments is greater than the second segment threshold, the speech is determined. The signal is a heavy signal loss.

It is noted herein that the segment number threshold may further include a plurality of segment number thresholds, thereby dividing the signal loss condition of the speech signal into more different signal loss degrees, such as one degree, two degrees, three degrees, four degrees, five degrees, and the like.

In another optional implementation manner, the voice signal processing apparatus may count, according to the signal loss information of the at least one voice segment, the length of the signal loss segment formed by consecutively lost signal points in each voice segment, according to the signal loss in each voice segment. The result of comparing the length of the segment with the threshold of the preset point determines the signal loss of the speech signal.

Wherein, the above-mentioned signal loss segment refers to a segment formed by consecutively lost signal points in the speech segment. The length of the signal loss segment refers to the number of consecutive lost signal points included in the signal loss segment.

If there are more consecutive lost signal points in a certain speech segment, even if the other speech segments have no signal loss or the signal loss is light, the speech signal cannot be correctly recognized, so the speech signal processing device can be based on at least one speech segment. The signal loss information is used to count the length of the signal loss segment in each voice segment; then the length of the signal loss segment in each voice segment is compared with the preset point threshold, according to the length of the signal lost segment and the preset point threshold in each voice segment. The comparison results to determine the signal loss of the speech signal.

Further optionally, if the threshold value is one, the voice signal processing apparatus may compare the length of the signal loss segment in each voice segment with the threshold of the point, and if there is a voice segment whose length of the lost segment is greater than the threshold of the point, determine The voice signal is a heavy signal loss, Severe signal loss means that the signal point loss is more serious than the situation; conversely, if there is no speech segment whose signal loss segment is longer than the point threshold, it is determined that the speech signal is a slight signal loss.

Further, optionally, the point threshold includes: a first point threshold and a second point threshold, and the second point threshold is greater than the first point threshold. Based on this, the voice signal processing apparatus may compare the lengths of the signal loss segments in each voice segment with the first point threshold and the second point threshold respectively; if the length of the missing signal segment in at least one voice segment is greater than the first point threshold The voice segment determines that the voice signal is zero-degree signal loss; if at least one voice segment has a signal loss segment whose length is greater than the first point threshold but there is no voice segment whose signal length is greater than the second point threshold, the voice signal is determined to be A slight signal loss; if there is a speech segment in which at least one speech segment has a length greater than a second point threshold, the speech signal is determined to be a heavy signal loss.

In still another optional implementation manner, the voice signal processing apparatus may simultaneously determine, according to a comparison result of the number of lost voice segments and a preset segment number threshold, and a comparison result between a length of the signal missing segment in each voice segment and a preset point threshold. The signal loss of the voice signal.

For example, the threshold number includes a first segment threshold and a second segment threshold, and the threshold includes: a first point threshold and a second point threshold, if the number of lost speech segments is greater than the second segment threshold, and at least one A voice segment having a length of the signal loss segment greater than a threshold of the second point is determined in the voice segment, and the voice signal is determined to be a heavy signal loss; if the number of the lost voice segment is less than or equal to the first segment threshold, and the at least one voice segment does not exist The length of the signal loss segment is greater than the speech segment of the first point threshold, and the speech signal is determined to be zero-degree signal loss; in other cases, the speech signal is determined to be a slight signal loss.

It should be noted that, according to the foregoing various implementation manners for determining the signal loss degree of the voice signal provided by the embodiment, a similar expansion scheme is not difficult to be considered by those skilled in the art, and these extension schemes are all within the protection scope of the present invention. Various extensions are described one by one.

After determining the signal loss degree of the speech signal, the speech signal processing device may perform speech recognition processing on the speech signal according to the signal loss degree of the speech signal.

If it is determined that the voice signal is zero-degree signal loss, that is, the voice signal does not have a signal loss, then The speech signal processing device can directly perform speech recognition processing on the speech signal, which can ensure the efficiency of the speech recognition processing, ensure the accuracy of the speech recognition, and improve the user experience.

If it is determined that the voice signal is a slight signal loss, it indicates that the voice signal is lost but not very serious, and within a range that can be correctly recognized, the voice signal processing apparatus can use the lost voice for the lost voice segment in at least one voice segment. The signal points that are not lost in the segment compensate the missing signal points in the lost speech segment, and perform speech recognition processing on the compensated speech segments to ensure the accuracy of speech recognition and improve the user experience.

It should be noted that the embodiment does not limit the specific manner of compensating for missing signal points in the lost speech segment by using signal points that are not lost in the lost speech segment. Preferably, the signal points in the entire lost speech segment can be divided into two parts. For each part, the missing signal points in the part are compensated by the signal points that are not lost in the part, because the signals in each part are The points are relatively close to each other, so the compensation of the signal points that are close to each other can ensure that the compensated speech signal is closer to the speech signal that has not been lost, and the speech recognition based on the compensated speech signal is beneficial to the speech recognition. Improve the accuracy of speech recognition.

Further, in the above signal compensation process, the missing signal points are preferably compensated by using the undiscovered signal points that are closest to the missing signal points.

Generally, in the above signal compensation process, the number of signal points that are not lost in each part is generally more than the number of lost signal points, so that the lost signal points can be completely compensated for each part. Of course, in some special cases, the number of missing signal points in a certain part may be greater than the number of unrecovered signal points. For this part, only partially lost signal points can be compensated, but for the remaining Signal points that are not compensated can be compensated one by one by using the closest signal points in the other part.

If it is determined that the speech signal is a heavy signal loss, it indicates that the signal loss of the speech signal is serious and has exceeded the range that can be correctly recognized. Therefore, the speech signal processing device outputs a prompt message to the user to indicate that the speech signal is not recognized by the heavy signal loss. The voice signal processing device may output the prompt information to the user by means of text or voice, for example, outputting the text prompt information on the interactive interface, or outputting the voice prompt information. Correct According to the prompt information, the user can take corresponding measures in time, for example, re-input the voice signal, so as to obtain the required voice service in time, and improve the user experience.

FIG. 2 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention. As shown in FIG. 2, the apparatus includes: a receiving module 21, an obtaining module 22, a determining module 23, and a processing module 24.

The receiving module 21 is configured to receive a voice signal, where the voice signal includes at least one voice segment.

The obtaining module 22 is configured to acquire signal loss information of at least one voice segment.

The determining module 23 is configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment.

The processing module 24 is configured to perform voice recognition processing on the voice signal according to the signal loss degree of the voice signal.

In an optional implementation, the obtaining module 22 is specifically configured to:

For each of the at least one speech segment, multiplying the amplitudes of each two adjacent signal points in the speech segment to obtain adjacent signal points whose multiplication result is greater than or equal to 0 as the speech segment is lost. Signaling points and counting the length of the missing signal segments formed by consecutively lost signal points in the speech segment.

In an optional implementation, as shown in FIG. 3, an implementation structure of the determining module 23 includes at least one of a first determining unit 231 and a second determining unit 232.

The first determining unit 231 is configured to count, according to the signal loss information of the at least one voice segment, the number of lost voice segments in the at least one voice segment, and determine according to the comparison result between the number of the lost voice segments and the preset segment number threshold. The signal loss of the voice signal.

The second determining unit 232 is configured to count, according to the signal loss information of the at least one voice segment, the length of the signal loss segment formed by consecutively lost signal points in each voice segment, according to the length and preset of the signal lost segment in each voice segment. The comparison result of the point threshold determines the signal loss of the speech signal.

Further, optionally, the foregoing threshold number includes: a first segment threshold and a second segment threshold greater than the first segment threshold; correspondingly, the threshold threshold includes: a first point threshold and a second point greater than the first point threshold Threshold.

Based on the above, the first determining unit 231 is specifically configured to:

If the number of lost speech segments is less than or equal to the first segment threshold, it is determined that the speech signal is zero-signal loss;

If the number of lost speech segments is greater than the first segment threshold but less than or equal to the second segment threshold, determining that the voice signal is a slight signal loss;

If the number of lost speech segments is greater than the second segment threshold, it is determined that the speech signal is a heavy signal loss.

Correspondingly, the second determining unit 232 is specifically configured to:

Determining that the voice signal is zero-degree signal loss if there is no voice segment in the at least one voice segment whose length of the signal loss segment is greater than the first point threshold;

Determining that the voice signal is a slight signal loss if the length of the signal loss segment in the at least one voice segment is greater than the first point threshold but the length of the signal loss segment is greater than the second segment threshold;

If there is a speech segment in the at least one speech segment whose length of the signal loss segment is greater than the second threshold threshold, the speech signal is determined to be a heavy signal loss.

In an optional implementation, the processing module 24 is specifically configured to:

If the voice signal is zero-degree signal lost, the voice signal is directly processed by the voice recognition process;

If the voice signal is a slight signal loss, the lost voice segment in the at least one voice segment is compensated for the missing signal point in the lost voice segment by using the signal point that is not lost in the lost voice segment, and the compensated voice segment is compensated Perform speech recognition processing;

If the voice signal is a heavy signal loss, the prompt information is output to the user, so that the voice signal is not seriously recognized as a heavy signal loss.

The voice signal processing apparatus provided in this embodiment fully considers the influence of signal loss on the subsequent processing of the voice signal, and can adopt a corresponding processing manner according to the signal loss degree of the voice signal, which is beneficial to improve the accuracy of the voice signal recognition.

The embodiment of the present application further provides a non-transitory computer readable storage medium storing computer executable instructions executable in any of the above method embodiments. Voice signal processing method.

4 is a schematic structural diagram of hardware of an electronic device for performing a voice signal processing method according to an embodiment of the present disclosure. As shown in FIG. 4, the device includes:

One or more processors 410 and memory 420, one processor 410 is exemplified in FIG.

The apparatus for performing the voice signal processing method may further include: an input device 430 and an output device 440.

The processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, as exemplified by a bus connection in FIG.

The memory 420 is a non-volatile computer readable storage medium and can be used for storing a non-volatile software program, a non-volatile computer-executable program, and a module, such as a program corresponding to the voice signal processing method in the embodiment of the present application. An instruction/module (for example, the receiving module 21, the obtaining module 22, the determining module 23, and the processing module 24 shown in FIG. 2). The processor 410 executes various functional applications and data processing of the electronic device by executing non-volatile software programs, instructions, and modules stored in the memory 420, that is, implementing the voice signal processing method of the above method embodiment.

The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the voice signal processing device, and the like. Further, the memory 420 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one disk. A memory device, flash memory device, or other non-volatile solid state memory device. In some embodiments, memory 420 can optionally include memory remotely located relative to processor 410, which can be coupled to the voice signal processing device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 430 can receive the input digital or character information and generate a key signal input related to user settings and function control of the voice signal processing device. Output device 440 can include a display device such as a display screen.

The one or more modules are stored in the memory 420, and when executed by the one or more processors 410, perform a speech signal processing method in any of the above method embodiments.

The above products can perform the methods provided by the embodiments of the present application, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.

The electronic device of the embodiment of the invention exists in various forms, including but not limited to:

(1) Mobile communication devices: These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.

(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.

(3) Portable entertainment devices: These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.

(4) Server: A device that provides computing services. The server consists of a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability , reliability, security, scalability Sexuality, manageability and other aspects are highly demanding.

(5) Other electronic devices with data interaction functions.

Finally, it should be understood that those skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile manner. In a computer readable storage medium, the program, when executed, may include the flow of an embodiment of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM).

The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.

Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware. Based on such understanding, the above-described technical solutions may be embodied in the form of software products in essence or in the form of software products, which may be stored in a computer readable storage medium such as ROM/RAM, magnetic Discs, optical discs, etc., include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or portions of the embodiments.

It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and are not limited thereto; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

A voice signal processing method is characterized in that it is applied to an electronic device, including:

Receiving a voice signal, the voice signal including at least one voice segment;

Obtaining signal loss information of the at least one voice segment;

Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;

A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
The method according to claim 1, wherein the acquiring signal loss information of the at least one voice segment comprises:

For each of the at least one voice segment, multiplying the amplitudes of each two adjacent signal points in the voice segment to obtain adjacent signal points whose multiplication result is greater than or equal to 0 as the voice segment The missing signal point and the length of the missing signal segment formed by consecutively lost signal points in the speech segment is counted.
The method according to claim 1, wherein the determining the signal loss degree of the voice signal according to the signal loss information of the at least one voice segment comprises:

And counting, according to the signal loss information of the at least one voice segment, the number of the lost voice segments in the at least one voice segment, and determining the voice signal according to a comparison result between the number of the lost voice segments and the preset segment number threshold. Signal loss; and/or

And calculating, according to the signal loss information of the at least one voice segment, a length of a signal loss segment formed by consecutively lost signal points in each voice segment, according to a comparison result between a length of the signal loss segment and a preset point threshold in each voice segment, A signal loss degree of the voice signal is determined.
The method according to claim 3, wherein the threshold number of thresholds comprises: a first segment number threshold and a second segment number threshold greater than the first segment number threshold; the point threshold includes: a first point threshold and a second point threshold greater than the first point threshold value;

Determining a signal loss degree of the voice signal according to a comparison result between the number of the lost voice segments and the preset segment number threshold, including:

Determining that the voice signal is zero-signal loss if the number of the lost voice segments is less than or equal to the first segment number threshold;

Determining that the voice signal is a slight signal loss if the number of the lost voice segments is greater than the first segment threshold but less than or equal to the second segment threshold;

If the number of the lost voice segments is greater than the second segment threshold, determining that the voice signal is a heavy signal loss;

Determining a signal loss degree of the voice signal according to a comparison result between a length of a signal loss segment in each voice segment and a preset point threshold, including:

Determining that the voice signal is zero-signal loss if there is no voice segment in the at least one voice segment whose length of the signal loss segment is greater than the first point threshold;

Determining that the voice signal is a slight signal loss if the length of the signal loss segment in the at least one voice segment is greater than the first point threshold but the length of the signal loss segment is greater than the second segment threshold ;

And if there is a voice segment in the at least one voice segment whose length of the signal loss segment is greater than the second point threshold, determining that the voice signal is a heavy signal loss.
The method according to any one of claims 1 to 4, wherein the performing voice recognition processing on the voice signal according to a signal loss degree of the voice signal comprises:

If the voice signal is lost to a zero degree signal, the voice signal is directly subjected to voice recognition processing;

If the voice signal is a slight signal loss, compensating for missing signal points in the lost voice segment by using the missing signal points in the lost voice segment for the lost voice segment in the at least one voice segment And performing speech recognition processing on the compensated speech segment;

If the voice signal is a heavy signal loss, the prompt information is output to the user to prompt the voice signal to be seriously recognized as a heavy signal loss.
A voice signal processing device, comprising:

a receiving module, configured to receive a voice signal, where the voice signal includes at least one voice segment;

An acquiring module, configured to acquire signal loss information of the at least one voice segment;

a determining module, configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment;

And a processing module, configured to perform voice recognition processing on the voice signal according to a signal loss degree of the voice signal.
The device according to claim 6, wherein the obtaining module is specifically configured to:

For each of the at least one voice segment, multiplying the amplitudes of each two adjacent signal points in the voice segment to obtain adjacent signal points whose multiplication result is greater than or equal to 0 as the voice segment The missing signal point and the length of the missing signal segment formed by consecutively lost signal points in the speech segment is counted.
The apparatus according to claim 6, wherein the determining module comprises:

a first determining unit, configured to count, according to signal loss information of the at least one voice segment, a number of lost voice segments in the at least one voice segment, according to a comparison between the number of the lost voice segments and a preset segment threshold As a result, determining a signal loss of the speech signal; and/or

a second determining unit, configured to count, according to the signal loss information of the at least one voice segment, a length of a signal loss segment formed by consecutively lost signal points in each voice segment, according to a length of the signal lost segment in each voice segment and a pre- A comparison result of the point threshold is set to determine a signal loss degree of the voice signal.
The apparatus according to claim 8, wherein the number of segments threshold comprises: a first segment number threshold and a second segment number threshold greater than the first segment number threshold; the point threshold includes: a first point threshold and a second point threshold greater than the first point threshold;

The first determining unit is specifically configured to:

Determining that the voice signal is zero-signal loss if the number of the lost voice segments is less than or equal to the first segment number threshold;

Determining that the voice signal is a slight signal loss if the number of the lost voice segments is greater than the first segment threshold but less than or equal to the second segment threshold;

If the number of the lost voice segments is greater than the second segment threshold, determining that the voice signal is a heavy signal loss;

The second determining unit is specifically configured to:

Determining that the voice signal is zero-signal loss if there is no voice segment in the at least one voice segment whose length of the signal loss segment is greater than the first point threshold;

Determining that the voice signal is a slight signal loss if the length of the signal loss segment in the at least one voice segment is greater than the first point threshold but the length of the signal loss segment is greater than the second segment threshold ;

And if there is a voice segment in the at least one voice segment whose length of the signal loss segment is greater than the second point threshold, determining that the voice signal is a heavy signal loss.
The device according to any one of claims 6-9, wherein the processing module is specifically configured to:

If the voice signal is lost to a zero degree signal, the voice signal is directly subjected to voice recognition processing;

If the voice signal is a slight signal loss, compensating for missing signal points in the lost voice segment by using the missing signal points in the lost voice segment for the lost voice segment in the at least one voice segment And performing speech recognition processing on the compensated speech segment;

If the voice signal is a heavy signal loss, the prompt information is output to the user to prompt the voice signal to be seriously recognized as a heavy signal loss.
A non-transitory computer readable storage medium storing computer executable instructions, the computer executable instructions being set to:

Receiving a voice signal, the voice signal including at least one voice segment;

Obtaining signal loss information of the at least one voice segment;

Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;

A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
An electronic device comprising:

At least one processor; and,

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the one processor, the instructions being executed by the at least one processor to enable the at least one processor to:

Receiving a voice signal, the voice signal including at least one voice segment;

Obtaining signal loss information of the at least one voice segment;

Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;

A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, causing the computer The method of any one of claims 1 to 5 is performed.