WO2017161829A1 - Voice signal information processing method and device - Google Patents
Voice signal information processing method and device Download PDFInfo
- Publication number
- WO2017161829A1 WO2017161829A1 PCT/CN2016/096988 CN2016096988W WO2017161829A1 WO 2017161829 A1 WO2017161829 A1 WO 2017161829A1 CN 2016096988 W CN2016096988 W CN 2016096988W WO 2017161829 A1 WO2017161829 A1 WO 2017161829A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- voice
- segment
- signal loss
- threshold
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 17
- 230000010365 information processing Effects 0.000 title 1
- 238000012545 processing Methods 0.000 claims abstract description 70
- 238000000034 method Methods 0.000 claims abstract description 32
- 230000008569 process Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 238000010295 mobile communication Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 229920001690 polydopamine Polymers 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/422—Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
- H04N21/42204—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
- H04N21/42206—User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details
- H04N21/42222—Additional components integrated in the remote control device, e.g. timer, speaker, sensors for detecting position, direction or movement of the remote control, microphone or battery charging device
Definitions
- the present invention relates to the field of voice recognition technology, and in particular, to a voice signal processing method and apparatus.
- voice TV services have emerged, allowing users to interact with humans through voice and television.
- a voice remote controller has appeared on the basis of the conventional remote controller. The user interacts with the television via a voice remote control.
- the voice remote controller records the user voice, generates an analog voice signal, performs analog-to-digital conversion on the analog voice signal to obtain a digital voice signal, and then transmits the digital voice signal to the television terminal, and the television terminal identifies the digital voice signal. According to the recognition result, the corresponding operation is performed to realize human-computer interaction.
- a wireless transmission technology in the 2.4 GHz band such as Wi-Fi, Bluetooth, or the like, is mainly used between the voice remote controller and the television terminal. Since wireless transmission technologies such as Wi-Fi and Bluetooth are easily interfered by external factors, signal loss may occur during the transmission of voice signals, which may reduce the accuracy of voice recognition and affect the user experience.
- the invention provides a speech signal processing method and device for performing speech recognition and improving the accuracy of speech signal recognition.
- the embodiment of the invention provides a voice signal processing method, including:
- the voice signal including at least one voice segment
- a speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
- An embodiment of the present invention provides a voice signal processing apparatus, including:
- a receiving module configured to receive a voice signal, where the voice signal includes at least one voice segment
- An acquiring module configured to acquire signal loss information of the at least one voice segment
- a determining module configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment
- a processing module configured to perform voice recognition processing on the voice signal according to a signal loss degree of the voice signal.
- Embodiments of the present invention also provide a non-transitory computer readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions for executing the voice signal Approach.
- An embodiment of the present invention further provides an electronic device, including: one or more processors; and a memory; wherein the memory stores instructions executable by the one or more processors, the instructions being It is set to perform the above-described voice signal processing method.
- Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are When executed, the computer is caused to execute the above-described speech signal processing method.
- the method and device for processing a voice signal obtained by an embodiment of the present invention obtains signal loss information of each voice segment included in a voice signal, and determines according to signal loss information of each voice segment.
- the signal loss of the speech signal is based on the signal loss of the speech signal, and the speech signal is subjected to speech recognition processing.
- the embodiment of the invention fully considers the influence of signal loss on the subsequent processing of the speech signal, and can adopt a corresponding processing manner according to the signal loss degree of the speech signal, which is beneficial to improve the accuracy of the speech signal recognition.
- FIG. 1 is a schematic flowchart of a voice signal processing method according to an embodiment of the present invention
- FIG. 2 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention.
- FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention.
- FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
- FIG. 1 is a schematic flowchart diagram of a voice signal processing method according to an embodiment of the present invention. As shown in Figure 1, the method includes:
- a voice signal including at least one voice segment.
- the embodiment provides a voice signal processing method, which can be executed by a voice signal processing device to improve the accuracy of voice signal recognition.
- the method provided in this embodiment is applicable to various application scenarios that require voice signal identification, and in particular, an application scenario for performing voice signal transmission by using a wireless transmission technology in a 2.4 GHz band, such as Wi-Fi or Bluetooth.
- the wireless transmission technology such as -Fi and Bluetooth is easily interfered by external factors. Therefore, the signal loss phenomenon is more likely to occur during the transmission of the voice signal. Therefore, the method provided in this embodiment is more suitable for this application scenario.
- the voice signal processing device may be implemented on a service terminal corresponding to the television terminal or the television terminal, so that the voice signal sent by the voice remote controller is voice recognition processing and the voice recognition is improved by using the method provided in this embodiment.
- the accuracy rate may be implemented on a service terminal corresponding to the television terminal or the television terminal.
- the voice signal processing device receives the voice signal.
- the voice signal processing device can receive a voice signal transmitted by a voice signal collecting device (eg, a voice remote controller, a smart phone, etc.) in each application scenario.
- a voice signal collecting device e.g, a voice remote controller, a smart phone, etc.
- the analog voice signal is collected, and the analog voice signal can be analog-digital converted, and then the analog-to-digital converted voice signal is sent to the voice signal processing device.
- the voice signal collecting device may further process, compress, and the like the voice signal before transmitting the voice signal to the voice signal processing device. If the voice signal received by the voice signal processing device is a coded and compressed signal, the voice signal processing device further decompresses, decodes, and the like the voice signal after receiving the voice signal.
- the speech signal processing device can segment the speech signal to obtain at least one speech segment.
- the voice signal processing device can Weighted implementation with a finite window length that is movable.
- Each speech segment includes multiple signal points. This embodiment does not limit the length of the voice segment, and the length of the voice segment is determined by the number of signal points included in the voice segment.
- the length of the voice segment can be adaptively set according to the application scenario, for example, it can be 256, 1024, and the like.
- the speech segment and the speech segment may be continuous or overlapping.
- the segmentation method may adopt overlapping segmentation, that is, there is an overlapping portion between the previous speech segment and the latter speech segment, so as to ensure a smooth transition between the speech segment and the speech segment, and maintain the continuous Sex.
- the speech signal processing device may acquire signal loss information of at least one speech segment.
- the signal loss information of the voice segment mainly includes information that can reflect the loss of signal points in the voice segment, such as missing signal points and the number of consecutive lost signal points.
- the present embodiment regards a continuously lost signal point as a segment, which is called a signal loss segment, and takes the number of consecutive lost signal points included in the signal loss segment as the length of the signal loss segment.
- the voice signal processing device may multiply the amplitude of each two adjacent signal points in the voice segment to obtain a multiplication result greater than
- An adjacent signal point equal to 0 is used as a signal point of the lost speech segment, wherein adjacent signal points whose multiplication result is greater than or equal to 0 may also be referred to as a signal point that is not zero, and statistics are continuously lost in the speech segment.
- the signal formed by the signal point loses the length of the segment. It is worth noting that the speech segment can include one or more signal loss segments.
- a voice segment includes 200 signal points, wherein all of the 20th to 40th signal points are lost, forming a signal loss segment having a length of 21, and in addition, the 80th to 120th signal points are all lost.
- Another signal loss segment is formed with a length of 41.
- the signal loss information of the at least one speech segment can reflect the signal loss condition of the speech signal. Therefore, after acquiring the signal loss information of the voice segment, the voice signal processing device can determine the signal loss degree of the voice signal according to the signal loss information of the at least one voice segment. Among them, the signal loss of the speech signal is reversed.
- the extent to which the speech signal is lost is, for example, zero loss (ie, not lost), slight loss or heavy loss, and the like.
- the voice signal processing apparatus may count the number of lost voice segments in the at least one voice segment according to the signal loss information of the at least one voice segment, and then perform the number of lost voice segments and the preset segment number threshold. Comparing, the signal loss degree of the voice signal is determined according to the comparison result of the number of lost voice segments and the preset segment number threshold.
- the lost speech segment refers to a speech segment in which the number of signal points lost and lost is satisfying the specified condition. For example, when it is determined that the speech segment does not occur due to the signal loss information of one speech segment, and the missing signal point satisfies the specified condition, the speech segment is determined to be a lost speech segment.
- the specified condition may be that the total number of lost signal points is greater than the first specified number, for example, 100, the voice signal processing device may identify the voice segment whose total number of lost signal points is greater than 50 as the lost voice segment; or, the specified condition may also be continuous If the number of lost signal points is greater than the second specified number, for example, 60, the speech signal processing apparatus can recognize the speech segment whose number of consecutively lost signal points is greater than 60 as the lost speech segment.
- the value of the segment threshold is not limited in this embodiment, and may be adaptively set according to an application scenario.
- the voice signal processing device may compare the number of lost voice segments with a preset segment threshold. If the number of lost voice segments is greater than the preset segment threshold, the voice signal is determined to be Severe signal loss, heavy signal loss means that the signal point loss is more serious; conversely, if the number of lost speech segments is less than or equal to the preset number of segments threshold but not 0, then the speech signal is determined to be a slight signal loss, a mild signal Loss means that the signal point loss is relatively slight; if the number of lost speech segments is equal to 0, it is determined that the voice signal is zero-degree signal loss, indicating that no signal loss has occurred.
- the threshold number of segments is 10
- the total number of voice segments divided by the voice signal is 60. If more than 10 voice segments in 60 voice segments are lost, the voice signal is a heavy signal loss; if 60 voices In the segment, there is a signal loss in the speech segment but no more than 10, indicating that the speech signal is a slight signal loss; if no speech segment is lost in the 60 speech segments, the speech signal is lost to zero. Lost.
- the foregoing threshold number includes a first segment threshold and a second segment threshold, and the second segment threshold is greater than the first segment threshold.
- the voice signal processing apparatus may compare the number of lost voice segments with the first segment threshold and the second segment threshold respectively; if the number of lost segments is less than or equal to the first segment threshold, determine that the voice signal is The zero-degree signal is lost; if the number of lost speech segments is greater than the first segment threshold but less than or equal to the second segment threshold, the speech signal is determined to be a slight signal loss; if the number of lost speech segments is greater than the second segment threshold, the speech is determined.
- the signal is a heavy signal loss.
- segment number threshold may further include a plurality of segment number thresholds, thereby dividing the signal loss condition of the speech signal into more different signal loss degrees, such as one degree, two degrees, three degrees, four degrees, five degrees, and the like.
- the voice signal processing apparatus may count, according to the signal loss information of the at least one voice segment, the length of the signal loss segment formed by consecutively lost signal points in each voice segment, according to the signal loss in each voice segment. The result of comparing the length of the segment with the threshold of the preset point determines the signal loss of the speech signal.
- the above-mentioned signal loss segment refers to a segment formed by consecutively lost signal points in the speech segment.
- the length of the signal loss segment refers to the number of consecutive lost signal points included in the signal loss segment.
- the speech signal processing device can be based on at least one speech segment.
- the signal loss information is used to count the length of the signal loss segment in each voice segment; then the length of the signal loss segment in each voice segment is compared with the preset point threshold, according to the length of the signal lost segment and the preset point threshold in each voice segment. The comparison results to determine the signal loss of the speech signal.
- the voice signal processing apparatus may compare the length of the signal loss segment in each voice segment with the threshold of the point, and if there is a voice segment whose length of the lost segment is greater than the threshold of the point, determine The voice signal is a heavy signal loss, Severe signal loss means that the signal point loss is more serious than the situation; conversely, if there is no speech segment whose signal loss segment is longer than the point threshold, it is determined that the speech signal is a slight signal loss.
- the point threshold includes: a first point threshold and a second point threshold, and the second point threshold is greater than the first point threshold.
- the voice signal processing apparatus may compare the lengths of the signal loss segments in each voice segment with the first point threshold and the second point threshold respectively; if the length of the missing signal segment in at least one voice segment is greater than the first point threshold The voice segment determines that the voice signal is zero-degree signal loss; if at least one voice segment has a signal loss segment whose length is greater than the first point threshold but there is no voice segment whose signal length is greater than the second point threshold, the voice signal is determined to be A slight signal loss; if there is a speech segment in which at least one speech segment has a length greater than a second point threshold, the speech signal is determined to be a heavy signal loss.
- the voice signal processing apparatus may simultaneously determine, according to a comparison result of the number of lost voice segments and a preset segment number threshold, and a comparison result between a length of the signal missing segment in each voice segment and a preset point threshold. The signal loss of the voice signal.
- the threshold number includes a first segment threshold and a second segment threshold
- the threshold includes: a first point threshold and a second point threshold, if the number of lost speech segments is greater than the second segment threshold, and at least one A voice segment having a length of the signal loss segment greater than a threshold of the second point is determined in the voice segment, and the voice signal is determined to be a heavy signal loss; if the number of the lost voice segment is less than or equal to the first segment threshold, and the at least one voice segment does not exist
- the length of the signal loss segment is greater than the speech segment of the first point threshold, and the speech signal is determined to be zero-degree signal loss; in other cases, the speech signal is determined to be a slight signal loss.
- the speech signal processing device may perform speech recognition processing on the speech signal according to the signal loss degree of the speech signal.
- the speech signal processing device can directly perform speech recognition processing on the speech signal, which can ensure the efficiency of the speech recognition processing, ensure the accuracy of the speech recognition, and improve the user experience.
- the voice signal processing apparatus can use the lost voice for the lost voice segment in at least one voice segment.
- the signal points that are not lost in the segment compensate the missing signal points in the lost speech segment, and perform speech recognition processing on the compensated speech segments to ensure the accuracy of speech recognition and improve the user experience.
- the embodiment does not limit the specific manner of compensating for missing signal points in the lost speech segment by using signal points that are not lost in the lost speech segment.
- the signal points in the entire lost speech segment can be divided into two parts.
- the missing signal points in the part are compensated by the signal points that are not lost in the part, because the signals in each part are The points are relatively close to each other, so the compensation of the signal points that are close to each other can ensure that the compensated speech signal is closer to the speech signal that has not been lost, and the speech recognition based on the compensated speech signal is beneficial to the speech recognition. Improve the accuracy of speech recognition.
- the missing signal points are preferably compensated by using the undiscovered signal points that are closest to the missing signal points.
- the number of signal points that are not lost in each part is generally more than the number of lost signal points, so that the lost signal points can be completely compensated for each part.
- the number of missing signal points in a certain part may be greater than the number of unrecovered signal points. For this part, only partially lost signal points can be compensated, but for the remaining Signal points that are not compensated can be compensated one by one by using the closest signal points in the other part.
- the speech signal processing device If it is determined that the speech signal is a heavy signal loss, it indicates that the signal loss of the speech signal is serious and has exceeded the range that can be correctly recognized. Therefore, the speech signal processing device outputs a prompt message to the user to indicate that the speech signal is not recognized by the heavy signal loss.
- the voice signal processing device may output the prompt information to the user by means of text or voice, for example, outputting the text prompt information on the interactive interface, or outputting the voice prompt information. Correct According to the prompt information, the user can take corresponding measures in time, for example, re-input the voice signal, so as to obtain the required voice service in time, and improve the user experience.
- FIG. 2 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention. As shown in FIG. 2, the apparatus includes: a receiving module 21, an obtaining module 22, a determining module 23, and a processing module 24.
- the receiving module 21 is configured to receive a voice signal, where the voice signal includes at least one voice segment.
- the obtaining module 22 is configured to acquire signal loss information of at least one voice segment.
- the determining module 23 is configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment.
- the processing module 24 is configured to perform voice recognition processing on the voice signal according to the signal loss degree of the voice signal.
- the obtaining module 22 is specifically configured to:
- an implementation structure of the determining module 23 includes at least one of a first determining unit 231 and a second determining unit 232.
- the first determining unit 231 is configured to count, according to the signal loss information of the at least one voice segment, the number of lost voice segments in the at least one voice segment, and determine according to the comparison result between the number of the lost voice segments and the preset segment number threshold. The signal loss of the voice signal.
- the second determining unit 232 is configured to count, according to the signal loss information of the at least one voice segment, the length of the signal loss segment formed by consecutively lost signal points in each voice segment, according to the length and preset of the signal lost segment in each voice segment.
- the comparison result of the point threshold determines the signal loss of the speech signal.
- the foregoing threshold number includes: a first segment threshold and a second segment threshold greater than the first segment threshold; correspondingly, the threshold threshold includes: a first point threshold and a second point greater than the first point threshold Threshold.
- the first determining unit 231 is specifically configured to:
- the speech signal is a heavy signal loss.
- the second determining unit 232 is specifically configured to:
- the voice signal is zero-degree signal loss if there is no voice segment in the at least one voice segment whose length of the signal loss segment is greater than the first point threshold;
- the voice signal is a slight signal loss if the length of the signal loss segment in the at least one voice segment is greater than the first point threshold but the length of the signal loss segment is greater than the second segment threshold;
- the speech signal is determined to be a heavy signal loss.
- processing module 24 is specifically configured to:
- the voice signal is zero-degree signal lost, the voice signal is directly processed by the voice recognition process;
- the lost voice segment in the at least one voice segment is compensated for the missing signal point in the lost voice segment by using the signal point that is not lost in the lost voice segment, and the compensated voice segment is compensated Perform speech recognition processing;
- the prompt information is output to the user, so that the voice signal is not seriously recognized as a heavy signal loss.
- the voice signal processing apparatus provided in this embodiment fully considers the influence of signal loss on the subsequent processing of the voice signal, and can adopt a corresponding processing manner according to the signal loss degree of the voice signal, which is beneficial to improve the accuracy of the voice signal recognition.
- the embodiment of the present application further provides a non-transitory computer readable storage medium storing computer executable instructions executable in any of the above method embodiments.
- Voice signal processing method a non-transitory computer readable storage medium storing computer executable instructions executable in any of the above method embodiments.
- FIG. 4 is a schematic structural diagram of hardware of an electronic device for performing a voice signal processing method according to an embodiment of the present disclosure. As shown in FIG. 4, the device includes:
- processors 410 and memory 420 One or more processors 410 and memory 420, one processor 410 is exemplified in FIG.
- the apparatus for performing the voice signal processing method may further include: an input device 430 and an output device 440.
- the processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
- the memory 420 is a non-volatile computer readable storage medium and can be used for storing a non-volatile software program, a non-volatile computer-executable program, and a module, such as a program corresponding to the voice signal processing method in the embodiment of the present application.
- An instruction/module (for example, the receiving module 21, the obtaining module 22, the determining module 23, and the processing module 24 shown in FIG. 2).
- the processor 410 executes various functional applications and data processing of the electronic device by executing non-volatile software programs, instructions, and modules stored in the memory 420, that is, implementing the voice signal processing method of the above method embodiment.
- the memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the voice signal processing device, and the like. Further, the memory 420 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one disk. A memory device, flash memory device, or other non-volatile solid state memory device. In some embodiments, memory 420 can optionally include memory remotely located relative to processor 410, which can be coupled to the voice signal processing device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
- the input device 430 can receive the input digital or character information and generate a key signal input related to user settings and function control of the voice signal processing device.
- Output device 440 can include a display device such as a display screen.
- the one or more modules are stored in the memory 420, and when executed by the one or more processors 410, perform a speech signal processing method in any of the above method embodiments.
- the electronic device of the embodiment of the invention exists in various forms, including but not limited to:
- Mobile communication devices These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication.
- Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
- Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access.
- Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
- Portable entertainment devices These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.
- the server consists of a processor, a hard disk, a memory, a system bus, etc.
- the server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability , reliability, security, scalability Sexuality, manageability and other aspects are highly demanding.
- the program when executed, may include the flow of an embodiment of the methods as described above.
- the storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM).
- the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Telephone Function (AREA)
Abstract
A voice signal processing method and device. The method comprises: receiving a voice signal, wherein the voice signal comprises at least one voice segment (S101); acquiring signal loss information about the at least one voice segment (S102); determining a signal loss level of the voice signal according to the signal loss information about the at least one voice segment (S103), and performing voice recognition processing on the voice signal according to the signal loss level of the voice signal (S104). In the present method, a corresponding processing means is adopted according to a signal loss level of a voice signal, thus facilitating improvement of the accuracy of voice signal recognition.
Description
本申请要求于2016年3月25日提交中国专利局、申请号为201610179999.X、发明名称为“语音信号处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201610179999.X filed on March 25, 2016, the entire disclosure of which is hereby incorporated by reference. in.
本发明涉及语音识别技术领域,尤其涉及一种语音信号处理方法及装置。The present invention relates to the field of voice recognition technology, and in particular, to a voice signal processing method and apparatus.
随着智能电视技术的发展,出现了语音电视业务,即允许用户通过语音与电视进行人机交互。为了支持语音电视业务,在传统遥控器的基础上出现了语音遥控器。用户通过语音遥控器与电视进行语音交互。With the development of smart TV technology, voice TV services have emerged, allowing users to interact with humans through voice and television. In order to support the voice TV service, a voice remote controller has appeared on the basis of the conventional remote controller. The user interacts with the television via a voice remote control.
具体的,语音遥控器对用户语音进行录制,生成模拟语音信号,对模拟语音信号进行模数转换,以获得数字语音信号,然后将数字语音信号传输给电视终端,电视终端对数字语音信号进行识别,根据识别结果执行相应操作,实现人机交互。Specifically, the voice remote controller records the user voice, generates an analog voice signal, performs analog-to-digital conversion on the analog voice signal to obtain a digital voice signal, and then transmits the digital voice signal to the television terminal, and the television terminal identifies the digital voice signal. According to the recognition result, the corresponding operation is performed to realize human-computer interaction.
在现有技术中,语音遥控器与电视终端之间主要采用2.4GHz频段上的无线传输技术,例如Wi-Fi、蓝牙等进行通信。由于Wi-Fi、蓝牙等无线传输技术很容易受到外界因素的干扰,因此在语音信号的传输过程中,很可能会出现信号丢失现象,这会降低语音识别的准确率,影响用户体验。In the prior art, a wireless transmission technology in the 2.4 GHz band, such as Wi-Fi, Bluetooth, or the like, is mainly used between the voice remote controller and the television terminal. Since wireless transmission technologies such as Wi-Fi and Bluetooth are easily interfered by external factors, signal loss may occur during the transmission of voice signals, which may reduce the accuracy of voice recognition and affect the user experience.
发明内容Summary of the invention
本发明提供一种语音信号处理方法及装置,用以进行语音识别,提高语音信号识别的准确率。The invention provides a speech signal processing method and device for performing speech recognition and improving the accuracy of speech signal recognition.
本发明实施例提供一种语音信号处理方法,包括:
The embodiment of the invention provides a voice signal processing method, including:
接收语音信号,所述语音信号包括至少一个语音段;Receiving a voice signal, the voice signal including at least one voice segment;
获取所述至少一个语音段的信号丢失信息;Obtaining signal loss information of the at least one voice segment;
根据所述至少一个语音段的信号丢失信息,确定所述语音信号的信号丢失度;Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;
根据所述语音信号的信号丢失度,对所述语音信号进行语音识别处理。A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
本发明实施例提供一种语音信号处理装置,包括:An embodiment of the present invention provides a voice signal processing apparatus, including:
接收模块,用于接收语音信号,所述语音信号包括至少一个语音段;a receiving module, configured to receive a voice signal, where the voice signal includes at least one voice segment;
获取模块,用于获取所述至少一个语音段的信号丢失信息;An acquiring module, configured to acquire signal loss information of the at least one voice segment;
确定模块,用于根据所述至少一个语音段的信号丢失信息,确定所述语音信号的信号丢失度;a determining module, configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment;
处理模块,用于根据所述语音信号的信号丢失度,对所述语音信号进行语音识别处理。And a processing module, configured to perform voice recognition processing on the voice signal according to a signal loss degree of the voice signal.
本发明实施例还提供了一种非易失性计算机可读存储介质,其中,该非易失性计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于执行上述语音信号处理方法。Embodiments of the present invention also provide a non-transitory computer readable storage medium, wherein the non-transitory computer-readable storage medium stores computer-executable instructions for executing the voice signal Approach.
本发明实施例还提供了一种电子设备,包括:一个或多个处理器;以及,存储器;其中,所述存储器存储有可被所述一个或多个处理器执行的指令,所述指令被设置为用于执行上述语音信号处理方法。An embodiment of the present invention further provides an electronic device, including: one or more processors; and a memory; wherein the memory stores instructions executable by the one or more processors, the instructions being It is set to perform the above-described voice signal processing method.
本发明实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述语音信号处理方法。Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are When executed, the computer is caused to execute the above-described speech signal processing method.
本发明实施例提供的语音信号处理方法及装置,通过获取语音信号包括的各语音段的信号丢失信息,根据各语音段的信号丢失信息,确定
语音信号的信号丢失度,基于语音信号的信号丢失度,对语音信号进行语音识别处理。本发明实施例充分考虑了信号丢失对语音信号后续处理的影响,并且可以根据语音信号的信号丢失度,采取相应的处理方式,有利于可以提高语音信号识别的准确率。The method and device for processing a voice signal provided by an embodiment of the present invention obtains signal loss information of each voice segment included in a voice signal, and determines according to signal loss information of each voice segment.
The signal loss of the speech signal is based on the signal loss of the speech signal, and the speech signal is subjected to speech recognition processing. The embodiment of the invention fully considers the influence of signal loss on the subsequent processing of the speech signal, and can adopt a corresponding processing manner according to the signal loss degree of the speech signal, which is beneficial to improve the accuracy of the speech signal recognition.
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.
图1为本发明一实施例提供的语音信号处理方法的流程示意图;1 is a schematic flowchart of a voice signal processing method according to an embodiment of the present invention;
图2为本发明另一实施例提供的语音信号处理装置的结构示意图;2 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention;
图3为本发明又一实施例提供的语音信号处理装置的结构示意图;FIG. 3 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention; FIG.
图4为本发明实施例提供的一种电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described in conjunction with the drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
图1为本发明一实施例提供的语音信号处理方法的流程示意图。如图1所示,该方法包括:FIG. 1 is a schematic flowchart diagram of a voice signal processing method according to an embodiment of the present invention. As shown in Figure 1, the method includes:
101、接收语音信号,所述语音信号包括至少一个语音段。
101. Receive a voice signal, the voice signal including at least one voice segment.
102、获取至少一个语音段的信号丢失信息。102. Acquire signal loss information of at least one voice segment.
103、根据至少一个语音段的信号丢失信息,确定语音信号的信号丢失度。103. Determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment.
104、根据语音信号的信号丢失度,对语音信号进行语音识别处理。104. Perform speech recognition processing on the speech signal according to the signal loss degree of the speech signal.
本实施例提供一种语音信号处理方法,可由语音信号处理装置来执行,用以提高语音信号识别的准确率。The embodiment provides a voice signal processing method, which can be executed by a voice signal processing device to improve the accuracy of voice signal recognition.
本实施例提供的方法适用于各种需要进行语音信号识别的应用场景,特别地,对于采用2.4GHz频段上的无线传输技术,例如Wi-Fi、蓝牙等进行语音信号传输的应用场景,由于Wi-Fi、蓝牙等无线传输技术很容易受到外界因素的干扰,因此在语音信号的传输过程中,更容易出现信号丢失现象,因此本实施例提供的方法更为适合这种应用场景。例如,在语音电视业务场景中,语音信号处理装置可位于电视终端或电视终端对应的服务端实现,从而采用本实施例提供的方法对语音遥控器发送的语音信号进行语音识别处理,提高语音识别的准确率。The method provided in this embodiment is applicable to various application scenarios that require voice signal identification, and in particular, an application scenario for performing voice signal transmission by using a wireless transmission technology in a 2.4 GHz band, such as Wi-Fi or Bluetooth. The wireless transmission technology such as -Fi and Bluetooth is easily interfered by external factors. Therefore, the signal loss phenomenon is more likely to occur during the transmission of the voice signal. Therefore, the method provided in this embodiment is more suitable for this application scenario. For example, in a voice television service scenario, the voice signal processing device may be implemented on a service terminal corresponding to the television terminal or the television terminal, so that the voice signal sent by the voice remote controller is voice recognition processing and the voice recognition is improved by using the method provided in this embodiment. The accuracy rate.
下面对本实施例的方法原理与流程进行详细说明。The principle and flow of the method in this embodiment are described in detail below.
具体的,语音信号处理装置接收语音信号。例如,语音信号处理装置可以接收各应用场景中语音信号采集设备(例如语音遥控器、智能手机等)发送的语音信号。其中,对语音信号采集设备来说,采集到的是模拟语音信号,可以对模拟语音信号进行模数转换,然后将模数转换后的语音信号发送给语音信号处理装置。Specifically, the voice signal processing device receives the voice signal. For example, the voice signal processing device can receive a voice signal transmitted by a voice signal collecting device (eg, a voice remote controller, a smart phone, etc.) in each application scenario. Wherein, for the voice signal collecting device, the analog voice signal is collected, and the analog voice signal can be analog-digital converted, and then the analog-to-digital converted voice signal is sent to the voice signal processing device.
可选的,语音信号采集设备在将语音信号发送给语音信号处理装置之前,还可以对语音信号进行编码、压缩等处理。若语音信号处理装置接收到的语音信号是经过编码、压缩后的信号,则语音信号处理装置在接收到语音信号之后,还要对语音信号进行解压缩、解码等处理。Optionally, the voice signal collecting device may further process, compress, and the like the voice signal before transmitting the voice signal to the voice signal processing device. If the voice signal received by the voice signal processing device is a coded and compressed signal, the voice signal processing device further decompresses, decodes, and the like the voice signal after receiving the voice signal.
由于语音信号属于短时平稳信号,所以语音信号处理装置可以对语音信号进行分段,获得至少一个语音段。其中,语音信号处理装置可以
采用可移动的有限窗口长度进行加权实现。每个语音段包括多个信号点。本实施例不限定语音段的长度,语音段的长度由语音段包含的信号点的个数决定。根据应用场景的不同,语音段的长度可以适应性设置,例如可以是256、1024等。Since the speech signal belongs to a short-term stationary signal, the speech signal processing device can segment the speech signal to obtain at least one speech segment. Wherein, the voice signal processing device can
Weighted implementation with a finite window length that is movable. Each speech segment includes multiple signal points. This embodiment does not limit the length of the voice segment, and the length of the voice segment is determined by the number of signal points included in the voice segment. The length of the voice segment can be adaptively set according to the application scenario, for example, it can be 256, 1024, and the like.
另外,语音段与语音段之间既可以连续,也可以重叠。优选的,分段可以采用交叠分段的方法,也就是说前一语音段和后一语音段之间存在交叠部分,这样可以保证语音段与语音段之间的平滑过渡,保持其连续性。In addition, the speech segment and the speech segment may be continuous or overlapping. Preferably, the segmentation method may adopt overlapping segmentation, that is, there is an overlapping portion between the previous speech segment and the latter speech segment, so as to ensure a smooth transition between the speech segment and the speech segment, and maintain the continuous Sex.
在将语音信号划分为至少一个语音段之后,语音信号处理装置可以获取至少一个语音段的信号丢失信息。其中,语音段的信号丢失信息主要包括一些能够反映语音段中信号点丢失情况的信息,例如丢失的信号点以及连续丢失的信号点的个数等。为便于描述,本实施例将连续丢失的信号点视为一个片段,称为信号丢失片段,并将信号丢失片段包括的连续丢失的信号点的个数作为该信号丢失片段的长度。After dividing the speech signal into at least one speech segment, the speech signal processing device may acquire signal loss information of at least one speech segment. The signal loss information of the voice segment mainly includes information that can reflect the loss of signal points in the voice segment, such as missing signal points and the number of consecutive lost signal points. For convenience of description, the present embodiment regards a continuously lost signal point as a segment, which is called a signal loss segment, and takes the number of consecutive lost signal points included in the signal loss segment as the length of the signal loss segment.
基于上述,在一可选实施方式中,对至少一个语音段中的每个语音段,语音信号处理装置可以将该语音段中每两个相邻信号点的幅度相乘,获取相乘结果大于或等于0的相邻信号点作为该语音段丢失的信号点,其中,相乘结果大于或等于0的相邻信号点也可称为不过零的信号点,并统计该语音段中连续丢失的信号点形成的信号丢失片段的长度。值得说明的是,语音段可以包括一个或多个信号丢失片段。举例说明,假设一个语音段包括200个信号点,其中,第20至第40个信号点全部丢失,形成一个信号丢失片段,其长度为21,另外,第80至第120个信号点全部丢失,形成另一个信号丢失片段,其长度为41。Based on the foregoing, in an optional implementation, for each of the at least one voice segment, the voice signal processing device may multiply the amplitude of each two adjacent signal points in the voice segment to obtain a multiplication result greater than An adjacent signal point equal to 0 is used as a signal point of the lost speech segment, wherein adjacent signal points whose multiplication result is greater than or equal to 0 may also be referred to as a signal point that is not zero, and statistics are continuously lost in the speech segment. The signal formed by the signal point loses the length of the segment. It is worth noting that the speech segment can include one or more signal loss segments. For example, suppose a voice segment includes 200 signal points, wherein all of the 20th to 40th signal points are lost, forming a signal loss segment having a length of 21, and in addition, the 80th to 120th signal points are all lost. Another signal loss segment is formed with a length of 41.
由于语音信号被划分为至少一个语音段,所以至少一个语音段的信号丢失信息能够反映语音信号的信号丢失情况。因此,在获取语音段的信号丢失信息后,语音信号处理装置可以根据至少一个语音段的信号丢失信息,确定语音信号的信号丢失度。其中,语音信号的信号丢失度反
映该语音信号丢失信号的程度,例如可以是零度丢失(即未丢失)、轻度丢失或重度丢失等。Since the speech signal is divided into at least one speech segment, the signal loss information of the at least one speech segment can reflect the signal loss condition of the speech signal. Therefore, after acquiring the signal loss information of the voice segment, the voice signal processing device can determine the signal loss degree of the voice signal according to the signal loss information of the at least one voice segment. Among them, the signal loss of the speech signal is reversed.
The extent to which the speech signal is lost is, for example, zero loss (ie, not lost), slight loss or heavy loss, and the like.
在一可选实施方式中,语音信号处理装置可以根据至少一个语音段的信号丢失信息,统计至少一个语音段中丢失语音段的个数,然后将丢失语音段的个数与预设段数阈值进行比较,根据丢失语音段的个数与预设段数阈值的比较结果,确定语音信号的信号丢失度。In an optional implementation, the voice signal processing apparatus may count the number of lost voice segments in the at least one voice segment according to the signal loss information of the at least one voice segment, and then perform the number of lost voice segments and the preset segment number threshold. Comparing, the signal loss degree of the voice signal is determined according to the comparison result of the number of lost voice segments and the preset segment number threshold.
其中,上述丢失语音段是指存在信号丢失且丢失的信号点的个数满足指定条件的语音段。例如,当根据一个语音段的信号丢失信息,确定该语音段确实发生信号点丢失,且丢失的信号点满足指定条件时,确定该语音段为丢失语音段。指定条件可以是丢失信号点的总数大于第一指定个数,例如100,则语音信号处理装置可以识别出丢失信号点的总数大于50的语音段作为丢失语音段;或者,指定条件也可以是连续丢失的信号点的个数大于第二指定个数,例如60,则语音信号处理装置可以识别出连续丢失的信号点的个数大于60的语音段作为丢失语音段。Wherein, the lost speech segment refers to a speech segment in which the number of signal points lost and lost is satisfying the specified condition. For example, when it is determined that the speech segment does not occur due to the signal loss information of one speech segment, and the missing signal point satisfies the specified condition, the speech segment is determined to be a lost speech segment. The specified condition may be that the total number of lost signal points is greater than the first specified number, for example, 100, the voice signal processing device may identify the voice segment whose total number of lost signal points is greater than 50 as the lost voice segment; or, the specified condition may also be continuous If the number of lost signal points is greater than the second specified number, for example, 60, the speech signal processing apparatus can recognize the speech segment whose number of consecutively lost signal points is greater than 60 as the lost speech segment.
在此说明,本实施例不限定上述段数阈值的取值,可以根据应用场景适应性设置。It is to be noted that the value of the segment threshold is not limited in this embodiment, and may be adaptively set according to an application scenario.
进一步可选的,上述段数阈值为一个,则语音信号处理装置可以将丢失语音段的个数与预设段数阈值进行比较,若丢失语音段的个数大于预设段数阈值,则确定语音信号为重度信号丢失,重度信号丢失是指信号点丢失情况较为严重;反之,若丢失语音段的个数小于或等于预设段数阈值但不为0,则确定语音信号为轻度信号丢失,轻度信号丢失是指信号点丢失情况较为轻微;若丢失语音段的个数等于0,则确定语音信号为零度信号丢失,说明未发生信号丢失。例如,假设段数阈值为10,语音信号划分出的语音段的总数为60,若60个语音段中有超过10个的语音段发生信号丢失,说明该语音信号为重度信号丢失;若60个语音段中有语音段发生信号丢失但不超过10个,说明该语音信号为轻度信号丢失;若60个语音段中没有语音段发生信号丢失,说明该语音信号为零度信号丢
失。Further, if the number of segments is one, the voice signal processing device may compare the number of lost voice segments with a preset segment threshold. If the number of lost voice segments is greater than the preset segment threshold, the voice signal is determined to be Severe signal loss, heavy signal loss means that the signal point loss is more serious; conversely, if the number of lost speech segments is less than or equal to the preset number of segments threshold but not 0, then the speech signal is determined to be a slight signal loss, a mild signal Loss means that the signal point loss is relatively slight; if the number of lost speech segments is equal to 0, it is determined that the voice signal is zero-degree signal loss, indicating that no signal loss has occurred. For example, if the threshold number of segments is 10, the total number of voice segments divided by the voice signal is 60. If more than 10 voice segments in 60 voice segments are lost, the voice signal is a heavy signal loss; if 60 voices In the segment, there is a signal loss in the speech segment but no more than 10, indicating that the speech signal is a slight signal loss; if no speech segment is lost in the 60 speech segments, the speech signal is lost to zero.
Lost.
进一步可选的,上述段数阈值包括第一段数阈值和第二段数阈值,且第二段数阈值大于第一段数阈值。基于此,语音信号处理装置可以将丢失语音段的个数分别与第一段数阈值和第二段数阈值进行比较;若丢失语音段的个数小于或等于第一段数阈值,确定语音信号为零度信号丢失;若丢失语音段的个数大于第一段数阈值但小于或等于第二段数阈值,确定语音信号为轻度信号丢失;若丢失语音段的个数大于第二段数阈值,确定语音信号为重度信号丢失。Further optionally, the foregoing threshold number includes a first segment threshold and a second segment threshold, and the second segment threshold is greater than the first segment threshold. Based on this, the voice signal processing apparatus may compare the number of lost voice segments with the first segment threshold and the second segment threshold respectively; if the number of lost segments is less than or equal to the first segment threshold, determine that the voice signal is The zero-degree signal is lost; if the number of lost speech segments is greater than the first segment threshold but less than or equal to the second segment threshold, the speech signal is determined to be a slight signal loss; if the number of lost speech segments is greater than the second segment threshold, the speech is determined. The signal is a heavy signal loss.
在此说明,上述段数阈值还可以包括多个段数阈值,从而将语音信号的信号丢失情况划分为更多不同的信号丢失度,例如一度、二度、三度、四度、五度等。It is noted herein that the segment number threshold may further include a plurality of segment number thresholds, thereby dividing the signal loss condition of the speech signal into more different signal loss degrees, such as one degree, two degrees, three degrees, four degrees, five degrees, and the like.
在另一可选实施方式中,语音信号处理装置可以根据至少一个语音段的信号丢失信息,统计各语音段中由连续丢失的信号点形成的信号丢失片段的长度,根据各语音段中信号丢失片段的长度与预设点数阈值的比较结果,确定语音信号的信号丢失度。In another optional implementation manner, the voice signal processing apparatus may count, according to the signal loss information of the at least one voice segment, the length of the signal loss segment formed by consecutively lost signal points in each voice segment, according to the signal loss in each voice segment. The result of comparing the length of the segment with the threshold of the preset point determines the signal loss of the speech signal.
其中,上述信号丢失片段是指语音段中连续丢失的信号点形成的片段。信号丢失片段的长度是指信号丢失片段包括的连续丢失的信号点的个数。Wherein, the above-mentioned signal loss segment refers to a segment formed by consecutively lost signal points in the speech segment. The length of the signal loss segment refers to the number of consecutive lost signal points included in the signal loss segment.
如果某个语音段中连续丢失的信号点较多,即使其它语音段未发生信号丢失或信号丢失程度较轻,也会导致语音信号无法被正确识别,所以语音信号处理装置可以根据至少一语音段的信号丢失信息,统计各语音段中信号丢失片段的长度;然后将各语音段中信号丢失片段的长度与预设点数阈值进行比较,根据各语音段中信号丢失片段的长度与预设点数阈值的比较结果来确定语音信号的信号丢失度。If there are more consecutive lost signal points in a certain speech segment, even if the other speech segments have no signal loss or the signal loss is light, the speech signal cannot be correctly recognized, so the speech signal processing device can be based on at least one speech segment. The signal loss information is used to count the length of the signal loss segment in each voice segment; then the length of the signal loss segment in each voice segment is compared with the preset point threshold, according to the length of the signal lost segment and the preset point threshold in each voice segment. The comparison results to determine the signal loss of the speech signal.
进一步可选的,上述点数阈值为一个,则语音信号处理装置可以将各语音段中信号丢失片段的长度与该点数阈值进行比较,若存在信号丢失片段的长度大于点数阈值的语音段,则确定语音信号为重度信号丢失,
重度信号丢失是指信号点丢失较情况较为严重;反之,若不存在信号丢失片段的长度大于点数阈值的语音段,则确定语音信号为轻度信号丢失。Further optionally, if the threshold value is one, the voice signal processing apparatus may compare the length of the signal loss segment in each voice segment with the threshold of the point, and if there is a voice segment whose length of the lost segment is greater than the threshold of the point, determine The voice signal is a heavy signal loss,
Severe signal loss means that the signal point loss is more serious than the situation; conversely, if there is no speech segment whose signal loss segment is longer than the point threshold, it is determined that the speech signal is a slight signal loss.
进一步可选的,上述点数阈值包括:第一点数阈值和第二点数阈值,且第二点数阈值大于第一点数阈值。基于此,语音信号处理装置可以将各语音段中信号丢失片段的长度分别与第一点数阈值和第二点数阈值进行比较;若至少一个语音段中不存在信号丢失片段的长度大于第一点数阈值的语音段,确定语音信号为零度信号丢失;若至少一个语音段中存在信号丢失片段的长度大于第一点数阈值但不存在信号丢失片段的长度大于第二点数阈值的语音段,确定语音信号为轻度信号丢失;若至少一个语音段中存在信号丢失片段的长度大于第二点数阈值的语音段,确定语音信号为重度信号丢失。Further, optionally, the point threshold includes: a first point threshold and a second point threshold, and the second point threshold is greater than the first point threshold. Based on this, the voice signal processing apparatus may compare the lengths of the signal loss segments in each voice segment with the first point threshold and the second point threshold respectively; if the length of the missing signal segment in at least one voice segment is greater than the first point threshold The voice segment determines that the voice signal is zero-degree signal loss; if at least one voice segment has a signal loss segment whose length is greater than the first point threshold but there is no voice segment whose signal length is greater than the second point threshold, the voice signal is determined to be A slight signal loss; if there is a speech segment in which at least one speech segment has a length greater than a second point threshold, the speech signal is determined to be a heavy signal loss.
在又一可选实施方式中,语音信号处理装置可以同时根据丢失语音段的个数与预设段数阈值的比较结果以及各语音段中信号丢失片段的长度与预设点数阈值的比较结果,确定语音信号的信号丢失度。In still another optional implementation manner, the voice signal processing apparatus may simultaneously determine, according to a comparison result of the number of lost voice segments and a preset segment number threshold, and a comparison result between a length of the signal missing segment in each voice segment and a preset point threshold. The signal loss of the voice signal.
例如,以段数阈值包括第一段数阈值和第二段数阈值,且点数阈值包括:第一点数阈值和第二点数阈值为例,若丢失语音段的个数大于第二段数阈值,且至少一个语音段中存在信号丢失片段的长度大于第二点数阈值的语音段,确定语音信号为重度信号丢失;若丢失语音段的个数小于或等于第一段数阈值,且至少一个语音段中不存在信号丢失片段的长度大于第一点数阈值的语音段,确定语音信号为零度信号丢失;其余情况,确定语音信号为轻度信号丢失。For example, the threshold number includes a first segment threshold and a second segment threshold, and the threshold includes: a first point threshold and a second point threshold, if the number of lost speech segments is greater than the second segment threshold, and at least one A voice segment having a length of the signal loss segment greater than a threshold of the second point is determined in the voice segment, and the voice signal is determined to be a heavy signal loss; if the number of the lost voice segment is less than or equal to the first segment threshold, and the at least one voice segment does not exist The length of the signal loss segment is greater than the speech segment of the first point threshold, and the speech signal is determined to be zero-degree signal loss; in other cases, the speech signal is determined to be a slight signal loss.
值得说明的是,根据本实施例提供的上述各种确定语音信号的信号丢失度的实施方式,本领域技术人员不难想到类似扩展方案,这些扩展方案均属于本发明的保护范围,在此不再逐一描述各种扩展方案。It should be noted that, according to the foregoing various implementation manners for determining the signal loss degree of the voice signal provided by the embodiment, a similar expansion scheme is not difficult to be considered by those skilled in the art, and these extension schemes are all within the protection scope of the present invention. Various extensions are described one by one.
在确定语音信号的信号丢失度之后,语音信号处理装置可以根据语音信号的信号丢失度,对语音信号进行语音识别处理。After determining the signal loss degree of the speech signal, the speech signal processing device may perform speech recognition processing on the speech signal according to the signal loss degree of the speech signal.
若确定语音信号为零度信号丢失,即语音信号未发生信号丢失,则
语音信号处理装置可以直接对语音信号进行语音识别处理,既可以保证语音识别处理的效率,也可以保证语音识别的准确率,提高用户体验。If it is determined that the voice signal is zero-degree signal loss, that is, the voice signal does not have a signal loss, then
The speech signal processing device can directly perform speech recognition processing on the speech signal, which can ensure the efficiency of the speech recognition processing, ensure the accuracy of the speech recognition, and improve the user experience.
若确定语音信号为轻度信号丢失,说明语音信号发生信号丢失但不是很严重,还在能够正确识别的范围内,则语音信号处理装置可以对至少一个语音段中的丢失语音段,利用丢失语音段中未丢失的信号点对丢失语音段中丢失的信号点进行补偿,并对补偿后的语音段进行语音识别处理,以保证语音识别的准确率,提高用户体验。If it is determined that the voice signal is a slight signal loss, it indicates that the voice signal is lost but not very serious, and within a range that can be correctly recognized, the voice signal processing apparatus can use the lost voice for the lost voice segment in at least one voice segment. The signal points that are not lost in the segment compensate the missing signal points in the lost speech segment, and perform speech recognition processing on the compensated speech segments to ensure the accuracy of speech recognition and improve the user experience.
值得说明的是,本实施例并不限定利用丢失语音段中未丢失的信号点对丢失语音段中丢失的信号点进行补偿的具体方式。较为优选的,可以将整个丢失语音段中的信号点均为分两部分,对每一部分,利用该部分中未丢失的信号点对该部分中丢失的信号点进行补偿,由于每一部分中的信号点相对来说相距较近,所以利用相距较近的信号点进行补偿,可以保证补偿后语音信号与未发生丢失的语音信号更为接近,在基于补偿后的语音信号进行语音识别时,有利于提高语音识别的准确率。It should be noted that the embodiment does not limit the specific manner of compensating for missing signal points in the lost speech segment by using signal points that are not lost in the lost speech segment. Preferably, the signal points in the entire lost speech segment can be divided into two parts. For each part, the missing signal points in the part are compensated by the signal points that are not lost in the part, because the signals in each part are The points are relatively close to each other, so the compensation of the signal points that are close to each other can ensure that the compensated speech signal is closer to the speech signal that has not been lost, and the speech recognition based on the compensated speech signal is beneficial to the speech recognition. Improve the accuracy of speech recognition.
进一步,在上述信号补偿过程中,优选采用与丢失的信号点相距最近的未丢失的信号点对该丢失的信号点进行补偿。Further, in the above signal compensation process, the missing signal points are preferably compensated by using the undiscovered signal points that are closest to the missing signal points.
通常,在上述信号补偿过程中,每一部分中未丢失的信号点的个数一般多于丢失的信号点的个数,所以对每一部分来说能够完整补偿丢失的信号点。当然,在一些特殊情况下,也可能出现某一部分中丢失信号点的个数大于未丢失信号点的个数,那么对该部分来说,只能对部分丢失的信号点进行补偿,而对于剩余未能得到补偿的信号点,可以采用另一部分中相距最近的信号点逐一对其进行补偿。Generally, in the above signal compensation process, the number of signal points that are not lost in each part is generally more than the number of lost signal points, so that the lost signal points can be completely compensated for each part. Of course, in some special cases, the number of missing signal points in a certain part may be greater than the number of unrecovered signal points. For this part, only partially lost signal points can be compensated, but for the remaining Signal points that are not compensated can be compensated one by one by using the closest signal points in the other part.
若确定语音信号为重度信号丢失,说明语音信号发生信号丢失比较严重,已经超出能够正确识别的范围,于是语音信号处理装置向用户输出提示信息,以提示该语音信号为重度信号丢失无法正常识别。其中,语音信号处理装置可以通过文本或语音的方式向用户输出提示信息,例如可以是在交互界面上输出文本提示信息,或者输出语音提示信息。对
用户来说,可以根据该提示信息,及时采取相应措施,例如重新输入语音信号,以便及时获得所需语音服务,提高用户体验。If it is determined that the speech signal is a heavy signal loss, it indicates that the signal loss of the speech signal is serious and has exceeded the range that can be correctly recognized. Therefore, the speech signal processing device outputs a prompt message to the user to indicate that the speech signal is not recognized by the heavy signal loss. The voice signal processing device may output the prompt information to the user by means of text or voice, for example, outputting the text prompt information on the interactive interface, or outputting the voice prompt information. Correct
According to the prompt information, the user can take corresponding measures in time, for example, re-input the voice signal, so as to obtain the required voice service in time, and improve the user experience.
图2为本发明另一实施例提供的语音信号处理装置的结构示意图。如图2所示,该装置包括:接收模块21、获取模块22、确定模块23和处理模块24。FIG. 2 is a schematic structural diagram of a voice signal processing apparatus according to another embodiment of the present invention. As shown in FIG. 2, the apparatus includes: a receiving module 21, an obtaining module 22, a determining module 23, and a processing module 24.
接收模块21,用于接收语音信号,所述语音信号包括至少一个语音段。The receiving module 21 is configured to receive a voice signal, where the voice signal includes at least one voice segment.
获取模块22,用于获取至少一个语音段的信号丢失信息。The obtaining module 22 is configured to acquire signal loss information of at least one voice segment.
确定模块23,用于根据至少一个语音段的信号丢失信息,确定语音信号的信号丢失度。The determining module 23 is configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment.
处理模块24,用于根据语音信号的信号丢失度,对语音信号进行语音识别处理。The processing module 24 is configured to perform voice recognition processing on the voice signal according to the signal loss degree of the voice signal.
在一可选实施方式中,获取模块22具体用于:In an optional implementation, the obtaining module 22 is specifically configured to:
对至少一个语音段中的每个语音段,将所述语音段中每两个相邻信号点的幅度相乘,获取相乘结果大于或等于0的相邻信号点作为所述语音段丢失的信号点,并统计所述语音段中由连续丢失的信号点形成的信号丢失片段的长度。For each of the at least one speech segment, multiplying the amplitudes of each two adjacent signal points in the speech segment to obtain adjacent signal points whose multiplication result is greater than or equal to 0 as the speech segment is lost. Signaling points and counting the length of the missing signal segments formed by consecutively lost signal points in the speech segment.
在一可选实施方式中,如图3所示,确定模块23的一种实现结构包括:第一确定单元231和第二确定单元232中的至少一个。In an optional implementation, as shown in FIG. 3, an implementation structure of the determining module 23 includes at least one of a first determining unit 231 and a second determining unit 232.
其中,第一确定单元231,用于根据至少一个语音段的信号丢失信息,统计至少一个语音段中丢失语音段的个数,根据丢失语音段的个数与预设段数阈值的比较结果,确定语音信号的信号丢失度。The first determining unit 231 is configured to count, according to the signal loss information of the at least one voice segment, the number of lost voice segments in the at least one voice segment, and determine according to the comparison result between the number of the lost voice segments and the preset segment number threshold. The signal loss of the voice signal.
第二确定单元232,用于根据至少一个语音段的信号丢失信息,统计各语音段中由连续丢失的信号点形成的信号丢失片段的长度,根据各语音段中信号丢失片段的长度与预设点数阈值的比较结果,确定语音信号的信号丢失度。
The second determining unit 232 is configured to count, according to the signal loss information of the at least one voice segment, the length of the signal loss segment formed by consecutively lost signal points in each voice segment, according to the length and preset of the signal lost segment in each voice segment. The comparison result of the point threshold determines the signal loss of the speech signal.
进一步可选的,上述段数阈值包括:第一段数阈值和大于第一段数阈值的第二段数阈值;相应的,上述点数阈值包括:第一点数阈值和大于第一点数阈值的第二点数阈值。Further, optionally, the foregoing threshold number includes: a first segment threshold and a second segment threshold greater than the first segment threshold; correspondingly, the threshold threshold includes: a first point threshold and a second point greater than the first point threshold Threshold.
基于上述,第一确定单元231具体可用于:Based on the above, the first determining unit 231 is specifically configured to:
若丢失语音段的个数小于或等于第一段数阈值,确定语音信号为零度信号丢失;If the number of lost speech segments is less than or equal to the first segment threshold, it is determined that the speech signal is zero-signal loss;
若丢失语音段的个数大于第一段数阈值但小于或等于第二段数阈值,确定语音信号为轻度信号丢失;If the number of lost speech segments is greater than the first segment threshold but less than or equal to the second segment threshold, determining that the voice signal is a slight signal loss;
若丢失语音段的个数大于第二段数阈值,确定语音信号为重度信号丢失。If the number of lost speech segments is greater than the second segment threshold, it is determined that the speech signal is a heavy signal loss.
相应的,第二确定单元232具体可用于:Correspondingly, the second determining unit 232 is specifically configured to:
若至少一个语音段中不存在信号丢失片段的长度大于第一点数阈值的语音段,确定语音信号为零度信号丢失;Determining that the voice signal is zero-degree signal loss if there is no voice segment in the at least one voice segment whose length of the signal loss segment is greater than the first point threshold;
若至少一个语音段中存在信号丢失片段的长度大于第一点数阈值但不存在信号丢失片段的长度大于第二点数阈值的语音段,确定语音信号为轻度信号丢失;Determining that the voice signal is a slight signal loss if the length of the signal loss segment in the at least one voice segment is greater than the first point threshold but the length of the signal loss segment is greater than the second segment threshold;
若至少一个语音段中存在信号丢失片段的长度大于第二点数阈值的语音段,确定语音信号为重度信号丢失。If there is a speech segment in the at least one speech segment whose length of the signal loss segment is greater than the second threshold threshold, the speech signal is determined to be a heavy signal loss.
在一可选实施方式中,处理模块24具体用于:In an optional implementation, the processing module 24 is specifically configured to:
若语音信号为零度信号丢失,则直接对语音信号进行语音识别处理;If the voice signal is zero-degree signal lost, the voice signal is directly processed by the voice recognition process;
若语音信号为轻度信号丢失,则对至少一个语音段中的丢失语音段,利用丢失语音段中未丢失的信号点对丢失语音段中丢失的信号点进行补偿,并对补偿后的语音段进行语音识别处理;If the voice signal is a slight signal loss, the lost voice segment in the at least one voice segment is compensated for the missing signal point in the lost voice segment by using the signal point that is not lost in the lost voice segment, and the compensated voice segment is compensated Perform speech recognition processing;
若语音信号为重度信号丢失,则向用户输出提示信息,以提示语音信号为重度信号丢失无法正常识别。
If the voice signal is a heavy signal loss, the prompt information is output to the user, so that the voice signal is not seriously recognized as a heavy signal loss.
本实施例提供的语音信号处理装置,充分考虑了信号丢失对语音信号后续处理的影响,并且可以根据语音信号的信号丢失度,采取相应的处理方式,有利于可以提高语音信号识别的准确率。The voice signal processing apparatus provided in this embodiment fully considers the influence of signal loss on the subsequent processing of the voice signal, and can adopt a corresponding processing manner according to the signal loss degree of the voice signal, which is beneficial to improve the accuracy of the voice signal recognition.
本申请实施例还提供了一种非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令可执行上述任意方法实施例中的语音信号处理方法。The embodiment of the present application further provides a non-transitory computer readable storage medium storing computer executable instructions executable in any of the above method embodiments. Voice signal processing method.
图4是本申请实施例提供的执行语音信号处理方法的电子设备的硬件结构示意图,如图4所示,该设备包括:4 is a schematic structural diagram of hardware of an electronic device for performing a voice signal processing method according to an embodiment of the present disclosure. As shown in FIG. 4, the device includes:
一个或多个处理器410以及存储器420,图4中以一个处理器410为例。One or more processors 410 and memory 420, one processor 410 is exemplified in FIG.
执行语音信号处理方法的设备还可以包括:输入装置430和输出装置440。The apparatus for performing the voice signal processing method may further include: an input device 430 and an output device 440.
处理器410、存储器420、输入装置430和输出装置440可以通过总线或者其他方式连接,图4中以通过总线连接为例。The processor 410, the memory 420, the input device 430, and the output device 440 may be connected by a bus or other means, as exemplified by a bus connection in FIG.
存储器420作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的语音信号处理方法对应的程序指令/模块(例如,附图2所示的接收模块21、获取模块22、确定模块23和处理模块24)。处理器410通过运行存储在存储器420中的非易失性软件程序、指令以及模块,从而执行电子设备的各种功能应用以及数据处理,即实现上述方法实施例语音信号处理方法。The memory 420 is a non-volatile computer readable storage medium and can be used for storing a non-volatile software program, a non-volatile computer-executable program, and a module, such as a program corresponding to the voice signal processing method in the embodiment of the present application. An instruction/module (for example, the receiving module 21, the obtaining module 22, the determining module 23, and the processing module 24 shown in FIG. 2). The processor 410 executes various functional applications and data processing of the electronic device by executing non-volatile software programs, instructions, and modules stored in the memory 420, that is, implementing the voice signal processing method of the above method embodiment.
存储器420可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据语音信号处理装置的使用所创建的数据等。此外,存储器420可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘
存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器420可选包括相对于处理器410远程设置的存储器,这些远程存储器可以通过网络连接至语音信号处理装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 420 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the voice signal processing device, and the like. Further, the memory 420 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one disk.
A memory device, flash memory device, or other non-volatile solid state memory device. In some embodiments, memory 420 can optionally include memory remotely located relative to processor 410, which can be coupled to the voice signal processing device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
输入装置430可接收输入的数字或字符信息,以及产生与语音信号处理装置的用户设置以及功能控制有关的键信号输入。输出装置440可包括显示屏等显示设备。The input device 430 can receive the input digital or character information and generate a key signal input related to user settings and function control of the voice signal processing device. Output device 440 can include a display device such as a display screen.
所述一个或者多个模块存储在所述存储器420中,当被所述一个或者多个处理器410执行时,执行上述任意方法实施例中的语音信号处理方法。The one or more modules are stored in the memory 420, and when executed by the one or more processors 410, perform a speech signal processing method in any of the above method embodiments.
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。The above products can perform the methods provided by the embodiments of the present application, and have the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.
本发明实施例的电子设备以多种形式存在,包括但不限于:The electronic device of the embodiment of the invention exists in various forms, including but not limited to:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。(1) Mobile communication devices: These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication. Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。(2) Ultra-mobile personal computer equipment: This type of equipment belongs to the category of personal computers, has computing and processing functions, and generally has mobile Internet access. Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。(3) Portable entertainment devices: These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.
(4)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展
性、可管理性等方面要求较高。(4) Server: A device that provides computing services. The server consists of a processor, a hard disk, a memory, a system bus, etc. The server is similar to a general-purpose computer architecture, but because of the need to provide highly reliable services, processing power and stability , reliability, security, scalability
Sexuality, manageability and other aspects are highly demanding.
(5)其他具有数据交互功能的电子装置。(5) Other electronic devices with data interaction functions.
最后需要说明的是,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。Finally, it should be understood that those skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a non-volatile manner. In a computer readable storage medium, the program, when executed, may include the flow of an embodiment of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a read only memory (ROM), or a random access memory (RAM).
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment. Those of ordinary skill in the art can understand and implement without deliberate labor.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the various embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware. Based on such understanding, the above-described technical solutions may be embodied in the form of software products in essence or in the form of software products, which may be stored in a computer readable storage medium such as ROM/RAM, magnetic Discs, optical discs, etc., include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments or portions of the embodiments.
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。
It should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and are not limited thereto; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that The technical solutions described in the foregoing embodiments are modified, or the equivalents of the technical features are replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (13)
- 一种语音信号处理方法,其特征在于,应用于电子设备,包括:A voice signal processing method is characterized in that it is applied to an electronic device, including:接收语音信号,所述语音信号包括至少一个语音段;Receiving a voice signal, the voice signal including at least one voice segment;获取所述至少一个语音段的信号丢失信息;Obtaining signal loss information of the at least one voice segment;根据所述至少一个语音段的信号丢失信息,确定所述语音信号的信号丢失度;Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;根据所述语音信号的信号丢失度,对所述语音信号进行语音识别处理。A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
- 根据权利要求1所述的方法,其特征在于,所述获取所述至少一个语音段的信号丢失信息,包括:The method according to claim 1, wherein the acquiring signal loss information of the at least one voice segment comprises:对所述至少一个语音段中的每个语音段,将所述语音段中每两个相邻信号点的幅度相乘,获取相乘结果大于或等于0的相邻信号点作为所述语音段丢失的信号点,并统计所述语音段中由连续丢失的信号点形成的信号丢失片段的长度。For each of the at least one voice segment, multiplying the amplitudes of each two adjacent signal points in the voice segment to obtain adjacent signal points whose multiplication result is greater than or equal to 0 as the voice segment The missing signal point and the length of the missing signal segment formed by consecutively lost signal points in the speech segment is counted.
- 根据权利要求1所述的方法,其特征在于,所述根据所述至少一个语音段的信号丢失信息,确定所述语音信号的信号丢失度,包括:The method according to claim 1, wherein the determining the signal loss degree of the voice signal according to the signal loss information of the at least one voice segment comprises:根据所述至少一个语音段的信号丢失信息,统计所述至少一个语音段中丢失语音段的个数,根据所述丢失语音段的个数与预设段数阈值的比较结果,确定所述语音信号的信号丢失度;和/或And counting, according to the signal loss information of the at least one voice segment, the number of the lost voice segments in the at least one voice segment, and determining the voice signal according to a comparison result between the number of the lost voice segments and the preset segment number threshold. Signal loss; and/or根据所述至少一个语音段的信号丢失信息,统计各语音段中由连续丢失的信号点形成的信号丢失片段的长度,根据各语音段中信号丢失片段的长度与预设点数阈值的比较结果,确定所述语音信号的信号丢失度。And calculating, according to the signal loss information of the at least one voice segment, a length of a signal loss segment formed by consecutively lost signal points in each voice segment, according to a comparison result between a length of the signal loss segment and a preset point threshold in each voice segment, A signal loss degree of the voice signal is determined.
- 根据权利要求3所述的方法,其特征在于,所述段数阈值包括:第一段数阈值和大于所述第一段数阈值的第二段数阈值;所述点数阈值包括:第一点数阈值和大于所述第一点数阈值的第二点数阈 值;The method according to claim 3, wherein the threshold number of thresholds comprises: a first segment number threshold and a second segment number threshold greater than the first segment number threshold; the point threshold includes: a first point threshold and a second point threshold greater than the first point threshold value;根据所述丢失语音段的个数与预设段数阈值的比较结果,确定所述语音信号的信号丢失度,包括:Determining a signal loss degree of the voice signal according to a comparison result between the number of the lost voice segments and the preset segment number threshold, including:若所述丢失语音段的个数小于或等于所述第一段数阈值,确定所述语音信号为零度信号丢失;Determining that the voice signal is zero-signal loss if the number of the lost voice segments is less than or equal to the first segment number threshold;若所述丢失语音段的个数大于所述第一段数阈值但小于或等于所述第二段数阈值,确定所述语音信号为轻度信号丢失;Determining that the voice signal is a slight signal loss if the number of the lost voice segments is greater than the first segment threshold but less than or equal to the second segment threshold;若所述丢失语音段的个数大于所述第二段数阈值,确定所述语音信号为重度信号丢失;If the number of the lost voice segments is greater than the second segment threshold, determining that the voice signal is a heavy signal loss;所述根据每个语音段中信号丢失片段的长度与预设点数阈值的比较结果,确定所述语音信号的信号丢失度,包括:Determining a signal loss degree of the voice signal according to a comparison result between a length of a signal loss segment in each voice segment and a preset point threshold, including:若所述至少一个语音段中不存在信号丢失片段的长度大于所述第一点数阈值的语音段,确定所述语音信号为零度信号丢失;Determining that the voice signal is zero-signal loss if there is no voice segment in the at least one voice segment whose length of the signal loss segment is greater than the first point threshold;若所述至少一个语音段中存在信号丢失片段的长度大于所述第一点数阈值但不存在信号丢失片段的长度大于所述第二点数阈值的语音段,确定所述语音信号为轻度信号丢失;Determining that the voice signal is a slight signal loss if the length of the signal loss segment in the at least one voice segment is greater than the first point threshold but the length of the signal loss segment is greater than the second segment threshold ;若所述至少一个语音段中存在信号丢失片段的长度大于所述第二点数阈值的语音段,确定所述语音信号为重度信号丢失。And if there is a voice segment in the at least one voice segment whose length of the signal loss segment is greater than the second point threshold, determining that the voice signal is a heavy signal loss.
- 根据权利要求1-4任一项所述的方法,其特征在于,所述根据所述语音信号的信号丢失度,对所述语音信号进行语音识别处理,包括:The method according to any one of claims 1 to 4, wherein the performing voice recognition processing on the voice signal according to a signal loss degree of the voice signal comprises:若所述语音信号为零度信号丢失,则直接对所述语音信号进行语音识别处理;If the voice signal is lost to a zero degree signal, the voice signal is directly subjected to voice recognition processing;若所述语音信号为轻度信号丢失,则对所述至少一个语音段中的丢失语音段,利用所述丢失语音段中未丢失的信号点对所述丢失语音段中丢失的信号点进行补偿,并对补偿后的语音段进行语音识别处理;If the voice signal is a slight signal loss, compensating for missing signal points in the lost voice segment by using the missing signal points in the lost voice segment for the lost voice segment in the at least one voice segment And performing speech recognition processing on the compensated speech segment;若所述语音信号为重度信号丢失,则向用户输出提示信息,以提示所述语音信号为重度信号丢失无法正常识别。 If the voice signal is a heavy signal loss, the prompt information is output to the user to prompt the voice signal to be seriously recognized as a heavy signal loss.
- 一种语音信号处理装置,其特征在于,包括:A voice signal processing device, comprising:接收模块,用于接收语音信号,所述语音信号包括至少一个语音段;a receiving module, configured to receive a voice signal, where the voice signal includes at least one voice segment;获取模块,用于获取所述至少一个语音段的信号丢失信息;An acquiring module, configured to acquire signal loss information of the at least one voice segment;确定模块,用于根据所述至少一个语音段的信号丢失信息,确定所述语音信号的信号丢失度;a determining module, configured to determine a signal loss degree of the voice signal according to the signal loss information of the at least one voice segment;处理模块,用于根据所述语音信号的信号丢失度,对所述语音信号进行语音识别处理。And a processing module, configured to perform voice recognition processing on the voice signal according to a signal loss degree of the voice signal.
- 根据权利要求6所述的装置,其特征在于,所述获取模块具体用于:The device according to claim 6, wherein the obtaining module is specifically configured to:对所述至少一个语音段中的每个语音段,将所述语音段中每两个相邻信号点的幅度相乘,获取相乘结果大于或等于0的相邻信号点作为所述语音段丢失的信号点,并统计所述语音段中由连续丢失的信号点形成的信号丢失片段的长度。For each of the at least one voice segment, multiplying the amplitudes of each two adjacent signal points in the voice segment to obtain adjacent signal points whose multiplication result is greater than or equal to 0 as the voice segment The missing signal point and the length of the missing signal segment formed by consecutively lost signal points in the speech segment is counted.
- 根据权利要求6所述的装置,其特征在于,所述确定模块包括:The apparatus according to claim 6, wherein the determining module comprises:第一确定单元,用于根据所述至少一个语音段的信号丢失信息,统计所述至少一个语音段中丢失语音段的个数,根据所述丢失语音段的个数与预设段数阈值的比较结果,确定所述语音信号的信号丢失度;和/或a first determining unit, configured to count, according to signal loss information of the at least one voice segment, a number of lost voice segments in the at least one voice segment, according to a comparison between the number of the lost voice segments and a preset segment threshold As a result, determining a signal loss of the speech signal; and/or第二确定单元,用于根据所述至少一个语音段的信号丢失信息,统计各语音段中由连续丢失的信号点形成的信号丢失片段的长度,根据各语音段中信号丢失片段的长度与预设点数阈值的比较结果,确定所述语音信号的信号丢失度。a second determining unit, configured to count, according to the signal loss information of the at least one voice segment, a length of a signal loss segment formed by consecutively lost signal points in each voice segment, according to a length of the signal lost segment in each voice segment and a pre- A comparison result of the point threshold is set to determine a signal loss degree of the voice signal.
- 根据权利要求8所述的装置,其特征在于,所述段数阈值包括:第一段数阈值和大于所述第一段数阈值的第二段数阈值;所述点数阈值包括:第一点数阈值和大于所述第一点数阈值的第二点数阈值;The apparatus according to claim 8, wherein the number of segments threshold comprises: a first segment number threshold and a second segment number threshold greater than the first segment number threshold; the point threshold includes: a first point threshold and a second point threshold greater than the first point threshold;所述第一确定单元具体用于: The first determining unit is specifically configured to:若所述丢失语音段的个数小于或等于所述第一段数阈值,确定所述语音信号为零度信号丢失;Determining that the voice signal is zero-signal loss if the number of the lost voice segments is less than or equal to the first segment number threshold;若所述丢失语音段的个数大于所述第一段数阈值但小于或等于所述第二段数阈值,确定所述语音信号为轻度信号丢失;Determining that the voice signal is a slight signal loss if the number of the lost voice segments is greater than the first segment threshold but less than or equal to the second segment threshold;若所述丢失语音段的个数大于所述第二段数阈值,确定所述语音信号为重度信号丢失;If the number of the lost voice segments is greater than the second segment threshold, determining that the voice signal is a heavy signal loss;所述第二确定单元具体用于:The second determining unit is specifically configured to:若所述至少一个语音段中不存在信号丢失片段的长度大于所述第一点数阈值的语音段,确定所述语音信号为零度信号丢失;Determining that the voice signal is zero-signal loss if there is no voice segment in the at least one voice segment whose length of the signal loss segment is greater than the first point threshold;若所述至少一个语音段中存在信号丢失片段的长度大于所述第一点数阈值但不存在信号丢失片段的长度大于所述第二点数阈值的语音段,确定所述语音信号为轻度信号丢失;Determining that the voice signal is a slight signal loss if the length of the signal loss segment in the at least one voice segment is greater than the first point threshold but the length of the signal loss segment is greater than the second segment threshold ;若所述至少一个语音段中存在信号丢失片段的长度大于所述第二点数阈值的语音段,确定所述语音信号为重度信号丢失。And if there is a voice segment in the at least one voice segment whose length of the signal loss segment is greater than the second point threshold, determining that the voice signal is a heavy signal loss.
- 根据权利要求6-9任一项所述的装置,其特征在于,所述处理模块具体用于:The device according to any one of claims 6-9, wherein the processing module is specifically configured to:若所述语音信号为零度信号丢失,则直接对所述语音信号进行语音识别处理;If the voice signal is lost to a zero degree signal, the voice signal is directly subjected to voice recognition processing;若所述语音信号为轻度信号丢失,则对所述至少一个语音段中的丢失语音段,利用所述丢失语音段中未丢失的信号点对所述丢失语音段中丢失的信号点进行补偿,并对补偿后的语音段进行语音识别处理;If the voice signal is a slight signal loss, compensating for missing signal points in the lost voice segment by using the missing signal points in the lost voice segment for the lost voice segment in the at least one voice segment And performing speech recognition processing on the compensated speech segment;若所述语音信号为重度信号丢失,则向用户输出提示信息,以提示所述语音信号为重度信号丢失无法正常识别。If the voice signal is a heavy signal loss, the prompt information is output to the user to prompt the voice signal to be seriously recognized as a heavy signal loss.
- 一种非易失性计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为:A non-transitory computer readable storage medium storing computer executable instructions, the computer executable instructions being set to:接收语音信号,所述语音信号包括至少一个语音段;Receiving a voice signal, the voice signal including at least one voice segment;获取所述至少一个语音段的信号丢失信息; Obtaining signal loss information of the at least one voice segment;根据所述至少一个语音段的信号丢失信息,确定所述语音信号的信号丢失度;Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;根据所述语音信号的信号丢失度,对所述语音信号进行语音识别处理。A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
- 一种电子设备,包括:An electronic device comprising:至少一个处理器;以及,At least one processor; and,与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein所述存储器存储有可被所述一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够:The memory stores instructions executable by the one processor, the instructions being executed by the at least one processor to enable the at least one processor to:接收语音信号,所述语音信号包括至少一个语音段;Receiving a voice signal, the voice signal including at least one voice segment;获取所述至少一个语音段的信号丢失信息;Obtaining signal loss information of the at least one voice segment;根据所述至少一个语音段的信号丢失信息,确定所述语音信号的信号丢失度;Determining a signal loss degree of the voice signal according to signal loss information of the at least one voice segment;根据所述语音信号的信号丢失度,对所述语音信号进行语音识别处理。A speech recognition process is performed on the speech signal according to a signal loss degree of the speech signal.
- 一种计算机程序产品,所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行权利要求1至5任一项所述的方法。 A computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer, causing the computer The method of any one of claims 1 to 5 is performed.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610179999.X | 2016-03-25 | ||
CN201610179999.XA CN105845138A (en) | 2016-03-25 | 2016-03-25 | Voice signal processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017161829A1 true WO2017161829A1 (en) | 2017-09-28 |
Family
ID=56583905
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2016/096988 WO2017161829A1 (en) | 2016-03-25 | 2016-08-26 | Voice signal information processing method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN105845138A (en) |
WO (1) | WO2017161829A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113270096A (en) * | 2021-05-13 | 2021-08-17 | 前海七剑科技(深圳)有限公司 | Voice response method and device, electronic equipment and computer readable storage medium |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105845138A (en) * | 2016-03-25 | 2016-08-10 | 乐视控股(北京)有限公司 | Voice signal processing method and apparatus |
CN106856093A (en) * | 2017-02-23 | 2017-06-16 | 海信集团有限公司 | Audio-frequency information processing method, intelligent terminal and Voice command terminal |
CN107170451A (en) * | 2017-06-27 | 2017-09-15 | 乐视致新电子科技(天津)有限公司 | Audio signal processing method and device |
CN107316638A (en) * | 2017-06-28 | 2017-11-03 | 北京粉笔未来科技有限公司 | A kind of poem recites evaluating method and system, a kind of terminal and storage medium |
CN107990908B (en) * | 2017-11-20 | 2020-08-14 | Oppo广东移动通信有限公司 | Voice navigation method and device based on Bluetooth communication |
CN108831438B (en) * | 2018-07-24 | 2021-01-08 | Oppo(重庆)智能科技有限公司 | Voice data generation method and device, electronic device and computer readable storage medium |
CN109003619A (en) * | 2018-07-24 | 2018-12-14 | Oppo(重庆)智能科技有限公司 | Voice data generation method and relevant apparatus |
CN108965562B (en) * | 2018-07-24 | 2021-04-13 | Oppo(重庆)智能科技有限公司 | Voice data generation method and related device |
CN109065017B (en) * | 2018-07-24 | 2021-04-16 | Oppo(重庆)智能科技有限公司 | Voice data generation method and related device |
CN109121042B (en) * | 2018-07-26 | 2020-12-08 | Oppo广东移动通信有限公司 | Voice data processing method and related product |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1604572A (en) * | 2004-11-09 | 2005-04-06 | 北京中星微电子有限公司 | A semantic integrity ensuring method under IP network environment |
CN1731718A (en) * | 2004-08-06 | 2006-02-08 | 北京中星微电子有限公司 | Noise reduction method and device concerning IP network voice data packet lost |
CN102057634A (en) * | 2008-06-11 | 2011-05-11 | 日本电信电话株式会社 | Audio quality estimation method, audio quality estimation device, and program |
CN102568470A (en) * | 2012-01-11 | 2012-07-11 | 广州酷狗计算机科技有限公司 | Acoustic fidelity identification method and system for audio files |
CN103632679A (en) * | 2012-08-21 | 2014-03-12 | 华为技术有限公司 | An audio stream quality assessment method and an apparatus |
CN105845138A (en) * | 2016-03-25 | 2016-08-10 | 乐视控股(北京)有限公司 | Voice signal processing method and apparatus |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002297180A (en) * | 2001-03-29 | 2002-10-11 | Sanyo Electric Co Ltd | Voice recognizing device |
CN107170451A (en) * | 2017-06-27 | 2017-09-15 | 乐视致新电子科技(天津)有限公司 | Audio signal processing method and device |
-
2016
- 2016-03-25 CN CN201610179999.XA patent/CN105845138A/en active Pending
- 2016-08-26 WO PCT/CN2016/096988 patent/WO2017161829A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1731718A (en) * | 2004-08-06 | 2006-02-08 | 北京中星微电子有限公司 | Noise reduction method and device concerning IP network voice data packet lost |
CN1604572A (en) * | 2004-11-09 | 2005-04-06 | 北京中星微电子有限公司 | A semantic integrity ensuring method under IP network environment |
CN102057634A (en) * | 2008-06-11 | 2011-05-11 | 日本电信电话株式会社 | Audio quality estimation method, audio quality estimation device, and program |
CN102568470A (en) * | 2012-01-11 | 2012-07-11 | 广州酷狗计算机科技有限公司 | Acoustic fidelity identification method and system for audio files |
CN103632679A (en) * | 2012-08-21 | 2014-03-12 | 华为技术有限公司 | An audio stream quality assessment method and an apparatus |
CN105845138A (en) * | 2016-03-25 | 2016-08-10 | 乐视控股(北京)有限公司 | Voice signal processing method and apparatus |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113270096A (en) * | 2021-05-13 | 2021-08-17 | 前海七剑科技(深圳)有限公司 | Voice response method and device, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN105845138A (en) | 2016-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017161829A1 (en) | Voice signal information processing method and device | |
EP2994911B1 (en) | Adaptive audio frame processing for keyword detection | |
US11039204B2 (en) | Frequency band selection and processing techniques for media source detection | |
CN108920128B (en) | Operation method and system of presentation | |
CN109473104B (en) | Voice recognition network delay optimization method and device | |
WO2017166649A1 (en) | Voice signal processing method and device | |
WO2017166650A1 (en) | Voice recognition method and device | |
US11202066B2 (en) | Video data encoding and decoding method, device, and system, and storage medium | |
US20170195617A1 (en) | Image processing method and electronic device | |
US10103999B2 (en) | Jitter buffer level estimation | |
CN104125509A (en) | Program identification method, device and server | |
CN110827858B (en) | Voice endpoint detection method and system | |
CN110503944B (en) | Method and device for training and using voice awakening model | |
CN106896933B (en) | method and device for converting voice input into text input and voice input equipment | |
CN105786441A (en) | Audio processing method, server, user equipment and system | |
US11196868B2 (en) | Audio data processing method, server, client and server, and storage medium | |
CN106412676A (en) | Video code stream switching method and device, and electronic device | |
CN102111617A (en) | Streaming media decoding method and device | |
US10431236B2 (en) | Dynamic pitch adjustment of inbound audio to improve speech recognition | |
CN107493478B (en) | Method and device for setting coding frame rate | |
CN109741756B (en) | Method and system for transmitting operation signal based on USB external equipment | |
US20170280193A1 (en) | Method and device for processing a streaming media file | |
CN110890104A (en) | Voice endpoint detection method and system | |
CN105551500B (en) | A kind of acoustic signal processing method and device | |
WO2017031853A1 (en) | Grouping management method for playing devices, and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16895178 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 16895178 Country of ref document: EP Kind code of ref document: A1 |