CN108520760B - Voice signal processing method and terminal - Google Patents

Voice signal processing method and terminal Download PDF

Info

Publication number
CN108520760B
CN108520760B CN201810259017.7A CN201810259017A CN108520760B CN 108520760 B CN108520760 B CN 108520760B CN 201810259017 A CN201810259017 A CN 201810259017A CN 108520760 B CN108520760 B CN 108520760B
Authority
CN
China
Prior art keywords
voice signal
sentence
content
voice
content corresponding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810259017.7A
Other languages
Chinese (zh)
Other versions
CN108520760A (en
Inventor
符升升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN201810259017.7A priority Critical patent/CN108520760B/en
Publication of CN108520760A publication Critical patent/CN108520760A/en
Application granted granted Critical
Publication of CN108520760B publication Critical patent/CN108520760B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/72Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for transmitting results of analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/52User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail for supporting social networking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72406User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by software upgrading or downloading
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computing Systems (AREA)
  • Environmental & Geological Engineering (AREA)
  • Telephone Function (AREA)

Abstract

The embodiment of the invention discloses a voice signal processing method and a terminal. The method is applied to the terminal and comprises the following steps: the method comprises the steps of detecting voice signals received by a terminal in the process of voice communication, determining the content corresponding to a first voice signal according to a second voice signal when detecting that the first voice signal in the voice signals does not meet parameter conditions, wherein the second voice signal comprises at least one of the voice signal with preset duration received before and the voice signal with preset duration received after the first voice signal in the voice signals, and displaying the content corresponding to the first voice signal on a display interface for a user to refer to, so that the mutual interruption of the two voice communication parties and the operation of repeating the voice content are reduced, the smooth operation of the voice communication is ensured, and the user experience is improved.

Description

Voice signal processing method and terminal
Technical Field
The embodiment of the invention relates to the technical field of information processing, in particular to a voice signal processing method and a terminal.
Background
The social software has a real-time interaction function, and is realized by using a network, so that users can perform real-time interaction operations in various forms such as short messages, voice, video and the like after being friends with each other.
When a user uses social software to perform voice communication with other friends, if a network connected with a terminal fluctuates, a voice signal is distorted, so that the user cannot clearly listen to voice information of the friends. At this time, in order to know the voice information of the friend, the user usually interrupts the voice call with the friend to make the friend repeat the voice information again, however, the repeat operation of the voice information increases the workload of the friend, increases the time consumption of the voice call, and reduces the user experience.
Disclosure of Invention
The invention provides a voice signal processing method, which aims to solve the problems that when a voice signal is distorted, the repeated operation of voice information increases the workload of friends, increases the time consumption of voice communication and reduces the user experience.
In a first aspect, a method for processing a voice signal is provided, and is applied to a terminal, and includes:
Detecting a voice signal received by the terminal in the process of carrying out voice communication;
When detecting that a first voice signal in the voice signals does not meet parameter conditions, determining content corresponding to the first voice signal according to a second voice signal, wherein the second voice signal comprises at least one of a voice signal with a preset time length received before and a voice signal with a preset time length received after the first voice signal in the voice signals;
And displaying the content corresponding to the first voice signal on a display interface.
In a second aspect, a terminal is provided, including:
The signal detection module is used for detecting the voice signal received by the terminal in the process of voice call;
The content determining module is used for determining the content corresponding to a first voice signal according to a second voice signal when detecting that the first voice signal in the voice signals does not meet parameter conditions, wherein the second voice signal comprises at least one of a voice signal with a preset time length received before and a voice signal with a preset time length received after the first voice signal in the voice signals;
And the content display module is used for displaying the content corresponding to the first voice signal on a display interface.
In this way, in the embodiment of the present invention, in the process of performing voice call, a voice signal received by a terminal is detected, when it is detected that a first voice signal in the voice signal does not satisfy a parameter condition, it is determined that the first voice signal is distorted, and then, according to a second voice signal received before or after the first voice signal, content corresponding to the first voice signal is determined and displayed on a display interface for a user to refer to, so that operations of mutual interruption and repeated voice content of both voice calls are reduced, smooth performance of voice call is ensured, and user experience is improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a flow chart of a speech signal processing method of one embodiment of the present invention;
FIG. 2 is a flow chart of a speech signal processing method according to another embodiment of the present invention;
FIG. 3 is a flow chart of a speech signal processing method of one example of the present invention;
FIG. 4 is a block diagram of a terminal of one embodiment of the present invention;
Fig. 5 is a schematic diagram of a hardware structure of a mobile terminal according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Fig. 1 is a flowchart of a speech signal processing method according to an embodiment of the present invention. The voice signal processing method shown in fig. 1 is applied to a terminal, and the method comprises the following steps:
Step 101, detecting a voice signal received by a terminal in the process of carrying out voice communication.
The terminal can be various, for example, fixed terminal, mobile terminal, fixed terminal can be various, for example, desktop computer, mobile terminal can be various, for example, cell-phone, notebook, panel etc..
The social software has a real-time interaction function and is installed on the terminal, and when the terminal is connected with a network, such as a mobile network or a wireless network, a user can use the social software to perform real-time interaction in a voice call, text communication and other modes with friends of the user. When a user uses social software to carry out voice communication with friends of the user, a voice signal is sent between a terminal used by the user and a terminal used by the friends, and the terminal used by the user receives the voice signal sent by the terminal used by the user.
In the process of voice call, due to the influence of a plurality of factors such as network quality, environmental noise, pronunciation speed of a user and the like, a voice signal is distorted, so that the user cannot clearly hear the call content of friends.
In order to solve the problems, the invention detects the voice signal received by the terminal, detects whether the voice signal is distorted, and processes the voice signal when the distortion of the voice signal is detected, so that a user can clearly listen to the conversation content of friends.
Step 102, when it is detected that a first voice signal in the voice signals does not satisfy a parameter condition, determining content corresponding to the first voice signal according to a second voice signal, where the second voice signal includes at least one of a voice signal of a preset duration received before and a voice signal of a preset duration received after the first voice signal in the voice signals.
The first voice signal is a portion of a voice signal received by the terminal. The second voice signal is another part of the voice signal received by the terminal, and is a voice signal with a preset time length received before the first voice signal, a voice signal with a preset time length received after the first voice signal, or a combination of the voice signal with the preset time length received before and the voice signal with the preset time length received after the first voice signal.
The content defined by the preset duration may be various, for example, the preset duration may be a preset specified duration, or may be a non-fixed duration corresponding to an actual call situation. In general, to improve the processing effect of the voice signal, the preset duration is preferably a non-fixed duration corresponding to the actual call situation.
The invention presets the parameter condition of the voice signal, the terminal judges the received voice signal, when the voice signal meets the parameter condition, the voice signal is judged not to be distorted, when the voice signal does not meet the parameter condition, the voice signal is judged to be distorted.
After the first voice signal is detected not to meet the parameter condition, the first voice signal is judged to be distorted, a user of the terminal cannot clearly listen to the conversation content of friends, and then the content corresponding to the first voice signal is determined according to the content corresponding to the second voice signal which is not distorted. Specifically, the content corresponding to the first voice signal may be semantically presumed according to the content corresponding to the second voice signal.
In practice, after detecting that the first voice signal does not satisfy the parameter condition, the content corresponding to the first voice signal may be determined directly according to the content corresponding to the second voice signal, or the first voice signal may be recognized by a voice recognition method, and after failing to recognize the content corresponding to the first voice signal, the content corresponding to the first voice signal may be determined according to the content corresponding to the second voice signal.
The parameter condition may be various, for example, at least one of a frequency condition, a noise ratio condition and a corresponding speech rate condition, and besides the parameter conditions exemplified above, other parameter conditions may also be used, and may be set according to the actual situation. When the parameter condition includes a frequency condition, the frequency condition may define a frequency range, a frequency variation amplitude range, or the like; when the parameter condition includes a noise ratio condition, the noise ratio condition may define a noise ratio range, etc.; when the parameter condition includes a speech rate condition corresponding to the speech signal, the speech rate condition may define a speech rate range, a speech rate variation range, and the like.
And 103, displaying the content corresponding to the first voice signal on a display interface.
After the content corresponding to the first voice signal is determined, the content is displayed on a display interface of the terminal for a user to refer to, so that unnecessary repetition of both parties of the voice call in the process of the voice call is reduced, the smooth operation of the voice call is ensured, and the user experience of the voice call is improved. The determined content corresponding to the first voice signal can be displayed on a display interface in a text form, a picture form, a text and picture combination form and the like for a user to view.
In order to facilitate the user to integrally understand the voice call content, the content corresponding to the first voice signal and the content corresponding to the second voice signal can be simultaneously displayed on the display interface for the user to view according to the receiving time sequence of the signals.
The voice signal processing function may be started in various ways, for example, when the user starts a voice call with a friend, the voice signal processing function may be started, or the function may be started after the terminal receives a function start instruction executed by the user, for example, the function may be started after a selection operation of a preset option or a button by the user is received, or other suitable function starting ways may be used.
According to the embodiment of the invention, in the process of voice communication, voice signals received by a terminal are detected, when the fact that a first voice signal in the voice signals does not meet parameter conditions is detected, the first voice signal is judged to be distorted, then, according to a second voice signal received before or after the first voice signal, the content corresponding to the first voice signal is determined and displayed on a display interface for a user to refer to, so that the operation of mutual interruption and repeated voice content of both voice communication sides is reduced, the smooth operation of voice communication is ensured, and the user experience is improved.
Example two
Fig. 2 is a flowchart of a speech signal processing method according to another embodiment of the present invention. The voice signal processing method shown in fig. 2 is applied to a terminal, and the method includes:
Step 201, in the process of carrying out voice communication, detecting a voice signal received by a terminal.
In the process of voice communication, a terminal used by a user receives a voice signal sent by a terminal used by a friend, and the terminal of the user detects the received voice signal to detect whether the voice signal is normal.
Step 202, when it is detected that the first voice signal in the voice signals does not satisfy the parameter condition, constructing a sentence to be corrected according to the content corresponding to the second voice signal, wherein the sentence to be corrected has a vacancy at the sentence position corresponding to the first voice signal.
The second voice signal is one or a combination of a voice signal received before and a voice signal received after the first voice signal, and the second voice signal satisfies a parameter condition, and the corresponding content can be determined.
In the embodiment of the invention, the terminal stores the voice call content to the appointed storage position in the voice call process. There are various voice call content storage manners, for example, storing the voice call content of the friend/both parties in the whole voice call process, or storing the voice call content of the friend/both parties in a preset historical time length, such as the past half minute, and may also be other applicable storage manners, where the symbol "/" represents the meaning of "or".
And after detecting that the first voice signal does not meet the parameter condition, extracting the content corresponding to the second voice signal from the appointed storage position of the terminal, and constructing a sentence to be corrected according to the content corresponding to the second voice signal, wherein the sentence to be corrected has a vacancy at the sentence position corresponding to the first voice signal.
Since there are various receiving orders of the second speech signal with respect to the first speech signal, there are various positions of the vacancy in the sentence to be corrected, and specifically, the vacancy may be located at the middle, end, front end, etc. of the sentence to be corrected.
There are various ways to construct the sentence to be corrected according to the content corresponding to the second voice signal, for example, first, according to the corresponding relationship between the signal duration of the second voice signal and the content length thereof, the content length of the first voice signal matching the signal duration thereof is determined; secondly, constructing a sentence to be corrected according to the content corresponding to the second voice signal and the vacancy of the corresponding content length.
Illustratively, after detecting that the first voice signal in the received voice signal does not meet the parameter condition, namely after judging that the first voice signal is distorted, according to the number of characters (N1) of the content corresponding to the undistorted second voice signal and the signal time length (t1) of the content, determining the signal time length (t1/N1) corresponding to one character of the second voice signal, dividing the signal time length (t2) of the distorted first voice signal by (t1/N1) to obtain the number of characters (N1 · t2/t1) contained in the content corresponding to the first voice signal, determining the number of characters (N1 · t2/t1) corresponding to the vacancy, and constructing a sentence to be corrected according to the vacancy of which the corresponding number of characters is (N1 · t2/t1) and the content corresponding to the second voice signal.
Step 203, searching a target statement matched with the statement to be corrected from the statement database.
A sentence database is preset, and a large number of sentences are recorded in the sentence database. And after constructing the sentence to be corrected according to the content corresponding to the second voice signal, searching a target sentence matched with the sentence to be corrected from the sentence database.
And step 204, taking the content corresponding to the vacancy in the target sentence as the content corresponding to the first voice signal.
In the target sentences matched from the sentence database, the content corresponding to the vacancy of the sentence to be corrected is the content corresponding to the first voice signal.
The target sentence matched from the sentence database may include one or more than two. When the target sentences include more than two target sentences, the content corresponding to the vacancy in all the target sentences can be used as the content corresponding to the first voice signal, and in the subsequent steps, a plurality of contents are displayed on a display interface of the terminal for a user to view; after more than two target sentences are found, the more than two target sentences can be sequenced first, and then the content corresponding to the vacancy in the N target sentences before sequencing is taken as the content corresponding to the first voice signal, wherein N is a positive integer greater than or equal to 1, and the size of N can be set according to the practice.
There are various ways to sort the two or more target sentences, for example, the two or more target sentences are sorted according to at least one of the receiving time of the first voice signal, the position information of the terminal, and the pronunciation effect corresponding to the first voice signal. More than two target statements may also be ordered according to other parameters, and the embodiments of the present invention are not limited herein.
For example, the content corresponding to the second voice signal received before the first voice signal is "you", and the content corresponding to the second voice signal received after the first voice signal is "how a meal was eaten in the morning? Based on the signal duration of the second speech signal and the number of characters in the content, and based on the signal duration of the first speech signal, it is presumed that the content corresponding to the first speech signal includes two characters, and the sentence to be modified is "you are a how have a meal in the morning? ", each", represents a character. After the sentence to be corrected is constructed, matching the sentence to be corrected with the sentence database to obtain five target sentences matched with the sentence to be corrected, and sequencing the five target sentences to obtain a sequencing result of the target sentences as follows: "how do you have a meal in the morning today? "," how did you eat in the morning yesterday? "," how did you have a meal in the morning on the weekend? "," do you remember how much a meal was eaten in the morning? "and" how did you not have a meal in the morning? "select the top 3 target sentences (" how did you eat in the morning today "," how did you eat in the morning yesterday.
And step 205, displaying the content corresponding to the first voice signal on a display interface.
And after the content corresponding to the vacancy in the target sentence is taken as the content corresponding to the first voice signal, the content corresponding to the first voice signal is displayed on a display interface for a user to view. For example, "how did you eat in the morning today? "," how did you eat in the morning yesterday? "and" how did you eat in the morning on weekends? And sequentially displaying the images on the display interface.
In operation, the content corresponding to the vacancy in the target sentence can be displayed on the display interface for the user to view, and the target sentence including the content corresponding to the vacancy can also be displayed on the display interface, that is, the content corresponding to the first voice signal and the content corresponding to the second voice signal are simultaneously displayed on the display interface.
In order to make the present invention more clearly understood by those skilled in the art, the speech signal processing method according to the embodiment of the present invention will now be described in detail by way of the following examples.
Fig. 3 is a flow chart of a speech signal processing method according to an example of the present invention. Referring to fig. 3, the voice signal processing method includes:
And S1, detecting that the voice call function of the social software is started.
And S2, starting network quality detection.
The quality of a network to which the terminal is connected is detected.
And S3, constructing a buffer pool, and recording the content corresponding to the voice signals of the two voice call parties within the preset duration n seconds into the buffer pool.
S4, judging whether the network connected with the terminal is abnormal, if not, executing step S5, if yes, executing step S6.
S5, judging whether the voice call is finished, if yes, the method is finished, if not, the step S4 is executed.
And S6, judging whether the current voice signal newly added into the buffer pool is distorted, if not, executing the step S5, and if so, executing the step S7.
In this example, the current voice signal is a voice signal newly received by the terminal at present, and the current voice signal includes two parts, one part of the voice signal satisfies the parameter condition, and the other part of the voice signal does not satisfy the parameter condition.
The distortion of the voice signal in this example means that the voice signal does not satisfy a preset parameter condition. The contents of the parameter conditions may be as described above with reference to the embodiments of the present invention.
S7, speech recognition is performed on the part of the current speech signal that is not distorted, and a text corresponding to the part of the current speech signal that is not distorted is recognized.
S8, judging whether the distorted part of the current voice signal can be recognized by voice, if not, executing steps S9-S11, if yes, executing step S12.
S9, natural language context estimation is performed on the content corresponding to the distorted partial speech signal based on the content corresponding to the undistorted partial speech signal.
The content corresponding to the distorted part of the speech signal can be inferred by adopting the recorded sentence database searching mode, and other applicable inference modes can also be adopted.
S10, sorting the plurality of inferred contents, and screening the contents m before sorting, wherein m is a positive integer greater than or equal to 1.
And S11, placing the deduced m contents at a designated position to generate a corrected content corresponding to the current voice signal, wherein the designated position is a sentence position corresponding to a part of the voice signal which is distorted in a sentence corresponding to the current voice signal.
After the end of step S11, S13 is executed.
And S12, placing the content recognized by the voice at a specified position, and generating the correction content corresponding to the current voice signal, wherein the specified position is the sentence position corresponding to the part of the voice signal which is distorted in the sentence corresponding to the current voice signal.
After the end of step S12, S13 is executed.
And S13, displaying the content corresponding to the corrected current voice signal on a display interface for the user to refer to.
After the end of step S13, S5 is executed.
According to the method, the content corresponding to the transient distorted voice signal caused by network fluctuation is corrected in an auxiliary mode by utilizing the voice recording, voice recognition and natural language context inference technology, the corrected content is displayed on the display interface for the user to refer to, and the user can roughly know the content corresponding to the distorted voice signal by checking the content on the display interface, so that unnecessary repeat of two parties in the voice communication process is reduced, the smooth voice communication is ensured, and the user experience of the network voice communication is improved.
According to the embodiment of the invention, in the process of voice communication, voice signals received by a terminal are detected, when the fact that a first voice signal in the voice signals does not meet parameter conditions is detected, the first voice signal is judged to be distorted, then, according to a second voice signal received before or after the first voice signal, the content corresponding to the first voice signal is determined and displayed on a display interface for a user to refer to, so that the operation of mutual interruption and repeated voice content of both voice communication sides is reduced, the smooth operation of voice communication is ensured, and the user experience is improved.
EXAMPLE III
Fig. 4 is a block diagram of a terminal of one embodiment of the present invention. The terminal shown in fig. 4 includes:
The signal detection module 301 is configured to detect a voice signal received by the terminal in a voice call process.
A content determining module 302, configured to determine, when it is detected that a first voice signal in the voice signals does not satisfy a parameter condition, a content corresponding to the first voice signal according to a second voice signal, where the second voice signal includes at least one of a voice signal of a preset duration received before and a voice signal of a preset duration received after the first voice signal in the voice signals.
A content display module 303, configured to display a content corresponding to the first voice signal on a display interface.
In this embodiment of the present invention, preferably, the content determining module 302 includes:
The sentence construction submodule is used for constructing a sentence to be corrected according to the content corresponding to the second voice signal, and the sentence to be corrected has a vacancy at the sentence position corresponding to the first voice signal;
The target sentence searching submodule is used for searching a target sentence matched with the sentence to be corrected from a sentence database;
And the content obtaining submodule is used for taking the content corresponding to the vacancy in the target sentence as the content corresponding to the first voice signal.
In the embodiment of the present invention, preferably, the sentence construction sub-module includes:
The content length determining submodule is used for determining the content length of the first voice signal matched with the signal duration of the first voice signal according to the corresponding relation between the signal duration of the second voice signal and the content length of the second voice signal;
And the sentence obtaining submodule is used for constructing the sentence to be corrected according to the content corresponding to the second voice signal and the vacancy corresponding to the content length.
In this embodiment of the present invention, preferably, the content determining module 302 further includes:
The sentence sequencing sub-module is used for sequencing at least two target sentences after the target sentences matched with the sentences to be corrected are searched in the slave sentence database, and the number of the target sentences searched in the database is at least two;
The content obtaining sub-module is specifically configured to use content corresponding to the vacancy in the target sentence of N before the sorting as content corresponding to the first speech signal, where N is a positive integer greater than or equal to 1.
In this embodiment of the present invention, preferably, the sentence sequencing sub-module is specifically configured to sequence a plurality of at least two target sentences according to at least one of the receiving time of the first voice signal, the position information of the terminal, and a pronunciation effect corresponding to the first voice signal.
In this embodiment of the present invention, preferably, the content display module 303 is specifically configured to display the target sentence including the content corresponding to the vacancy on the display interface.
According to the embodiment of the invention, in the process of voice communication, voice signals received by a terminal are detected, when the fact that a first voice signal in the voice signals does not meet parameter conditions is detected, the first voice signal is judged to be distorted, then, according to a second voice signal received before or after the first voice signal, the content corresponding to the first voice signal is determined and displayed on a display interface for a user to refer to, so that the operation of mutual interruption and repeated voice content of both voice communication sides is reduced, the smooth operation of voice communication is ensured, and the user experience is improved.
Fig. 5 is a schematic diagram of a hardware structure of a mobile terminal implementing various embodiments of the present invention.
The mobile terminal 400 includes, but is not limited to: radio frequency unit 401, network module 402, audio output unit 403, input unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, processor 410, and power supply 411. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 5 is not intended to be limiting of mobile terminals, and that a mobile terminal may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the mobile terminal includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like.
The radio frequency unit 401 is configured to receive a voice signal sent by an opposite terminal performing a voice call with a terminal in a process of performing the voice call.
The processor 410 is configured to detect a voice signal received by the terminal in a voice call process, and when it is detected that a first voice signal in the voice signal does not satisfy a parameter condition, determine content corresponding to the first voice signal according to a second voice signal, where the second voice signal includes at least one of a voice signal of a preset duration received before and a voice signal of a preset duration received after the first voice signal in the voice signal, and display the content corresponding to the first voice signal on a display interface.
It should be understood that, in the embodiment of the present invention, the radio frequency unit 401 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 410; in addition, the uplink data is transmitted to the base station. Typically, radio unit 401 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Further, the radio unit 401 can also communicate with a network and other devices through a wireless communication system.
The mobile terminal provides the user with wireless broadband internet access through the network module 402, such as helping the user send and receive e-mails, browse web pages, and access streaming media.
The audio output unit 403 may convert audio data received by the radio frequency unit 401 or the network module 402 or stored in the memory 409 into an audio signal and output as sound. Also, the audio output unit 403 may also provide audio output related to a specific function performed by the mobile terminal 400 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 403 includes a speaker, a buzzer, a receiver, and the like.
The input unit 404 is used to receive audio or video signals. The input Unit 404 may include a Graphics Processing Unit (GPU) 4041 and a microphone 4042, and the Graphics processor 4041 processes image data of a still picture or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 906. The image frames processed by the graphic processor 4041 may be stored in the memory 409 (or other storage medium) or transmitted via the radio frequency unit 401 or the network module 402. The microphone 4042 may receive sound, and may be capable of processing such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 401 in case of the phone call mode.
The mobile terminal 400 also includes at least one sensor 405, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 4061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 4061 and/or the backlight when the mobile terminal 400 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of the mobile terminal (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 405 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which will not be described in detail herein.
the Display unit 906 may include a Display panel 4061, and the Display panel 4061 may be configured in the form of a liquid Crystal Display (L CD), an Organic light-Emitting Diode (O L ED), or the like.
The user input unit 407 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 905 includes a touch panel 4071 and other input devices 4072. Touch panel 4071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 4071 using a finger, a stylus, or any suitable object or attachment). The touch panel 4071 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 410, receives a command from the processor 410, and executes the command. In addition, the touch panel 4071 can be implemented by using various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 4071, the user input unit 407 may include other input devices 4072. Specifically, the other input devices 4072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein again.
Further, the touch panel 4071 can be overlaid on the display panel 4061, and when the touch panel 4071 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 410 to determine the type of the touch event, and then the processor 410 provides a corresponding visual output on the display panel 4061 according to the type of the touch event. Although in fig. 5, the touch panel 4071 and the display panel 4061 are two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 4071 and the display panel 4061 may be integrated to implement the input and output functions of the mobile terminal, which is not limited herein.
The interface unit 408 is an interface through which an external device is connected to the mobile terminal 400. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 408 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 400 or may be used to transmit data between the mobile terminal 400 and external devices.
The memory 409 may be used to store software programs as well as various data. The memory 409 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 409 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor 410 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 409 and calling data stored in the memory 409, thereby integrally monitoring the mobile terminal. Processor 410 may include one or more processing units; preferably, the processor 410 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 410.
The mobile terminal 400 may further include a power supply 411 (e.g., a battery) for supplying power to various components, and preferably, the power supply 411 may be logically connected to the processor 410 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.
In addition, the mobile terminal 400 includes some functional modules that are not shown, and thus, are not described in detail herein.
Preferably, an embodiment of the present invention further provides a terminal, which includes a processor 410, a memory 409, and a computer program that is stored in the memory 409 and can be run on the processor 410, and when being executed by the processor 410, the computer program implements each process of the foregoing speech signal processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not described here again.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the foregoing speech signal processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (6)

1. A voice signal processing method is applied to a terminal, and is characterized by comprising the following steps:
Detecting a voice signal received by the terminal in the process of carrying out voice communication;
When detecting that a first voice signal in the voice signals does not meet parameter conditions, determining content corresponding to the first voice signal according to a second voice signal, wherein the second voice signal comprises at least one of a voice signal with a preset time length received before and a voice signal with a preset time length received after the first voice signal in the voice signals;
Displaying the content corresponding to the first voice signal on a display interface;
Wherein the determining the content corresponding to the first voice signal according to the second voice signal comprises:
Constructing a sentence to be corrected according to the content corresponding to the second voice signal, wherein the sentence to be corrected has a vacancy at the sentence position corresponding to the first voice signal;
Searching a target sentence matched with the sentence to be corrected from a sentence database;
Taking the content corresponding to the vacancy in the target sentence as the content corresponding to the first voice signal;
When the target sentences searched out from the database are at least two, after the target sentences matched with the sentences to be corrected are searched out from the sentence database, the method further comprises the following steps:
Ordering at least two of the target sentences;
The taking the content corresponding to the vacancy in the target sentence as the content corresponding to the first voice signal includes:
Taking the content corresponding to the vacancy in the target sentence of the top N in the sequence as the content corresponding to the first voice signal, wherein N is a positive integer greater than or equal to 1;
Wherein said ordering at least two of said target statements comprises:
And sequencing at least two target sentences according to at least one of the receiving time of the first voice signal, the position information of the terminal and the pronunciation effect corresponding to the first voice signal.
2. The method of claim 1, wherein constructing the sentence to be modified according to the content corresponding to the second speech signal comprises:
Determining the content length of the first voice signal matched with the signal duration of the second voice signal according to the corresponding relation between the signal duration of the second voice signal and the content length of the second voice signal;
And constructing the sentence to be corrected according to the content corresponding to the second voice signal and the vacancy corresponding to the length of the content.
3. The method of claim 1, wherein the displaying the content of the first speech signal on a display interface comprises:
Displaying the target sentence comprising the content corresponding to the vacancy on the display interface.
4. A terminal, comprising:
The signal detection module is used for detecting the voice signal received by the terminal in the process of voice call;
The content determining module is used for determining the content corresponding to a first voice signal according to a second voice signal when detecting that the first voice signal in the voice signals does not meet parameter conditions, wherein the second voice signal comprises at least one of a voice signal with a preset time length received before and a voice signal with a preset time length received after the first voice signal in the voice signals;
The content display module is used for displaying the content corresponding to the first voice signal on a display interface;
Wherein the content determination module comprises:
The sentence construction submodule is used for constructing a sentence to be corrected according to the content corresponding to the second voice signal, and the sentence to be corrected has a vacancy at the sentence position corresponding to the first voice signal;
The target sentence searching submodule is used for searching a target sentence matched with the sentence to be corrected from a sentence database;
A content obtaining submodule, configured to use content corresponding to the vacancy in the target sentence as content corresponding to the first voice signal;
Wherein the content determination module further comprises:
The sentence sequencing sub-module is used for sequencing at least two target sentences after the target sentences matched with the sentences to be corrected are searched in the slave sentence database, and the number of the target sentences searched in the database is at least two;
The content obtaining sub-module is specifically configured to use content corresponding to the vacancy in the target sentence of N before the sorting as content corresponding to the first speech signal, where N is a positive integer greater than or equal to 1;
The sentence sequencing submodule is specifically configured to sequence a plurality of at least two target sentences according to at least one of the receiving time of the first voice signal, the position information of the terminal, and the pronunciation effect corresponding to the first voice signal.
5. The terminal of claim 4, wherein the sentence construction sub-module comprises:
The content length determining submodule is used for determining the content length of the first voice signal matched with the signal duration of the first voice signal according to the corresponding relation between the signal duration of the second voice signal and the content length of the second voice signal;
And the sentence obtaining submodule is used for constructing the sentence to be corrected according to the content corresponding to the second voice signal and the vacancy corresponding to the content length.
6. The terminal of claim 4, wherein:
The content display module is specifically configured to display the target sentence including the content corresponding to the vacancy on the display interface.
CN201810259017.7A 2018-03-27 2018-03-27 Voice signal processing method and terminal Active CN108520760B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810259017.7A CN108520760B (en) 2018-03-27 2018-03-27 Voice signal processing method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810259017.7A CN108520760B (en) 2018-03-27 2018-03-27 Voice signal processing method and terminal

Publications (2)

Publication Number Publication Date
CN108520760A CN108520760A (en) 2018-09-11
CN108520760B true CN108520760B (en) 2020-07-24

Family

ID=63434318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810259017.7A Active CN108520760B (en) 2018-03-27 2018-03-27 Voice signal processing method and terminal

Country Status (1)

Country Link
CN (1) CN108520760B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109286554B (en) * 2018-09-14 2021-07-13 腾讯科技(深圳)有限公司 Social function unlocking method and device in social application
CN113422868B (en) * 2021-05-19 2022-08-09 北京荣耀终端有限公司 Voice call method and device, electronic equipment and computer readable storage medium
CN115798465B (en) * 2023-02-07 2023-04-07 天创光电工程有限公司 Voice input method, system and readable storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8000452B2 (en) * 2004-07-26 2011-08-16 General Motors Llc Method and system for predictive interactive voice recognition
US8200691B2 (en) * 2006-11-29 2012-06-12 Sap Ag Action prediction based on interactive history and context between sender and recipient
US8762156B2 (en) * 2011-09-28 2014-06-24 Apple Inc. Speech recognition repair using contextual information
CN104160392B (en) * 2012-03-07 2017-03-08 三菱电机株式会社 Semantic estimating unit, method
US20170194000A1 (en) * 2014-07-23 2017-07-06 Mitsubishi Electric Corporation Speech recognition device and speech recognition method
CN106847280B (en) * 2017-02-23 2020-09-15 海信集团有限公司 Audio information processing method, intelligent terminal and voice control terminal
CN106856093A (en) * 2017-02-23 2017-06-16 海信集团有限公司 Audio-frequency information processing method, intelligent terminal and Voice command terminal

Also Published As

Publication number Publication date
CN108520760A (en) 2018-09-11

Similar Documents

Publication Publication Date Title
CN108234289B (en) Message display method and device and mobile terminal
CN108391008B (en) Message reminding method and mobile terminal
CN110072012B (en) Reminding method for screen state switching and mobile terminal
CN108616448B (en) Information sharing path recommendation method and mobile terminal
CN111130989B (en) Information display and sending method and electronic equipment
CN109388456B (en) Head portrait selection method and mobile terminal
CN109523253B (en) Payment method and device
CN109412932B (en) Screen capturing method and terminal
CN108984066B (en) Application icon display method and mobile terminal
CN110096203B (en) Screenshot method and mobile terminal
CN108920040B (en) Application icon sorting method and mobile terminal
CN108520760B (en) Voice signal processing method and terminal
CN108446339B (en) Application icon classification method and mobile terminal
CN109982273B (en) Information reply method and mobile terminal
CN110784394A (en) Prompting method and electronic equipment
CN108628534B (en) Character display method and mobile terminal
CN108270928B (en) Voice recognition method and mobile terminal
CN110825474A (en) Interface display method and device and electronic equipment
CN108093119B (en) Strange incoming call number marking method and mobile terminal
CN108307048B (en) Message output method and device and mobile terminal
CN111142759B (en) Information sending method and electronic equipment
CN109274814B (en) Message prompting method and device and terminal equipment
CN109660657B (en) Application program control method and device
CN111597435A (en) Voice search method and device and electronic equipment
CN109684006B (en) Terminal control method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant