CN107331405A

CN107331405A - A kind of voice information processing method and server

Info

Publication number: CN107331405A
Application number: CN201710519934.XA
Authority: CN
Inventors: 王明川
Original assignee: Shenzhen Jinli Communication Equipment Co Ltd
Current assignee: Shenzhen Jinli Communication Equipment Co Ltd
Priority date: 2017-06-30
Filing date: 2017-06-30
Publication date: 2017-11-07

Abstract

The embodiment of the invention discloses a kind of method of speech processing and server, wherein method includes：Server receives the voice messaging that first terminal is sent, and the voice messaging includes voice information and ambient noise information；The server carries out the ambient noise information in the voice messaging to screen out processing, obtains target voice information；The target voice information is sent to second terminal by the server, and the first terminal and the second terminal are in the partner terminal of talking state each other.Using the embodiment of the present invention, the ambient noise information in voice messaging can be carried out automatically screening out/filtering, obtain voice information, the definition of voice call can be lifted, it is to avoid ambient noise is excessive to cover the problems such as call voice causes to answer unintelligible.

Description

Voice information processing method and server

Technical Field

The present invention relates to the field of terminal technologies, and in particular, to a voice information processing method and a server.

Background

With the development of the mobile internet, the communication mode changes from being covered over the air, and information transfer is from ancient beacon to the wired telephone of the 20 th century, and then to the intelligent terminal at present, so the development is rapid. However, how to transmit information that is effectively available to users is a common problem that needs to be solved. For example, in the process of calling, if the external environment where we are located is noisy, such as in a subway and a crowded crowd, it is useless no matter how much the terminal is enlarged, and the calling party cannot hear the information that we want to inform the calling party.

To solve the above problems, users generally adopt the following methods. Firstly, an answering terminal manually adjusts a volume key of the terminal to improve the available volume frequency of the terminal; second, the user actively selects a relatively quiet environment to talk. In practice, the first mode does not solve the problem that the call voice cannot be clearly heard due to large noise, when the volume of the terminal is increased, the noise is increased while the call voice is increased, the noise can cover the call voice, and a user cannot clearly hear the call voice. The second method is limited to an objective environment, and is not very practical depending on user behavior.

Disclosure of Invention

The embodiment of the invention provides a voice information processing method and a server, which can automatically screen out environmental noise information in voice information so as to improve the definition of voice communication.

In a first aspect, an embodiment of the present invention provides a method for processing voice information, where the method includes:

the method comprises the steps that a server receives voice information sent by a first terminal, wherein the voice information comprises call voice information and environmental noise information;

the server selects information of which the voice decibel exceeds a first decibel threshold value from the voice information as the environmental noise information;

the server screens out environmental noise information in the voice information to obtain target voice information;

and the server sends the target voice information to a second terminal, and the first terminal and the second terminal are opposite terminals in a communication state.

In a second aspect, an embodiment of the present invention provides a server, where the server includes:

the receiving unit is used for receiving voice information sent by a first terminal, wherein the voice information comprises call voice information and environmental noise information;

the selection unit is used for selecting information of which the voice decibel exceeds a first decibel threshold value from the voice information as the environmental noise information;

the processing unit is used for screening out the environmental noise information in the voice information to obtain target voice information;

and the sending unit is used for sending the target voice information to a second terminal, and the first terminal and the second terminal are opposite terminals in a communication state.

In a third aspect, an embodiment of the present invention provides another server, where the server includes a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, where the memory is used to store an application program code that supports a terminal to execute the foregoing method, and the processor is configured to execute the foregoing method according to the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, the computer program comprising program instructions, which, when executed by a processor, cause the processor to perform the method of the first aspect.

In the embodiment of the invention, a server receives voice information sent by a first terminal, wherein the voice information comprises call voice information and environmental noise information, then information with a voice decibel exceeding a first decibel threshold value is selected from the voice information to serve as the environmental noise information, then the server screens out the environmental noise information in the voice information to obtain target voice information, and finally the server sends the target voice information to a second terminal, and the first terminal and the second terminal are opposite call terminals in a call state; therefore, environmental noise information in the voice information can be automatically screened out, the definition of voice communication is improved, and the problems that the communication voice is covered by too much environmental noise to cause unclear answering and the like are avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a voice information processing method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart diagram of a method for processing voice information according to another embodiment of the present invention;

FIG. 3 is a schematic block diagram of a server provided by an embodiment of the present invention;

FIG. 4 is a schematic block diagram of a server provided by another embodiment of the present invention;

fig. 5 is a schematic block diagram of a server according to another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

In particular implementations, the terminals described in embodiments of the invention include, but are not limited to, other portable devices such as mobile phones, laptop computers, or tablet computers having touch sensitive surfaces (e.g., touch screen displays and/or touch pads). It should also be understood that in some embodiments, the device is not a portable communication device, but is a desktop computer having a touch-sensitive surface (e.g., a touch screen display and/or touchpad).

In the discussion that follows, a terminal that includes a display and a touch-sensitive surface is described. However, it should be understood that the terminal may include one or more other physical user interface devices such as a physical keyboard, mouse, and/or joystick.

The terminal supports various applications, such as one or more of the following: a drawing application, a presentation application, a word processing application, a website creation application, a disc burning application, a spreadsheet application, a gaming application, a telephone application, a video conferencing application, an email application, an instant messaging application, an exercise support application, a photo management application, a digital camera application, a web browsing application, a digital music player application, and/or a digital video player application.

Various applications that may be executed on the terminal may use at least one common physical user interface device, such as a touch-sensitive surface. One or more functions of the touch-sensitive surface and corresponding information displayed on the terminal can be adjusted and/or changed between applications and/or within respective applications. In this way, a common physical architecture (e.g., touch-sensitive surface) of the terminal can support various applications with user interfaces that are intuitive and transparent to the user.

Referring to fig. 1, which is a schematic flowchart of a method for processing voice information according to an embodiment of the present invention, the method shown in fig. 1 may include the following implementation steps:

step S101, a server receives voice information sent by a first terminal, wherein the voice information comprises call voice information and environmental noise information.

In the embodiment of the present invention, when the first terminal and the second terminal are in a call state, sound sensors, which may also be referred to as mic (microphone) sensors, are built in the first terminal and the second terminal. The sensitivity of the sound sensor is required to be high, for example, the sound sensor can monitor sound information of more than 30 decibels (db) and less than 120db, and the invention is not limited.

The first terminal can collect voice information by using a sound sensor, wherein the voice information can comprise call voice information of a user and can also comprise environmental noise information of the environment where the user is located. Specifically, when the first user uses the first terminal to talk in a noisy environment, the noise decibel of the external environment is large, which may cause an increase in the current generated by the eardrum in the earpiece of the first terminal (i.e., increase the vibration of the electret film in the earpiece, which may increase the current). When the current exceeds the preset current threshold, the first terminal can consider that the first user carries out voice call in a noisy environment, and at the moment, the first terminal needs to monitor and filter voice information collected by the sound sensor. The first terminal may then send the voice information to a server over a network. Accordingly, the server receives the voice information.

Or, after the first terminal collects the voice information by using the sound sensor, whether the voice decibel of the voice information is within a preset decibel threshold range or not can be detected, if not, the first terminal can consider that the voice information needs to be monitored and filtered, and the voice information is sent to the server in the same way. The preset decibel threshold may be system or user configurable, such as 80-90db, etc.

In some embodiments, the first terminal may encode the voice information in a preset format, and send the encoded voice information in the preset format to the server. The preset format herein includes, but is not limited to, any of the following: binary format, octal format, hexadecimal format, etc.

And step S102, the server selects information with a voice decibel exceeding a first decibel threshold value from the voice information as the environmental noise information.

Specifically, the server may select, from the voice information, information that a voice decibel (i.e., sound decibel) exceeds a first decibel threshold as the environmental noise information. The server can delete the environmental noise information of which the voice exceeds a first decibel threshold value respectively to obtain the communication voice information in the voice information. The first decibel threshold can be system configured or user configured, such as 90db or the like.

And step S103, the server screens out the environmental noise information in the voice information to obtain target voice information.

The server can intelligently filter invalid information (namely environmental noise information) in the voice information by analyzing the voice information, so as to acquire effective call voice information in the voice information. In an embodiment of the present invention, the filtering, by the server, the environmental noise information in the voice information to obtain target voice information includes: the server screens out environmental noise information in the voice information to obtain the call voice information; and the server amplifies the call voice information to obtain target voice information.

Specifically, the server may delete the environmental noise information in which the voice decibel exceeds the first decibel threshold value in the voice information, so as to obtain the call voice information. Then, the server can amplify the whole voice decibel of the call voice information to obtain the target voice information, so that the situation that the call voice decibel of the first user is too small and the user of the answering party cannot hear clearly is avoided.

In another embodiment of the present invention, the server amplifies the call voice information to obtain target voice information, including: and the server amplifies the call voice information of which the voice decibel does not exceed a second decibel threshold value to obtain the target voice information.

Specifically, under the condition of reducing voice distortion, the server can amplify the information of which the voice decibel does not exceed the second decibel threshold value in the call voice information in a voice decibel amplifying mode, and does not process the information of which the voice decibel exceeds the second decibel threshold value in the call voice information, so that the target voice information is obtained, and the situation that the call sound of the first user is too small and the user of the answering party cannot hear clearly can be avoided.

And step S104, the server sends the target voice information to a second terminal, and the first terminal and the second terminal are opposite terminals in a communication state.

In the embodiment of the invention, the server can send the target voice information to the second terminal, so that the second terminal can listen to the target voice information after the environmental noise information is removed. Specifically, the server may further compress the target voice information to send the compressed target voice information to the second terminal, so as to save transmission resources.

In an embodiment of the present invention, before the server sends the target voice information to the second terminal, the method further includes: the server carries out voice recognition on the target voice information to obtain target character information corresponding to the target voice information; the server sends the target voice information to a second terminal, and the method comprises the following steps: and the server sends the target voice information and the target character information to a second terminal so as to synchronously display the target character information when the target voice information is played by the second terminal.

Specifically, the server may further perform speech recognition or speech coding on the target speech information to obtain corresponding target text information. The server can send the target voice information and the target character information to the second terminal together, so that the target character information is synchronously displayed when the second terminal answers the target voice information, and a user can answer the speaking content of a clear conversation opposite side more conveniently. Optionally, the server may perform compression coding on the target voice information and the target text information, so as to send the compressed target voice information and the compressed target text information to the second terminal, so as to save transmission resources.

In yet another embodiment of the present invention, the method further comprises: and the server records the voice information or the target voice information according to a preset format. Specifically, the server may store the voice information with a preset format sent by the first terminal in a database, or the server may store the call voice information with the environmental noise information removed in the database, where the call voice information also has the preset format, for example, the server stores the voice information or the target voice information in the database in a binary form.

In yet another embodiment of the present invention, the method further comprises: and the server destroys the voice information or the target voice information with the recording time exceeding a preset time threshold. Specifically, to save storage resources, the database records the latest call voice information for each call matching. The server may delete/destroy the voice message or the target voice message recorded in the database for a time exceeding a preset time threshold (e.g., one week).

Referring to fig. 2, which is a schematic flow chart of a method for processing voice information according to another embodiment of the present invention, the method shown in fig. 2 may include the following implementation steps:

step S201, a server receives voice information sent by a first terminal, wherein the voice information comprises call voice information and environmental noise information.

Step S202, the server selects information of which the voice decibel exceeds a first decibel threshold value from the voice information as environmental noise information.

And step S203, the server screens out the environmental noise information in the voice information to obtain the call voice information.

Step S204, the server amplifies the call voice information of which the voice decibel does not exceed a second decibel threshold value to obtain the target voice information.

In an embodiment of the present invention, another alternative implementation manner of step S204 may be: and the server amplifies the call voice information to obtain target voice information.

Step S205, the server performs voice recognition on the target voice information to obtain target text information corresponding to the target voice information.

Step S206, the server sends the target voice information and the target character information to a second terminal so as to synchronously display the target character information when the second terminal plays the target voice information, wherein the first terminal and the second terminal are opposite terminals in a communication state.

It should be noted that, for content that is not shown in the embodiment of the present invention, reference may be made to the description related to the embodiment described in fig. 1, and details are not described here again.

Referring to fig. 3, which is a schematic block diagram of a server according to an embodiment of the present invention, the server 30 may include: a receiving unit 301, a selecting unit 302, a processing unit 303, and a transmitting unit 304; wherein,

the receiving unit 301 is configured to receive voice information sent by a first terminal, where the voice information includes call voice information and environmental noise information;

the selecting unit 302 is configured to select, from the voice information, information that a voice decibel exceeds a first decibel threshold as the environmental noise information;

the processing unit 303 is configured to perform filtering processing on the environmental noise information in the voice information to obtain target voice information;

the sending unit 304 is configured to send the target voice information to a second terminal, where the first terminal and the second terminal are opposite terminals in a call state.

Referring to fig. 4, which is a schematic block diagram of a server according to another embodiment of the present invention, the server 30 may include: the receiving unit 301, the selecting unit 302, the processing unit 303, and the transmitting unit 304; wherein, the sending unit 303 is configured to send the target voice information to the second terminal, and further comprises a recognition unit 305,

the recognition unit 305 is configured to perform voice recognition on the target voice information to obtain target text information corresponding to the target voice information;

the sending unit 304 is specifically configured to send the target voice information and the target text information to a second terminal, so that the target text information is synchronously displayed when the target voice information is played by the second terminal.

In some possible embodiments, the processing unit 303 is configured to perform filtering processing on environmental noise information in the voice information to obtain the call voice information; the processing unit 303 is further configured to amplify the call voice information to obtain target voice information.

Referring to fig. 5, a schematic block diagram of a server according to another embodiment of the present invention is shown. The server 30 in the present embodiment as shown in the figure may include: one or more processors 801; one or more input devices 802, one or more output devices 803, and memory 804. The processor 801, the input device 802, the output device 803, and the memory 804 described above are connected by a bus 805. The memory 802 is used to store instructions and the processor 801 is used to execute instructions stored by the memory 802. Wherein the processor 801 is configured to:

receiving voice information sent by a first terminal, wherein the voice information comprises call voice information and environmental noise information;

selecting information of which the voice decibel exceeds a first decibel threshold value from the voice information as the environmental noise information;

screening out environmental noise information in the voice information to obtain target voice information;

and sending the target voice information to a second terminal, wherein the first terminal and the second terminal are opposite communication terminals in a communication state.

In some possible embodiments, the processor 801 is configured to perform a filtering process on the environmental noise information in the voice information to obtain the call voice information; and amplifying the call voice information to obtain target voice information.

In some possible embodiments, the processor 801 is configured to amplify the call voice information with a voice decibel not exceeding the second decibel threshold, so as to obtain the target voice information.

In some possible embodiments, before the processor 801 is configured to send the target voice information to the second terminal, the processor 801 is further configured to perform voice recognition on the target voice information to obtain target text information corresponding to the target voice information; the processor 801 is configured to send the target voice information and the target text information to a second terminal, so as to synchronously display the target text information when the second terminal plays the target voice information.

It should be understood that in the present embodiment, the Processor 801 may be a Central Processing Unit (CPU), and the Processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The input device 802 may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of the fingerprint), a microphone, etc., and the output device 803 may include a display (LCD, etc.), a speaker, etc.

The memory 804 may include both read-only memory and random access memory, and provides instructions and data to the processor 801. A portion of the memory 804 may also include non-volatile random access memory. For example, the memory 804 may also store device type information.

In a specific implementation, the processor 801, the input device 802, and the output device 803 described in this embodiment of the present invention may execute the implementation manners described in the first embodiment and the second embodiment of the method for processing voice information provided in this embodiment of the present invention, and may also execute the implementation manners of the terminal described in this embodiment of the present invention, which is not described herein again.

In a further embodiment of the invention, a computer-readable storage medium is provided, which stores a computer program which, when executed by a processor, carries out any of the steps of the method embodiments described above.

The computer readable storage medium may be an internal storage unit of the terminal according to any of the foregoing embodiments, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used for storing the computer program and other programs and data required by the terminal. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the terminal and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed terminal and method can be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for processing speech information, the method comprising:

2. The method of claim 1, wherein the filtering out the environmental noise information in the voice information to obtain the target voice information comprises:

screening out environmental noise information in the voice information to obtain the call voice information;

and amplifying the call voice information to obtain target voice information.

3. The method according to claim 2, wherein the amplifying the call voice information to obtain the target voice information comprises:

and amplifying the call voice information of which the voice decibel does not exceed the second decibel threshold value to obtain the target voice information.

4. The method of claim 1, wherein before sending the target voice message to the second terminal, further comprising:

performing voice recognition on the target voice information to obtain target character information corresponding to the target voice information;

the sending the target voice information to a second terminal includes:

and sending the target voice information and the target character information to a second terminal so as to synchronously display the target character information when the target voice information is played by the second terminal.

5. The method according to any one of claims 1-4, further comprising:

and destroying the voice information or the target voice information with the recording time exceeding the preset time threshold.

6. A server, comprising:

7. The server according to claim 6,

the processing unit is specifically configured to screen out environmental noise information in the voice information to obtain the call voice information;

the processing unit is further specifically configured to amplify the call voice information to obtain target voice information.

8. The server according to claim 6, wherein the sending unit is configured to further include a recognition unit before sending the target voice message to the second terminal,

the recognition unit is used for carrying out voice recognition on the target voice information to obtain target character information corresponding to the target voice information;

the sending unit is specifically configured to send the target voice information and the target text information to a second terminal, so that the target text information is synchronously displayed when the target voice information is played by the second terminal.

9. A server comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.