CN109960910B

CN109960910B - Voice processing method, device, storage medium and terminal equipment

Info

Publication number: CN109960910B
Application number: CN201711339174.0A
Authority: CN
Inventors: 陈岩; 刘耀勇
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-12-14
Filing date: 2017-12-14
Publication date: 2021-06-08
Anticipated expiration: 2037-12-14
Also published as: WO2019114507A1; CN109960910A

Abstract

According to the voice processing method, the voice processing device, the storage medium and the terminal equipment, the voice information to be verified is obtained, and the voice information to be verified is verified; if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except for voice verification; receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information; and if the user passes the secondary verification, determining the voice information to be verified as pre-stored voice information and storing the pre-stored voice information into an identification library. The embodiment of the application has the advantages that when the voice information verification of the user fails, whether the voice information which fails in verification is stored in the recognition library or not is determined by judging the identity of the user, and the accuracy of the voice verification can be improved when the subsequent user uses the voice verification again.

Description

Voice processing method, device, storage medium and terminal equipment

Technical Field

The embodiment of the application relates to the technical field of computer voice processing, in particular to a voice processing method, a voice processing device, a storage medium and terminal equipment.

Background

With the development of terminal devices, the terminal devices become essential and portable tools for people in daily life, and the terminal devices also have a lot of private or important information about users, and users generally set verification on the terminal devices to prevent unsuspecting people from stealing the user's information through the terminal devices. The verification mode comprises a method for verifying the identity of the user based on voice recognition, namely, the terminal equipment acquires the voice spoken by the user and judges whether the voice is the voice of the user to verify the identity of the user. The voice of the user may be interfered by the self-factor or the external factor to change, for example, if the user catches a cold, the voice of the user catching a cold may appear heavier than the normal voice, and the terminal device may not recognize the voice of the user catching a cold, which results in low accuracy of voice recognition.

Disclosure of Invention

The embodiment of the application provides a voice processing method, a voice processing device, a storage medium and a terminal device, which can improve the accuracy of voice verification.

In a first aspect, an embodiment of the present application provides a speech processing method, including:

acquiring voice information to be verified, and verifying the voice information to be verified;

if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except for voice verification;

receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information;

and if the user passes the secondary verification, determining the voice information to be verified as pre-stored voice information and storing the pre-stored voice information into an identification library.

In a second aspect, an embodiment of the present application provides a speech processing apparatus, including:

the voice verification module is used for acquiring voice information to be verified and verifying the voice information to be verified;

the secondary starting module is used for starting secondary verification when the verification fails; wherein, the secondary verification comprises verification modes except for voice verification;

the secondary verification module is used for receiving secondary verification information input by a user and judging whether the user passes the secondary verification according to the secondary verification information;

and the voice storage module is used for determining the voice information to be verified as prestored voice information and storing the prestored voice information into an identification library when the user passes the secondary verification.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements a speech processing method according to an embodiment of the present application.

In a fourth aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer program to implement the voice processing method according to the embodiment of the present application.

According to the voice processing scheme provided by the embodiment of the application, the voice information to be verified is obtained and verified; if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except for voice verification; receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information; and if the user passes the secondary verification, determining the voice information to be verified as pre-stored voice information and storing the pre-stored voice information into an identification library. By adopting the technical scheme, when the voice information verification of the user fails, whether the voice information which fails in verification is stored in the recognition base or not is determined by judging the identity of the user, so that the accuracy of the voice verification can be improved when the subsequent user uses the voice verification again.

Drawings

Fig. 1 is a schematic flowchart of a speech processing method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of another speech processing method according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of another speech processing method according to an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating another speech processing method according to an embodiment of the present application;

FIG. 5 is a flowchart illustrating another speech processing method according to an embodiment of the present application;

FIG. 6 is a flowchart illustrating another speech processing method according to an embodiment of the present application;

fig. 7 is a block diagram of a speech processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of another terminal device provided in the embodiment of the present application.

Detailed Description

The technical scheme of the application is further explained by the specific implementation mode in combination with the attached drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Terminal devices may include smart phones, tablets, and other devices having an operating system. The unlocking mode of the terminal device can comprise a voice verification mode, and the opening verification of an application program (application) on the terminal device can also comprise the voice verification mode, but if the voice of the user changes due to cold, the voice verification cannot be passed. At the moment, the identity of the user is verified in other modes, the voice of the cold can be used as verification reference voice, and the accuracy of subsequent voice verification can be improved.

Fig. 1 is a flowchart of a speech processing method provided in an embodiment of the present application, where the speech processing method may be executed by a speech processing apparatus, where the speech processing apparatus may be implemented by software and/or hardware, and may be generally integrated in a terminal device, or may be integrated in other devices installed with an operating system. As shown in fig. 1, the method includes:

s110, obtaining the voice information to be verified, and verifying the voice information to be verified.

The voice information to be verified can be the voice information to be verified output by a user of the terminal equipment during voice verification, and the voice information to be verified is obtained through the terminal equipment. The voice authentication may be an authentication method in which the terminal device authenticates whether the terminal device is the administrator himself or herself through voice information to be authenticated output by the user. The voice information to be verified may be the voice information output by the user for unlocking the terminal device, or the voice information output by the user for unlocking the application program on the terminal device. Optionally, the application program may be WeChat, and the user may judge whether the user passes the verification according to the voice spoken by the user by speaking preset characters or numbers.

The user can analyze and verify the voice information to be verified by outputting the voice information to be verified. Optionally, the voice information to be verified may be verified by comparing the voice information to be verified with preset voice information in the recognition library, where the preset voice information may be standard voice information recorded when the administrator of the terminal device uses the corresponding voice verification earliest, and by comparing the voice information to be verified with the standard voice information, whether the voice information to be verified matches the standard voice information may be determined, and further, whether the user is the administrator of the terminal device himself or herself may be determined.

S111, if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except voice verification.

Optionally, the secondary verification may include at least one of fingerprint verification, password verification, pattern verification, face recognition verification, and iris recognition verification. If the voice information to be verified fails to be verified, whether the user is the administrator can be verified by starting secondary verification. The authentication mode on the terminal device includes multiple modes, and the secondary authentication mode may be determined according to user setting or system preset, which is not limited herein in the embodiments of the present application.

And S112, receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information.

And the secondary verification information is verification information input by the user according to the started secondary verification. Exemplarily, if the secondary authentication is fingerprint authentication, the secondary authentication information is fingerprint information of the user; and if the secondary verification is password verification, the secondary verification information is password information input by the user. And judging whether the user passes the secondary verification according to the secondary verification information, so that whether the user is the administrator can be determined.

And S113, if the user passes the secondary verification, determining the voice information to be verified as pre-stored voice information, and storing the pre-stored voice information in an identification library.

If the user passes the secondary verification, the user can be determined to be the administrator himself, the corresponding voice information to be verified is also the voice information of the administrator himself, the user can possibly change the voice due to some self factors or external factors, the voice to be verified can be determined to be the pre-stored voice information, and the pre-stored voice information is stored in the recognition library. If the sound changes next time the user still because of the same self factor or external factor, the output voice information to be verified can pass verification.

For example, if the voice of the user changes due to a cold, the voice information to be verified of the cold can be determined as the pre-stored voice information and stored in the recognition library, and if the voice information to be verified of the cold of the user is stored in the recognition library when the cold of the user occurs again next time, the voice information to be verified of the cold of the user can also pass verification. Therefore, the accuracy of subsequent voice verification can be improved by determining the voice information to be verified as the pre-stored voice information and storing the pre-stored voice information into the recognition library.

Optionally, if the relatives and friends of the administrator need to use the terminal device or the application program on the terminal device together with the administrator, the technical scheme of the embodiment of the application may be used to determine the voice information to be verified of the relatives and friends of the administrator as the pre-stored voice information, and store the pre-stored voice information in the recognition library. Illustratively, the authentication can be performed by acquiring the voice information to be authenticated of other people (such as relatives and friends), and if the voice information to be authenticated of other people is different from the standard voice information of the user, the authentication is determined to fail, and the secondary authentication is started correspondingly. At this time, the user (administrator himself) can input the secondary verification information, after the secondary verification is passed, the voice information to be verified of other people can be stored to the recognition library as the pre-stored voice information, and then other people can pass the voice verification of the terminal device or the application program on the terminal device through their own voice. By the embodiment of the application, the use efficiency of voice recognition can be improved, and the function of voice verification can be expanded.

Fig. 2 is a schematic flow chart of another speech processing method provided in the embodiment of the present application, which is optimized based on the technical solution provided in the foregoing embodiment, and optionally, as shown in fig. 2, the method includes:

and S120, acquiring the voice information to be verified, and verifying the voice information to be verified.

S121, if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except voice verification.

And S122, receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information.

And S123, if the user passes the secondary verification, determining the voice information to be verified as pre-stored voice information, and storing the pre-stored voice information in an identification library.

For the above-mentioned specific implementation of the operations, reference may be made to the above-mentioned related description, and further description is omitted here.

S124, acquiring second voice information, and verifying the second voice information according to the pre-stored voice information in the recognition library; wherein, the pre-stored voice information is at least one.

The second voice information may be voice information to be authenticated, which is output by a user in subsequent voice authentication. And verifying the second voice message according to the pre-stored voice message in the recognition library, where the pre-stored voice message may be the voice message to be verified determined as the pre-stored voice message in the above embodiment, or may be other pre-stored voice messages stored in the recognition library. Illustratively, the other pre-stored voice information may be standard voice information recorded by the administrator of the terminal device at the earliest time of use or when voice verification is started. Therefore, the pre-stored voice information in the recognition library is at least one, namely at least including standard voice information recorded by the administrator at the earliest using or starting voice verification. If the pre-stored voice message is more than one, the voice message to be verified determined to be the pre-stored voice message as described in the above embodiment may be included.

For example, the pre-stored voice information in the recognition library may also include only the to-be-verified voice information determined as the pre-stored voice information in the above embodiment; and the recorded standard voice information can be stored in other storage modules when the administrator himself uses or starts voice verification at the earliest time. As long as the identification library includes the to-be-verified voice information determined as the pre-stored voice information in the above embodiment, comparison materials for voice verification can be added, and accuracy of voice verification is improved.

Optionally, the verifying the second voice information according to the voice information in the recognition library may be performed by:

s1240, comparing the second voice message with the pre-stored voice message.

S1241, if the comparison result of any one of the pre-stored voice information and the second voice information meets a preset condition, determining that the second voice information passes verification.

The comparing the second voice information with the pre-stored voice information may be comparing the feature information of the second voice information with the feature information of the pre-stored voice information, and the preset condition may be that the comparison error value is smaller than a set error value. And if the error value between the first voice message and the second voice message is smaller than the set error value, determining that the two voice messages of the second voice message and the pre-stored voice message are successfully matched, and determining that the second voice message passes verification.

And if the comparison result of any one of the pre-stored voice information and the second voice information meets the preset condition, the second voice information can be determined to pass the verification. If the number of the pre-stored voice information stored in the recognition library is more than one, the recognition library certainly comprises the voice information to be verified which is determined to be the pre-stored voice information, and can also comprise standard voice information recorded by an administrator at the earliest use or when the voice verification is started. Therefore, as long as the comparison result of any one of the pre-stored voice information and the second voice information meets the preset condition, the user outputting the second voice information can be determined to be the administrator of the terminal device, and the user can pass the voice verification.

Through with the second speech information with prestore speech information compares, if arbitrary prestore speech information with the comparison result of second speech information accords with preset conditions, confirm that second speech information verifies and passes, can make the user carry out the speech verification, compare the second speech information of user output with different comparison materials, can improve the accuracy of speech verification.

Fig. 3 is a schematic flow chart of another speech processing method provided in an embodiment of the present application, where based on the technical solution provided in any of the embodiments, an operation of determining, if the user passes the secondary authentication, that the speech information to be authenticated is pre-stored speech information and storing the pre-stored speech information in a recognition library is optimized, and optionally, as shown in fig. 3, the method includes:

s130, obtaining the voice information to be verified, and verifying the voice information to be verified.

S131, if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except voice verification.

S132, receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information.

S133, determining the voice information to be verified as pre-stored voice information.

S134, preprocessing the pre-stored voice information to obtain characteristic parameters of the pre-stored voice information; wherein the characteristic parameters comprise characteristic parameters which embody sound characteristics.

Optionally, the pre-processing process may include: performing frame division processing on the pre-stored voice information to obtain at least one voice frame; alternatively, the frame length of each speech frame may be any value from 20ms to 50 ms. Because the voice information is an unstable signal and the signal change in the voice information is generally very quick, the voice information is subjected to framing processing, the obtained voice frame has a sufficiently long period, the signal in the voice frame does not change violently, and the efficiency of subsequent processing can be improved.

The processing of extracting the characteristic parameters is performed separately for each speech frame, and optionally, the characteristic parameters may include parameters such as a centroid, a root mean square, and a Mel Frequency Cepstrum Coefficient (Mel Frequency Cepstrum Coefficient). The centroid can represent the basic frequency band of the signal in the voice frame, the root mean square can represent the signal strength of the voice frame, the Mel cepstrum coefficient can simulate the perception of human ears on voices with different frequencies, and the voice characteristics of human voices can be effectively represented. The characteristic parameters may also include a sound energy value, a pitch frequency, a resonance peak, and the like, which may characterize the sound.

And S135, storing the characteristic parameters of the pre-stored voice information into a recognition library.

Since the size of the voice information is generally large, the size of a segment of voice information about 10s is generally ten or more kilobytes (KB, Ki byte), and if the pre-stored voice information is directly stored in the recognition library, it may occupy too much storage space. The pre-stored voice information is generally used for calculation processing in the background, and a user does not need to hear the pre-stored voice information again, so that the characteristic parameters of the pre-stored voice information can be stored in the recognition library, the characteristic parameters which can reflect the sound characteristics in the voice information can be stored, the subsequent calculation processing can be realized, and the small storage space is occupied.

Fig. 4 is a schematic flow chart of another speech processing method provided in the embodiment of the present application, where on the basis of the technical solution provided in the embodiment, the operation of verifying the second speech information according to the speech information in the recognition library is optimized, and optionally, as shown in fig. 4, the method includes:

s140, voice information to be verified is obtained, and verification is carried out on the voice information to be verified.

S141, if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except voice verification.

S142, receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information.

S143, determining the voice information to be verified as pre-stored voice information.

S144, preprocessing the pre-stored voice information to obtain characteristic parameters of the pre-stored voice information; wherein the characteristic parameters comprise characteristic parameters which embody sound characteristics.

S145, storing the characteristic parameters of the pre-stored voice information to a recognition library.

S146, second voice information is obtained, and the second voice information is preprocessed to obtain characteristic parameters of the second voice information.

Wherein, the characteristic parameters can comprise characteristic parameters which embody sound characteristics. Parameters that may include centroid, root mean square, and Mel-Frequency Cepstrum Coefficient (Mel Frequency Cepstrum Coefficient), etc. And may also include sound energy value, fundamental tone frequency, resonance peak value and other characteristic parameters capable of representing sound characteristics. Reference may be made to the above description for specific embodiments, which are not repeated herein.

And S147, calculating the Euclidean distance between the characteristic parameters of the second voice message and the characteristic parameters of the pre-stored voice message.

S148, if the Euclidean distance between the characteristic parameter of any pre-stored voice message and the characteristic parameter of the second voice message is smaller than a preset threshold value, determining that the second voice message passes verification.

Wherein, the Euclidean Distance (Euclidean Distance) refers to a real Distance between two points in an N-dimensional space or a natural length of a vector formed by the two points, and N may be any natural number greater than 0. Illustratively, if the feature parameters include a centroid, a root mean square and mel-frequency cepstrum coefficients, calculating a natural length, that is, a euclidean distance, of a vector formed by two points (a1, b1, c1) of the second speech information and (a2, b2, c2) of the pre-stored speech information in a three-dimensional space formed by the centroid, the root mean square and the mel-frequency cepstrum coefficients; wherein a1 and a2 are centroids, b1 and b2 are root mean square, and c1 and c2 are mel-frequency cepstrum coefficients. If the Euclidean distance between the characteristic parameter of the pre-stored voice message and the characteristic parameter of the second voice message is smaller than the preset threshold value, the pre-stored voice message and the second voice message can be determined to be relatively close and matched with each other, and further the second voice message can be determined to pass verification.

By calculating the Euclidean distance between the characteristic parameter of the second voice message and the characteristic parameter of the pre-stored voice message, if the Euclidean distance between the characteristic parameter of any one of the pre-stored voice messages and the characteristic parameter of the second voice message is smaller than a preset threshold value, the second voice message is determined to pass verification, the characteristic parameter which embodies the sound characteristics in the two voice messages can be compared, and the comparison efficiency can be effectively improved.

Fig. 5 is a schematic flow chart of another speech processing method provided in an embodiment of the present application, which is optimized based on the technical solution provided in any of the above embodiments, and optionally, as shown in fig. 5, the method includes:

s150, obtaining the voice information to be verified, and verifying the voice information to be verified.

S151, if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except voice verification.

S152, receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information.

S153, if the user passes the secondary verification, obtaining failure category information corresponding to the voice information to be verified.

The failure category information may be a plurality of category information pre-stored in the system, and the obtaining of the failure category information corresponding to the voice information to be verified may be sending the plurality of category information pre-stored in the system to the user, and determining one or more category information as the failure category information according to a selection of the user. The failure category information may also be custom content entered by the user. Illustratively, the plurality of category information pre-stored by the system may include cold, hoarse voice, inflammation in throat, etc., and the user may also input the custom contents "cold", "hoarse voice", and "relatives and friends" as the failure category information.

S154, determining the voice information to be verified as pre-stored voice information, and storing the pre-stored voice information and the failure category information into an identification library.

If the voice information to be verified input by the user fails the verification in operation S150, but the secondary verification information input by the user passes the secondary verification, the voice information to be verified that fails the verification may be determined to be the administrator himself or authorized by the administrator, and the reason why the voice information to be verified fails the verification may be determined by obtaining the failure category information corresponding to the voice information to be verified. And the failure category information and the voice information to be verified determined as the pre-stored voice information are stored in the recognition library, so that corresponding service information can be provided for the user when the voice verification is carried out on the subsequent user.

S155, comparing the second voice message with the pre-stored voice message.

And S156, if the comparison result of any one of the pre-stored voice messages and the second voice message meets the preset condition, determining that the second voice message passes the verification.

And S157, acquiring failure category information corresponding to target voice information from the recognition library, wherein the target voice information is preset voice information corresponding to a comparison result meeting preset conditions.

If the second voice information passes the verification, determining the failure category information corresponding to the target voice information of which the comparison result accords with the preset condition with the second voice information, and determining the state of the user when the second voice information is output. Exemplarily, if the comparison result of the second voice information input by the user and the target voice information meets the preset condition, and the failure category information corresponding to the target voice information is a cold, it may be determined that the user is likely to catch the cold at this time; corresponding service information can be provided for the user according to the acquired failure category information, for example, information such as warm reminding information 'drink more hot water' can be pushed to the user when the acquired failure category information is a cold. The failure category information can be transmitted to the health application program on the terminal device, the health application program can record the health state of the user according to the failure category information and provide corresponding service information for the user, and the terminal device can be made more intelligent.

The operation after the failure category information is acquired may be set according to the requirement of the actual application, and the embodiment of the present application is not limited herein.

Fig. 6 is a schematic flow chart of another speech processing method provided in an embodiment of the present application, which is optimized based on the technical solution provided in any of the above embodiments, and optionally, as shown in fig. 6, the method includes:

and S160, acquiring the voice information to be verified, and verifying the voice information to be verified.

S161, if the verification fails, starting secondary verification; wherein, the secondary verification comprises verification modes except voice verification.

And S162, receiving secondary verification information input by a user, and judging whether the user passes the secondary verification according to the secondary verification information.

And S163, after the user passes the secondary verification, acquiring failure category information corresponding to the voice information to be verified.

And S164, sending the voice information to be verified and the failure category information to a background server.

The voice information to be verified and the failure category information can be uploaded to a background server, background research and development staff can analyze and research the voice information to be verified according to the failure category information, and research and development staff can be assisted in research and development and management of voice corresponding to the failure category information.

Fig. 7 is a block diagram of a speech processing apparatus according to an embodiment of the present application, where the speech processing apparatus may execute a speech processing method, and as shown in fig. 7, the speech processing apparatus includes:

the voice verification module 210 is configured to acquire voice information to be verified, and verify the voice information to be verified;

a secondary starting module 211, configured to start secondary verification when verification fails; wherein, the secondary verification comprises verification modes except for voice verification;

a secondary verification module 212, configured to receive secondary verification information input by a user, and determine whether the user passes the secondary verification according to the secondary verification information;

and the voice storage module 213 is configured to determine the voice information to be verified as pre-stored voice information when the user passes the secondary verification, and store the pre-stored voice information in the recognition library.

By the technical scheme provided by the embodiment of the application, when the voice information verification of the user fails, whether the voice information which fails in verification is stored in the recognition library or not can be determined by judging the identity of the user, and the accuracy of the voice verification can be improved when the subsequent user uses the voice verification again.

Optionally, the method further comprises:

the second voice verification module is used for obtaining second voice information after the voice information to be verified is determined to be pre-stored voice information and is stored in the recognition library, and verifying the second voice information according to the pre-stored voice information in the recognition library; wherein, the pre-stored voice information is at least one.

Optionally, the secondary verification module comprises:

the comparison unit is used for comparing the second voice information with the pre-stored voice information;

and the verification unit is used for determining that the second voice information passes verification when the comparison result of any one of the pre-stored voice information and the second voice information meets the preset condition.

Optionally, the voice storage module comprises:

the determining unit is used for determining the voice information to be verified as pre-stored voice information;

the preprocessing unit is used for preprocessing the pre-stored voice information to obtain the characteristic parameters of the pre-stored voice information; wherein the characteristic parameters comprise characteristic parameters which embody sound characteristics;

and the storage unit is used for storing the characteristic parameters of the pre-stored voice information to an identification library.

Optionally, the alignment unit comprises:

the preprocessing subunit is configured to preprocess the second voice information to obtain a feature parameter of the second voice information;

the calculation subunit is used for calculating the Euclidean distance between the characteristic parameter of the second voice message and the characteristic parameter of the pre-stored voice message;

correspondingly, the verification unit is specifically configured to:

and when the Euclidean distance between the characteristic parameter of any pre-stored voice message and the characteristic parameter of the second voice message is smaller than a preset threshold value, determining that the second voice message passes verification.

Optionally, the method further comprises:

the category information acquisition module is used for acquiring failure category information corresponding to the voice information to be verified after the user passes the secondary verification;

correspondingly, the voice storage module is specifically configured to:

determining the voice information to be verified as pre-stored voice information, and storing the pre-stored voice information and the failure category information into an identification library;

correspondingly, the method further comprises the following steps:

and the category information determining module is used for acquiring failure category information corresponding to the target voice information from the recognition library after the second voice information is determined to pass the verification, wherein the target voice information is preset voice information corresponding to the comparison result meeting the preset condition.

Optionally, the method further comprises:

and the background sending module is used for sending the voice information to be verified and the failure category information to a background server.

The storage medium containing the computer-executable instructions provided by the embodiments of the present application is not limited to the voice processing operation described above, and may also perform related operations in the voice processing method provided by any embodiments of the present application.

Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method of speech processing, the method comprising:

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDRRAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

The embodiment of the application provides a terminal device, and the voice processing device provided by the embodiment of the application can be integrated in the terminal device.

Fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application, where the terminal device according to the embodiment of the present application includes a memory 31, a processor 32, and a computer program stored in the memory 31 and executable by the processor, and the processor implements the voice processing method according to the above embodiment when executing the computer program. The terminal equipment provided by the embodiment of the application can determine whether to store the voice information which fails to be verified into the recognition base by judging the identity of the user when the voice information verification of the user fails, and can improve the accuracy of the voice verification when the subsequent user reuses the voice verification.

Fig. 9 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 9, the terminal device may include: a casing (not shown), a touch screen (not shown), touch keys (not shown), a memory 301, a Central Processing Unit (CPU) 302 (also called a processor, hereinafter referred to as CPU), a circuit board (not shown), and a power circuit (not shown). The circuit board is arranged in a space enclosed by the shell; the CPU302 and the memory 301 are disposed on the circuit board; the power supply circuit is used for supplying power to each circuit or device of the terminal equipment; the memory 301 is used for storing executable program codes; the CPU302 executes a computer program corresponding to the executable program code by reading the executable program code stored in the memory 301 to implement the steps of:

The terminal device further includes: peripheral interface 303, RF (Radio Frequency) circuitry 305, audio circuitry 306, speakers 311, power management chip 308, input/output (I/O) subsystems 309, touch screen 312, other input/control devices 310, and external ports 304, which communicate via one or more communication buses or signal lines 307.

It should be understood that the illustrated terminal device 300 is only one example of a terminal device, and that the terminal device 300 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The following describes in detail the terminal device for implementing voice processing provided in this embodiment, where the terminal device is a mobile phone as an example.

A memory 301, the memory 301 being accessible by a CPU302, a peripheral interface 303, or the like, the memory 301 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other volatile solid state storage devices.

A peripheral interface 303, said peripheral interface 303 being capable of connecting input and output peripherals of the device to the CPU302 and the memory 301.

I/O subsystem 309, the I/O subsystem 309 may connect input and output peripherals on the device, such as touch screen 312 and other input/control devices 310, to the peripheral interface 303. The I/O subsystem 309 may include a display controller 3091 and one or more input controllers 3092 for controlling other input/control devices 310. Where one or more input controllers 3092 receive electrical signals from or send electrical signals to other input/control devices 310, the other input/control devices 310 may include physical buttons (push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click wheels. It is noted that the input controller 3092 may be connected to any of the following: a keyboard, an infrared port, a USB interface, and a pointing device such as a mouse.

A touch screen 312, which touch screen 312 is an input interface and an output interface between the user terminal device and the user, displays visual output to the user, which may include graphics, text, icons, video, and the like.

The display controller 3091 in the I/O subsystem 309 receives electrical signals from the touch screen 312 or transmits electrical signals to the touch screen 312. The touch screen 312 detects a contact on the touch screen, and the display controller 3091 converts the detected contact into an interaction with a user interface object displayed on the touch screen 312, i.e., implements a human-machine interaction, and the user interface object displayed on the touch screen 312 may be an icon for running a game, an icon networked to a corresponding network, or the like. It is worth mentioning that the device may also comprise a light mouse, which is a touch sensitive surface that does not show visual output, or an extension of the touch sensitive surface formed by the touch screen.

The RF circuit 305 is mainly used to establish communication between the mobile phone and the wireless network (i.e., the network side), and implement data reception and transmission between the mobile phone and the wireless network. Such as sending and receiving short messages, e-mails, etc. In particular, the RF circuitry 305 receives and transmits RF signals, also referred to as electromagnetic signals, through which the RF circuitry 305 converts electrical signals to or from electromagnetic signals and communicates with communication networks and other devices. RF circuitry 305 may include known circuitry for performing these functions including, but not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC (CODEC) chipset, a Subscriber Identity Module (SIM), and so forth.

The audio circuit 306 is mainly used to receive audio data from the peripheral interface 303, convert the audio data into an electric signal, and transmit the electric signal to the speaker 311.

And a speaker 311 for converting the voice signal received by the handset from the wireless network through the RF circuit 305 into sound and playing the sound to the user.

And the power management chip 308 is used for supplying power and managing power to the hardware connected with the CPU302, the I/O subsystem, and the peripheral interface.

The terminal equipment provided by the embodiment of the application can improve the accuracy of voice verification.

The voice processing device, the storage medium and the terminal device provided in the above embodiments may execute the voice processing method provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. For technical details that are not described in detail in the above embodiments, reference may be made to a speech processing method provided in any embodiment of the present application.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A method of speech processing, comprising:

if the user passes the secondary verification, determining the voice information to be verified as pre-stored voice information, and storing the pre-stored voice information in an identification library;

acquiring second voice information, and verifying the second voice information according to the pre-stored voice information in the recognition library; wherein, the pre-stored voice information is at least one;

and if the second voice information passes the verification, determining failure category information corresponding to target voice information of which the comparison result accords with preset conditions with the second voice information, and providing corresponding service information for the user according to the acquired failure category information.

2. The method of claim 1, wherein verifying the second speech information based on the speech information in the recognition repository comprises:

comparing the second voice information with the pre-stored voice information;

and if the comparison result of any one of the pre-stored voice information and the second voice information meets a preset condition, determining that the second voice information passes verification.

3. The method of claim 2, wherein the determining the voice information to be verified as pre-stored voice information and storing the pre-stored voice information to a recognition library comprises:

determining the voice information to be verified as pre-stored voice information;

preprocessing the pre-stored voice information to obtain characteristic parameters of the pre-stored voice information; wherein the characteristic parameters comprise characteristic parameters which embody sound characteristics;

and storing the characteristic parameters of the pre-stored voice information into an identification library.

4. The method of claim 3, wherein comparing the second voice message to the pre-stored voice messages comprises:

preprocessing the second voice information to obtain a characteristic parameter of the second voice information;

calculating the Euclidean distance between the characteristic parameter of the second voice message and the characteristic parameter of the pre-stored voice message;

correspondingly, if the comparison result of any one of the pre-stored voice messages and the second voice message meets the preset condition, determining that the second voice message passes the verification, including:

and if the Euclidean distance between the characteristic parameter of any pre-stored voice message and the characteristic parameter of the second voice message is smaller than a preset threshold value, determining that the second voice message passes verification.

5. The method of any of claims 2 to 4, wherein after the user passes the secondary authentication, further comprising:

acquiring failure category information corresponding to the voice information to be verified;

correspondingly, the determining the voice information to be verified as pre-stored voice information and storing the pre-stored voice information in a recognition library includes:

and determining the voice information to be verified as pre-stored voice information, and storing the pre-stored voice information and the failure category information into an identification library.

6. The method of any of claims 1 to 4, wherein after the user passes the secondary authentication, further comprising:

and sending the voice information to be verified and the failure category information to a background server.

7. A speech processing apparatus, comprising:

the voice storage module is used for determining the voice information to be verified as pre-stored voice information when the user passes the secondary verification and storing the pre-stored voice information into an identification library;

the second voice verification module is used for obtaining second voice information after the voice information to be verified is determined to be pre-stored voice information and is stored in the recognition library, and verifying the second voice information according to the pre-stored voice information in the recognition library; wherein, the pre-stored voice information is at least one;

and the category information determining module is used for determining failure category information corresponding to target voice information of which the comparison result accords with preset conditions with the second voice information after the second voice information is determined to pass the verification, and providing corresponding service information for the user according to the obtained failure category information.

8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the speech processing method according to any one of claims 1 to 6.

9. A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the speech processing method according to any of claims 1-6 when executing the computer program.