CN105895105B

CN105895105B - Voice processing method and device

Info

Publication number: CN105895105B
Application number: CN201610394300.1A
Authority: CN
Inventors: 黄宇
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2016-06-06
Filing date: 2016-06-06
Publication date: 2020-05-05
Anticipated expiration: 2036-06-06
Also published as: CN105895105A

Abstract

The invention relates to a voice processing method and a voice processing device, wherein the method comprises the following steps: receiving voice information input by a user; performing voiceprint recognition on the voice information, and determining the age of the user according to a recognition result; judging a target age range to which the age of the user belongs; determining a target speech processing model corresponding to the target age range; and processing the voice information by using the target voice processing model. Through this technical scheme, confirm user's age according to the speech information of user input, and then confirm the target speech processing module that corresponds according to user's age to use target speech processing model to handle speech information, like this, set up different speech processing models to different age bracket, carry out corresponding processing to the speech information of every age bracket, can be so that the treatment effect is better, improve speech processing's accuracy, promote user's use and experience.

Description

Voice processing method and device

Technical Field

The present invention relates to the field of speech processing technologies, and in particular, to a speech processing method and apparatus.

Background

Speech recognition is a cross discipline. In the last two decades, speech recognition technology has advanced significantly, starting to move from the laboratory to the market. It is expected that voice recognition technology will enter various fields such as industry, home appliances, communications, automotive electronics, medical care, home services, consumer electronics, etc. within the next 10 years. The application of speech recognition dictation machines in some fields is rated by the U.S. news community as one of ten major computer developments in 1997. Many experts consider the speech recognition technology to be one of the ten important technological development technologies in the information technology field between 2000 and 2010. The fields to which speech recognition technology relates include: signal processing, pattern recognition, probability and information theory, sound and hearing mechanisms, artificial intelligence, and the like.

Disclosure of Invention

The embodiment of the invention provides a voice processing method and a voice processing device, which are used for improving the success rate and the accuracy rate of semantic analysis on the basis of ensuring the accuracy rate of voice processing, so that the use experience of a user is improved.

According to a first aspect of the embodiments of the present invention, there is provided a speech processing method, including:

receiving voice information input by a user;

performing voiceprint recognition on the voice information, and determining the age of the user according to a recognition result;

judging a target age range to which the age of the user belongs;

determining a target speech processing model corresponding to the target age range;

and processing the voice information by using the target voice processing model.

In this embodiment, according to the age of the user determined by the voice information input by the user, and then according to the age of the user determined the corresponding target voice processing module, thereby using the target voice processing model to process the voice information, so as to set different voice processing models for different age groups, and to process the voice information of each age group in a targeted manner, so that the processing effect is better, the accuracy of the voice processing is improved, and the user experience is improved.

In one embodiment, the determining a target speech processing model corresponding to the target age range includes:

and determining a target voice processing model corresponding to the target age range according to the corresponding relation between the preset age range and the preset voice processing model.

In one embodiment, the age ranges include a first age range, a second age range, and a third age range, wherein the age in the first age range is greater than the age in the second age range, and the age in the second age range is greater than the age in the third age range, and the speech processing model corresponding to the first age range is a first speech processing model, the speech processing model corresponding to the second age range is a second speech processing model, and the speech processing model corresponding to the third age range is a third speech processing model.

In one embodiment, the first speech processing model includes a first speech model and a first semantic model, the second speech processing model includes a second speech model and a second semantic model, and the third speech processing model includes a third speech model.

In one embodiment, the age range is positively correlated with the degree of match of the corresponding speech processing model.

In this embodiment, different speech processing models may be used for processing of speech information of different ages, wherein the speech processing module includes a speech model and a semantic model, and the speech model may include an acoustic model and a language model. Specifically, the older the voice processing module, the higher the matching degree of the voice processing module can be, thereby ensuring the accuracy of the processing result.

For example, if the speech processing module of an adult requires a higher degree of exact match, the speech model and the semantic model may both adopt a model with a high degree of match.

The speech processing module of the child requires a high degree of fuzzy matching. For example, the acoustic model and the language model adopt a model with a higher matching degree, and the semantic model adopts a model with a medium matching degree.

The infant may only correspond to the acoustic model, only recognize sound, not text. The baby can not speak and only sound, so that the acoustic model can be adopted only, and the language and the semanteme are not recognized. And an acoustic model with a low degree of matching is used.

According to a second aspect of the embodiments of the present invention, there is provided a speech processing apparatus including:

the receiving module is used for receiving voice information input by a user;

the first determining module is used for carrying out voiceprint recognition on the voice information and determining the age of the user according to a recognition result;

the judging module is used for judging a target age range to which the age of the user belongs;

the second determining module is used for determining a target voice processing model corresponding to the target age range;

and the processing module is used for processing the voice information by using the target voice processing model.

In one embodiment, the second determination module is to:

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flow diagram illustrating a method of speech processing according to an example embodiment.

Fig. 2 is a flowchart illustrating step S104 in a voice processing method according to an exemplary embodiment.

FIG. 3 is a block diagram illustrating a speech processing apparatus according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

FIG. 1 is a flow diagram illustrating a method of speech processing according to an example embodiment. The voice processing method is applied to terminal equipment which can be any equipment with a voice processing function, such as a mobile phone, a computer, a digital broadcast terminal, a message transceiving equipment, a game console, a tablet equipment, a medical equipment, a body building equipment, a personal digital assistant and the like. As shown in fig. 3, the method comprises steps S101-S105:

in step S101, receiving voice information input by a user;

in step S102, performing voiceprint recognition on the voice information, and determining the age of the user according to a recognition result;

voiceprint (Voiceprint) is a spectrum of sound waves carrying verbal information displayed by an electro-acoustic apparatus. The generation of human language is a complex physiological and physical process between the human language center and the pronunciation organs, and the vocal print maps of any two people are different because the vocal organs used by a person in speaking, namely the tongue, the teeth, the larynx, the lung and the nasal cavity, are different greatly in size and shape. The speech acoustic characteristics of each person are both relatively stable and variable, not absolute, but invariant. The variation can come from physiology, pathology, psychology, simulation, camouflage and is also related to environmental interference. However, since the vocal organs of each person are different, it is possible to distinguish the voices of different persons or determine whether the voices are the same person in general.

By performing voiceprint recognition on the voice information, specific characteristics of the user, such as the age, the gender and the like of the user, can be recognized.

In step S103, a target age range to which the age of the user belongs is determined;

Wherein the first age range may be adult segments above 11 years, the second age range may be child segments between 3-10 years, and the third age range may be infant segments between 1-3 years. Therefore, different voice processing models are set for different age groups, and voice information of each age group is processed in a targeted mode, so that the processing effect is better.

In step S104, a target speech processing model corresponding to the target age range is determined;

in step S105, the speech information is processed using the target speech processing model.

As shown in fig. 2, in one embodiment, the step S104 includes the step S201:

in step S201, a target speech processing model corresponding to the target age range is determined according to a corresponding relationship between a preset age range and a preset speech processing model.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.

Fig. 3 is a block diagram illustrating a speech processing apparatus according to an exemplary embodiment, which may be implemented as part or all of a terminal device through software, hardware, or a combination of both. As shown in fig. 3, the speech processing apparatus includes:

a receiving module 31, configured to receive voice information input by a user;

a first determining module 32, configured to perform voiceprint recognition on the voice information, and determine an age of the user according to a recognition result;

a judging module 33, configured to judge a target age range to which the age of the user belongs;

a second determining module 34, configured to determine a target speech processing model corresponding to the target age range;

and the processing module 35 is configured to process the voice information by using the target voice processing model.

In one embodiment, the second determination module is to:

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of speech processing, comprising:

receiving voice information input by a user;

judging a target age range to which the age of the user belongs;

processing the voice information by using the target voice processing model;

the age range comprises a first age range, a second age range and a third age range, wherein the age in the first age range is larger than the age in the second age range, the age in the second age range is larger than the age in the third age range, the voice processing model corresponding to the first age range is a first voice processing model, the voice processing model corresponding to the second age range is a second voice processing model, and the voice processing model corresponding to the third age range is a third voice processing model;

the first voice processing model comprises a first voice model and a first semantic model, the second voice processing model comprises a second voice model and a second semantic model, and the third voice processing model comprises a third voice model;

the age range is positively correlated with the degree of match of the corresponding speech processing model.

2. The method of claim 1, wherein determining the target speech processing model corresponding to the target age range comprises:

3. A speech processing apparatus, comprising:

the receiving module is used for receiving voice information input by a user;

the processing module is used for processing the voice information by using the target voice processing model;

4. The apparatus of claim 3, wherein the second determining module is configured to: