CN105957524B

CN105957524B - Voice processing method and device

Info

Publication number: CN105957524B
Application number: CN201610264283.XA
Authority: CN
Inventors: 李霄寒; 田伟
Original assignee: Beijing Yunzhisheng Information Technology Co Ltd
Current assignee: Unisound Intelligent Technology Co Ltd
Priority date: 2016-04-25
Filing date: 2016-04-25
Publication date: 2020-03-31
Anticipated expiration: 2036-04-25
Also published as: CN105957524A

Abstract

The invention relates to a voice processing method and a voice processing device, wherein the method comprises the following steps: receiving voice information input by a user; recognizing the voice information to obtain a first character corresponding to the pre-operation item and a second character corresponding to the item content; determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information; determining a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value; when the second energy index parameter is larger than the first energy index parameter and the second energy index parameter is larger than the preset energy index parameter, searching target item content matched with the second characters in the item content database to be selected; filling the target entry content into the corresponding entry content table. Through the technical scheme, the success rate and the accuracy rate of semantic analysis can be improved on the basis of ensuring the accuracy rate of voice processing, so that the use experience of a user is improved.

Description

Voice processing method and device

Technical Field

The present invention relates to the field of speech recognition technologies, and in particular, to a speech processing method and apparatus.

Background

In the speech processing, the semantic understanding depends on the effect of speech recognition. If the speech recognition effect is poor, the semantic analysis effect is affected. For example, for the entry template of FIG. 1, the user wants to automatically select by dictating "execute department radiology department," and the selection control is as shown in FIG. 2. From the perspective of the semantic analyzer, the 'execution department' is a pre-operation item, the 'radiology department' is item content, and the semantic analyzer can distinguish through a dictionary template and then convert into an execution command. However, in the speech recognition process, if a word in the whole sentence is not recognized well, the semantic analysis will fail. For example, the text is recognized as "department radiology", one less "line" word. In this way, the entire recognition process may fail, resulting in being unable to perform corresponding operations according to the user's voice input, thereby affecting the user experience.

Disclosure of Invention

The embodiment of the invention provides a voice processing method and a voice processing device, which are used for improving the success rate and the accuracy rate of semantic analysis on the basis of ensuring the accuracy rate of voice processing, so that the use experience of a user is improved.

According to a first aspect of the embodiments of the present invention, there is provided a speech processing method, including:

receiving voice information input by a user, wherein the voice information comprises first voice information corresponding to a pre-operation item and second voice information corresponding to item content;

recognizing the voice information to obtain a first character corresponding to the pre-operation item and a second character corresponding to the item content;

determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;

determining a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value;

when the second energy index parameter is larger than the first energy index parameter and the second energy index parameter is larger than a preset energy index parameter, searching target item content matched with the second character in an item content database to be selected;

and filling the target entry content into the corresponding entry content table.

In this embodiment, after the voice information including the first voice information corresponding to the pre-operation entry and the second voice information corresponding to the entry content is identified, the volume values of the two voice information are respectively determined, and then the energy index parameter is determined according to the volume values of the two voice information, when the energy index parameter of the second voice information corresponding to the entry content is greater than the energy index parameter of the first voice information and greater than the preset index parameter, the text corresponding to the entry content is preferentially matched with the entry content to be selected in the entry content database to be selected, so that the target entry content matched with the text is filled in the corresponding entry content table. Therefore, a user can fill in a form through voice input without manual selection, and the user is allowed to use different volumes for different vocabularies, so that different energy index parameters are determined according to different volumes, whether the operation of filling in an entry content form is executed is determined according to the energy index parameters, the problem that the form cannot be filled after errors occur in voice recognition is avoided, the success rate and the accuracy of semantic analysis are improved on the basis of ensuring the accuracy of voice processing, and the success rate and the user experience of filling in the form through voice input are also improved.

In one embodiment, the determining a first energy indicator parameter corresponding to the first volume value and a second energy indicator parameter corresponding to the second volume value includes:

acquiring a corresponding relation between a volume value interval and an energy index parameter, wherein the volume value interval and the energy index parameter are in positive correlation;

determining a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;

and determining a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval according to the corresponding relation between the volume value interval and the energy index parameters.

In this embodiment, a corresponding relationship between the volume value interval and the energy index parameter may be preset, so that a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval are determined according to the corresponding relationship between the volume value interval and the energy index parameter. Specifically, the volume value interval and the energy index parameter may be in positive correlation, that is, the larger the value of the volume value interval is, the larger the energy index parameter is, and the smaller the value of the volume value interval is, the smaller the energy index parameter is. And if the energy index parameter is large, the probability that the characters corresponding to the item content are preferentially matched is higher. Therefore, the energy index parameter is added on the basis of the original voice recognition technology, the success rate and the accuracy rate of semantic analysis can be improved on the basis of ensuring the accuracy rate of voice processing, and the success rate and the user experience degree of filling in a form by using voice input of a user are also improved.

In one embodiment, searching the content of the target item matched with the second text in the content database of the item to be selected includes:

calculating a first similarity between the second characters and each item content to be selected in the item content database to be selected;

and determining the content of the item to be selected with the highest first similarity as the target item content.

In the embodiment, the first similarity between the second character and each item content to be selected in the item content database to be selected is calculated, so that the item content to be selected with the highest similarity is determined as the target item content, the accuracy of voice recognition is ensured, the problem that semantic analysis cannot be performed due to the fact that a voice recognition result is wrong is avoided, and the success rate and the accuracy of the semantic analysis are improved.

In one embodiment, the filling out the target entry content into the corresponding entry content table includes:

determining a target operation item corresponding to the target item content;

calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;

and when the second similarity is greater than or equal to the preset similarity, filling the target entry content into an entry content table corresponding to the target operation entry.

In this embodiment, before filling the target entry content in the corresponding entry content table, the target operation entry corresponding to the target entry content may be determined, and then similarity calculation may be performed on the first text corresponding to the target operation entry and the pre-operation entry, and if the similarity between the target operation entry and the pre-operation entry is greater than the preset similarity, it is indicated that the target operation entry and the pre-operation entry are matched, that is, the pre-operation entry of the user is the target operation entry, so that accuracy of the semantic analysis result is further ensured.

In one embodiment, before determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message, the method further includes:

calculating the confidence of the voice information;

judging whether the confidence coefficient is smaller than a preset confidence coefficient;

and when the confidence coefficient is smaller than a preset confidence coefficient, executing the step of determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message.

In this embodiment, the confidence level of the voice information may be calculated first, and if the confidence level of the voice information is greater than or equal to the preset confidence level, it indicates that the confidence level is high, and semantic analysis may be successfully performed, a voice analysis scheme in the existing related art may be used to perform semantic analysis, and if the confidence level of the voice information is less than the preset confidence level, it indicates that the confidence level is low, and semantic analysis may fail, and at this time, the scheme of performing semantic analysis according to the difference in volume of the present invention may be used.

According to a second aspect of the embodiments of the present invention, there is provided a speech processing apparatus including:

the receiving module is used for receiving voice information input by a user, wherein the voice information comprises first voice information corresponding to a pre-operation item and second voice information corresponding to item content;

the recognition module is used for recognizing the voice information to obtain a first character corresponding to the pre-operation entry and a second character corresponding to the entry content;

the first determining module is used for determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;

a second determining module, configured to determine a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value;

the searching module is used for searching the target item content matched with the second character in the item content database to be selected when the second energy index parameter is greater than the first energy index parameter and the second energy index parameter is greater than a preset energy index parameter;

and the filling module is used for filling the target entry content into the corresponding entry content table.

In one embodiment, the second determining module comprises:

the obtaining submodule is used for obtaining a corresponding relation between a volume value interval and an energy index parameter, wherein the volume value interval and the energy index parameter are in positive correlation;

the first determining submodule is used for determining a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;

and the second determining submodule is used for determining a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval according to the corresponding relation between the volume value interval and the energy index parameter.

In one embodiment, the lookup module comprises:

the first calculation submodule is used for calculating a first similarity between the second characters and each item content to be selected in the item content database to be selected;

and the content determining submodule is used for determining the item content to be selected with the highest first similarity as the target item content.

In one embodiment, the filling module includes:

the item determining submodule is used for determining a target operation item corresponding to the target item content;

the second calculation submodule is used for calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;

and the filling sub-module is used for filling the target item content into the item content table corresponding to the target operation item when the second similarity is greater than or equal to the preset similarity.

In one embodiment, the apparatus further comprises:

the calculation module is used for calculating the confidence of the voice information before determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information;

the judging module is used for judging whether the confidence coefficient is smaller than the preset confidence coefficient;

and the triggering module is used for triggering the second determining module to determine a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information when the confidence coefficient is smaller than a preset confidence coefficient.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic diagram of an entry template in the related art.

Fig. 2 is a diagram illustrating item content options in the related art.

FIG. 3 is a flow diagram illustrating a method of speech processing according to an example embodiment.

Fig. 4 is a flowchart illustrating step S304 in a voice processing method according to an exemplary embodiment.

Fig. 5 is a flowchart illustrating step S305 in a voice processing method according to an exemplary embodiment.

Fig. 6 is a flowchart illustrating step S306 in a voice processing method according to an exemplary embodiment.

FIG. 7 is a flow diagram illustrating another method of speech processing according to an example embodiment.

FIG. 8 is a block diagram illustrating a speech processing apparatus according to an example embodiment.

FIG. 9 is a block diagram illustrating a second determination module in a speech processing apparatus according to an example embodiment.

FIG. 10 is a block diagram illustrating a lookup module in a speech processing device according to an example embodiment.

FIG. 11 is a block diagram illustrating a fill module in a speech processing device according to an example embodiment.

FIG. 12 is a block diagram illustrating another speech processing apparatus according to an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

FIG. 3 is a flow diagram illustrating a method of speech processing according to an example embodiment. The voice awakening method is applied to terminal equipment which can be any equipment with a voice control function, such as a mobile phone, a computer, a digital broadcast terminal, a message transceiver, a game console, a tablet equipment, a medical equipment, a body-building equipment, a personal digital assistant and the like. As shown in fig. 3, the method comprises steps S301-S306:

in step S301, receiving voice information input by a user, where the voice information includes first voice information corresponding to a pre-operation item and second voice information corresponding to an item content;

in some tables, there are operation entries and entry contents, for example, when a score sheet is filled, the operation entries include names, characters, scores and the like, and the specific zhang san, maid and 90 cents are corresponding entry contents. For another example, the examination category, the department performing, the execution time, and the like in fig. 1 all belong to the operation entry, and the corresponding plain film, radiology department, 3/8/2011, and the like all belong to the entry content. The user can input the operation item by voice when the user wants to perform voice operation on the operation item, and the operation item is the pre-operation item.

In step S302, voice information is identified to obtain a first text corresponding to the pre-operation entry and a second text corresponding to the entry content;

in step S303, a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message are determined;

for the convenience of semantic analysis, the user may use different volumes for different words to emphasize meaning in the semantics, for example, the user says "department radiology department", where the sound of three words of "radiology department" is relatively large, which indicates that the radiology department is the key point in the semantic analysis.

In step S304, a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value are determined;

the energy index parameters corresponding to different volume values are different, so that whether form filling is performed or not is determined according to the energy index parameters.

In step S305, when the second energy index parameter is greater than the first energy index parameter and the second energy index parameter is greater than the preset energy index parameter, searching a target item content matched with the second text in the item content database to be selected;

in step S306, the target entry content is filled in the corresponding entry content table.

In one embodiment, as shown in FIG. 4, the step S304 includes steps S401-S403:

in step S401, a corresponding relationship between a volume value interval and an energy index parameter is obtained, wherein the volume value interval and the energy index parameter are in positive correlation;

the corresponding relationship between the volume value interval and the energy index parameter can be preset, so that a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval are determined according to the corresponding relationship between the volume value interval and the energy index parameter. Specifically, the volume value interval and the energy index parameter may be in positive correlation, that is, the larger the value of the volume value interval is, the larger the energy index parameter is, and the smaller the value of the volume value interval is, the smaller the energy index parameter is.

For example, the volume value is represented by a decibel value, and in order to improve the success rate of filling a form in the voice of the user, the higher the decibel value of the voice is, the larger the energy index parameter is, and in this example, the corresponding relationship between the volume value interval and the energy index parameter is shown in table 1.

TABLE 1

Decibel value	Energy fingerTarget parameter
		0～20	1
21～30	2
		31～60	3
61～80	4

In step S402, a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs are determined;

in step S403, a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval are determined according to the correspondence between the volume value intervals and the energy index parameters.

In this embodiment, if the energy index parameter is large, the probability that the text corresponding to the entry content is preferentially matched is higher. Therefore, the energy index parameter is added on the basis of the original voice recognition technology, the success rate and the accuracy rate of semantic analysis can be improved on the basis of ensuring the accuracy rate of voice processing, and the success rate and the user experience degree of filling in a form by using voice input of a user are also improved.

In one embodiment, as shown in fig. 5, the step S305 may include steps S501-S502:

in step S501, a first similarity between the second text and each item content to be selected in the item content database to be selected is calculated;

in the content database of the items to be selected, there are many items to be selected, for example, for the medical class database, the content of the items to be selected may include executive departments, such as radiology department, influential department, general medicine department, digestive department, endocrinology department, etc., and may also include doctors, such as zhang san, li si, wang, etc., while for the other class database, such as student score class, it may include subjects, such as politics, history, geography, etc.

In step S502, the item content to be selected with the highest first similarity is determined as the target item content.

In one embodiment, as shown in FIG. 6, the step S306 includes steps S601-S603:

in step S601, a target operation item corresponding to the content of the target item is determined;

in the content database of the items to be selected, the contents of the items to be selected and the operation items to which the items belong should be correspondingly stored, so that the target operation items can be determined according to the contents of the target items.

In step S602, calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;

in step S603, when the second similarity is greater than or equal to the preset similarity, filling the target entry content in the entry content table corresponding to the target operation entry.

In one embodiment, as shown in fig. 7, before determining the first volume value corresponding to the first voice message and the second volume value corresponding to the second voice message, the method further includes steps S701-S703:

in step S701, a confidence of the voice information is calculated;

the confidence coefficient value is in the range of 0-1, and the confidence coefficient is used for evaluating the reliability of the voice recognition result, so that the higher the confidence coefficient is, the more accurate the voice recognition result is.

In step S702, it is determined whether the confidence is less than a preset confidence; the value of the preset confidence level threshold is in the range of 0-1.

In step S703, when the confidence is smaller than the preset confidence, a step of determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message is performed.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.

Fig. 8 is a block diagram illustrating a voice processing apparatus, which may be implemented as part or all of a terminal device by software, hardware, or a combination of both, according to an example embodiment. As shown in fig. 8, the speech processing apparatus includes:

the receiving module 81 is configured to receive voice information input by a user, where the voice information includes first voice information corresponding to a pre-operation entry and second voice information corresponding to an entry content;

the recognition module 82 is configured to recognize the voice information to obtain a first text corresponding to the pre-operation entry and a second text corresponding to the entry content;

a first determining module 83, configured to determine a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;

a second determining module 84, configured to determine a first energy indicator parameter corresponding to the first volume value and a second energy indicator parameter corresponding to the second volume value;

the searching module 85 is configured to search, when the second energy index parameter is greater than the first energy index parameter and the second energy index parameter is greater than a preset energy index parameter, a target entry content matched with the second text in an entry content database to be selected;

and a filling module 86, configured to fill the target entry content into the corresponding entry content table.

In one embodiment, as shown in fig. 9, the second determining module 84 includes:

the obtaining submodule 91 is configured to obtain a corresponding relationship between a volume value interval and an energy index parameter, where the volume value interval and the energy index parameter are in positive correlation;

a first determining submodule 92, configured to determine a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;

the second determining submodule 93 is configured to determine, according to a correspondence between a volume value interval and an energy index parameter, a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval.

As shown in fig. 10, in one embodiment, the lookup module 85 includes:

the first calculation submodule 101 is configured to calculate a first similarity between the second text and each item content to be selected in the item content database to be selected;

the content determining submodule 102 is configured to determine the content of the item to be selected with the highest first similarity as the target item content.

As shown in fig. 11, in one embodiment, the filling module 86 includes:

an item determining sub-module 111, configured to determine a target operation item corresponding to the target item content;

the second calculating submodule 112 is configured to calculate a second similarity between the first characters corresponding to the target operation entry and the pre-operation entry;

and the filling sub-module 113 is configured to fill the target entry content into the entry content table corresponding to the target operation entry when the second similarity is greater than or equal to a preset similarity.

As shown in fig. 12, in one embodiment, the apparatus further comprises:

a calculating module 121, configured to calculate a confidence of the voice information before determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information;

a judging module 122, configured to judge whether the confidence level is smaller than a preset confidence level;

the triggering module 123 is configured to trigger the second determining module to determine a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information when the confidence is smaller than a preset confidence.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method of speech processing, comprising:

filling the target entry content into a corresponding entry content table;

before determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message, the method further includes:

calculating the confidence of the voice information;

when the confidence coefficient is smaller than a preset confidence coefficient, executing the step of determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information;

the determining a first energy indicator parameter corresponding to the first volume value and a second energy indicator parameter corresponding to the second volume value includes:

2. The method of claim 1, wherein searching the database of contents of items to be selected for the contents of the target item matching the second word comprises:

3. The method of claim 1, wherein filling in the target entry content into a corresponding entry content table comprises:

determining a target operation item corresponding to the target item content;

4. A speech processing apparatus, comprising:

the filling module is used for filling the target entry content into the corresponding entry content table;

the device further comprises:

the triggering module is used for triggering the second determining module to determine a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information when the confidence coefficient is smaller than a preset confidence coefficient;

the second determining module includes:

5. The apparatus of claim 4, wherein the lookup module comprises:

6. The apparatus of claim 4, wherein the filling module comprises: