CN105957524B - Voice processing method and device - Google Patents

Voice processing method and device Download PDF

Info

Publication number
CN105957524B
CN105957524B CN201610264283.XA CN201610264283A CN105957524B CN 105957524 B CN105957524 B CN 105957524B CN 201610264283 A CN201610264283 A CN 201610264283A CN 105957524 B CN105957524 B CN 105957524B
Authority
CN
China
Prior art keywords
volume value
index parameter
energy index
content
item
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610264283.XA
Other languages
Chinese (zh)
Other versions
CN105957524A (en
Inventor
李霄寒
田伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Original Assignee
Beijing Yunzhisheng Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunzhisheng Information Technology Co Ltd filed Critical Beijing Yunzhisheng Information Technology Co Ltd
Priority to CN201610264283.XA priority Critical patent/CN105957524B/en
Publication of CN105957524A publication Critical patent/CN105957524A/en
Application granted granted Critical
Publication of CN105957524B publication Critical patent/CN105957524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention relates to a voice processing method and a voice processing device, wherein the method comprises the following steps: receiving voice information input by a user; recognizing the voice information to obtain a first character corresponding to the pre-operation item and a second character corresponding to the item content; determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information; determining a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value; when the second energy index parameter is larger than the first energy index parameter and the second energy index parameter is larger than the preset energy index parameter, searching target item content matched with the second characters in the item content database to be selected; filling the target entry content into the corresponding entry content table. Through the technical scheme, the success rate and the accuracy rate of semantic analysis can be improved on the basis of ensuring the accuracy rate of voice processing, so that the use experience of a user is improved.

Description

Voice processing method and device
Technical Field
The present invention relates to the field of speech recognition technologies, and in particular, to a speech processing method and apparatus.
Background
In the speech processing, the semantic understanding depends on the effect of speech recognition. If the speech recognition effect is poor, the semantic analysis effect is affected. For example, for the entry template of FIG. 1, the user wants to automatically select by dictating "execute department radiology department," and the selection control is as shown in FIG. 2. From the perspective of the semantic analyzer, the 'execution department' is a pre-operation item, the 'radiology department' is item content, and the semantic analyzer can distinguish through a dictionary template and then convert into an execution command. However, in the speech recognition process, if a word in the whole sentence is not recognized well, the semantic analysis will fail. For example, the text is recognized as "department radiology", one less "line" word. In this way, the entire recognition process may fail, resulting in being unable to perform corresponding operations according to the user's voice input, thereby affecting the user experience.
Disclosure of Invention
The embodiment of the invention provides a voice processing method and a voice processing device, which are used for improving the success rate and the accuracy rate of semantic analysis on the basis of ensuring the accuracy rate of voice processing, so that the use experience of a user is improved.
According to a first aspect of the embodiments of the present invention, there is provided a speech processing method, including:
receiving voice information input by a user, wherein the voice information comprises first voice information corresponding to a pre-operation item and second voice information corresponding to item content;
recognizing the voice information to obtain a first character corresponding to the pre-operation item and a second character corresponding to the item content;
determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;
determining a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value;
when the second energy index parameter is larger than the first energy index parameter and the second energy index parameter is larger than a preset energy index parameter, searching target item content matched with the second character in an item content database to be selected;
and filling the target entry content into the corresponding entry content table.
In this embodiment, after the voice information including the first voice information corresponding to the pre-operation entry and the second voice information corresponding to the entry content is identified, the volume values of the two voice information are respectively determined, and then the energy index parameter is determined according to the volume values of the two voice information, when the energy index parameter of the second voice information corresponding to the entry content is greater than the energy index parameter of the first voice information and greater than the preset index parameter, the text corresponding to the entry content is preferentially matched with the entry content to be selected in the entry content database to be selected, so that the target entry content matched with the text is filled in the corresponding entry content table. Therefore, a user can fill in a form through voice input without manual selection, and the user is allowed to use different volumes for different vocabularies, so that different energy index parameters are determined according to different volumes, whether the operation of filling in an entry content form is executed is determined according to the energy index parameters, the problem that the form cannot be filled after errors occur in voice recognition is avoided, the success rate and the accuracy of semantic analysis are improved on the basis of ensuring the accuracy of voice processing, and the success rate and the user experience of filling in the form through voice input are also improved.
In one embodiment, the determining a first energy indicator parameter corresponding to the first volume value and a second energy indicator parameter corresponding to the second volume value includes:
acquiring a corresponding relation between a volume value interval and an energy index parameter, wherein the volume value interval and the energy index parameter are in positive correlation;
determining a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;
and determining a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval according to the corresponding relation between the volume value interval and the energy index parameters.
In this embodiment, a corresponding relationship between the volume value interval and the energy index parameter may be preset, so that a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval are determined according to the corresponding relationship between the volume value interval and the energy index parameter. Specifically, the volume value interval and the energy index parameter may be in positive correlation, that is, the larger the value of the volume value interval is, the larger the energy index parameter is, and the smaller the value of the volume value interval is, the smaller the energy index parameter is. And if the energy index parameter is large, the probability that the characters corresponding to the item content are preferentially matched is higher. Therefore, the energy index parameter is added on the basis of the original voice recognition technology, the success rate and the accuracy rate of semantic analysis can be improved on the basis of ensuring the accuracy rate of voice processing, and the success rate and the user experience degree of filling in a form by using voice input of a user are also improved.
In one embodiment, searching the content of the target item matched with the second text in the content database of the item to be selected includes:
calculating a first similarity between the second characters and each item content to be selected in the item content database to be selected;
and determining the content of the item to be selected with the highest first similarity as the target item content.
In the embodiment, the first similarity between the second character and each item content to be selected in the item content database to be selected is calculated, so that the item content to be selected with the highest similarity is determined as the target item content, the accuracy of voice recognition is ensured, the problem that semantic analysis cannot be performed due to the fact that a voice recognition result is wrong is avoided, and the success rate and the accuracy of the semantic analysis are improved.
In one embodiment, the filling out the target entry content into the corresponding entry content table includes:
determining a target operation item corresponding to the target item content;
calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;
and when the second similarity is greater than or equal to the preset similarity, filling the target entry content into an entry content table corresponding to the target operation entry.
In this embodiment, before filling the target entry content in the corresponding entry content table, the target operation entry corresponding to the target entry content may be determined, and then similarity calculation may be performed on the first text corresponding to the target operation entry and the pre-operation entry, and if the similarity between the target operation entry and the pre-operation entry is greater than the preset similarity, it is indicated that the target operation entry and the pre-operation entry are matched, that is, the pre-operation entry of the user is the target operation entry, so that accuracy of the semantic analysis result is further ensured.
In one embodiment, before determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message, the method further includes:
calculating the confidence of the voice information;
judging whether the confidence coefficient is smaller than a preset confidence coefficient;
and when the confidence coefficient is smaller than a preset confidence coefficient, executing the step of determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message.
In this embodiment, the confidence level of the voice information may be calculated first, and if the confidence level of the voice information is greater than or equal to the preset confidence level, it indicates that the confidence level is high, and semantic analysis may be successfully performed, a voice analysis scheme in the existing related art may be used to perform semantic analysis, and if the confidence level of the voice information is less than the preset confidence level, it indicates that the confidence level is low, and semantic analysis may fail, and at this time, the scheme of performing semantic analysis according to the difference in volume of the present invention may be used.
According to a second aspect of the embodiments of the present invention, there is provided a speech processing apparatus including:
the receiving module is used for receiving voice information input by a user, wherein the voice information comprises first voice information corresponding to a pre-operation item and second voice information corresponding to item content;
the recognition module is used for recognizing the voice information to obtain a first character corresponding to the pre-operation entry and a second character corresponding to the entry content;
the first determining module is used for determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;
a second determining module, configured to determine a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value;
the searching module is used for searching the target item content matched with the second character in the item content database to be selected when the second energy index parameter is greater than the first energy index parameter and the second energy index parameter is greater than a preset energy index parameter;
and the filling module is used for filling the target entry content into the corresponding entry content table.
In one embodiment, the second determining module comprises:
the obtaining submodule is used for obtaining a corresponding relation between a volume value interval and an energy index parameter, wherein the volume value interval and the energy index parameter are in positive correlation;
the first determining submodule is used for determining a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;
and the second determining submodule is used for determining a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval according to the corresponding relation between the volume value interval and the energy index parameter.
In one embodiment, the lookup module comprises:
the first calculation submodule is used for calculating a first similarity between the second characters and each item content to be selected in the item content database to be selected;
and the content determining submodule is used for determining the item content to be selected with the highest first similarity as the target item content.
In one embodiment, the filling module includes:
the item determining submodule is used for determining a target operation item corresponding to the target item content;
the second calculation submodule is used for calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;
and the filling sub-module is used for filling the target item content into the item content table corresponding to the target operation item when the second similarity is greater than or equal to the preset similarity.
In one embodiment, the apparatus further comprises:
the calculation module is used for calculating the confidence of the voice information before determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information;
the judging module is used for judging whether the confidence coefficient is smaller than the preset confidence coefficient;
and the triggering module is used for triggering the second determining module to determine a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information when the confidence coefficient is smaller than a preset confidence coefficient.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic diagram of an entry template in the related art.
Fig. 2 is a diagram illustrating item content options in the related art.
FIG. 3 is a flow diagram illustrating a method of speech processing according to an example embodiment.
Fig. 4 is a flowchart illustrating step S304 in a voice processing method according to an exemplary embodiment.
Fig. 5 is a flowchart illustrating step S305 in a voice processing method according to an exemplary embodiment.
Fig. 6 is a flowchart illustrating step S306 in a voice processing method according to an exemplary embodiment.
FIG. 7 is a flow diagram illustrating another method of speech processing according to an example embodiment.
FIG. 8 is a block diagram illustrating a speech processing apparatus according to an example embodiment.
FIG. 9 is a block diagram illustrating a second determination module in a speech processing apparatus according to an example embodiment.
FIG. 10 is a block diagram illustrating a lookup module in a speech processing device according to an example embodiment.
FIG. 11 is a block diagram illustrating a fill module in a speech processing device according to an example embodiment.
FIG. 12 is a block diagram illustrating another speech processing apparatus according to an example embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
FIG. 3 is a flow diagram illustrating a method of speech processing according to an example embodiment. The voice awakening method is applied to terminal equipment which can be any equipment with a voice control function, such as a mobile phone, a computer, a digital broadcast terminal, a message transceiver, a game console, a tablet equipment, a medical equipment, a body-building equipment, a personal digital assistant and the like. As shown in fig. 3, the method comprises steps S301-S306:
in step S301, receiving voice information input by a user, where the voice information includes first voice information corresponding to a pre-operation item and second voice information corresponding to an item content;
in some tables, there are operation entries and entry contents, for example, when a score sheet is filled, the operation entries include names, characters, scores and the like, and the specific zhang san, maid and 90 cents are corresponding entry contents. For another example, the examination category, the department performing, the execution time, and the like in fig. 1 all belong to the operation entry, and the corresponding plain film, radiology department, 3/8/2011, and the like all belong to the entry content. The user can input the operation item by voice when the user wants to perform voice operation on the operation item, and the operation item is the pre-operation item.
In step S302, voice information is identified to obtain a first text corresponding to the pre-operation entry and a second text corresponding to the entry content;
in step S303, a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message are determined;
for the convenience of semantic analysis, the user may use different volumes for different words to emphasize meaning in the semantics, for example, the user says "department radiology department", where the sound of three words of "radiology department" is relatively large, which indicates that the radiology department is the key point in the semantic analysis.
In step S304, a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value are determined;
the energy index parameters corresponding to different volume values are different, so that whether form filling is performed or not is determined according to the energy index parameters.
In step S305, when the second energy index parameter is greater than the first energy index parameter and the second energy index parameter is greater than the preset energy index parameter, searching a target item content matched with the second text in the item content database to be selected;
in step S306, the target entry content is filled in the corresponding entry content table.
In this embodiment, after the voice information including the first voice information corresponding to the pre-operation entry and the second voice information corresponding to the entry content is identified, the volume values of the two voice information are respectively determined, and then the energy index parameter is determined according to the volume values of the two voice information, when the energy index parameter of the second voice information corresponding to the entry content is greater than the energy index parameter of the first voice information and greater than the preset index parameter, the text corresponding to the entry content is preferentially matched with the entry content to be selected in the entry content database to be selected, so that the target entry content matched with the text is filled in the corresponding entry content table. Therefore, a user can fill in a form through voice input without manual selection, and the user is allowed to use different volumes for different vocabularies, so that different energy index parameters are determined according to different volumes, whether the operation of filling in an entry content form is executed is determined according to the energy index parameters, the problem that the form cannot be filled after errors occur in voice recognition is avoided, the success rate and the accuracy of semantic analysis are improved on the basis of ensuring the accuracy of voice processing, and the success rate and the user experience of filling in the form through voice input are also improved.
In one embodiment, as shown in FIG. 4, the step S304 includes steps S401-S403:
in step S401, a corresponding relationship between a volume value interval and an energy index parameter is obtained, wherein the volume value interval and the energy index parameter are in positive correlation;
the corresponding relationship between the volume value interval and the energy index parameter can be preset, so that a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval are determined according to the corresponding relationship between the volume value interval and the energy index parameter. Specifically, the volume value interval and the energy index parameter may be in positive correlation, that is, the larger the value of the volume value interval is, the larger the energy index parameter is, and the smaller the value of the volume value interval is, the smaller the energy index parameter is.
For example, the volume value is represented by a decibel value, and in order to improve the success rate of filling a form in the voice of the user, the higher the decibel value of the voice is, the larger the energy index parameter is, and in this example, the corresponding relationship between the volume value interval and the energy index parameter is shown in table 1.
TABLE 1
Decibel value Energy fingerTarget parameter
0~20 1
21~30 2
31~60 3
61~80 4
In step S402, a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs are determined;
in step S403, a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval are determined according to the correspondence between the volume value intervals and the energy index parameters.
In this embodiment, if the energy index parameter is large, the probability that the text corresponding to the entry content is preferentially matched is higher. Therefore, the energy index parameter is added on the basis of the original voice recognition technology, the success rate and the accuracy rate of semantic analysis can be improved on the basis of ensuring the accuracy rate of voice processing, and the success rate and the user experience degree of filling in a form by using voice input of a user are also improved.
In one embodiment, as shown in fig. 5, the step S305 may include steps S501-S502:
in step S501, a first similarity between the second text and each item content to be selected in the item content database to be selected is calculated;
in the content database of the items to be selected, there are many items to be selected, for example, for the medical class database, the content of the items to be selected may include executive departments, such as radiology department, influential department, general medicine department, digestive department, endocrinology department, etc., and may also include doctors, such as zhang san, li si, wang, etc., while for the other class database, such as student score class, it may include subjects, such as politics, history, geography, etc.
In step S502, the item content to be selected with the highest first similarity is determined as the target item content.
In the embodiment, the first similarity between the second character and each item content to be selected in the item content database to be selected is calculated, so that the item content to be selected with the highest similarity is determined as the target item content, the accuracy of voice recognition is ensured, the problem that semantic analysis cannot be performed due to the fact that a voice recognition result is wrong is avoided, and the success rate and the accuracy of the semantic analysis are improved.
In one embodiment, as shown in FIG. 6, the step S306 includes steps S601-S603:
in step S601, a target operation item corresponding to the content of the target item is determined;
in the content database of the items to be selected, the contents of the items to be selected and the operation items to which the items belong should be correspondingly stored, so that the target operation items can be determined according to the contents of the target items.
In step S602, calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;
in step S603, when the second similarity is greater than or equal to the preset similarity, filling the target entry content in the entry content table corresponding to the target operation entry.
In this embodiment, before filling the target entry content in the corresponding entry content table, the target operation entry corresponding to the target entry content may be determined, and then similarity calculation may be performed on the first text corresponding to the target operation entry and the pre-operation entry, and if the similarity between the target operation entry and the pre-operation entry is greater than the preset similarity, it is indicated that the target operation entry and the pre-operation entry are matched, that is, the pre-operation entry of the user is the target operation entry, so that accuracy of the semantic analysis result is further ensured.
In one embodiment, as shown in fig. 7, before determining the first volume value corresponding to the first voice message and the second volume value corresponding to the second voice message, the method further includes steps S701-S703:
in step S701, a confidence of the voice information is calculated;
the confidence coefficient value is in the range of 0-1, and the confidence coefficient is used for evaluating the reliability of the voice recognition result, so that the higher the confidence coefficient is, the more accurate the voice recognition result is.
In step S702, it is determined whether the confidence is less than a preset confidence; the value of the preset confidence level threshold is in the range of 0-1.
In step S703, when the confidence is smaller than the preset confidence, a step of determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message is performed.
In this embodiment, the confidence level of the voice information may be calculated first, and if the confidence level of the voice information is greater than or equal to the preset confidence level, it indicates that the confidence level is high, and semantic analysis may be successfully performed, a voice analysis scheme in the existing related art may be used to perform semantic analysis, and if the confidence level of the voice information is less than the preset confidence level, it indicates that the confidence level is low, and semantic analysis may fail, and at this time, the scheme of performing semantic analysis according to the difference in volume of the present invention may be used.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention.
Fig. 8 is a block diagram illustrating a voice processing apparatus, which may be implemented as part or all of a terminal device by software, hardware, or a combination of both, according to an example embodiment. As shown in fig. 8, the speech processing apparatus includes:
the receiving module 81 is configured to receive voice information input by a user, where the voice information includes first voice information corresponding to a pre-operation entry and second voice information corresponding to an entry content;
the recognition module 82 is configured to recognize the voice information to obtain a first text corresponding to the pre-operation entry and a second text corresponding to the entry content;
a first determining module 83, configured to determine a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;
a second determining module 84, configured to determine a first energy indicator parameter corresponding to the first volume value and a second energy indicator parameter corresponding to the second volume value;
the searching module 85 is configured to search, when the second energy index parameter is greater than the first energy index parameter and the second energy index parameter is greater than a preset energy index parameter, a target entry content matched with the second text in an entry content database to be selected;
and a filling module 86, configured to fill the target entry content into the corresponding entry content table.
In this embodiment, after the voice information including the first voice information corresponding to the pre-operation entry and the second voice information corresponding to the entry content is identified, the volume values of the two voice information are respectively determined, and then the energy index parameter is determined according to the volume values of the two voice information, when the energy index parameter of the second voice information corresponding to the entry content is greater than the energy index parameter of the first voice information and greater than the preset index parameter, the text corresponding to the entry content is preferentially matched with the entry content to be selected in the entry content database to be selected, so that the target entry content matched with the text is filled in the corresponding entry content table. Therefore, a user can fill in a form through voice input without manual selection, and the user is allowed to use different volumes for different vocabularies, so that different energy index parameters are determined according to different volumes, whether the operation of filling in an entry content form is executed is determined according to the energy index parameters, the problem that the form cannot be filled after errors occur in voice recognition is avoided, the success rate and the accuracy of semantic analysis are improved on the basis of ensuring the accuracy of voice processing, and the success rate and the user experience of filling in the form through voice input are also improved.
In one embodiment, as shown in fig. 9, the second determining module 84 includes:
the obtaining submodule 91 is configured to obtain a corresponding relationship between a volume value interval and an energy index parameter, where the volume value interval and the energy index parameter are in positive correlation;
a first determining submodule 92, configured to determine a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;
the second determining submodule 93 is configured to determine, according to a correspondence between a volume value interval and an energy index parameter, a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval.
In this embodiment, if the energy index parameter is large, the probability that the text corresponding to the entry content is preferentially matched is higher. Therefore, the energy index parameter is added on the basis of the original voice recognition technology, the success rate and the accuracy rate of semantic analysis can be improved on the basis of ensuring the accuracy rate of voice processing, and the success rate and the user experience degree of filling in a form by using voice input of a user are also improved.
As shown in fig. 10, in one embodiment, the lookup module 85 includes:
the first calculation submodule 101 is configured to calculate a first similarity between the second text and each item content to be selected in the item content database to be selected;
the content determining submodule 102 is configured to determine the content of the item to be selected with the highest first similarity as the target item content.
In the embodiment, the first similarity between the second character and each item content to be selected in the item content database to be selected is calculated, so that the item content to be selected with the highest similarity is determined as the target item content, the accuracy of voice recognition is ensured, the problem that semantic analysis cannot be performed due to the fact that a voice recognition result is wrong is avoided, and the success rate and the accuracy of the semantic analysis are improved.
As shown in fig. 11, in one embodiment, the filling module 86 includes:
an item determining sub-module 111, configured to determine a target operation item corresponding to the target item content;
the second calculating submodule 112 is configured to calculate a second similarity between the first characters corresponding to the target operation entry and the pre-operation entry;
and the filling sub-module 113 is configured to fill the target entry content into the entry content table corresponding to the target operation entry when the second similarity is greater than or equal to a preset similarity.
In this embodiment, before filling the target entry content in the corresponding entry content table, the target operation entry corresponding to the target entry content may be determined, and then similarity calculation may be performed on the first text corresponding to the target operation entry and the pre-operation entry, and if the similarity between the target operation entry and the pre-operation entry is greater than the preset similarity, it is indicated that the target operation entry and the pre-operation entry are matched, that is, the pre-operation entry of the user is the target operation entry, so that accuracy of the semantic analysis result is further ensured.
As shown in fig. 12, in one embodiment, the apparatus further comprises:
a calculating module 121, configured to calculate a confidence of the voice information before determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information;
a judging module 122, configured to judge whether the confidence level is smaller than a preset confidence level;
the triggering module 123 is configured to trigger the second determining module to determine a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information when the confidence is smaller than a preset confidence.
In this embodiment, the confidence level of the voice information may be calculated first, and if the confidence level of the voice information is greater than or equal to the preset confidence level, it indicates that the confidence level is high, and semantic analysis may be successfully performed, a voice analysis scheme in the existing related art may be used to perform semantic analysis, and if the confidence level of the voice information is less than the preset confidence level, it indicates that the confidence level is low, and semantic analysis may fail, and at this time, the scheme of performing semantic analysis according to the difference in volume of the present invention may be used.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (6)

1. A method of speech processing, comprising:
receiving voice information input by a user, wherein the voice information comprises first voice information corresponding to a pre-operation item and second voice information corresponding to item content;
recognizing the voice information to obtain a first character corresponding to the pre-operation item and a second character corresponding to the item content;
determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;
determining a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value;
when the second energy index parameter is larger than the first energy index parameter and the second energy index parameter is larger than a preset energy index parameter, searching target item content matched with the second character in an item content database to be selected;
filling the target entry content into a corresponding entry content table;
before determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message, the method further includes:
calculating the confidence of the voice information;
judging whether the confidence coefficient is smaller than a preset confidence coefficient;
when the confidence coefficient is smaller than a preset confidence coefficient, executing the step of determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information;
the determining a first energy indicator parameter corresponding to the first volume value and a second energy indicator parameter corresponding to the second volume value includes:
acquiring a corresponding relation between a volume value interval and an energy index parameter, wherein the volume value interval and the energy index parameter are in positive correlation;
determining a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;
and determining a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval according to the corresponding relation between the volume value interval and the energy index parameters.
2. The method of claim 1, wherein searching the database of contents of items to be selected for the contents of the target item matching the second word comprises:
calculating a first similarity between the second characters and each item content to be selected in the item content database to be selected;
and determining the content of the item to be selected with the highest first similarity as the target item content.
3. The method of claim 1, wherein filling in the target entry content into a corresponding entry content table comprises:
determining a target operation item corresponding to the target item content;
calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;
and when the second similarity is greater than or equal to the preset similarity, filling the target entry content into an entry content table corresponding to the target operation entry.
4. A speech processing apparatus, comprising:
the receiving module is used for receiving voice information input by a user, wherein the voice information comprises first voice information corresponding to a pre-operation item and second voice information corresponding to item content;
the recognition module is used for recognizing the voice information to obtain a first character corresponding to the pre-operation entry and a second character corresponding to the entry content;
the first determining module is used for determining a first volume value corresponding to the first voice message and a second volume value corresponding to the second voice message;
a second determining module, configured to determine a first energy index parameter corresponding to the first volume value and a second energy index parameter corresponding to the second volume value;
the searching module is used for searching the target item content matched with the second character in the item content database to be selected when the second energy index parameter is greater than the first energy index parameter and the second energy index parameter is greater than a preset energy index parameter;
the filling module is used for filling the target entry content into the corresponding entry content table;
the device further comprises:
the calculation module is used for calculating the confidence of the voice information before determining a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information;
the judging module is used for judging whether the confidence coefficient is smaller than the preset confidence coefficient;
the triggering module is used for triggering the second determining module to determine a first volume value corresponding to the first voice information and a second volume value corresponding to the second voice information when the confidence coefficient is smaller than a preset confidence coefficient;
the second determining module includes:
the obtaining submodule is used for obtaining a corresponding relation between a volume value interval and an energy index parameter, wherein the volume value interval and the energy index parameter are in positive correlation;
the first determining submodule is used for determining a first volume value interval to which the first volume value belongs and a second volume value interval to which the second volume value belongs;
and the second determining submodule is used for determining a first energy index parameter corresponding to the first volume value interval and a second energy index parameter corresponding to the second volume value interval according to the corresponding relation between the volume value interval and the energy index parameter.
5. The apparatus of claim 4, wherein the lookup module comprises:
the first calculation submodule is used for calculating a first similarity between the second characters and each item content to be selected in the item content database to be selected;
and the content determining submodule is used for determining the item content to be selected with the highest first similarity as the target item content.
6. The apparatus of claim 4, wherein the filling module comprises:
the item determining submodule is used for determining a target operation item corresponding to the target item content;
the second calculation submodule is used for calculating a second similarity between the first characters corresponding to the target operation item and the pre-operation item;
and the filling sub-module is used for filling the target item content into the item content table corresponding to the target operation item when the second similarity is greater than or equal to the preset similarity.
CN201610264283.XA 2016-04-25 2016-04-25 Voice processing method and device Active CN105957524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610264283.XA CN105957524B (en) 2016-04-25 2016-04-25 Voice processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610264283.XA CN105957524B (en) 2016-04-25 2016-04-25 Voice processing method and device

Publications (2)

Publication Number Publication Date
CN105957524A CN105957524A (en) 2016-09-21
CN105957524B true CN105957524B (en) 2020-03-31

Family

ID=56915661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610264283.XA Active CN105957524B (en) 2016-04-25 2016-04-25 Voice processing method and device

Country Status (1)

Country Link
CN (1) CN105957524B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107895579B (en) * 2018-01-02 2021-08-17 联想(北京)有限公司 Voice recognition method and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3762327B2 (en) * 2002-04-24 2006-04-05 株式会社東芝 Speech recognition method, speech recognition apparatus, and speech recognition program
CN101211504A (en) * 2006-12-31 2008-07-02 康佳集团股份有限公司 Method, system and apparatus for remote control for TV through voice
KR101513615B1 (en) * 2008-06-12 2015-04-20 엘지전자 주식회사 Mobile terminal and voice recognition method
CN105161104A (en) * 2015-07-31 2015-12-16 北京云知声信息技术有限公司 Voice processing method and device
CN105183081A (en) * 2015-09-07 2015-12-23 北京君正集成电路股份有限公司 Voice control method of intelligent glasses and intelligent glasses

Also Published As

Publication number Publication date
CN105957524A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
US9947317B2 (en) Pronunciation learning through correction logs
CN106649694B (en) Method and device for determining user intention in voice interaction
CN109817046B (en) Learning auxiliary method based on family education equipment and family education equipment
US9146987B2 (en) Clustering based question set generation for training and testing of a question and answer system
US11200887B2 (en) Acoustic model training using corrected terms
US10741092B1 (en) Application of high-dimensional linguistic and semantic feature vectors in automated scoring of examination responses
US9230009B2 (en) Routing of questions to appropriately trained question and answer system pipelines using clustering
CN107077845B (en) Voice output method and device
CN105283914A (en) System and methods for recognizing speech
CN114757176B (en) Method for acquiring target intention recognition model and intention recognition method
CN109522397B (en) Information processing method and device
CN113094478B (en) Expression reply method, device, equipment and storage medium
CN115509485A (en) Filling-in method and device of business form, electronic equipment and storage medium
CN112395391A (en) Concept graph construction method and device, computer equipment and storage medium
CN109977412B (en) Method and device for correcting field value of voice recognition text and storage controller
CN105957524B (en) Voice processing method and device
CN111950267B (en) Text triplet extraction method and device, electronic equipment and storage medium
CN111639160A (en) Domain identification method, interaction method, electronic device and storage medium
CN109684357B (en) Information processing method and device, storage medium and terminal
CN108009157B (en) Statement classification method and device
CN112487159B (en) Search method, search device, and computer-readable storage medium
CN114141235A (en) Voice corpus generation method and device, computer equipment and storage medium
CN112837813A (en) Automatic inquiry method and device
CN111651961A (en) Voice-based input method and device
CN108197151B (en) Grammar library updating method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: Room 101, 1st floor, building 1, Xisanqi building materials City, Haidian District, Beijing 100096

Patentee after: Yunzhisheng Intelligent Technology Co.,Ltd.

Address before: 100028, Beijing, Haidian District Chaoyang District Sun Palace Road No. 16, building 1, AOC building, 12 floor

Patentee before: BEIJING UNISOUND INFORMATION TECHNOLOGY Co.,Ltd.

CP03 Change of name, title or address