CN111192572A

CN111192572A - Semantic recognition method, device and system

Info

Publication number: CN111192572A
Application number: CN201911421165.5A
Authority: CN
Inventors: 蔡勇
Original assignee: Zebra Network Technology Co Ltd
Current assignee: Zebra Network Technology Co Ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-22

Abstract

The invention provides a method, a device and a system for semantic recognition, wherein the method comprises the following steps: acquiring voice information and extracting a voice state according to the voice information; and inputting the voice state into a target semantic recognition model, wherein the target semantic recognition model is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state to obtain semantic information corresponding to the voice information. The method has the advantages of realizing professional semantic extraction, improving the semantic understanding accuracy, reducing semantic understanding errors caused by recognition errors of homophone characters, having wide universality and being suitable for control type voice recognition scenes such as automobiles and homes.

Description

Semantic recognition method, device and system

Technical Field

The invention relates to the technical field of computer natural language processing, in particular to a semantic recognition method, a semantic recognition device and a semantic recognition system.

Background

With the rapid development of ASR (automatic speech recognition), the semantic understanding technology based on the characters recognized by ASR has also been widely applied and developed.

Although ASR has matured, ASR recognition is not ideal in a particular landing area. For example, in the fields of medicine, biology and chemistry, although ASR can be recognized, the recognition accuracy of ASR is not high, the use requirements in each field are different, development needs to be performed for each field, the development cost is high, and the ASR speech recognition has a poor effect in the professional field.

Since semantic understanding requires the use of ASR-recognized words, when ASR-recognized words have deviations, semantic understanding is severely affected.

Disclosure of Invention

The invention provides a semantic recognition method, a semantic recognition device and a semantic recognition system, which are used for realizing professional semantic recognition, improving the recognition accuracy, reducing semantic understanding errors caused by ASR homophone recognition errors, having wide universality and being suitable for control speech recognition scenes of automobiles, homes and the like.

In a first aspect, a method for semantic recognition provided in an embodiment of the present invention includes:

acquiring voice information and extracting a voice state according to the voice information;

and inputting the voice state into a target semantic recognition model, wherein the target semantic recognition model is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state to obtain semantic information corresponding to the voice information.

In one possible design, before inputting the speech state into the target semantic recognition model, the method further includes:

acquiring a training data set;

inputting the training data set into an initial semantic recognition model, wherein the initial semantic recognition model comprises a pinyin conversion branch and a matching branch, the pinyin conversion branch is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state, and the matching branch is used for obtaining corresponding semantic information according to the pinyin characteristics to obtain the target semantic recognition model.

In one possible design of the system, the system may be,

obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state, including:

sequentially obtaining character characteristics corresponding to each voice state according to a plurality of sequentially arranged voice states, and sequentially obtaining corresponding pinyin characteristics according to the character characteristics;

or obtaining corresponding character characteristics according to a plurality of sequentially arranged voice states, wherein the character characteristics comprise character characteristics corresponding to the first voice state, and sequentially obtaining corresponding pinyin characteristics from the character characteristics corresponding to the first voice state to the character characteristics at the front end and the rear end until obtaining pinyin characteristics corresponding to all the character characteristics.

In one possible design, further comprising:

and marking the corresponding tone characteristics for the pinyin characteristics, wherein the tone characteristics are used for obtaining corresponding semantic information by combining the pinyin characteristics.

In one possible design, further comprising:

space marks are arranged among a plurality of pinyin characteristics, and the pinyin characteristics are connected into a pinyin characteristic string.

In one possible design, obtaining corresponding semantic information according to the pinyin features includes:

acquiring the highest semantic information probability corresponding to the pinyin feature string according to the pinyin feature string;

and if the highest semantic information probability is not less than a probability threshold, determining semantic information corresponding to the pinyin features.

In one possible design, after obtaining semantic information corresponding to the speech information, the method further includes:

and displaying the semantic information.

In a second aspect, an embodiment of the present invention provides a method for semantic recognition, including:

acquiring voice information and extracting a voice state;

and inputting the voice state into a target semantic recognition model, wherein the target semantic recognition model is used for recognizing the voice state to obtain semantic information corresponding to the voice information.

In a second aspect, an apparatus for semantic recognition provided in an embodiment of the present invention includes:

the acquisition module is used for acquiring voice information and extracting a voice state according to the voice information;

and the recognition module is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state and obtaining semantic information corresponding to the voice information.

acquiring a training data set;

inputting the training data set into an initial semantic recognition model, wherein the initial semantic recognition model comprises a pinyin conversion branch and a matching branch, the pinyin conversion branch is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state, and the matching branch is used for obtaining corresponding semantic information according to the pinyin characteristics to obtain the target semantic recognition model. In one possible design of the system, the system may be,

In one possible design, further comprising:

and displaying the semantic information.

In a third aspect, an apparatus for semantic recognition provided in an embodiment of the present invention includes:

the acquisition module is used for acquiring voice information and extracting a voice state;

and the recognition module is used for inputting the voice state into a target semantic recognition model, wherein the target semantic recognition model is used for recognizing the voice state to obtain semantic information corresponding to the voice information.

In a fourth aspect, a system for semantic recognition provided in an embodiment of the present invention includes: the device comprises a memory and a processor, wherein the memory stores executable instructions of the processor; wherein the processor is configured to perform the method of semantic recognition of any of the first aspect via execution of the executable instructions.

In a fifth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for semantic recognition according to any one of the first aspect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of an exemplary scenario in accordance with the present invention;

FIG. 2 is a flowchart of a semantic recognition method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a target semantic recognition model in the semantic recognition method according to an embodiment of the present invention;

FIG. 4 is a first schematic diagram of a target semantic recognition model in the semantic recognition method according to the first embodiment of the present invention;

FIG. 5 is a second schematic diagram of a target semantic recognition model in the semantic recognition method according to the second embodiment of the present invention;

FIG. 6 is a flowchart of a semantic recognition method according to a third embodiment of the present invention;

FIG. 7 is a diagram illustrating a target semantic recognition model in the semantic recognition method according to the third embodiment of the present invention;

FIG. 8 is a flowchart of a semantic recognition method according to a fourth embodiment of the present invention;

fig. 9 is a schematic diagram illustrating a partial effect in the semantic recognition method according to the fourth embodiment of the present invention;

fig. 10 is a schematic structural diagram of an apparatus for semantic recognition according to a fifth embodiment of the present invention;

fig. 11 is a schematic structural diagram of a semantic recognition system according to a sixth embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The following describes the technical solutions of the present invention and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a typical scenario of the present invention, for example, fig. 1 obtains voice information through a voice obtaining device 11, and then the voice information can be recognized and output as corresponding semantic information through the semantic recognition system of the present invention. The semantic information can comprise voice characters, json format display can be adopted, the recognition accuracy is improved, the error rate of homophone character recognition is reduced, the universality is wide, and the method is suitable for control voice recognition scenes of automobiles, homes and the like. Fig. 2 is a flowchart of a semantic recognition method according to an embodiment of the present invention, and as shown in fig. 2, the method in this embodiment may include:

s201, acquiring voice information, and extracting a voice state according to the voice information.

In the present embodiment, by collecting voice information of a continuous audio stream, compressed formats such as mp3, wmv, etc. are usually collected, and in an alternative embodiment, the voice information needs to be converted into an uncompressed waveform file to be processed, such as a Windows PCM file, which includes a file header and a sound waveform. In an alternative embodiment, the acquired speech information is pre-processed, for example, the silence of the beginning and end segments is cut off, to reduce the interference to the subsequent processing, wherein the sound is framed by sound analysis, that is, the sound is cut into small segments, each of which is a frame, the framing operation is not simple but open, and the frames are generally overlapped with the frames, and a moving window is used to implement the framing.

After framing, extracting a speech state, i.e. extracting acoustic features, such as MFCC (Mel Frequency cepstrum coefficient), so as to input the extracted speech state into a target semantic recognition model in the following, and obtain semantic information corresponding to the speech information. Wherein the speech state can be represented according to 3 states into which each basic syllable in the speech information is split, and such represented syllable is called tri-phone. A segment of speech can therefore be represented by a series of states and every 3 states represents a syllable.

S202, inputting the voice state into a target semantic recognition model, wherein the target semantic recognition model is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state to obtain semantic information corresponding to the voice information.

In combination with the above example, the extracted voice state is input into the target semantic recognition model, referring to fig. 3, fig. 3 is a schematic diagram of the target semantic recognition model in the semantic recognition method provided in the embodiment of the present invention, as shown in fig. 3, the target semantic recognition model may include a pinyin conversion branch and a matching branch, the pinyin conversion branch obtains pinyin features and character features according to the voice state, and the matching branch is used for obtaining corresponding semantic information according to the pinyin features to obtain the target semantic recognition model.

In an alternative embodiment, the target semantic recognition model is established based on semantic rules, wherein a set of rules with attributes are provided for each production formula of the grammar, and the rules represent the combination relationship among the components in the Chinese sentence and become the semantic rules. And by using the semantic rule, sentences for understanding semantic information can be summarized by using the semantic rule, and a label, namely representing semantic information, is output to express the semantic information as the meaning of a certain type of semantics, wherein the rules are generally grammars, such as: < ac _ down > [ please ] turn down the air conditioner (turning on | tone) { internal _ down }; the grammar shows that when the user speaks one of four sentences, namely, air conditioner _ down, air conditioner is turned down, and the like, a semantic intent is sent. Grammar is expressed as a tree-like state diagram data structure. The state diagram is driven by the input characters to move along the direction of an arrow, when the state diagram can move to endNode, the input sentence can be matched with the semantic rule, and the semantic information _ down represented by the state diagram is output at the moment.

Because the existing ASR language model is not accurate enough, errors can be output, especially for homophones, such as 'turn air conditioner low' recognized as 'turn air conditioner on'. At this time, the existing text-based rule is adopted, and the user cannot continue to walk after walking to the node to make a beat, so that the user cannot understand the sentence.

However, in the embodiment, when the target semantic recognition model based on the ASR semantic rule is used to output a text, a corresponding pinyin feature is also output, where the pinyin feature is ba3 kong1tiao2 da3 di1 (including a tone feature). Based on the rules based on the existing characters, the rules set as pinyin by the compiler can be adapted to the pinyin features to establish semantic rules, such as < ac _ down > is modified as: [ q ] ba king tiao (da | tiao) di { intent ═ airconditioner _ down } (without regard to pitch characteristics).

Preferably, the tone characteristics can also be considered, and the simplification is as follows: [ q 3] ba3 kong1tiao2(da3| tiao2) di1{ intent is airconditioner _ down }.

Preferably, a space is inserted if the pinyin features of two or more characters are connected together. And then the pinyin feature (combined with tone feature) sequence of ba3 kong1tiao2 da3 di1 is used in the matching branch to drive the state diagram to move forward, and when the state diagram moves to the endNode, the matching can be successfully matched, and semantic information is output.

The semantic rule-based approach in this embodiment may solve the problem of homophones but different words identified by prior art asr. In an alternative embodiment, the pinyin conversion branch may directly obtain the pinyin features, and the technical principle and implementation process are similar to the above process and will not be described herein again.

In an alternative embodiment, a target semantic recognition model of the neural network model is established, and semantic information is output through the target semantic recognition model of the neural network model by using pinyin features and labels, which are listed in table 1 below.

TABLE 1

Specifically, referring to fig. 4, fig. 4 is a schematic diagram of a target semantic recognition model in the semantic recognition method provided in the first embodiment of the present invention, as shown in fig. 4, for example, pinyin features are sent to various models such as CNN/DNN, and after multiple convolutions, a predicted label is output, a distribution distance (loss) is obtained by comparing distributions of the predicted label and the labeled label, a weight of the CNN/DNN is modified by the loss, the distribution of the predicted label and the labeled label is closest to be a highest semantic information probability through multiple iterations, and a predicted probability p (label) corresponding to the label is obtained. The phonetic state is converted into corresponding phonetic feature through the phonetic conversion branch, and then corresponding semantic information is obtained in the CNN/DNN matching branch according to the phonetic feature. In an alternative embodiment, the pinyin conversion branch may be a pinyin conversion branch established based on semantic rules, and the specific implementation process and technical principles thereof are as described in the above examples, and are not described herein again.

In an optional embodiment, an important achievement of a pre-training language model developed based on google in natural language understanding is established in this embodiment to obtain a target semantic recognition model, referring to fig. 5, fig. 5 is a schematic diagram of the target semantic recognition model in the semantic recognition method provided by the second embodiment of the present invention, as shown in fig. 5, a speech state is converted into a corresponding pinyin feature through a pinyin conversion branch, and then a prediction probability p (label) corresponding to the prediction label is obtained through multiple iterations through various models such as a pre-training model of Bert and a CNN/DNN, where the prediction label is closest to the distribution of the labeled label, and if (p label) is greater than T (preset threshold), the prediction label is determined, and corresponding semantic information is output. The phonetic state is converted into corresponding phonetic features through the phonetic conversion branch, and corresponding semantic information is obtained in matching branches such as Bert, CNN/DNN and the like according to the phonetic features. In an alternative embodiment, the pinyin conversion branch may be a pinyin conversion branch established based on semantic rules, and the specific implementation process and technical principles thereof are as described in the above examples, and are not described herein again.

In the embodiment, the voice state is converted into the pinyin characteristics, so that the error rate of homophone recognition is reduced, and the accuracy rate of semantic recognition is improved.

Fig. 6 is a flowchart of a semantic recognition method provided in the third embodiment of the present invention, and as shown in fig. 6, the semantic recognition method in this embodiment may include:

s301, acquiring voice information and extracting a voice state.

In this embodiment, by collecting the voice information of the continuous audio stream, the compression formats such as mp3 and wmv are common, and may be collected in real time or collected in advance. In an alternative embodiment, the audio signal is converted into a non-compressed waveform file for processing, such as a Windows PCM file, which includes a header and a sound waveform.

S302, inputting the voice information into a target semantic recognition model, wherein the target semantic recognition model is used for inputting the voice state into the target semantic recognition model, and the target semantic recognition model is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state to obtain semantic information corresponding to the voice information.

Referring to fig. 7 specifically, fig. 7 is a schematic diagram of a target semantic recognition model in the semantic recognition method according to the third embodiment of the present invention, as shown in fig. 7, collected voice information is input into the target semantic model, and the voice information is recognized in an acoustic model, so that the language model is input to obtain semantic information corresponding to the voice information. In an optional embodiment, the matching branch may also use a Bert, CNN/DNN, or other language model to obtain, through multiple iterations, a predicted label closest to the distribution of the labeled label, and a predicted probability p (label) corresponding to the predicted label, where if p (label) is greater than T (a preset threshold), the predicted label is determined, and corresponding semantic information is output. In an optional embodiment, when the maximum p (label) is obtained and corresponds to T (a preset threshold), the predicted label is determined, and corresponding semantic information is output.

In an optional embodiment, in the process of extracting the voice state, if it is detected that there is a key feature corresponding to the voice information (for example, an environmental sound "buy things"), semantic recognition may be performed in combination with the key feature (for example, a semantic corresponding environment: a mall is recognized by the environmental sound, and then semantic recognition is performed in combination with the environment), so as to improve recognition accuracy. The embodiment can realize professional semantic extraction, improve the semantic understanding accuracy and reduce semantic understanding errors caused by recognition errors of homophones.

Fig. 8 is a flowchart of a semantic recognition method according to a fourth embodiment of the present invention, as shown in fig. 8, the semantic recognition method in this embodiment may add step S200 on the basis of fig. 2 before inputting the speech state into the target semantic recognition model, specifically,

s200: acquiring a training data set; inputting the training data set into an initial semantic recognition model, wherein the initial semantic recognition model comprises a pinyin conversion branch and a matching branch, the pinyin conversion branch is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to a voice state, and the matching branch is used for obtaining corresponding semantic information according to the pinyin characteristics to obtain a target semantic recognition model.

In the embodiment, by capturing a large number of characters on the network and then obtaining the pinyin characteristics, not only the probability of a single pinyin corresponding to Chinese characters and phrases can be obtained, but also the probability of the pinyin characteristics with different lengths corresponding to semantic information can be obtained most importantly. In the method of semantic recognition in the prior art, the training dataset may result in P (open empty | e.g. sound "da kai kong") -2/3, P (open control | e.g. sound "da kai kong") -1/3, P (open condition | e.g. sound "da kai big") -1/2, and P (open ice | da kai big) — 1/2. The character probability can be obtained by counting the probability of different Chinese characters under the same sound. In a similar embodiment, the probability of different semantic tables appearing under the same pinyin feature is counted to obtain the semantic probability, so that the semantic information is output when the highest probability table appears.

And then inputting the training data set into the initial semantic recognition model, obtaining corresponding pinyin characteristics according to the voice state through a pinyin conversion branch of the initial semantic recognition model, obtaining corresponding semantic information according to the pinyin characteristics through a matching branch, and training to obtain a target semantic recognition model.

In an alternative embodiment, obtaining the pinyin characteristics or the pinyin characteristics and the text characteristics according to the voice state includes:

or obtaining corresponding character characteristics according to a plurality of sequentially arranged voice states, wherein the character characteristics comprise character characteristics corresponding to the first voice state, and sequentially obtaining corresponding pinyin characteristics from the character characteristics corresponding to the first voice state to the character characteristics at the front end and the rear end until obtaining the pinyin characteristics corresponding to all the character characteristics.

In this embodiment, the pinyin features are obtained according to the voice state by expressing a data structure of a tree diagram in the target semantic recognition model based on the ASR semantic rule, the arranged character features can be sequentially obtained according to the time sequence of the voice state, and then each character feature is converted into the pinyin feature, for example, the voice state "please turn the air conditioner low, i.e., meaning the inner _ down" refers to fig. 9, fig. 9 is a partial effect schematic diagram of the semantic recognition method provided by the fourth embodiment of the present invention, and referring to fig. 9, the voice state is driven according to the arrow direction, and when the voice state is driven to the endNode, each voice state is sequentially converted into the pinyin feature.

Or, in order to improve the speed and efficiency of the conversion, in an optional embodiment, the voice state may be used to obtain corresponding text features, and a text feature corresponding to the first voice state is selected from the text features corresponding to the voice states, and the text feature corresponding to the first voice state starts to be converted, and pinyin features are sequentially obtained from the text feature of the first voice state to the text features of the front end and the rear end until pinyin features corresponding to all text features are obtained. For example, the character features in the first voice state are driven according to the arrow direction, and pinyin features are obtained towards the character features at the front end and the rear end, so that pinyin features are obtained until all the character features are converted into pinyin features. In this embodiment, the first speech state and the corresponding text feature are not limited. The pinyin features obtained according to the voice state can be based on Chinese character rules, for example, a dictionary and the like are utilized to convert the voice state into character features, and then the pinyin features are obtained according to the character features.

In an alternative embodiment, the method further comprises:

Specifically, the corresponding tone features can be labeled in the process of converting the voice state, and the corresponding semantic information can be obtained by combining the pinyin features, so that the tone features are beneficial to identifying more accurate semantic information from the voice information. In combination with the above example, the phonetic feature label corresponds to a tone feature of ba3 kong1tiao2 da3 di 1.

In an optional embodiment, further comprising:

space marks are arranged among the pinyin characteristics, and the pinyin characteristics are connected into a pinyin characteristic string.

In combination with the above example, the pinyin features may be connected into a pinyin feature string, and space identifiers are provided between the plurality of pinyin features to separate each pinyin feature, thereby avoiding confusion and improving the accuracy of recognition.

Wherein, obtain corresponding semantic information according to the spelling characteristic, include:

and if the highest semantic information probability is not less than the probability threshold, determining the semantic information corresponding to the pinyin features.

Specifically, the pinyin characteristics are obtained to corresponding semantic information through a matching branch in an initial semantic recognition model based on ASR established semantic rules. In an alternative implementation, the highest probability of semantic information corresponding to the pinyin feature string is obtained according to the pinyin feature string, for example, p (label | S1, S2.) is output through a matching branch, if the probability of max (p (label | S1, S2 …) × p (S1, S2.)) is greater than a probability threshold T, the semantic information corresponding to the pinyin feature is determined, and the input speech information is determined to correspond to the semantic label, wherein label is semantic information, S1, S2, etc. are speech states, and the probability threshold T is not limited in this embodiment, wherein, the speech states of S1, S2, etc. can be represented according to 3 states into which each base syllable in the speech information is split, and the represented syllable is called tri-phone (polyphony).

In an alternative embodiment, an initial semantic recognition model based on a neural network model, for example, various models including CNN/DNN, outputs a predicted label through a matching branch therein, obtains a distribution distance (loss) by comparing distributions of the predicted label and a labeled label, modifies a weight of the CNN/DNN through the loss, and determines semantic information corresponding to a pinyin feature by performing multiple iterations to make the distributions of the predicted label and the labeled label closest.

In an optional embodiment, a pre-training language model developed based on google is established to obtain an initial semantic recognition model, the predicted label is output through matching branches in the model, for example, the pre-training model such as Bert and various models such as CNN/DNN, distribution distance (loss) is obtained by comparing the distribution of the predicted label and the labeled label, the weight of CNN/DNN is modified through loss, and the semantic information corresponding to the pinyin features is determined by performing multiple iterations to make the distribution of the predicted label and the labeled label closest to each other.

In an optional embodiment, an initial semantic recognition model which is not based on a training model in the prior art and a pre-training language model developed based on google are established to obtain the initial semantic recognition model, a speech state is input into any one of the initial semantic recognition models through a training data set, the speech state is converted into corresponding pinyin characteristics according to the speech state, and corresponding semantic information is further obtained according to the pinyin characteristics through a matching branch to train to obtain a target semantic recognition model. For example, the target semantic recognition model can be obtained based on a pre-trained pinyin bert model developed by google.

In an optional embodiment, after obtaining the semantic information corresponding to the voice information, the method further includes:

and displaying the semantic information.

For example, when semantic information corresponding to the voice information is obtained through the target semantic recognition model, the semantic information is displayed, for example, semantic information corresponding to the voice information of "turn air conditioner low" is displayed.

In the speech recognition process, since the chinese characters may correspond to homophones, that is, one pinyin feature may correspond to a plurality of chinese characters, the present embodiment obtains the highest semantic information probability through the determined pinyin feature by converting the speech state into the pinyin feature, and if the highest semantic information probability is not less than the probability threshold, determines the semantic information corresponding to the pinyin feature. The method not only improves the recognition accuracy rate and reduces the error rate of homophone character recognition, but also has wide universality and is suitable for control type voice recognition scenes of automobiles, homes and the like.

In an alternative embodiment, the method for semantic recognition in this embodiment may add step S300 (not shown) on the basis of fig. 6 before inputting the speech information into the target semantic recognition model, specifically,

s300: acquiring a training data set; inputting the training data set into an initial semantic recognition model, wherein the initial semantic recognition model recognizes sound features of a large amount of pre-stored voice information, and then obtains semantic information corresponding to the voice information through the sound features, so as to obtain a target semantic recognition model. In an alternative embodiment, the initial semantic recognition model may include matching branches and is used to derive corresponding semantic information based on the sound features. In an optional embodiment, the matching branch may also use a Bert, CNN/DNN, or other language model to obtain, through multiple iterations, a predicted label closest to the distribution of the labeled label, and a predicted probability p (label) corresponding to the predicted label, where if p (label) is greater than T (a preset threshold), the predicted label is determined, and corresponding semantic information is output. In an optional embodiment, when the maximum p (label) is obtained and corresponds to T (a preset threshold), the predicted label is determined, and corresponding semantic information is output.

In an optional embodiment, after obtaining the semantic information corresponding to the voice information, the method further includes: and displaying the semantic information.

The embodiment not only improves the recognition accuracy rate and reduces the error rate of homophone character recognition, but also has wide universality and is suitable for control type voice recognition scenes of automobiles, homes and the like.

Fig. 10 is a schematic structural diagram of a semantic recognition apparatus according to a fifth embodiment of the present invention, and as shown in fig. 10, the semantic recognition apparatus according to this embodiment may include:

an obtaining module 31, configured to obtain voice information and extract a voice state according to the voice information;

an identification module 32 for

acquiring a training data set;

inputting the training data set into an initial semantic recognition model, wherein the initial semantic recognition model comprises a pinyin conversion branch and a matching branch, and the pinyin conversion branch is used for

And the matching branch is used for obtaining corresponding semantic information according to the pinyin characteristics to obtain a target semantic recognition model.

In one possible design of the system, the system may be,

obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state, comprising:

In one possible design, further comprising:

and displaying the semantic information.

The device for semantic recognition in this embodiment may execute the technical solutions in the methods shown in fig. 2 and fig. 8, and the specific implementation process and technical principle of the device refer to the related descriptions in the methods shown in fig. 2 and fig. 8, which are not described herein again.

Fig. 11 is a schematic structural diagram of a semantic recognition system according to a sixth embodiment of the present invention, and as shown in fig. 11, the semantic recognition system 40 according to this embodiment may include: a processor 41 and a memory 42.

A memory 42 for storing a computer program (e.g., an application program, a functional module, etc. implementing the above-described method of semantic recognition), computer instructions, etc.;

the computer programs, computer instructions, etc. described above may be stored in one or more memories 42 in partitions. And the above-mentioned computer program, computer instructions, data, etc. can be called by the processor 41.

A processor 41 for executing the computer program stored in the memory 42 to implement the steps of the method according to the above embodiments.

Reference may be made in particular to the description relating to the preceding method embodiment.

The processor 41 and the memory 42 may be separate structures or may be integrated structures integrated together. When the processor 41 and the memory 42 are separate structures, the memory 42 and the processor 41 may be coupled by a bus 43.

The server in this embodiment may execute the technical solutions in the methods shown in fig. 2 and fig. 8, and the specific implementation process and technical principle of the server refer to the relevant descriptions in the methods shown in fig. 2 and fig. 8, which are not described herein again.

In addition, embodiments of the present application further provide a computer-readable storage medium, in which computer-executable instructions are stored, and when at least one processor of the user equipment executes the computer-executable instructions, the user equipment performs the above-mentioned various possible methods.

Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in user equipment. Of course, the processor and the storage medium may reside as discrete components in a communication device.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of semantic recognition, comprising:

2. The method of claim 1, further comprising, prior to inputting the speech state into the target semantic recognition model:

acquiring a training data set;

3. The method of claim 2, wherein obtaining the pinyin feature, or the pinyin feature and the text feature based on the voice status comprises:

4. The method of claim 3, further comprising:

5. The method of claim 3, further comprising:

6. The method of claim 5, wherein obtaining corresponding semantic information according to the pinyin features comprises:

7. The method according to any one of claims 1-6, further comprising, after obtaining semantic information corresponding to the speech information:

and displaying the semantic information.

8. A method of semantic recognition, comprising:

acquiring voice information and extracting a voice state;

9. An apparatus for semantic recognition, comprising:

and the recognition module is used for inputting the voice state into a target semantic recognition model, wherein the target semantic recognition model is used for obtaining pinyin characteristics or pinyin characteristics and character characteristics according to the voice state to obtain semantic information corresponding to the voice information.

10. An apparatus for semantic recognition, comprising:

11. A system for semantic recognition, comprising: the device comprises a memory and a processor, wherein the memory stores executable instructions of the processor; wherein the processor is configured to perform the method of semantic recognition of any one of claims 1-7 via execution of the executable instructions.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of semantic recognition according to any one of claims 1 to 7.